Ignore URLs the matches the current url but does have query params
I do not want to crawl a url that is already crawled but have different query params, how can i do this?
1 Reply
optimistic-goldOP•2y ago
await enqueueLinks({
selector: 'a[href]',
transformRequestFunction: (link) => {
const { url } = link;
const urlWithoutQuery = url.split('?')[0];
if (!visitedUrls.has(urlWithoutQuery)) {
visitedUrls.add(urlWithoutQuery);
return { url: urlWithoutQuery };
}
},
strategy: EnqueueStrategy.SameHostname,
});