Prevent Clawler from adding failed request to default RequestQueue

Is there a way to prevent the crawler from adding a failed request to the default RequestQueue?
const crawler = new PuppeteerCrawler({
proxyConfiguration,
requestHandler: router,
maxRequestRetries: 25,
requestList: await RequestList.open(null, [initUrl]),
requestHandlerTimeoutSecs: 2000,
maxConcurrency: 1,
}, config);
const crawler = new PuppeteerCrawler({
proxyConfiguration,
requestHandler: router,
maxRequestRetries: 25,
requestList: await RequestList.open(null, [initUrl]),
requestHandlerTimeoutSecs: 2000,
maxConcurrency: 1,
}, config);
I'm using the default RequestQueue to add productUrls, and they're being handled inside the defaultRequestHandler, but when some of them fails, I purposely throw an Error, expecting the failed request(which is the initUrl) goes back to RequestList, but it goes to the default RequestQueue too, which is not what I want.
1 Reply
flat-fuchsia
flat-fuchsia2y ago
do not throw error if you do not want to retry the same request, scraping logic to retry on errors s to resolve blocking by retries

Did you find this page helpful?