Best practice to stop/crash the actor/crawler on high ratio of errors?

Following snippet works well for me, but it smells... sb have a cleaner approach?
// Every 3s, check for the ratio of finished (=success) and failed requests and stop the process if it's too bad
setInterval(() => {
const { requestsFinished, requestsFailed } = crawler.stats.state
if (requestsFailed > requestsFinished + 10) { // when failed 10 more than finished, stop trying bro
console.warn(`💣 Too many failed requests, stopping! (${requestsFailed} failed, ${requestsFinished} finished)`)
process.exit(1)
}
}, 3000)
// Every 3s, check for the ratio of finished (=success) and failed requests and stop the process if it's too bad
setInterval(() => {
const { requestsFinished, requestsFailed } = crawler.stats.state
if (requestsFailed > requestsFinished + 10) { // when failed 10 more than finished, stop trying bro
console.warn(`💣 Too many failed requests, stopping! (${requestsFailed} failed, ${requestsFinished} finished)`)
process.exit(1)
}
}, 3000)
3 Replies
rare-sapphire
rare-sapphire2y ago
There is now some message on apify which comes I guess from the crawler when there are problems. So maybe you can use that if you find out what is generating that message.
No description
xenial-black
xenial-blackOP2y ago
This @HonzaS guy knows stuff 🙏
afraid-scarlet
afraid-scarlet2y ago
you can use stats https://crawlee.dev/api/browser-crawler/class/BrowserCrawler#stats however approach itself is not safe - you supposed to handle sessions and-or bot protection to resolve blocking by logic, not by hammering web site doing many runs. I.e. set concurrency, max request retries, logic for session.markBad etc and implement scalable crawler.

Did you find this page helpful?