Crawlee & Apify•2y ago

Hey everyone, would love it if we had a

Hey everyone, would love it if we had a way to shutdown crawlers from inside request handler, I went through the docs today and the only way to do it right now is via the crawler itself either using crawler.teardown() or crawler.requestQueue.drop() (Not sure about this one), the main use case for it being saving on proxy costs/stopping crawlers from redundantly scraping data or even some other arbritrary conditions. I have found a workaround for this by setting a shutdown flag in a state or even a variable and checking for it inside the handlers and if its true, just doing a return;(to empty out the queue) while this works it does add in a lot of noise in logs (also in the code) because we need to log that we are skipping them because of this flag for debugging purposes and I wish it would be handled a little more gracefully in the scraper instead of every request handler checking for it

4 Replies

sensitive-blue•2y ago

This is more for the #crawlee-js but why doesn't crawler.teardown() work? It does shut down the crawling. Yes, the requests that are already in progress will continue but that's how JS promises work. You can also do a process hardkill, await Actor.exit() or process.exit() if without Apify

optimistic-gold•2y ago

I did make a thread for it, we dont have access to the crawler instance inside the requestHandler from what I know, my suggestion was to have some interface inside the request handlers which could handle this

sensitive-blue•2y ago

You have access to crawler in context (and also from the outer scope)

optimistic-gold•2y ago

I'll check that out, my bad

Gaming

Programming

Hey everyone, would love it if we had a

Did you find this page helpful?