Hey everyone, would love it if we had a
Hey everyone, would love it if we had a way to shutdown crawlers from inside request handler, I went through the docs today and the only way to do it right now is via the
crawler
itself either using crawler.teardown()
or crawler.requestQueue.drop()
(Not sure about this one), the main use case for it being saving on proxy costs/stopping crawlers from redundantly scraping data or even some other arbritrary conditions.
I have found a workaround for this by setting a shutdown
flag in a state or even a variable and checking for it inside the handlers and if its true, just doing a return;
(to empty out the queue) while this works it does add in a lot of noise in logs (also in the code) because we need to log that we are skipping them because of this flag for debugging purposes and I wish it would be handled a little more gracefully in the scraper instead of every request handler checking for it4 Replies
sensitive-blue•2y ago
This is more for the #crawlee-js but why doesn't crawler.teardown() work? It does shut down the crawling. Yes, the requests that are already in progress will continue but that's how JS promises work.
You can also do a process hardkill,
await Actor.exit()
or process.exit()
if without Apifyoptimistic-gold•2y ago
I did make a thread for it, we dont have access to the
crawler
instance inside the requestHandler
from what I know,
my suggestion was to have some interface inside the request handlers which could handle thissensitive-blue•2y ago
You have access to
crawler
in context
(and also from the outer scope)optimistic-gold•2y ago
I'll check that out, my bad