Crawlee & Apify•3y ago

Is it possible to stop the crawler if a condition is met ?

Hi. I'm making a crawler (CheerioCrawler) that scrape a news website. I start the crawler by giving it a list of url with all the pages of the articles list (an array containing site.com/?page=1, site.com/?page=2 etc). For every article list page, i will scrape every article inside of it. I was wondering, if my url site.com/?page=60 (per instance) doesn't have any articles on it, can i stop the execution of the crawler at this time ? I know how to check if there are any articles on the page, but I can't find how to stop the crawler at a certain point (without completing all the url in the url list). Thank you very much!

3 Replies

correct-apricotOP•3y ago

Anyone has an idea ? Thanks

Pepa J•3y ago

Hi @Vince The general approach would be putting only first page (?page=1) in the RequestQueue on start and then if it founds any articles the RequestHandler would add the next page (?page=2) link to the RequestQueue by it self. If it not find any article it may simply just return without adding next page to the Request Queue, which would finish crawling pages for specific article list just when needed.

correct-apricotOP•3y ago

Thank you @Pepa J ! I’ll try that

Gaming

Programming

Is it possible to stop the crawler if a condition is met ?

Did you find this page helpful?