Is it possible to stop the crawler if a condition is met ?

Hi. I'm making a crawler (CheerioCrawler) that scrape a news website. I start the crawler by giving it a list of url with all the pages of the articles list (an array containing site.com/?page=1, site.com/?page=2 etc). For every article list page, i will scrape every article inside of it. I was wondering, if my url site.com/?page=60 (per instance) doesn't have any articles on it, can i stop the execution of the crawler at this time ? I know how to check if there are any articles on the page, but I can't find how to stop the crawler at a certain point (without completing all the url in the url list). Thank you very much!
3 Replies
correct-apricot
correct-apricotOP3y ago
Anyone has an idea ? Thanks
Pepa J
Pepa J3y ago
Hi @Vince The general approach would be putting only first page (?page=1) in the RequestQueue on start and then if it founds any articles the RequestHandler would add the next page (?page=2) link to the RequestQueue by it self. If it not find any article it may simply just return without adding next page to the Request Queue, which would finish crawling pages for specific article list just when needed.
correct-apricot
correct-apricotOP3y ago
Thank you @Pepa J ! I’ll try that

Did you find this page helpful?