Crawlee & Apify•12mo ago

Hello everyone, Is there a way to avoid

Hello everyone, Is there a way to avoid scraping same pages even if the crawler is restared ??? because I'm currently working on a news website crawler, However, with each run of the scraper, I'm encountering up to 80% duplicated news from previous runs. Any suggestions on how to address this issue effectively?

3 Replies

correct-apricot•12mo ago

If running on local you can set enviroment variable CRAWLEE_PURGE_ON_START to false and then the crawler will use the same request queue all over again. https://crawlee.dev/api/3.8/core/interface/ConfigurationOptions#purgeOnStart

flat-fuchsia•12mo ago

Thanks

correct-apricot•12mo ago

If running in Apify, try naming your requests queue.

Gaming

Programming

Hello everyone, Is there a way to avoid

Did you find this page helpful?