Shared external queue between multiple crawlers

Hello folks! Is there any way i can force cheerio/playwright crawlers to stop using their own internal request queue and instead "enqueue links" to another queue service such as Redis? I would like to achieve this in order to be able to run multiple crawlers on a single website and i would need them to share the same queue so they won't use duplicate links. Thanks in advance!
3 Replies
MEE6
MEE6•3mo ago
@mesca4046 just advanced to level 1! Thanks for your contributions! 🎉
Hall
Hall•3mo ago
Someone will reply to you shortly. In the meantime, this might help: -# This post was marked as solved by mesca4046. View answer.
unwilling-turquoise
unwilling-turquoise•3mo ago
Hello! The request queue is managed by Crawlee, and not by Cheerio or Playwright directly. What you could try to do, is creating a custom RequestQueue which inherits Crawlee's class: https://crawlee.dev/api/core/class/RequestQueue. Here is the source code: https://github.com/apify/crawlee/blob/master/packages/core/src/storages/request_queue.ts#L77. Then, you could pass the custom queue to the (Cheerio/Playwright) Crawler: https://crawlee.dev/api/cheerio-crawler/class/CheerioCrawler#requestQueue.
CheerioCrawler | API | Crawlee · Build reliable crawlers. Fast.
Crawlee helps you build and maintain your crawlers. It's open source, but built by developers who scrape millions of pages every day for a living.
GitHub
crawlee/packages/core/src/storages/request_queue.ts at master · api...
Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, an...

Did you find this page helpful?