Managing Queue using redis or something similar and having worker nodes listening on queue

I'm trying to run Crawlee for production use and try to scale where we can have a cluster of worker nodes who will be ready for crawling pages based on the request. How can achieve this. The RequestQueue is basically writing requests to files and not utilizing any queueing system. I couldn't find doc that said how i can utilise Redis queue or something similar.
6 Replies
Hall
Hall•5mo ago
Someone will reply to you shortly. In the meantime, this might help:
harsh-harlequin
harsh-harlequin•5mo ago
I'm not aware of such a possibility. Actually, I don't think that Crawlee's queues were intended for concurrent access, but for keeping track of todo/done jobs within a single or multiple, but subsequent, executions. You should develop your own solution to manage and scale workers, or look at existing solutions, such as Apify.
rising-crimson
rising-crimsonOP•5mo ago
If i create a custom RequestQueue which uses redis, then this should be possible right? Or is it possible that I can use Apify managed queue and still run the crawler in my infra instead of managed actors? @Marco
harsh-harlequin
harsh-harlequin•5mo ago
To the latter question, I'd say no: Apify does not provide on premise solutions. Regarding implementing a RequestQueue with uses Redis, I think it would be possible! You can take a look at the code here: https://github.com/apify/crawlee/blob/master/packages/core/src/storages/request_queue_v2.ts#L55
GitHub
crawlee/packages/core/src/storages/request_queue_v2.ts at master · ...
Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, an...
rising-crimson
rising-crimsonOP•5mo ago
Okay. I will check it out. I guess extending the RequestQueue with redis would do the trick for me.
MEE6
MEE6•5mo ago
@darkprince just advanced to level 1! Thanks for your contributions! 🎉

Did you find this page helpful?