About "Requests Queue"
what is the purpose of Request queues storages. Does it automatically fetch request, or it just purely just for storing URLs. thanks you in advance (sorry for noob question).
3 Replies
optimistic-gold•3y ago
On Apify platform requests should be stored e.g. in case of migration event - this way after migration actor could pretty much continue where it left off. Locally - well - pretty much the same, you could abort and restart the actor where if left off. Or actually with the latest Crawlee - you could pretty much you memoryStorage for the queue: https://github.com/apify/crawlee/pull/1901, but it's definitely not a good for running on the platform
harsh-harlequin•17mo ago
why it's not good to running in The Platform @Andrey Bykov ?
I found out that requests queu expensive for my 95 scrappers.
It take almost 60% of my budget.
Do you think we can use something cheaper?
optimistic-gold•17mo ago
That depends - if you are scraping static list of URLs - meaning you have set of URLs and you vist them and extract data, without adding more - then you could use RequestList (you will have to explicitly specify it in the crwaler options). More here: https://crawlee.dev/api/core/class/RequestList
If you are adding more URLs during the run - then RequestQueue is a way to go
RequestList | API | Crawlee
Represents a static list of URLs to crawl.
The URLs can be provided either in code or parsed from a text file hosted on the web.
RequestList
is used by {@apilink BasicCrawler}, {@apilink CheerioCrawler}, {@apilink PuppeteerCrawler}
and {@apilink PlaywrightCrawler} as a source of URLs to crawl.
Each URL is represented using an instance of the ...