CA
rival-black
How to set concurrency/cpu's/memory correcty
Hello, I would like to use
PlayWrightCrawler
for scraping , but it is not clear from the documentation how can I set up correctly concurrency, memory, cpu's, etc. Can someone help me out? What is the best practice to set up this Crawler to make scraping parallel? Thanks in advance!2 Replies
View post on community site
This post has been pushed to the community knowledgebase. Any replies in this thread will be synced to the community site.
Apify Community
Hello! The best concurrency settings really depend on the context, for instance the available resources, the use-case and the scraped website. You can set the crawling options when creating the
PlaywrightCrawler
: see https://crawlee.dev/python/api/class/PlaywrightCrawler#__init__ and https://crawlee.dev/python/api/class/BasicCrawler#__init__. For instance, you can set concurrency_settings
: https://crawlee.dev/python/api/class/ConcurrencySettings.BasicCrawler | API | Crawlee for Python · Fast, reliable crawlers.
Crawlee helps you build and maintain your Python crawlers. It's open source and modern, with type hints for Python to help you catch bugs early.
PlaywrightCrawler | API | Crawlee for Python · Fast, reliable crawl...
Crawlee helps you build and maintain your Python crawlers. It's open source and modern, with type hints for Python to help you catch bugs early.