rival-black

How to set concurrency/cpu's/memory correcty

Hello, I would like to use PlayWrightCrawler for scraping , but it is not clear from the documentation how can I set up correctly concurrency, memory, cpu's, etc. Can someone help me out? What is the best practice to set up this Crawler to make scraping parallel? Thanks in advance!

2 Replies

Hall•8mo ago

View post on community site

This post has been pushed to the community knowledgebase. Any replies in this thread will be synced to the community site.

Apify Community

Marco•8mo ago

Hello! The best concurrency settings really depend on the context, for instance the available resources, the use-case and the scraped website. You can set the crawling options when creating the PlaywrightCrawler: see https://crawlee.dev/python/api/class/PlaywrightCrawler#__init__ and https://crawlee.dev/python/api/class/BasicCrawler#__init__. For instance, you can set concurrency_settings: https://crawlee.dev/python/api/class/ConcurrencySettings.

BasicCrawler | API | Crawlee for Python · Fast, reliable crawlers.

Crawlee helps you build and maintain your Python crawlers. It's open source and modern, with type hints for Python to help you catch bugs early.

PlaywrightCrawler | API | Crawlee for Python · Fast, reliable crawl...

Crawlee helps you build and maintain your Python crawlers. It's open source and modern, with type hints for Python to help you catch bugs early.

Gaming

Programming

How to set concurrency/cpu's/memory correcty

Did you find this page helpful?