CA
genetic-orange
which browser is the best to crawl
As title said
I’m using chromium currently but it is cpu heavy in usage
Killing browser do not kill the process and because of that it’s easy to get 100% cpu usage pretty quickly
(I’m crawling thousands of websites where on each I’m looking for different data) I already try to load pure html without css, images and other assets, that helped a lot but issue is still there
4 Replies
View post on community site
This post has been pushed to the community knowledgebase. Any replies in this thread will be synced to the community site.
Apify Community
absent-sapphire•8mo ago
Hi @Wojciech
I recommend also blocking unnecessary network requests. with the blockRequests
Make sure that are running it in headless mode.
Also you could try using cheerio if the use-case allows it.
Regarding your question about the browser:
Firefox tends to be lighter on CPU usage.
Using Firefox browser with Playwright crawler | Crawlee · Build rel...
Crawlee helps you build and maintain your crawlers. It's open source, but built by developers who scrape millions of pages every day for a living.
PlaywrightCrawlingContext | API | Crawlee · Build reliable crawlers...
Crawlee helps you build and maintain your crawlers. It's open source, but built by developers who scrape millions of pages every day for a living.
genetic-orangeOP•8mo ago
yes I already do that
unfortunetly I recive:
WARN Playwright Utils: blockRequests() helper is incompatible with non-Chromium browsers.
I didn't know that 😄multiple-amethyst•7mo ago
you can block requests manually (I mean not using util func)
Example: