Disable image in playwright
How can I disable downloading images and videos and other media globally for my scraper?
22 Replies
@Casper just advanced to level 8! Thanks for your contributions! 🎉
grumpy-cyan•3y ago
You can create an array of resourceTypes that you'd like to block.
Then within your
preNavigationHooks
of your crawler, add this function:
fair-roseOP•3y ago
Thanks I will try that
grumpy-cyan•3y ago
You can also check out this article https://scrapingant.com/blog/block-requests-playwright
Block resources with Playwright | ScrapingAnt Blog
This article will show you how to intercept and block requests with Playwright using the request interception API. Learn how to block images, CSS and Javascript loading.
fair-roseOP•3y ago
Thanks
fair-roseOP•3y ago
I have this in my main.ts file:

fair-roseOP•3y ago
it does not work yet, can you spot an error?
fair-roseOP•3y ago
I inject it here:

grumpy-cyan•3y ago
Just add the function directly into the crawler
Here's one of my crawlers using the preNavigationHook
fair-roseOP•3y ago
thanks it works
however I dont get why I consume so much bandwidth
fair-roseOP•3y ago
is it possible to see all the requests made for each url eg: https://dk.trustpilot.com/review/www.diba.dk
Trustpilot
Diba Billån er bedømt "Fremragende" med 4,8 / 5 på Trustpilot
Er du enig i TrustScoren for Diba Billån? Del din mening i dag, og find ud af, hvad 665 kunder allerede har sagt.
fair-roseOP•3y ago
so I can inspect and see which requests are unnecessary
in playwright
or do I need to use chrome dev tools for that
grumpy-cyan•3y ago
The reason is because request interception disables cache in Playwright, so you are downloading everything every single time
grumpy-cyan•3y ago
grumpy-cyan•3y ago
It is possible to see them all! Just add this function to your prenavigation hooks:
grumpy-cyan•3y ago
All of this stuff is covered in our Playwright/Puppeteer course in the academy:
https://developers.apify.com/academy/puppeteer-playwright
Apify
Puppeteer & Playwright · Apify Developers
Learn in-depth how to use two of the most popular Node.js libraries for controlling a headless browser - Puppeteer and Playwright.
fair-roseOP•3y ago
thanks. I have this but I can not get access to the url, I pre sume because I need to await it, but I cant use await there:

grumpy-cyan•3y ago
req.url()
is a function and does not need to be awaited.
fair-roseOP•3y ago
thanks. I missed the ()
grumpy-cyan•3y ago
I agree that it should be a getter instead of a function.
req.url
makes much more sense than req.url()
.fair-roseOP•3y ago
yeah
but it is a small issue
amazing how much bandwidth is saved by cache:
98 requests 1.6 MB without cache
96 requests 54 KB with cache
Is there a better option to not download unnecessary files than manually intercepting requests?
grumpy-cyan•3y ago
Nope
sadly