How to use network mocking ?

I use playwright. And I'd like to mock some network requests, so as not to attack my CDN too hard. With playwright, we can do something like this (documentation):
test.beforeEach(async ({ context }) => {
// Block any css requests for each test in this file.
await context.route(/.css$/, route => route.abort());
});
test.beforeEach(async ({ context }) => {
// Block any css requests for each test in this file.
await context.route(/.css$/, route => route.abort());
});
Do you know how to do this with Crawlee ?
Network | Playwright
Playwright provides APIs to monitor and modify network traffic, both HTTP and HTTPS. Any requests that a page does, including XHRs and fetch requests, can be tracked, modified and handled.
2 Replies
rival-black
rival-black2y ago
preNavigationHooks is the right place for it: https://crawlee.dev/api/puppeteer-crawler/interface/PuppeteerCrawlerOptions#preNavigationHooks You can create an array of resourceTypes that you'd like to block. Example for Playwright:
const BLOCKED = ['image', 'stylesheet', 'media', 'font','other'];

Then within your preNavigationHooks of your crawler, add this function:
async ({ page }) => {
await page.route('**/*', (route) => {
if (BLOCKED.includes(route.request().resourceType())) return route.abort();
return route.continue()
});
};
const BLOCKED = ['image', 'stylesheet', 'media', 'font','other'];

Then within your preNavigationHooks of your crawler, add this function:
async ({ page }) => {
await page.route('**/*', (route) => {
if (BLOCKED.includes(route.request().resourceType())) return route.abort();
return route.continue()
});
};
Or you can try to use Crawlee util functions (also in preNavigationHooks option): https://crawlee.dev/api/3.0/playwright-crawler/namespace/playwrightUtils#blockRequests https://crawlee.dev/api/3.0/puppeteer-crawler/namespace/puppeteerUtils#blockRequests
metropolitan-bronze
metropolitan-bronze2y ago
When you use a route (suggested above), be aware that you are also disabling the browser cache. So if you want to respect browser cache settings for requests that are not excluded, you will need to develop a method to cache and serve results. This article cover that topic. https://docs.apify.com/academy/node-js/caching-responses-in-puppeteer#solving-the-problem-by-creating-an-in-memory-cache
How to optimize Puppeteer by caching responses | Apify Documentation
Learn why it is important for performance to cache responses in memory when intercepting requests in Puppeteer and how to implement it in your code.

Did you find this page helpful?