Crawlee Playwright Access to Network requests

Hello, Is there a method to access the “network” requests that are sent during the crawl? I’m trying to store image URLs, currently doing page$$.eval - however there are some variations in how certain sites embed image urls. For example, lazy loading, make network requests and I can see them in DevTools Network tab. Any way to access this and store it? Please let me know if my question isn’t clear. Thanks!
2 Replies
Pepa J
Pepa J3y ago
Hello @cryptorex : Add prenavigationHook to the crawler. In the prenavigation hook set listener on requests:
page.on('request', (req) => {
if(req.resourceType() === 'image'){
console.log(req.url());
}
}
page.on('request', (req) => {
if(req.resourceType() === 'image'){
console.log(req.url());
}
}
extended-salmon
extended-salmonOP3y ago
thanks for your quick reply!
For anyone else, based on @Pepa J 's guidance I'm going with this:
preNavigationHooks: [
async (crawlingContext) => {
const { page, request } = crawlingContext;
page.on('request', (pageobj) => {
const requestUrl = pageobj.url();
if(pageobj.resourceType() === 'image' && requestUrl.match(/\.(webp|bmp|tif?f|png|jpe?g|gif|svg)$/i)) {
if(requestUrl.match(excludedImgUrls) == null && requestUrl.length > 0) {
cb.push({imgurl: requestUrl, pageurl: request.url});
}
}
})
},
]
preNavigationHooks: [
async (crawlingContext) => {
const { page, request } = crawlingContext;
page.on('request', (pageobj) => {
const requestUrl = pageobj.url();
if(pageobj.resourceType() === 'image' && requestUrl.match(/\.(webp|bmp|tif?f|png|jpe?g|gif|svg)$/i)) {
if(requestUrl.match(excludedImgUrls) == null && requestUrl.length > 0) {
cb.push({imgurl: requestUrl, pageurl: request.url});
}
}
})
},
]

Did you find this page helpful?