page.setRequestInterception(true)

How to I can use page.setRequestInterception(true) in PuppeteerCrawler (not use raw Puppeteer)
9 Replies
MEE6
MEE6•3y ago
@songoku just advanced to level 2! Thanks for your contributions! 🎉
ratty-blush
ratty-blush•3y ago
preNavigationHooks is the right place for it: https://crawlee.dev/api/puppeteer-crawler/interface/PuppeteerCrawlerOptions#preNavigationHooks Example:
preNavigationHooks: [async ({ page }, gotoOptions) => {
gotoOptions.waitUntil = 'domcontentloaded';
await page.setRequestInterception(true);

page.on('request', async (req) => {
if (req.url().includes('something-you-re-looking-for')) {
// your logic
}
await req.continue(); // if you don't call this, it will hang indefinitely
});
await page.setRequestInterception(false);
}],
preNavigationHooks: [async ({ page }, gotoOptions) => {
gotoOptions.waitUntil = 'domcontentloaded';
await page.setRequestInterception(true);

page.on('request', async (req) => {
if (req.url().includes('something-you-re-looking-for')) {
// your logic
}
await req.continue(); // if you don't call this, it will hang indefinitely
});
await page.setRequestInterception(false);
}],
stormy-gold
stormy-goldOP•3y ago
thank you
quickest-silver
quickest-silver•3y ago
you enabling http requests tracking, after that you need to page.on('request') (or response) if you not adding your own logic to process responses or requests then it makes no sense to enable interception, right? 😉
stormy-gold
stormy-goldOP•3y ago
I want to obtain respone ajax from request main URL
conscious-sapphire
conscious-sapphire•3y ago
@songoku the Ajax request is made on the page?
stormy-gold
stormy-goldOP•3y ago
yes
conscious-sapphire
conscious-sapphire•3y ago
Within your preNavigationHooks, you can add a function that looks like this. It will listen for responses:
async ({ page }) => {
page.on('response', function handleResponse(res) {
if (res.url().includes('foo')) {
// do something
}

page.off('response', handleResponse);
});
}
async ({ page }) => {
page.on('response', function handleResponse(res) {
if (res.url().includes('foo')) {
// do something
}

page.off('response', handleResponse);
});
}
optimistic-gold
optimistic-gold•3y ago
Yes, you don't need request interception at all. You only care about responses. You can also scrape the ajax directly as explained here https://developers.apify.com/academy/api-scraping
Apify
API scraping · Apify Developers
Learn all about how the professionals scrape various types of APIs with various configurations, parameters, and requirements.

Did you find this page helpful?