CA
Crawlee & Apify•13mo ago
conscious-sapphire

How to determine if dynamic content is loaded or not. PuppeteerCrawler

in the requestHandler I'm trying to click to the pagination next button and I cannot determine if the content is changed or not. How can I do it? waitfornetworkidle does not seem to work here. any ideas? See the GIF
new PuppeteerCrawler({
preNavigationHooks: [
async ({ page }) => {
page.on('response', async (res) => {
if (res.url().includes('api/offersearches/filters')) {
try {
const json = await res.json();
const jsonString = JSON.stringify(json);
const filePath = 'data.json';
fs.appendFile(filePath, jsonString + '\n', () => {});
} catch (err) {
console.error('Response wasn\'t JSON or failed to parse response.');
}
}
});
},
],
async requestHandler({ request, page }) {
for (let i = 0; i < maxNumberOfPages; i++) {
const isDisabled = await page.evaluate(() => document.querySelector('[data-testid="mo-pagination-next"] button.mo-button--pagination').disabled);
if (isDisabled) {
break;
}

await Promise.all([
page.waitForNetworkIdle(),
page.click('[data-testid="mo-pagination-next"] button.mo-button--pagination'),
]);
console.log('clicked'); // it never reaches
}
},
});
new PuppeteerCrawler({
preNavigationHooks: [
async ({ page }) => {
page.on('response', async (res) => {
if (res.url().includes('api/offersearches/filters')) {
try {
const json = await res.json();
const jsonString = JSON.stringify(json);
const filePath = 'data.json';
fs.appendFile(filePath, jsonString + '\n', () => {});
} catch (err) {
console.error('Response wasn\'t JSON or failed to parse response.');
}
}
});
},
],
async requestHandler({ request, page }) {
for (let i = 0; i < maxNumberOfPages; i++) {
const isDisabled = await page.evaluate(() => document.querySelector('[data-testid="mo-pagination-next"] button.mo-button--pagination').disabled);
if (isDisabled) {
break;
}

await Promise.all([
page.waitForNetworkIdle(),
page.click('[data-testid="mo-pagination-next"] button.mo-button--pagination'),
]);
console.log('clicked'); // it never reaches
}
},
});
Here's my code so far. Currently button is clicked OK, the data is fetched OK. it just hangs in the end, I guess waitForNetworkIdle is never resolving
No description
4 Replies
unwilling-turquoise
unwilling-turquoise•13mo ago
Hi @4unkur You might want to try Page.waitForFunction() method https://pptr.dev/api/puppeteer.page.waitforfunction Or you could wait for a specific selector that is loaded when the request is resolved https://pptr.dev/api/puppeteer.page.waitforselector Or you could wait for the request that fetches the data with Page.waitForResponse() method https://pptr.dev/api/puppeteer.page.waitforresponse It depends what works for you the best 🙂 . Hope this helps
Page.waitForFunction() method | Puppeteer
Waits for the provided function, pageFunction, to return a truthy value when evaluated in the page's context.
Page.waitForSelector() method | Puppeteer
Wait for the selector to appear in page. If at the moment of calling the method the selector already exists, the method will return immediately. If the selector doesn't appear after the timeout milliseconds of waiting, the function will throw.
conscious-sapphire
conscious-sapphireOP•13mo ago
@Lukas Celnar Thank you for your response. waitForSelector does not seem to work as the html is already there it just updates once ajax request is complete waitForResponse is fired before DOM is changed, so this does not work as well. I was able to make it work adding timer after response is ready for 3 sec, but that's not the right way I guess. waitForFunction - I am not sure how can I utilize this in my case. Anyway, I was able to implement the scraper via adding next page URL to the request queue instead. So task is complete but the question I've asked is still open for me (
unwilling-turquoise
unwilling-turquoise•13mo ago
Waiting for some time with page.waitForTimeout is another way, but of course if there are going to be some delays from the website there are going to be troubles with it, so i would use it only as a last option if nothing else works. with waitForFunction you could save the initial content const initialContent = await page.evaluate(() => document.querySelector('[data-testid="content-element"]').textContent); and then wait for changes await page.waitForFunction( (initialContent) => { const newContent = document.querySelector('[data-testid="content-element"]').textContent; return newContent !== initialContent; }, { timeout: 10000 }, initialContent ) But if you can just add the urls to the request que then i would go with that approach.
conscious-sapphire
conscious-sapphireOP•13mo ago
Thank you @Lukas Celnar !

Did you find this page helpful?