Puppeteer Crawler cannot open the page

Hi, I have a puppeteer scrapper, which worked just fine until this Monday. Nothing is changed, but scrapper stopped working. The HTML markup of the page is not changed, a[data-testid="search-listing-title"] this element is still there. Apify run logs says it is failing to find this HTML element: TimeoutError: waiting for selector a[data-testid="search-listing-title"] failed: timeout 30000ms exceeded I have tried to launch scrapper from local machine and it did work but does not work on Apify platform. I guess something has to do with proxy. This is part of my code:
//...
const proxyConfiguration = await Apify.createProxyConfiguration();

const launchContext = {
useChrome: true,
stealth: true,
launchOptions: {
headless: true,
},
};

const crawler = new Apify.PuppeteerCrawler({
requestList,
requestQueue,
proxyConfiguration,
launchContext: launchContext as any,
maxRequestRetries: 5,
handlePageTimeoutSecs: 180,
navigationTimeoutSecs: 180,
async handlePageFunction({ page, request }): Promise<void> {

await utils.puppeteer.saveSnapshot(page, { key: 'beforescrap', saveHtml: false });
const cheerio = load(await page.content());

const html = cheerio.html();
await Apify.setValue('htmlstring', html, { contentType: 'text/html' });

await page.waitForSelector('a[data-testid="search-listing-title"]');
//...
//...
const proxyConfiguration = await Apify.createProxyConfiguration();

const launchContext = {
useChrome: true,
stealth: true,
launchOptions: {
headless: true,
},
};

const crawler = new Apify.PuppeteerCrawler({
requestList,
requestQueue,
proxyConfiguration,
launchContext: launchContext as any,
maxRequestRetries: 5,
handlePageTimeoutSecs: 180,
navigationTimeoutSecs: 180,
async handlePageFunction({ page, request }): Promise<void> {

await utils.puppeteer.saveSnapshot(page, { key: 'beforescrap', saveHtml: false });
const cheerio = load(await page.content());

const html = cheerio.html();
await Apify.setValue('htmlstring', html, { contentType: 'text/html' });

await page.waitForSelector('a[data-testid="search-listing-title"]');
//...
I have tried to take a screenshot to see what the page looks like and it gives blank white page. I have also tried to change proxy settings to use residential servers and change the country - also did not work. How can I debug this? Logs screenshot is also attached.
No description
10 Replies
Pepa J
Pepa J•14mo ago
Hi @4unkur , Can you try to run https://apify.com/apify/screenshot-url on the url, that failed for you? Just out of curiosity I tried to run it on page like https://www.autotrader.co.uk/car-search?postcode=PO16%207GZ&refresh=true and it seems all the data are there 🤔
Apify
Website Screenshot Generator - Screenshot URL · Apify
Create a screenshot of a website based on a specified URL. The screenshot is stored as the output in a key-value store. It can be used to monitor web changes regularly after setting up the scheduler.
exotic-emerald
exotic-emeraldOP•14mo ago
OK, I'll try it. The problem is not with single URL. As you can see it's car listings website and basically we scraping the vehicles.
MEE6
MEE6•14mo ago
@4unkur just advanced to level 1! Thanks for your contributions! 🎉
exotic-emerald
exotic-emeraldOP•14mo ago
@Pepa J react skeleton can be seen here. My guess is maybe "stealth" mode of puppeteer is not working or something. I'm thinking maybe I'll upgrade apify SDK to the latest. I don't know...
exotic-emerald
exotic-emeraldOP•14mo ago
@Pepa J not a big deal. In my actor I get complete blank screenshot
Pepa J
Pepa J•14mo ago
I mean the data seems to be there, I am using just a DATACENTER proxies for me I am trying to think about what may cause is in your case.
exotic-emerald
exotic-emeraldOP•13mo ago
@Pepa J Hi, I have upgraded to apify v3 (I rewrote it in pure JS) the selector can be seen now. That part is solved. I guess puppeteer's stealth plugin was not making it's job and I was blocked... With new sdk it works OK. So the main problem is solved. Should I close the topic or something? BTW I have a new problem (:, I've posted here https://discord.com/channels/801163717915574323/1231166329076191274
Pepa J
Pepa J•13mo ago
@4unkur Thank you. No this is fine.

Did you find this page helpful?