Hello!

Im trying to scrape product data using Web Scraper actor( https://www.opticsplanet.com/s/026509005698), but i cant open product page. If i`m using $('div.specials-contents').find('span.grid__text').click() log writes target closed. How can i add product page to scrape data from it?
14 Replies
conscious-sapphire
conscious-sapphire•3y ago
Can you provide link to your run please. So I can check all logs/input
wise-white
wise-white•3y ago
you need to enqueue url of the product page
correct-apricot
correct-apricotOP•3y ago
like this context.enqueueRequest($('div.specials-contents').find('span.grid__text'))?
wise-white
wise-white•3y ago
no, this returns cheerio element, you need to return string that is the url of the page, for example https://www.opticsplanet.com/allen-miscellaneous-accessories-569.html
await context.enqueueRequest($('div.specials-contents').find('.grid__link').attr('href'));
await context.enqueueRequest($('div.specials-contents').find('.grid__link').attr('href'));
somethig like this should enqueue first detail page
correct-apricot
correct-apricotOP•3y ago
thank you, ill try it now I'm doing something wrong. Reclaiming failed request back to the list or queue. Expected property url to be of type string but received type undefined in object requestLike async function pageFunction(context) { const $ = context.jQuery; context.log.info(URL: ${context.request.url}`); await context.enqueueRequest($('div.specials-contents').find('.grid__link').attr('href')); return { name:$('div.page-header_product-page').find('h1').text(), }; }
MEE6
MEE6•3y ago
@Shubko_N just advanced to level 1! Thanks for your contributions! 🎉
conscious-sapphire
conscious-sapphire•3y ago
In enqueueRequest you should use request object:
const myUrl = $('div.specials-contents').find('.grid__link').attr('href');
await context.enqueueRequest({ url: myUrl });
const myUrl = $('div.specials-contents').find('.grid__link').attr('href');
await context.enqueueRequest({ url: myUrl });
correct-apricot
correct-apricotOP•3y ago
now scraper open page and log writes : INFO URL: https://www.opticsplanet.com/allen-hunting-accessories-562.html WARN PuppeteerCrawler: Reclaiming failed request back to the list or queue. Expected property url to be of type string but received type undefined in object requestLike, or it`s ok? And why it returns blank field name?
OpticsPlanet
Allen Antler Mounting Kit — 2 models
Shop Allen Antler Mounting Kit | Be The First To Review Allen Antler Mounting Kit + Free Shipping over $49.
wise-white
wise-white•3y ago
it is maybe because it wants to again enqueue that selector and it did not find it because you are already on the detail page and not on the list page, so you need to implement some checks and logic
correct-apricot
correct-apricotOP•3y ago
Thank you! Now it cant open page and get data I have one more question, how can i scrape all items in this product to different rows? Like row 1 : upc , name, price; row 2 : upc2, name , price2; not like this : row1 : upc, name,price , upc1 , name , price2
rare-sapphire
rare-sapphire•3y ago
To scrape all items in the product to different rows, you can modify your pageFunction to iterate through all the product items and enqueue a request for each item's detail page. Then, for each detail page, extract the relevant data and return it as a separate row. Here's an example implementation:
async function pageFunction(context) {
const $ = context.jQuery;

// Get all the item links on the page and enqueue requests for their detail pages
const itemLinks = $('div.specials-contents').find('.grid__link');
for (const link of itemLinks) {
const itemUrl = $(link).attr('href');
await context.enqueueRequest(itemUrl);
}

// Extract data from each item detail page
const results = [];
const detailPages = context.getRequestQueue().filter(req => req.userData.isDetailPage);
for (const page of detailPages) {
const $detail = page.$('.page-header_product-page');
const upc = $detail.find('.product-code__value').text();
const name = $detail.find('h1').text();
const price = $detail.find('.product-price__value').text();
results.push({ upc, name, price });
}

return results;
}
async function pageFunction(context) {
const $ = context.jQuery;

// Get all the item links on the page and enqueue requests for their detail pages
const itemLinks = $('div.specials-contents').find('.grid__link');
for (const link of itemLinks) {
const itemUrl = $(link).attr('href');
await context.enqueueRequest(itemUrl);
}

// Extract data from each item detail page
const results = [];
const detailPages = context.getRequestQueue().filter(req => req.userData.isDetailPage);
for (const page of detailPages) {
const $detail = page.$('.page-header_product-page');
const upc = $detail.find('.product-code__value').text();
const name = $detail.find('h1').text();
const price = $detail.find('.product-price__value').text();
results.push({ upc, name, price });
}

return results;
}
In this implementation, we first iterate through all the item links on the page and enqueue a request for each item's detail page. We set the userData.isDetailPage flag to true for each detail page request so we can later filter them out from the list of requests. After all the detail page requests have completed, we iterate through them and extract the relevant data from each page. We then push the data into an array and return it.
MEE6
MEE6•3y ago
@NPZ just advanced to level 1! Thanks for your contributions! 🎉
correct-apricot
correct-apricotOP•3y ago
Thank you!

Did you find this page helpful?