Hello!
I
m trying to scrape product data using Web Scraper actor( https://www.opticsplanet.com/s/026509005698), but i can
t open product page. If i`m using $('div.specials-contents').find('span.grid__text').click() log writes target closed. How can i add product page to scrape data from it?14 Replies
conscious-sapphire•3y ago
Can you provide link to your run please. So I can check all logs/input
correct-apricotOP•3y ago
wise-white•3y ago
you need to enqueue url of the product page
correct-apricotOP•3y ago
like this context.enqueueRequest($('div.specials-contents').find('span.grid__text'))?
wise-white•3y ago
no, this returns cheerio element, you need to return string that is the url of the page, for example
https://www.opticsplanet.com/allen-miscellaneous-accessories-569.html
somethig like this should enqueue first detail pagecorrect-apricotOP•3y ago
thank you, i
ll try it now
I'm doing something wrong.
Reclaiming failed request back to the list or queue. Expected property
url to be of type
string but received type
undefined in object
requestLike
async function pageFunction(context) {
const $ = context.jQuery;
context.log.info(
URL: ${context.request.url}`);
await context.enqueueRequest($('div.specials-contents').find('.grid__link').attr('href'));
return {
name:$('div.page-header_product-page').find('h1').text(),
};
}@Shubko_N just advanced to level 1! Thanks for your contributions! 🎉
conscious-sapphire•3y ago
In
enqueueRequest
you should use request object:
correct-apricotOP•3y ago
now scraper open page and log writes : INFO URL: https://www.opticsplanet.com/allen-hunting-accessories-562.html WARN PuppeteerCrawler: Reclaiming failed request back to the list or queue. Expected property
url
to be of type string
but received type undefined
in object requestLike
, or it`s ok? And why it returns blank field name?OpticsPlanet
Allen Antler Mounting Kit — 2 models
Shop Allen Antler Mounting Kit | Be The First To Review Allen Antler Mounting Kit + Free Shipping over $49.
wise-white•3y ago
it is maybe because it wants to again enqueue that selector and it did not find it because you are already on the detail page and not on the list page,
so you need to implement some checks and logic
correct-apricotOP•3y ago
Thank you! Now it cant open page and get data
I have one more question, how can i scrape all items in this product to different rows? Like row 1 : upc , name, price; row 2 : upc2, name , price2; not like this : row1 : upc, name,price , upc1 , name , price2
rare-sapphire•3y ago
To scrape all items in the product to different rows, you can modify your pageFunction to iterate through all the product items and enqueue a request for each item's detail page. Then, for each detail page, extract the relevant data and return it as a separate row.
Here's an example implementation:
In this implementation, we first iterate through all the item links on the page and enqueue a request for each item's detail page. We set the userData.isDetailPage flag to true for each detail page request so we can later filter them out from the list of requests.
After all the detail page requests have completed, we iterate through them and extract the relevant data from each page. We then push the data into an array and return it.
@NPZ just advanced to level 1! Thanks for your contributions! 🎉
correct-apricotOP•3y ago
Thank you!