push Dataset but got nothing
Hi, i'm new I try to make like https://crawlee.dev/docs/examples/playwright-crawler
but make none data on storage : /
I really don't understand how does it work :
- I have url log
- playwright is ok ??????
With ".dxp-node" I expected to fetch 153 text nodes ...
Playwright crawler | Crawlee
This example demonstrates how to use PlaywrightCrawler in combination with RequestQueue to recursively scrape the Hacker News website using headless Chrome / Playwright.
10 Replies
sensitive-blueOP•2y ago
Someone have idea what happens ? (Up)
rare-sapphire•2y ago
Hey there - what are you trying to enqueue? currently you start the crawler with this url - https://fr.wikipedia.org/wiki/Lexique_de_l%27orgue - and then trying to find the new links with provided glob patterns, while the pattern is the page url itself - meaning enqueueLinks does not find anything on the page, and the crawler just shuts down
I'd say the question isn't really what are you trying to enqueue, please share what is the workflow here generally?
sensitive-blueOP•2y ago
thank you for taking this time, ... I understand why its going nowhere ... I just want to fetch data in one page only https://fr.wikipedia.org/wiki/Lexique_de_l%27orgue I understand that I can also make it different ways but obviously sometimes we have to deal with exceptions ...
I found this https://crawlee.dev/api/core/function/enqueueLinks maybe i will try to put in "urls" my link
rare-sapphire•2y ago
I am still not following. You alreadu run the crawler with the above start URL - meaning in requestHandler you have the first page loaded with it. If you want to scrape data from this page - why are you trying to enqueue more pages? You either have to use
detail
handler as default handler, or add start url as an object with url being your start url and label set to detail
sensitive-blueOP•2y ago
I test that but dont work .....
"why are you trying to enqueue more pages? " you expected that I write it correctly, if it is easy, I would love to watch some code that works [...] . I not able to execute the code in the documentation : / . ANYWAY "await Dataset.pushData" is not executed ... It should be ... I guess
@Landerfine l'écarlate just advanced to level 1! Thanks for your contributions! 🎉
rare-sapphire•2y ago
have you checked that https://docs.apify.com/academy ? in first snippet you have a missing await for crawler.run() call. Also $ is not part of the PlaywrightCrawlingContext.
second snippet - you're not adding any requests to the crawler - you open requests queue explicitly, but don't add it to the crawler. And also pretty much the same comments as in my previous message. Also I don't see any elements with the selector you provide
.dxp-node
on the pagesensitive-blueOP•2y ago
thanks to report that (.dxp-node dont exist indeed ... now ...), maybe wikipédia have updated ... I will test later some updates. I will respond when I see your ressource/suggestion soon
aaaaah ... ok ... crawly just only fetch links ...
Hello @Landerfine l'écarlate ,
Not sure if I follow. Crawling is a process of getting urls from website, navigate through them and obtain another links.
Crawlee is a framework capable to do this, but it also allows you to get "any" information from the website and store it, so you can use them later.
At this point I am not sure if you are having issues with your data were not stored. If so please let me know and we may investigate it more. 🙂
sensitive-blueOP•2y ago
for me it is closed ! Thanks