Trying to use enqueueLinksByClickingElements

The page and requestQueue parameters are needed for this function but i dont know what should i put. This is the doc: https://crawlee.dev/api/playwright-crawler/namespace/playwrightClickElements#enqueueLinksByClickingElements Thanks for the help
21 Replies
optimistic-gold
optimistic-gold3y ago
You could use the https://crawlee.dev/api/playwright-crawler/interface/PlaywrightCrawlingContext#enqueueLinksByClickingElements - it's the same function, but it's context-aware, so you don't need to provide request queue and page. Part of the PlaywrightCrawlingContext
quickest-silver
quickest-silverOP3y ago
I am a little bit lost to be fair this is a exemple of what im trying to do
router.addHandler(labels.PAGE, async({page, enqueueLinksByClickingElements}) => {
// DO somthing
await enqueueLinksByClickingElements({
label:labels.PAGE,
selector: "#next",
})
})
router.addHandler(labels.PAGE, async({page, enqueueLinksByClickingElements}) => {
// DO somthing
await enqueueLinksByClickingElements({
label:labels.PAGE,
selector: "#next",
})
})
optimistic-gold
optimistic-gold3y ago
Then it should pretty much work, this function is context-aware, you don't need to provide page or requestQueue params there
quickest-silver
quickest-silverOP3y ago
ok thanks, but its not working actually
optimistic-gold
optimistic-gold3y ago
Just to clarify - there are two ways you could use this function. The link you sent above could be imported separately - and you would need to provide page/requestQueue. When used inside of the crawler - it's part of the context, and function already know about the page where it's being called and requestQueue which is being used. Note that your link goes to playwrightUtils namespace, while second link (the one i sent) goes to PlaywrightCrawlingContext Basically you could use it out of crawler for some edge case, just by using an instance of playwright and some separate request queue. But when used in crawler - it's not needed
quickest-silver
quickest-silverOP3y ago
ok i got it
MEE6
MEE63y ago
@Lesourdingo just advanced to level 1! Thanks for your contributions! 🎉
quickest-silver
quickest-silverOP3y ago
But nothing happens and i don't get an error, so i might do something wrong else where
optimistic-gold
optimistic-gold3y ago
note those warnings in the docs: IMPORTANT: To be able to do this, this function uses various mutations on the page, such as changing the Z-index of elements being clicked and their visibility. Therefore, it is recommended to only use this function as the last operation in the page. USING HEADFUL BROWSER: When using a headful browser, this function will only be able to click elements in the focused tab, effectively limiting concurrency to 1. In headless mode, full concurrency can be achieved. PERFORMANCE: Clicking elements with a mouse and intercepting requests is not a low level operation that takes nanoseconds. It’s not very CPU intensive, but it takes time. We strongly recommend limiting the scope of the clicking as much as possible by using a specific selector that targets only the elements that you assume or know will produce a navigation. You can certainly click everything by using the * selector, but be prepared to wait minutes to get results on a large and complex page. Also - make sure that selector is correct
quickest-silver
quickest-silverOP3y ago
ok thanks, i will check Looks like a selector problem After this: await page.waitForSelector("#next"); the locator seems to be hidden locator resolved to hidden <button id="next" type="button" class="btn pagingBtn hid…>…</button>
optimistic-gold
optimistic-gold3y ago
could you share the URL?
quickest-silver
quickest-silverOP3y ago
DBD DataWarehouse+
DBD DataWarehouse+
บริการข้อมูลนิติบุคคลและงบการเงิน
optimistic-gold
optimistic-gold3y ago
I meant there URL on which you're trying to enqueue more pages 🙂 on this one I don't see #next selector at all
quickest-silver
quickest-silverOP3y ago
thats this one, u need to reload after accepting the prompt
optimistic-gold
optimistic-gold3y ago
ah, I see it now 👍 Well - the button is hidden indeed - so it cannot really click on it. I don't know why it's hidden - something website specific apparently..
quickest-silver
quickest-silverOP3y ago
oh ok
MEE6
MEE63y ago
@Lesourdingo just advanced to level 2! Thanks for your contributions! 🎉
optimistic-gold
optimistic-gold3y ago
might be easier to replicate XHR request as it's a web app, so it's not really reloading the page, it sends POST requests which only differ in currentPage number (at least what I saw, maybe there's more)
quickest-silver
quickest-silverOP3y ago
how could i do this?
optimistic-gold
optimistic-gold3y ago
I guess you could start from here: https://docs.apify.com/academy/api-scraping
API scraping | Apify Documentation
Learn all about how the professionals scrape various types of APIs with various configurations, parameters, and requirements.
quickest-silver
quickest-silverOP3y ago
ok thanks i will take a look

Did you find this page helpful?