Crawlee & Apify•3y ago

Trying to use enqueueLinksByClickingElements

The page and requestQueue parameters are needed for this function but i dont know what should i put. This is the doc: https://crawlee.dev/api/playwright-crawler/namespace/playwrightClickElements#enqueueLinksByClickingElements Thanks for the help

playwrightClickElements | API | Crawlee

21 Replies

optimistic-gold•3y ago

You could use the https://crawlee.dev/api/playwright-crawler/interface/PlaywrightCrawlingContext#enqueueLinksByClickingElements - it's the same function, but it's context-aware, so you don't need to provide request queue and page. Part of the PlaywrightCrawlingContext

PlaywrightCrawlingContext | API | Crawlee

quickest-silverOP•3y ago

I am a little bit lost to be fair this is a exemple of what im trying to do

router.addHandler(labels.PAGE, async({page, enqueueLinksByClickingElements}) => {
    // DO somthing
    await enqueueLinksByClickingElements({
        label:labels.PAGE,
        selector: "#next",
    })
})

router.addHandler(labels.PAGE, async({page, enqueueLinksByClickingElements}) => {
    // DO somthing
    await enqueueLinksByClickingElements({
        label:labels.PAGE,
        selector: "#next",
    })
})

optimistic-gold•3y ago

Then it should pretty much work, this function is context-aware, you don't need to provide page or requestQueue params there

quickest-silverOP•3y ago

ok thanks, but its not working actually

optimistic-gold•3y ago

Just to clarify - there are two ways you could use this function. The link you sent above could be imported separately - and you would need to provide page/requestQueue. When used inside of the crawler - it's part of the context, and function already know about the page where it's being called and requestQueue which is being used. Note that your link goes to playwrightUtils namespace, while second link (the one i sent) goes to PlaywrightCrawlingContext Basically you could use it out of crawler for some edge case, just by using an instance of playwright and some separate request queue. But when used in crawler - it's not needed

quickest-silverOP•3y ago

ok i got it

MEE6•3y ago

@Lesourdingo just advanced to level 1! Thanks for your contributions! 🎉

quickest-silverOP•3y ago

But nothing happens and i don't get an error, so i might do something wrong else where

optimistic-gold•3y ago

note those warnings in the docs: IMPORTANT: To be able to do this, this function uses various mutations on the page, such as changing the Z-index of elements being clicked and their visibility. Therefore, it is recommended to only use this function as the last operation in the page. USING HEADFUL BROWSER: When using a headful browser, this function will only be able to click elements in the focused tab, effectively limiting concurrency to 1. In headless mode, full concurrency can be achieved. PERFORMANCE: Clicking elements with a mouse and intercepting requests is not a low level operation that takes nanoseconds. It’s not very CPU intensive, but it takes time. We strongly recommend limiting the scope of the clicking as much as possible by using a specific selector that targets only the elements that you assume or know will produce a navigation. You can certainly click everything by using the * selector, but be prepared to wait minutes to get results on a large and complex page. Also - make sure that selector is correct

quickest-silverOP•3y ago

ok thanks, i will check Looks like a selector problem After this: await page.waitForSelector("#next"); the locator seems to be hidden locator resolved to hidden <button id="next" type="button" class="btn pagingBtn hid…>…</button>

optimistic-gold•3y ago

could you share the URL?

quickest-silverOP•3y ago

https://datawarehouse.dbd.go.th/searchJuristicInfo/86101/submitObjCode/1

DBD DataWarehouse+

บริการข้อมูลนิติบุคคลและงบการเงิน

optimistic-gold•3y ago

I meant there URL on which you're trying to enqueue more pages 🙂 on this one I don't see #next selector at all

quickest-silverOP•3y ago

thats this one, u need to reload after accepting the prompt

optimistic-gold•3y ago

ah, I see it now 👍 Well - the button is hidden indeed - so it cannot really click on it. I don't know why it's hidden - something website specific apparently..

quickest-silverOP•3y ago

oh ok

MEE6•3y ago

@Lesourdingo just advanced to level 2! Thanks for your contributions! 🎉

optimistic-gold•3y ago

might be easier to replicate XHR request as it's a web app, so it's not really reloading the page, it sends POST requests which only differ in currentPage number (at least what I saw, maybe there's more)

quickest-silverOP•3y ago

how could i do this?

optimistic-gold•3y ago

I guess you could start from here: https://docs.apify.com/academy/api-scraping

API scraping | Apify Documentation

Learn all about how the professionals scrape various types of APIs with various configurations, parameters, and requirements.

quickest-silverOP•3y ago

ok thanks i will take a look

Gaming

Programming

Trying to use enqueueLinksByClickingElements

Did you find this page helpful?