Connecting to a remote browser instance?
Is there a way we can specify a web socket endpoint in the PlaywrightCrawler config (or somewhere else) so we can connect to a remote browser?
8 Replies
quickest-silver•17mo ago
Hi @tim ,
It looks like the solution is not straightforward, you may try to write your own PlaywrightPlugin, by replacing every
this.library.launch
by this.library.connectOverCDP('http://hostname:port')
(e.g. http://localhost:9222), and then provide it to the PlaywrightCrawler
via the browserPool
option parameter (check the code of PlaywrightCrawlerOptions for more details).GitHub
crawlee/packages/playwright-crawler/src/internals/playwright-crawle...
Crawlee—A web scraping and browser automation library for Node.js that helps you build reliable crawlers. Fast. - apify/crawlee
quickest-silver•17mo ago
For more info about connectOverCDP: https://playwright.dev/docs/api/class-browsertype#browser-type-connect-over-cdp
BrowserType | Playwright
BrowserType provides methods to launch a specific browser instance or connect to an existing one. The following is a typical example of using Playwright to drive automation:
@Marc Plouhinec just advanced to level 1! Thanks for your contributions! 🎉
correct-apricotOP•17mo ago
Thanks for the response! unfortunately when i try that i get an error:
Error: browserPoolOptions.browserPlugins is disallowed. Use launchContext.launcher instead.
national-gold•17mo ago
Hello! I have same task with remote browser.
@tim did you find the solution with
launchContext.launcher
? May you share this one?quickest-silver•17mo ago
I was thinking about another solution: you can create a BasicCrawler and manage your browser page by yourself, for example:
Basic crawler | Crawlee
This is the most bare-bones example of using Crawlee, which demonstrates some of its building blocks such as the BasicCrawler. You probably don't need to go this deep though, and it would be better to start with one of the full-featured crawlers
genetic-orange•17mo ago
Yeah, I don't think this is possible with e.g. PlaywrightCrawler but if there would be bigger demand, technically could be implemented. There is actually an issue for this https://github.com/apify/crawlee/issues/1822
GitHub
Connect to remote browser services · Issue #1822 · apify/crawlee
Which package is the feature request for? If unsure which one to select, leave blank @crawlee/browser (BrowserCrawler) Feature There are cloud browser services like Browserless. So that we can use ...
wise-white•5mo ago
Hi @Lukas Krivka
Any feature update on this? I checked the github issue, it is still open. Building a scraper functionality into our AI agent, hoping to use Crawlee for the scraping part, but require connecting to remote browser.