How can I use the Playwright Crawler and BeautifulSoup Crawler in the same Actor?
This is so that Playwright can fill in and submit a website search page which uses dynamic Javascript. When the results are shown I want to be able to use the BeautifulSoup crawler to open each product page and parse the information. If I use Playwright to open each product page, this takes a very long time. I cannot seem to run both Crawlers at the same time.
8 Replies
View post on community site
This post has been pushed to the community knowledgebase. Any replies in this thread will be synced to the community site.
Apify Community
rare-sapphire•8mo ago
This is similar to my question here https://github.com/apify/crawlee-python/discussions/573
GitHub
Running different requests with different crawlers? · apify crawlee...
I'm trying to solve a situation where I want to make the initial request with a plain crawler (because it's an API or something), but continue with subsequent requests to detail pages with ...
rare-sapphire•8mo ago
The link contains the answer
deep-jade•8mo ago
I want to build my own actor with playwright and BeautifulSoup.
I am looking for this exactly solution. first I want to send a Http request and get the HTML and use the beautifulSoup to parse the data and then open the Links (get from parsing the data) using playwrights .
correct me If I am wrong.
first use the python with beautifulSoup and get the results and use those result with Playwright.
so we have to create and build 2 different actor for this ?
Hi @Abdul
The discussion above concerns the use of - crawlee-python
In Actor, you can implement the use of Http client + BeautifulSoup and Playwright, either within a single Actor or using a bundle of two Actors.
@Mantisus just advanced to level 4! Thanks for your contributions! 🎉
deep-jade•8mo ago
Thanks for clarifying it. do you have anything that will be helpful for me to start working on actor with HTTP client + BeautifulSoup + PlayWright
No, I don't have any code samples like that. Since I don't usually use Playwright and browser automation.
But writing such an Actor is not much different from just writing a scrapper using such a bundle.
Refer to the official documentation to see Playwright instantiation in Actor - https://docs.apify.com/sdk/python/docs/guides/playwright
Add on top of HTTP Client + BeautifulSoup integration will not be a problem.
Using Playwright | SDK for Python | Apify Documentation
Playwright is a tool for web automation and testing that can also be used for web scraping.