Scrape JSON and HTML responses in different handlers
I do not know how to scrape a website, that contains JSON and HTML responses
My scraper need to:
1. Send a request and parse a JSON response which contains a list of URL that I will enqueue.
2. Scrape those URLs but in HTML using cheerio or whatever is required to do so.
2 Replies
View post on community site
This post has been pushed to the community knowledgebase. Any replies in this thread will be synced to the community site.
Apify Community
fair-rose•7mo ago
Hey,
For your task, I'd use 2 request handlers:
-
JSON
handler will handle the JSON response, it'll parse it and enqueue HTML requests
- HTML
handler will parse HTML response as usual with cheerio's $
JSON
and HTML
are request labels, you can read more about labels here. Basically, if you label a request with e.g. HTML
label, it will be handled with HTML
request handler.
Let me know if you have any questionsCrawling the Store | Crawlee · Build reliable crawlers. Fast.
Crawlee helps you build and maintain your crawlers. It's open source, but built by developers who scrape millions of pages every day for a living.