how deep can website-content-crawler go?
hello guys I have a website that is composed of 3 million pages, the website homepage is like google so the crawler has to enter all the search results one by one and scrape the content inside each of them, can website-content-crawler do all that automatically or do I have to give it the links to those 3 million pages?
also can I customize what to scrape inside each of those links? like give it div id of the container?
1 Reply
national-gold•2y ago
Hello @CernunnoS, the actor can only follow links that can be found on the page or go to URLs you specify as input. In case there is not a list of all queries on the page you will have to provide the list as an input.
The actor only allows you to remove specified selectors and use some transformer on the extracted content, but it does not allow you to extract content from a specific element. If you need more flexibility in the structure of the extracted content, you can use the actor Web Scraper: https://apify.com/apify/web-scraper.