Web Page Finder
New to Apify. Need advice on best Actor to complete desired outcome.
Goal: Find all websites using a set of specific root urls.
Use case: A known company supplies it clients with dedicated webpages using a root url that doesn't itself display information. To get to a page with information, you need to have a complete url which always follows this format: http://company-url.com/client-identifier Where company-url is the creator/host of the webpages and client-identifier is the client name or some version of it.
Usually we can manually put in google search "inurl:http://company-url.com/"*" and manually find each result. Can this be automated in Apify? Whic Actor would complete this? What settings do I need to be aware of for optimal results? Thanks!
Use case: A known company supplies it clients with dedicated webpages using a root url that doesn't itself display information. To get to a page with information, you need to have a complete url which always follows this format: http://company-url.com/client-identifier Where company-url is the creator/host of the webpages and client-identifier is the client name or some version of it.
Usually we can manually put in google search "inurl:http://company-url.com/"*" and manually find each result. Can this be automated in Apify? Whic Actor would complete this? What settings do I need to be aware of for optimal results? Thanks!
2 Replies
@growthmonster dm me
rival-black•13mo ago
This can be done using this Actor:
https://apify.com/apify/web-scraper
Apify
Web Scraper · Apify
Crawls websites using Chrome and extracts data from pages using JavaScript. Supports recursive crawling and URL lists and automatically manages concurrency.