CA
correct-apricot
API scraping
Hi, want to make a bot extract all the data from an API endpoint and store it on my database to create a history of the values.
I tried making the script without Apify, but the script took ages to finish and after some request my IP was blocked.
So the solution I found is to use Apify, because of the anti blocking solutions and the concurrency.
The problem is that I want to create a custom actor, and I did not find much information about scraping an API. I don't know if this is because this isn't the right tool for my problem.
In addition to that, I would love to know how are the actors expected to work, at the moment I have a script that create a bunch of URLs for all the pages (pagination) and another that go on each page and make the request.
For the actors, I need to create one actor that can make a request and call it for each URL, or the best approach is to have one actor that can make a bunch of requests and receive a list of URLs as arguments. I am asking taking in count that concurrency and proxy use is important, and I don't know if each actor manages this by itself.
2 Replies
equal-aqua•3y ago
Leaving the same reply as in the chat. But please don't open the submit the same message several time:
Hey there! Please check our docs: https://docs.apify.com/academy - you could scrape the API same way as normal website, Apify SDK and Crawlee handle both cases perfectly. Actor could add more URLs to the queue during the run. Autoscaled pool takes care of managing concurrency and utilizing free resources, proxy setup is very easy for each actor. Please check the above links/docs, and let us know if something would still be not clear
Web Scraping Academy | Apify Documentation
Learn everything about web scraping and automation with our free courses that will turn you into an expert scraper developer.
correct-apricotOP•3y ago
Many thanks 👌