Crawlee & Apify•3y ago

scraping at scale

How should I structure my crawler when scraping possibly 100s of different sites with different structures, handling multiple requests at once in Crawlee

3 Replies

flat-fuchsia•3y ago

Well, I am implementing something similar... 30-40 sites but with SIMILAR structure (if the structure of your sites is different -> you are implementing something like google/bing - king of generic web crawler) 1. You might use something like an external message queue, we discussed it here and in few other places: https://discord.com/channels/801163717915574323/1056348705407651941 beanstalkd if just fine for these purposes 2. you can create one big config file (YML, JSON...) describing "where-to-find-what on each site" Example: abc123.com: listOfTopics: h1 > div.list > div ... xyz987.com: listOfTopics: div.bigListClass > div > p ....

wise-whiteOP•3y ago

thank you

MEE6•3y ago

@harish just advanced to level 1! Thanks for your contributions! 🎉

Gaming

Programming

scraping at scale

Did you find this page helpful?