scraping at scale

How should I structure my crawler when scraping possibly 100s of different sites with different structures, handling multiple requests at once in Crawlee
3 Replies
flat-fuchsia
flat-fuchsia•3y ago
Well, I am implementing something similar... 30-40 sites but with SIMILAR structure (if the structure of your sites is different -> you are implementing something like google/bing - king of generic web crawler) 1. You might use something like an external message queue, we discussed it here and in few other places: https://discord.com/channels/801163717915574323/1056348705407651941 beanstalkd if just fine for these purposes 2. you can create one big config file (YML, JSON...) describing "where-to-find-what on each site" Example: abc123.com: listOfTopics: h1 > div.list > div ... xyz987.com: listOfTopics: div.bigListClass > div > p ....
wise-white
wise-whiteOP•3y ago
thank you
MEE6
MEE6•3y ago
@harish just advanced to level 1! Thanks for your contributions! 🎉

Did you find this page helpful?