running multiple scrapers with speed
alr have a web scraper for amazon outputting to a rawData.json file able to successfully to scrape product links and then go through each of those product links to get the data i need
but i want to scale up to many many scrapers and im having trouble running multiple scrapers at once
i essentially made a new router to handle the other site and want to know how i can make sure that only the url with the same label will run the router handler with the same label but it wont let me define both routes like
it didn't work and i had to combine both routers in a weird way to get it to work and there weren't any errors but I keep getting no scraped data from the second site (ebay) and it it sometimes shows objects that have the eBay site name instead of amazon but still have an amazon link with an amazon product in it
i want to be able to run both scrapes at the same time, get rid of the combinedRouter, and define them as different routes and also make the scrapes happen faster and also make it so that it is easy to add routes on and scale up the process and keep adding on new scrapers daily
here are my codes:
6 Replies
modern-tealOP•2y ago
main.js:
combinedRouter.js:
amazon.js:
ebay.js:
i get outputs as such in my data file (data is incomplete bc some tags are not accurate):
and weird ones too:
passive-yellow•2y ago
That looks weird. By design crawler instance should have one router assigned. Here you should basically have 1 router with 4 different routes (without the default route at all). Then in AMAZON and EBAY routes, you should assign new labels to request while enqueueing
const result = await crawler.addRequests([link]);
- so it would be something like const result = await crawler.addRequests([{ url: link, label: 'AMAZON_PRODUCT' }])
; and the same for ebay. You're experiencing some unexpected behavior, because this combination of different routers does not really make much sense for the crawler...modern-tealOP•2y ago
is it better for me to run each site scrape on separate routes
if so how
thanks for the help
when i do this, how do import the routers from each file so they don't overwrite each b/c im getting this problem when i have to import the router from amazon.js and then from ebay.js
passive-yellow•2y ago
as I mentioned above - you should NOT even have several routers, crawler should only have one router assigned. You should have several routes specified for one router.
modern-tealOP•2y ago
ok
@harish just advanced to level 3! Thanks for your contributions! 🎉