Anyone have any example scraping multiple different websites?

The structure i am doing idoes not look like the best. I am basically creating several routers and then doing something like:
const crawler = new PlaywrightCrawler({
// proxyConfiguration: new ProxyConfiguration({ proxyUrls: ['...'] }),
requestHandler: async (ctx) => {
if (ctx.request.url.includes("url1")) {
await url1Router(ctx);
}

if (ctx.request.url.includes("url2")) {
await url2Router(ctx);
}

if (ctx.request.url.includes("url3")) {
await url3Router(ctx);
}
await Dataset.exportToJSON("data.json");
},

// Comment this option to scrape the full website.

// maxRequestsPerCrawl: 20,
});
const crawler = new PlaywrightCrawler({
// proxyConfiguration: new ProxyConfiguration({ proxyUrls: ['...'] }),
requestHandler: async (ctx) => {
if (ctx.request.url.includes("url1")) {
await url1Router(ctx);
}

if (ctx.request.url.includes("url2")) {
await url2Router(ctx);
}

if (ctx.request.url.includes("url3")) {
await url3Router(ctx);
}
await Dataset.exportToJSON("data.json");
},

// Comment this option to scrape the full website.

// maxRequestsPerCrawl: 20,
});
This does not seem correct. Anyone with a better way?
6 Replies
Hall
Hall7mo ago
View post on community site
This post has been pushed to the community knowledgebase. Any replies in this thread will be synced to the community site.
Apify Community
xenial-black
xenial-black7mo ago
You can use Crawlee's Router: https://crawlee.dev/api/playwright-crawler/function/createPlaywrightRouter. Create a route for each URL, then use labels to identify them.
equal-aqua
equal-aquaOP7mo ago
@Marco , how far is that from what i am doing there? because it seems like soewhere i will have to do it? in the example above i did a router per url there , urlRouter1, urlRouter2 is defined on a per url basis. am i wrong?
xenial-black
xenial-black7mo ago
It's actually very similar. Routes should be defined depending on your needs, so if you need a route per URL, just do that.
equal-aqua
equal-aquaOP7mo ago
my concern is that i have multiple websites, not just different urls. each website might have two urls that i have to scrape independently is that how you would do it @Marco ? would you have multiple routers ?
xenial-black
xenial-black7mo ago
Oh, I see. I think I would still use one router, with labels such as "website1-page2", to keep things simple; a function called at the beginning would assign the correct label to each request based on the URL.

Did you find this page helpful?