How to launch playwrightcrawler inside basiccrawler?

So i have this code:
const cookieJar = new CookieJar();
export const basicCrawler = new BasicCrawler({
async requestHandler({ sendRequest, request, log }) {
try {
const res = await sendRequest({
url: request.url,
method: 'GET',
cookieJar
});
const json = destr(res.body);
const urls = json.map(v => v.url);
await playCrawler.run(urls);
} catch (error) {
console.log(error);

}

},
});

//code for playwright crawler here
const cookieJar = new CookieJar();
export const basicCrawler = new BasicCrawler({
async requestHandler({ sendRequest, request, log }) {
try {
const res = await sendRequest({
url: request.url,
method: 'GET',
cookieJar
});
const json = destr(res.body);
const urls = json.map(v => v.url);
await playCrawler.run(urls);
} catch (error) {
console.log(error);

}

},
});

//code for playwright crawler here
I start the crawler by calling the basicCrawler.run(['url']); The problem is it seems to call the basicCrawler again for the urls i pass to playCrawler. how is that possible?
11 Replies
metropolitan-bronze
metropolitan-bronzeOP•3y ago
Also the try catch inside basicCrawler is triggered for errors from playCrawler
harsh-harlequin
harsh-harlequin•3y ago
so you are trying to run playwrightCrawler inside the handaler of basicCrawler? what is the usecase for this? this is quite wild construction maybe you use the same default requestQueue for both crawlers
metropolitan-bronze
metropolitan-bronzeOP•3y ago
The usecase would be calling an http api and run playwrightCrawler on its results
MEE6
MEE6•3y ago
@Nisthar just advanced to level 1! Thanks for your contributions! 🎉
metropolitan-bronze
metropolitan-bronzeOP•3y ago
What i don't understand is the urls i pass to playwrightCrawler is queued to basicCrawler as well How is that possible?
Pepa J
Pepa J•3y ago
That is because, there is only one default RequestQueue related to the run. Since you didn't specified any requestQueue in the constructor for the crawlers they both are using the same default one. You may need to create another names Request queue for one of those crawlers.
metropolitan-bronze
metropolitan-bronzeOP•3y ago
is there a way to limit the number of tabs in a window? like use different window with one tab?
Pepa J
Pepa J•3y ago
maxConcurrency: 4,
useSessionPool: true,
browserPoolOptions: {
maxOpenPagesPerBrowser: 2,
}
maxConcurrency: 4,
useSessionPool: true,
browserPoolOptions: {
maxOpenPagesPerBrowser: 2,
}
For a PlaywrightCrawler constructor is probably what you are looking for. Should use two browsers each with two tabs.
metropolitan-bronze
metropolitan-bronzeOP•3y ago
Thanks a lot, Is there a way to put a delay in between the two requests? currently crawlee opens the urls almost at the same time.
Pepa J
Pepa J•3y ago
You would probably need to solve your own logic in prenavigation hook: https://docs.apify.com/sdk/js/docs/2.3/typedefs/puppeteer-crawler-options#prenavigationhooks This is very poor implementation but you may get the idea:
function increment() {
this.number = (this.number || 0) + 1
return number;
}
function increment() {
this.number = (this.number || 0) + 1
return number;
}
and then in PlaywrightCrawler constructor define something like
postNavigationHooks: [
async ({page}) => {
await page.waitForTimeout(increment() * 1_000); // 1 000ms = 1s - the number would be increasing with each request.
},
]
postNavigationHooks: [
async ({page}) => {
await page.waitForTimeout(increment() * 1_000); // 1 000ms = 1s - the number would be increasing with each request.
},
]
metropolitan-bronze
metropolitan-bronzeOP•3y ago
can you take a look at this? https://discord.com/channels/801163717915574323/1076083814817869854 whats the use case of storing results in seperate files?

Did you find this page helpful?