No such file or directory storage/request_queues/default/JoxD7mAqj47ssmS.json

I'm trying to run a fairly simple scraper, but I keep getting this error. I want to scrape around 64,000 pages, but I get the no such file error every time. Setting waitForAllRequestsToBeAdded to true doesn't fix the issue. This is how I'm setting up and running the crawler
const opts={
navigationTimeoutSecs: 3,
requestHandlerTimeoutSecs: 3,
maxRequestRetries: 6,
maxConcurrency: 20
};
const config=new Configuration({
memoryMbytes: 8000
});
const crawler = new PlaywrightCrawler(opts, config);
crawler.router.addDefaultHandler(handlePage);
const requests = data.map(
(d) =>
new Request({
url: d.url,
userData: d
})
);
await crawler.run(requests, {waitForAllRequestsToBeAdded: true});
const opts={
navigationTimeoutSecs: 3,
requestHandlerTimeoutSecs: 3,
maxRequestRetries: 6,
maxConcurrency: 20
};
const config=new Configuration({
memoryMbytes: 8000
});
const crawler = new PlaywrightCrawler(opts, config);
crawler.router.addDefaultHandler(handlePage);
const requests = data.map(
(d) =>
new Request({
url: d.url,
userData: d
})
);
await crawler.run(requests, {waitForAllRequestsToBeAdded: true});
Thanks for any help!
6 Replies
unwilling-turquoise
unwilling-turquoise2y ago
@MaskedSparrow this is run locally, right? Can you share the rest of the source? Specifically anything related to where you're setting up your request queue
mute-gold
mute-goldOP2y ago
It is running locally, yeah. And this is actually all the setup I'm doing. I'm just using the default request queue and dataset. I also get the same error if I run it with an empty page handler, so I'm not sure where it's coming from
Pepa J
Pepa J2y ago
@MaskedSparrow What version of Crawlee is this? Have you tried update to the latest one?
mute-gold
mute-goldOP2y ago
Yeah, I was using the latest version. I found Crawlee to be a little slow and unreliable so I just switched to using Puppeteer directly. My use case was pretty simple and it's now running several times faster with no issues
NeoNomade
NeoNomade2y ago
Try : useSessionPool true
rival-black
rival-black2y ago
You not supposed to create Request object by mapping array, just crawler.run(data) and SDK will handle the rest

Did you find this page helpful?