Apify in NestJS scheduler

Hello everyone I am using Apify + Crawlee Cheerio Crawler + NestJS scheduler in my project, and getting issue NestJS process for running the server is quit when calling Apify.exit() . Below is my code
@Cron('0 */5 * * * *')
async handleEvery20Minutes() {
const config = new Configuration({ purgeOnStart: true, persistStorage: false });
let cheerioCrawler = new CheerioCrawler({
minConcurrency: 10,
maxConcurrency: 50,

// On error, retry each page at most once.
maxRequestRetries: 1,

// Increase the timeout for processing of each page.
requestHandlerTimeoutSecs: 30,

// Limit to 10 requests per one crawl
maxRequestsPerCrawl: 10,
requestHandler: defaultRouter
}, config);

await Actor.init();
const crawlingCodes = await this.codesService.findAllCodesUrl();
for (let i = 0; i < crawlingCodes.length; i++) {
await cheerioCrawler.addRequests([
{
url: crawlingCodes[i].url,
userData: {
code: crawlingCodes[i].name,
},
uniqueKey: uuidv4()
},
]);
}
await cheerioCrawler.run();

await cheerioCrawler.teardown();

await Actor.exit(); //when the NestJS scheduler running at this line, it quits
}
@Cron('0 */5 * * * *')
async handleEvery20Minutes() {
const config = new Configuration({ purgeOnStart: true, persistStorage: false });
let cheerioCrawler = new CheerioCrawler({
minConcurrency: 10,
maxConcurrency: 50,

// On error, retry each page at most once.
maxRequestRetries: 1,

// Increase the timeout for processing of each page.
requestHandlerTimeoutSecs: 30,

// Limit to 10 requests per one crawl
maxRequestsPerCrawl: 10,
requestHandler: defaultRouter
}, config);

await Actor.init();
const crawlingCodes = await this.codesService.findAllCodesUrl();
for (let i = 0; i < crawlingCodes.length; i++) {
await cheerioCrawler.addRequests([
{
url: crawlingCodes[i].url,
userData: {
code: crawlingCodes[i].name,
},
uniqueKey: uuidv4()
},
]);
}
await cheerioCrawler.run();

await cheerioCrawler.teardown();

await Actor.exit(); //when the NestJS scheduler running at this line, it quits
}
I would like to call Actor.exit() to reset the index of data json files. I can remove Actor.exit() but will get this error [Nest] 43924 - 06/08/2024, 8:30:02 PM ERROR [Scheduler] Error: ENOENT: no such file or directory, open '/storage/datasets/default/000000001.json' Does anyone has this similar issue when running Apify Crawlee on NestJS framework ? Can you please help ? Thank you
No description
2 Replies
stormy-gold
stormy-gold12mo ago
Hello @anh.tran.conf, calling the exit method like this should do the trick:
await Actor.exit({ exit: false });
await Actor.exit({ exit: false });
wise-white
wise-whiteOP12mo ago
Thank you @vojtechmaslan , it works

Did you find this page helpful?