Unable to run crawlee in aws lambda (Protocol error (Target.setAutoAttach): Target closed)

I am trying to run crawlee on aws lambda but getting this error message: Reclaiming failed request back to the list or queue. Protocol error (Target.setAutoAttach): Target closed. chromium version: 109 node version: 16 code:
exports.handler = async (event, context, callback) => {
const finalResult = [];
const url = ``;

try {
const crawler = new PuppeteerCrawler({
launchContext: {
useIncognitoPages: true,
launchOptions: {
executablePath: await chromium.executablePath(),
args: ['--no-sandbox', '--disable-setuid-sandbox']
},
launcher: puppeteer
},
useSessionPool: true,
requestHandlerTimeoutSecs: 60,
browserPoolOptions: {
useFingerprints: true,
fingerprintOptions: {
fingerprintGeneratorOptions: {
browsers: ['chrome'],
operatingSystems: ['windows'],
devices: ['desktop'],
locales: ['en-US', 'en']
},
},
},
headless: true,

async requestHandler({ request, page, enqueueLinks }) {
log.info(`Processing ${request.url}...`);

},

// This function is called if the page processing failed more than maxRequestRetries+1 times.
failedRequestHandler({ request }) {
log.error(`Request ${request.url} failed too many times.`);
},
});

// Run the crawler and wait for it to finish.
await crawler.run([url]);
log.info('Crawler finished.');

} catch (error) {
return callback(error);
} finally {

}
return callback(null, finalResult);
};
exports.handler = async (event, context, callback) => {
const finalResult = [];
const url = ``;

try {
const crawler = new PuppeteerCrawler({
launchContext: {
useIncognitoPages: true,
launchOptions: {
executablePath: await chromium.executablePath(),
args: ['--no-sandbox', '--disable-setuid-sandbox']
},
launcher: puppeteer
},
useSessionPool: true,
requestHandlerTimeoutSecs: 60,
browserPoolOptions: {
useFingerprints: true,
fingerprintOptions: {
fingerprintGeneratorOptions: {
browsers: ['chrome'],
operatingSystems: ['windows'],
devices: ['desktop'],
locales: ['en-US', 'en']
},
},
},
headless: true,

async requestHandler({ request, page, enqueueLinks }) {
log.info(`Processing ${request.url}...`);

},

// This function is called if the page processing failed more than maxRequestRetries+1 times.
failedRequestHandler({ request }) {
log.error(`Request ${request.url} failed too many times.`);
},
});

// Run the crawler and wait for it to finish.
await crawler.run([url]);
log.info('Crawler finished.');

} catch (error) {
return callback(error);
} finally {

}
return callback(null, finalResult);
};
5 Replies
absent-sapphire
absent-sapphire3y ago
GitHub
Error Running in AWS Lambda via chrome-aws-lambda (Error: spawn ps ...
Now describe the bug When running in AWS Lambda using chrome-aws-lambda, I'm hitting an Error: spawn ps ENOENT error. Im just trying to run a simple job based on the HackerNews example. Err...
fair-rose
fair-roseOP3y ago
No it did not work for me.
"dependencies": {
"@sparticuz/chromium": "^109.0.1",
"crawlee": "^3.1.4",
"puppeteer-core": "^19.4.0",
"puppeteer-extra": "^3.3.4",
"puppeteer-extra-plugin-stealth": "^2.11.1"
}
"dependencies": {
"@sparticuz/chromium": "^109.0.1",
"crawlee": "^3.1.4",
"puppeteer-core": "^19.4.0",
"puppeteer-extra": "^3.3.4",
"puppeteer-extra-plugin-stealth": "^2.11.1"
}
I am running crawlee on nodejs v16 and for that chrome-aws-lambda is not supported. Hence I have added @sparticuz/chromium which supports node v16
metropolitan-bronze
metropolitan-bronze3y ago
is launcher: puppeteer referring to an existing puppeteer instance/browser session
fair-rose
fair-roseOP3y ago
Sorry did not get you what do you mean by that? Are you able to run crawlee in aws lambda?
absent-sapphire
absent-sapphire3y ago
@pmt11 Not browser, it should be puppeteer launcher variable

Did you find this page helpful?