Crawlee & Apify•3y ago

Unable to run crawlee in aws lambda (Protocol error (Target.setAutoAttach): Target closed)

I am trying to run crawlee on aws lambda but getting this error message: Reclaiming failed request back to the list or queue. Protocol error (Target.setAutoAttach): Target closed. chromium version: 109 node version: 16 code:

exports.handler = async (event, context, callback) => {
    const finalResult = [];
    const url = ``;

    try {
        const crawler = new PuppeteerCrawler({
            launchContext: {
                useIncognitoPages: true,
                launchOptions: {
                    executablePath: await chromium.executablePath(),
                    args: ['--no-sandbox', '--disable-setuid-sandbox']
                },
                launcher: puppeteer
            },
            useSessionPool: true,
            requestHandlerTimeoutSecs: 60, 
            browserPoolOptions: {
                useFingerprints: true,
                fingerprintOptions: {
                    fingerprintGeneratorOptions: {
                        browsers: ['chrome'],
                        operatingSystems: ['windows'],
                        devices: ['desktop'],
                        locales: ['en-US', 'en']
                    },
                },
            },
            headless: true,

            async requestHandler({ request, page, enqueueLinks }) {
                log.info(`Processing ${request.url}...`);

            },

            // This function is called if the page processing failed more than maxRequestRetries+1 times.
            failedRequestHandler({ request }) {
                log.error(`Request ${request.url} failed too many times.`);
            },
        });

        // Run the crawler and wait for it to finish.
        await crawler.run([url]);
        log.info('Crawler finished.');

    } catch (error) {
        return callback(error);
    } finally {

    }
    return callback(null, finalResult);
};

exports.handler = async (event, context, callback) => {
    const finalResult = [];
    const url = ``;

    try {
        const crawler = new PuppeteerCrawler({
            launchContext: {
                useIncognitoPages: true,
                launchOptions: {
                    executablePath: await chromium.executablePath(),
                    args: ['--no-sandbox', '--disable-setuid-sandbox']
                },
                launcher: puppeteer
            },
            useSessionPool: true,
            requestHandlerTimeoutSecs: 60, 
            browserPoolOptions: {
                useFingerprints: true,
                fingerprintOptions: {
                    fingerprintGeneratorOptions: {
                        browsers: ['chrome'],
                        operatingSystems: ['windows'],
                        devices: ['desktop'],
                        locales: ['en-US', 'en']
                    },
                },
            },
            headless: true,

            async requestHandler({ request, page, enqueueLinks }) {
                log.info(`Processing ${request.url}...`);

            },

            // This function is called if the page processing failed more than maxRequestRetries+1 times.
            failedRequestHandler({ request }) {
                log.error(`Request ${request.url} failed too many times.`);
            },
        });

        // Run the crawler and wait for it to finish.
        await crawler.run([url]);
        log.info('Crawler finished.');

    } catch (error) {
        return callback(error);
    } finally {

    }
    return callback(null, finalResult);
};

5 Replies

absent-sapphire•3y ago

check if this helps - https://github.com/apify/crawlee/issues/702

GitHub

Error Running in AWS Lambda via chrome-aws-lambda (Error: spawn ps ...

Now describe the bug When running in AWS Lambda using chrome-aws-lambda, I'm hitting an Error: spawn ps ENOENT error. Im just trying to run a simple job based on the HackerNews example. Err...

fair-roseOP•3y ago

No it did not work for me.

"dependencies": {
    "@sparticuz/chromium": "^109.0.1",
    "crawlee": "^3.1.4",
    "puppeteer-core": "^19.4.0",
    "puppeteer-extra": "^3.3.4",
    "puppeteer-extra-plugin-stealth": "^2.11.1"
  }

"dependencies": {
    "@sparticuz/chromium": "^109.0.1",
    "crawlee": "^3.1.4",
    "puppeteer-core": "^19.4.0",
    "puppeteer-extra": "^3.3.4",
    "puppeteer-extra-plugin-stealth": "^2.11.1"
  }

I am running crawlee on nodejs v16 and for that chrome-aws-lambda is not supported. Hence I have added @sparticuz/chromium which supports node v16

metropolitan-bronze•3y ago

is launcher: puppeteer referring to an existing puppeteer instance/browser session

fair-roseOP•3y ago

Sorry did not get you what do you mean by that? Are you able to run crawlee in aws lambda?

absent-sapphire•3y ago

@pmt11 Not browser, it should be puppeteer launcher variable

Gaming

Programming

Unable to run crawlee in aws lambda (Protocol error (Target.setAutoAttach): Target closed)

Did you find this page helpful?