NeoNomade
CACrawlee & Apify
•Created by NeoNomade on 4/23/2025 in #crawlee-js
preNavigationHooks not followed
Camoufox JS integration used.
If I log something before the await page.route it works, inside page.route it doesn't.
13 replies
CACrawlee & Apify
•Created by NeoNomade on 3/13/2025 in #crawlee-js
Camoufox failing
I have a project that is using the PlaywrightCrawler from Crawlee.
If I create the template camoufox it's running perfectly, when I take the same commands from the package.json of the template and basically following the same example in my project I get the following error:
Of course none of those 2 ideas are helping, camoufox binary is already there, and playwright install --with-deps have been already ran because the project was previously running firefox.
the entire error log is attached
5 replies
CACrawlee & Apify
•Created by NeoNomade on 2/28/2025 in #crawlee-js
Replace default logger
Hello, did anybody manage to completely replace the logs from Crawlee with console logs ?
If yes, can you please share your implementation ?
3 replies
CACrawlee & Apify
•Created by NeoNomade on 2/14/2025 in #crawlee-js
CheerioCrawler headerGenerator help
Hello !
I kept reading the docs but couldn't find a clear information about this. When we use Puppeteer or Playwright we can tweak in browserPool the fingerprintGenerator. For Cheerio we have the headerGenerator from got, how we can adjust it inside the CheerioCrawler ?
4 replies
CACrawlee & Apify
•Created by NeoNomade on 10/16/2023 in #crawlee-js
error handling
Can we somehow throw errors that are closing the page ?
and not retrying the request?
4 replies
CACrawlee & Apify
•Created by NeoNomade on 10/6/2023 in #crawlee-js
Duplicate requests
how can I allow duplicates by default in the request queue ?
2 replies
CACrawlee & Apify
•Created by NeoNomade on 9/19/2023 in #crawlee-js
Throw error that respects maxRequestRetries
Hello,
With RetryRequestError, the request gets retried an infinite times until it succeeds, what error should I throw to respect the maxRequestRetries?
5 replies
CACrawlee & Apify
•Created by NeoNomade on 9/12/2023 in #crawlee-js
TSConfig in Crawlee projects.
The linter is giving this error even on the template project.
This needs attention or can I let it like this ?
18 replies
CACrawlee & Apify
•Created by NeoNomade on 9/7/2023 in #crawlee-js
XVFB fails on server.
I've deployed a playwright with chromium crawler on aws batch, with the default docker image.
this is the error that I'm getting, it's mandatory for this crawler to run headful because otherwise there are some buttons that I need to click that are not loading.
(Error log attached).
I've also tried to create a custom slimmer image, but I bump into the same issue with Xvfb.
17 replies
CACrawlee & Apify
•Created by NeoNomade on 9/7/2023 in #crawlee-js
Handle browser failure
I have Puppeteer scraper that is doing lots of actions on a page, at one point the browser fails.
It's a page with infinite scroll and I have to click a button and scroll down. After 70-80 interactions the browser crashes, and the request is getting retried as usual.
The main idea is that with those actions I'm collecting
urls that I wan't to navigate.
I want to somehow handle the browser crashing so I can start with those urls when the browser crashes.
3 replies
CACrawlee & Apify
•Created by NeoNomade on 8/9/2023 in #crawlee-js
Interception error in Puppeteer
I'm getting this error in Puppeteer but I'm not doing any interception in my script, I just create a request and add it to the crawler using crawler.addRequests, the request is a get where I just provide url and headers.
2 replies
CACrawlee & Apify
•Created by NeoNomade on 7/28/2023 in #crawlee-js
Crawler works locally but not on cloud
Hello, I've built a puppeteer crawler, nothing special about it.
It works locally flawless, I've tried to deploy to AWS on batch with Fargate, I get navigation timeouts after 60 seconds, switched to EC2, navigation timeouts after 60 seconds, increased navigation timeout to 120 seconds, same error.
Switched proxies between BrightData and OxyLabs, same issue.
Deployed to Apify, same issue.
I'm getting out of my mind understanding why is this happening.
18 replies
CACrawlee & Apify
•Created by NeoNomade on 7/25/2023 in #crawlee-js
enqueueLinksByClickingElements help
This is the code :
This is the error :
I have imported RequestQueue from crawlee, don't understand where it goes wrong
4 replies
CACrawlee & Apify
•Created by NeoNomade on 6/30/2023 in #crawlee-js
Cookies failure Playwright
I'm just opening this url, no hooks no nothing, and it just has to open this and print the title.
And it fails with this error... any ideas ?
2 replies
CACrawlee & Apify
•Created by NeoNomade on 6/28/2023 in #crawlee-js
change session storage in preNavigationHooks
Hello,
I'm trying to change session storage, before navigating to a page .
The hook is attached, here is the error :
help !
2 replies
CACrawlee & Apify
•Created by NeoNomade on 6/20/2023 in #crawlee-js
Crawler only working in headed mode.
I have a Puppeteer Crawler that works almost flawless in headed mode, but if I go headless all the requests are getting 403 errors.
I was thinking that xvfb should fix this but unfortunately it doesn't. Any other ideas ?
11 replies
CACrawlee & Apify
•Created by NeoNomade on 6/13/2023 in #crawlee-js
Cheerio memory error
Hello,
I have deployed a CheerioCrawler on AWS, the machine has 2vCPU and 4gb of ram, but I get the following error:
What can it be ?
3 replies
CACrawlee & Apify
•Created by NeoNomade on 6/6/2023 in #crawlee-js
Pause concurrent requests ?
Hello,
I have the following issue, I have a website that I'm scraping and I need to login every 100-150 items.
The issue is, if I'm going with more than 1 concurrent requests when in needs to login it already has in progress requests, which will go wrong.
So I have a marker that I'm extracting to know when I need to login again.
I want to go with >1 concurrent requests and stop everything when that marker is found, do the login and then resume.
Could it be possible to achieve that ?
23 replies
CACrawlee & Apify
•Created by NeoNomade on 6/5/2023 in #crawlee-js
Error when running in Docker Container
I'm deploying a Crawlee (Cheerio) project in an amazonlinux:2023 based docker container.
I get the following error:
3 replies
CACrawlee & Apify
•Created by NeoNomade on 5/31/2023 in #crawlee-js
setCookie and session.getCookies don't work together
I'm trying to run this code in my default handler:
Error:
2 replies