stuck with python sdk, works on and off with curl
I just selfhosted a firecrawl instance yesterday, it works immediately with a curl request to /v1/crawl, which is a surprise! and then....
1. with curl
i built
playwright-ts
and it doesn't work at first because i didn't really think when I quoted PLAYWRIGHT_MICROSERVICE_URL=http://localhost:3000/scrape
from the SELF_HOST.md into my envfile, which is obviously incorrect. after I fixed it i works with example request
then, without doing anything to the server, this same request and crawl request to any website wont work anymore. i keeps getting
from LLM scrape request. ( of course I got my LLM api key right, otherwise it won't work the 1st time).
THEN I RESTARTED ALL DOCKER SERVICES, /v1/crawl works again, but /v1/scrape still doesn't work.
2. python sdk
i put aside curl for now, and went on to try python sdk.
after looking into the code, i changed example.py like this
should be correct, right? but I never got it working...
- stuck with crawling at sitemap
- timeout with scraping request
PS. my worker container keeps printing:
I could deal with failure, but I am so struggling with these inconsistencies... help please, thanks a million!10 Replies
Hi @thousandmiles --
Cant accept connection
usually means the CPU/RAM usage is too high for the worker to take up new jobs. We use this metric in production to ensure our worker machines never get too overloaded. If you have tighter margins on your end, you can adjust the MAX_CPU
and MAX_RAM
environment variables (both are percentages, our defaults are 0.8
= 80% for each)that's definitely a possibility... the server I used has only 1GB memory. I will try deploying on better hardware.
That'll be the issue. On production each of our workers have 8GB of RAM.
now i got it up and running on a server with 4C4G, weird thing is that: when I launched a crawl task with about 60+ pages, playwright-ts actually runs fine, but same "Cant accept connection" appeared.
in less than 1 minute the task almost finishes with
but it refuses to accept any new tasks...
can I simply change the limit of MAX_CPU and MAX_RAM to at least allow worker continue working? i really don't have powerful enough hardware in the cloud for now.
you can set them as environment variables
MAX_CPU=1 and MAX_RAM=1 will essentially disable the checks (only stops if 100%)
thanks, env just set. I'll keep an eye on that for some time.
or, should i reduce the NUM_WORKERS_PER_QUEUE to reduce the mem and cpu usage?
alright, might want to monitor cpu and ram usage as well so you can collerate issues with redlining
we don't use that variable anymore, it's replaced by MAX_CPU and MAX_RAM
oh! but it's still in the guide to SELF_HOST. where can I get an up2date example of full env vars?
nowhere unfortunately. we want to update the guide at some point but were super busy with maintaining v1 right now
that change is mostly it though
no worries then, functionality is the key, bravo!