`crawl` results in `waiting` but `scrape` works
Hello, when running locally, I'm able to scrape, using curl, successfully.
However, if I try the crawl endpoint it results in a job that is constantly waiting.
Is this because it depends on scrapingbee?
I do see the following log which may be relevant:
The
404
status seems misleading as the same url works from the scrape endpoint.33 Replies
@Magick you need to run the workers seperately. Try doing
npm run workers
in a seperate terminalThanks @Adobe.Flash - if I'm using
docker compose up
to start, which container should I run this command in?Oh I see. I think it should have automatically handled that for you. ccing @rafaelmiller which can prob help you better in this area
@Magick you should have a
worker
container running automatically when you run docker compose
at root. Can you confirm if this container is running? You can use docker ps
to checkHi @rafaelmiller - yes, I do see the
worker
container running.
I also see this in the api container api-1 | Worker 73 listening on port 3002
As I am running docker compose up
- not including the -d
I'm seeing all output from the running containers.
The last log output I see from the worker container is:
It seems as if it never gets queue message from the api.
When I send a crawl
request, this is the only thing logged:
I also encountered this problem
i am also running into this problem
Hey yall, quick update https://discord.com/channels/1226707384710332458/1261330279348572251/1261330647910191254
Im also running into this problem... scraping works but crawling keeps timing out. THe update you sent is only for the people using the API key right? not for self-host?
Correct, are you having this issue while self hosting ?
Yes i am, im trying to do it via docker now
Make sure that you are running the workers
Got it, if you are manually doing it do npm run start and npm run workers on separate terminals
Hmm
Do You have redis running ?
Yes i did that, had workers all online
Quick question inbetween, i got it in docker desktop now but what .env is it using? from the folder where i docked it from?
Gotcha
I believe it should be using it from the apps/api folder
when i docker-compose up it uses this:
And so it creates another env
i just cant seem to find my way in docker desktop hahah
Okay so when i run docker-compose config it prints me my .env file and thats all correct. it states USE_DB_AUTHENTICATION: "false"
But in docker desktop it shows this in logs

hey @Nijn ! This looks like a warning message, are you able to run crawl or scrape?
I tried opening the workers and queue in 2 seperate cmds. when i tried post via python the scrape worked but the crawl keeps timing out
im trying to compose it into docker desktop to see if it works via there but for some reason it takes a different env
Okay so i got it to work in docker now by deleting and re-composing it! For some reason when using cmd i needed to change .env.local to .env and in docker it probably needed the .local
oh ok! Does crawl works now?
Crawl now gives a jobId while it never has before so thats a step forward!
However i now get this:
oh I think there's a bug there
1 sec
alright
Okay so despite the error it does work
i can retrieve it by job id
I just pushed a fix for this error
you can update your firecrawl repo for solving that
awesome
Thanks so much! if i find any more errors/bugs i will let you know!
Really cool what you guys working on!
Awesome that it worked! And thank you! 🔥
Question, how do I use llm extract locally? And are these inputs correct for pyton?


or is llm extract only for scrape function?
I see that you need an api key for llm extraction which seems logical because this is ran on your network.
So im trying out the markdown function but for some reason the markdown is the same as the output...
And it also gives markdown with standard crawl function
Hey @Nijn , to use llm extract, you need to set up the extractorOptions parameter when using the
scrape
functions. Also, if you're using it self-hosted, you'll need to configure the OPENAI_API_KEY
in your .env
file.Im trying to keep away from using paid llm like OpenAI. Is there any way to use a self-hosted llm like llama to llm extract?
we have one open PR for that. It's still under review, though.
And how about prioritizing certain paths instead of including them?