(YC W24) Inconsistent crawl results between prod and local
I'm trying to test out crawling https://fanfiction.net with a script locally before I switch to the Firecrawl API, however I'm getting different results.
Locally I am just running Firecrawl with the docker setup:
docker compose up
from the SELF_HOST.md
instructions and default .env
variables with no DB.
I am initiating the crawl sequence with the following command and a 200 response is returned with the jobId:
The response I'm seeing from the completed job in the MQ at http://localhost:3002/admin/@/queues/queue/web-scraper?status=completed
is:
However, when I try crawling from the Playground:
https://www.firecrawl.dev/playground?url=https%3A%2F%2Fwww.fanfiction.net%2F&mode=crawl&limit=5&excludes=&includes=&returnOnlyUrls=false&ignoreSitemap=false&maxDepth=&onlyMainContent=false&includeHtml=false&removeTags=&onlyIncludeTags=&waitFor=
I am getting appropriately returned results from there.
Can you tell me what is the difference between running /crawl
locally and the playground environment is?
I tried crawling other websites locally (like mendable.ai
) and they seemed to be crawled appropriately with a reasonable returnValue.1 Reply
Hey there rachael!
On the cloud hosted version, we use fire-engine, a custom built scraping service that does a better job grabbing content.
Also, nice to see someone from YC 🙂 🟧