Issues and Inconsistencies During FireCrawl Testing

Hey everyone! We’ve been testing FireCrawl for a bit at our company, and we’ve come across some issues that I wanted to get your thoughts on: 1. Crashes when viewing logs: We keep seeing crashes when trying to view crawl results on the logs page, with the error: "Application error: a client-side exception has occurred (see the browser console for more info)." It looks like a 504 Gateway Timeout. Is this something you guys are aware of? Any idea when it might be fixed? 2. waitFor impacting performance: On some websites, using waitFor seems to be the only way we can get valid results, but it really increases the crawl time. Has anyone else experienced this? Any tips for speeding things up? We’re seeing a lot of 200 status codes but with empty content. 3. Lack of detailed error info: When scraping fails, we only get generic errors like "500 INTERNAL ERROR" or "Request timed out". Is there any way to get more specific error details to help debug? 4. Inconsistent crawl results: We’ve noticed some inconsistencies when crawling the same site multiple times. Here are a couple of examples: bootbarn.com: id: b0d4744e-3cf9-4e30-bf6d-576530d8fc87: Had 60 errors (pages returning 500 INTERNAL ERROR), but the same pages scraped fine on a different run (e.g., https://www.bootbarn.com/sale/promotions/denim-deals/). id: d8bd1cee-d469-4d2f-bf6f-c1f9a6fad863: No errors at all. delaval.com: id: 975ba184-8d29-473e-adfe-73174ebe0a57 and id: bae8c29a-bd92-47c9-925e-894529ced16c: We got a ton of timeouts (496 in one run, 510 in the next), but when we scrape these pages one by one, they work fine (e.g., https://www.delaval.com/en-gb/animal-welfare/cow-comfort-stalls/light-installation/). Also noticed inconsistent page counts for the same site: Wikipedia (Bicycle page): id: d34788dd-d2d5-4252-8ba9-76524ff4a32c: 286 pages scraped. id: 47a9224b-e75b-4c57-ad3e-e2cb7a33d1e6: 318 pages scraped. Would love to hear any advice or thoughts on these issues! Thanks in advance! 🙌
9 Replies
Adobe.Flash
Adobe.Flash13mo ago
@renata-k Hey there! Thanks for reaching out! I just forwarded that to the team and we are investigating! In regards to your questions: 1. Does that happen when you click on a job in the log? 2. We are in the process of rolling out a smarter way of "waiting for the page to load" internally so users can minimize the use of waitFor. Has this happened recently? Could you share a url if so? 3. Yes, good point. Although sometimes is hard to understand why it failed, we are in the process of pushing more specific details when we can. Those will start showing in the warning params. 4. ccing @mogery to look into that one. Thanks again for the feedback and comments! Also happy to setup a Slack connect to iterate on these issues quicker.
renata-k
renata-kOP13mo ago
Thank you @Adobe.Flash! 1. Yes, when I either try to download the documents or get more details regarding the job 2. That's great to hear! Do you have an ETA there? Examples can be: https://www.delaval.com and https://support.carta.com/s/
Home - DeLaval
Shop DeLaval products for dairy farmers.
Adobe.Flash
Adobe.Flash13mo ago
Gotcha! Thanks! 1. How big is the crawl that you are trying to view/download (how many docs)? 2. This week! ccing @thomas on #2
thomas
thomas13mo ago
hey @renata-k , just pushed some fixes it should work now. Let me know if you face any issues 🙂
renata-k
renata-kOP13mo ago
@Adobe.Flash The biggest I had was 1000 documents. I also got some errors even when I was trying to get 1 result, but this was last Friday and I saw you had some issues then, so it may have been connected. Perfect! Let me test this a bit 🙂 @thomas results are much better after the fix, thank you! 👏 I have a couple more questions: 1. Is there a way to check token usage through the API itself? 2. For rate limits, do we need to track how many scraping requests we're sending, or does the API just throttle the responses automatically when we hit the limit? Thanks in advance!
thomas
thomas13mo ago
Not sure about 1, @Adobe.Flash ? 2. Yes it will throttle you
Adobe.Flash
Adobe.Flash13mo ago
Not yet! Curious on how would you use that. Shouldn’t be hard to get it prioritized
renata-k
renata-kOP12mo ago
We’d like to track token usage to avoid exceeding the limit. We're planning to use FireCrawl in different environments, including local and testing setups. In these environments, we'd prefer to limit usage rather than risk going over the token limit when we're getting close
Adobe.Flash
Adobe.Flash12mo ago
For sure @renata-k , that's a common request and we are working on it. In the meantime, we do send alert emails at 80% and at limit usage.

Did you find this page helpful?