Issues and Inconsistencies During FireCrawl Testing

Hey everyone!

We’ve been testing FireCrawl for a bit at our company, and we’ve come across some issues that I wanted to get your thoughts on:

Crashes when viewing logs: We keep seeing crashes when trying to view crawl results on the logs page, with the error: "Application error: a client-side exception has occurred (see the browser console for more info)." It looks like a 504 Gateway Timeout. Is this something you guys are aware of? Any idea when it might be fixed?
waitFor impacting performance: On some websites, using waitFor seems to be the only way we can get valid results, but it really increases the crawl time. Has anyone else experienced this? Any tips for speeding things up? We’re seeing a lot of 200 status codes but with empty content.
Lack of detailed error info: When scraping fails, we only get generic errors like "500 INTERNAL ERROR" or "Request timed out". Is there any way to get more specific error details to help debug?
Inconsistent crawl results: We’ve noticed some inconsistencies when crawling the same site multiple times. Here are a couple of examples:

bootbarn.com:
id: b0d4744e-3cf9-4e30-bf6d-576530d8fc87: Had 60 errors (pages returning 500 INTERNAL ERROR), but the same pages scraped fine on a different run (e.g., https://www.bootbarn.com/sale/promotions/denim-deals/).
id: d8bd1cee-d469-4d2f-bf6f-c1f9a6fad863: No errors at all.
delaval.com:
id: 975ba184-8d29-473e-adfe-73174ebe0a57 and id: bae8c29a-bd92-47c9-925e-894529ced16c: We got a ton of timeouts (496 in one run, 510 in the next), but when we scrape these pages one by one, they work fine (e.g., https://www.delaval.com/en-gb/animal-welfare/cow-comfort-stalls/light-installation/).
Also noticed inconsistent page counts for the same site:

Wikipedia (Bicycle page):
id: d34788dd-d2d5-4252-8ba9-76524ff4a32c: 286 pages scraped.
id: 47a9224b-e75b-4c57-ad3e-e2cb7a33d1e6: 318 pages scraped.
Would love to hear any advice or thoughts on these issues!

Thanks in advance!

Adobe.Flash•10/1/24, 3:21 PM

@renata-k Hey there! Thanks for reaching out! I just forwarded that to the team and we are investigating! In regards to your questions:

Does that happen when you click on a job in the log?
We are in the process of rolling out a smarter way of "waiting for the page to load" internally so users can minimize the use of waitFor. Has this happened recently? Could you share a url if so?
Yes, good point. Although sometimes is hard to understand why it failed, we are in the process of pushing more specific details when we can. Those will start showing in the warningwarning params.
ccing @mogery to look into that one.

Thanks again for the feedback and comments! Also happy to setup a Slack connect to iterate on these issues quicker.

renata-kOP•10/3/24, 11:00 AM

Thank you @Adobe.Flash!

Yes, when I either try to download the documents or get more details regarding the job
That's great to hear! Do you have an ETA there? Examples can be: https://www.delaval.com and https://support.carta.com/s/

Home - DeLaval

Shop DeLaval products for dairy farmers.

Adobe.Flash•10/3/24, 5:01 PM

Gotcha! Thanks!

How big is the crawl that you are trying to view/download (how many docs)?
This week!

Adobe.Flash•10/3/24, 5:01 PM

ccing @thomas on #2

thomas•10/4/24, 12:57 AM

hey @renata-k , just pushed some fixes it should work now. Let me know if you face any issues

renata-kOP•10/4/24, 7:39 AM

@Adobe.Flash The biggest I had was 1000 documents.

I also got some errors even when I was trying to get 1 result, but this was last Friday and I saw you had some issues then, so it may have been connected.

Tthomas hey @renata-k , just pushed some fixes it should work now. Let me know if you fa...

renata-kOP•10/4/24, 7:40 AM

Perfect! Let me test this a bit

renata-kOP•10/7/24, 7:00 AM

@thomas results are much better after the fix, thank you!

I have a couple more questions:

Is there a way to check token usage through the API itself?
For rate limits, do we need to track how many scraping requests we're sending, or does the API just throttle the responses automatically when we hit the limit?

Thanks in advance!

Rrenata-k @thomas results are much better after the fix, thank you! 👏 I have a couple m...

thomas•10/7/24, 11:00 AM

Not sure about 1, @Adobe.Flash ? 2. Yes it will throttle you

Rrenata-k @thomas results are much better after the fix, thank you! 👏 I have a couple m...

Adobe.Flash•10/7/24, 3:32 PM

Not yet! Curious on how would you use that. Shouldn’t be hard to get it prioritized

renata-kOP•10/8/24, 1:30 PM

We’d like to track token usage to avoid exceeding the limit. We're planning to use FireCrawl in different environments, including local and testing setups. In these environments, we'd prefer to limit usage rather than risk going over the token limit when we're getting close

Adobe.Flash•10/8/24, 4:51 PM

For sure @renata-k , that's a common request and we are working on it. In the meantime, we do send alert emails at 80% and at limit usage.

Issues and Inconsistencies During FireCrawl Testing

Similar Threads

Similar Threads

Similar Threads