F
Firecrawlβ€’14mo ago
chichin

Scrape requests fail by timeout

Hello! All the requests for websites scraping fails by timeout. Domains are available, most pages of them were scraped 1 hr ago successfully. I tried to scrape our own domain (that doesn't have any protection) and that one fails too. Which additional information should I provide to figure out with that? Thank you in advance!
36 Replies
mogery
mogeryβ€’14mo ago
Hey chichin, we're having an incident and we're working on it right now.
chichin
chichinOPβ€’14mo ago
Thank you 🀝
mogery
mogeryβ€’14mo ago
@chichin We've made some changes and it should be working better now
chichin
chichinOPβ€’14mo ago
Yeah, that works! Thank you a lot!
Get Rec'd
Get Rec'dβ€’14mo ago
@mogery we are still getting 408s on /scrape
MikeFreeman
MikeFreemanβ€’14mo ago
I think it's back down, I'm getting 400 errors as well. @Moderator
mogery
mogeryβ€’14mo ago
Playground works for me so that seems like a different issue -- ccing @Adobe.Flash @rafaelmiller
MikeFreeman
MikeFreemanβ€’14mo ago
whatever it was, it's now working again. Maybe you guys got the hug of death! Not a bad problem to have. ok, I'm shutting things down for the night, it keeps breaking and I'm pretty sure it's the API, now what I'm passing it.
mogery
mogeryβ€’14mo ago
We're getting hugged to death quite often haha. Launching some things to make it much better soon, and we're applying hotfixes 24/7. Thanks for the patience πŸ™πŸ» πŸ™πŸ» πŸ™πŸ»
Sachin
Sachinβ€’14mo ago
I have been getting the same error frequently as well. Hasn't this been fixed yet? Request Timeout: Failed to scrape URL as the request timed out. Request timed out
mogery
mogeryβ€’14mo ago
We're working on it, will let you know if we have updates
Sachin
Sachinβ€’14mo ago
@mogery Seems like the CRAWL jobs are stuck now. What's the matter now? This is creating serious concerns on reliability of your services. adding for visibility @Adobe.Flash @Caleb
mogery
mogeryβ€’14mo ago
We're on it, deploying a fix now. Still working on fixing the issues fully, but we're back up
MikeFreeman
MikeFreemanβ€’14mo ago
hey guys, can we get an update on the status of the API. I'm doing some things and getting errors, I'm not sure if those errors are related to the issue or if it's something else that's causing the error.
mogery
mogeryβ€’14mo ago
Hey @MikeFreeman, things should be working right now. What errors are you getting?
MikeFreeman
MikeFreemanβ€’14mo ago
Thanks for getting back to me so quicily. My scrape seems to be working about 50% of the time and the other 50% of the time it's failing. I am not sure of the cause, whether the error is coming from the earlier issue with FireCrawl or the issue is coming from what I'm trying to scrape. I'm going to continue to test this and make sure it's going to be functional. But from what I can see, it's kind of working. i might just build in some sort of "fallback" so if it does fail, it will loop until it doesn't fail. That is a "fix" I guess, right? Thanks again for the quick response, it's really helpful having good support, so I wanted to make sure and tell you I appreciate that.
mogery
mogeryβ€’14mo ago
@MikeFreeman Of course :) Can you send the error message you're getting? A fallback/retry system is a good idea in general
MikeFreeman
MikeFreemanβ€’14mo ago
I'm fairly certain I've identified the issue. It looks like Zapier's API response time is hitting prior to the call coming back and that's causing it to intermitently fail. @mogery ^ I think their API response time is like 30 seconds, and if what I'm scraping takes longer than that, it's failing. This exaplains why it's not each time, I'll make no changes then randomly it'll fail.
mogery
mogeryβ€’14mo ago
Interesting. A scrape shouldn't take more than a few seconds. Can you give me an example of what page you are scraping? The page specifically might be something we can optimize for on our end.
MikeFreeman
MikeFreemanβ€’14mo ago
sure, would you DM me though? Is that possible?
mogery
mogeryβ€’14mo ago
sent! are my DMs not open by default?
Sachin
Sachinβ€’14mo ago
Hi @mogery , Ideally a scrape should take a few seconds only no matter what URL is being used to make it a scalable approach. But I have been experiencing slowness in the scraping requests in my process. Can you help with this.
mogery
mogeryβ€’14mo ago
Yes @Sachin, what URLs are you encountering slowness on?
Sachin
Sachinβ€’14mo ago
for example, in order to scrape 60 URLs for website GRAIL.COM, it's taking around 5 minutes to complete [which is slow as well]. But for scraping same amount of pages for SYNEOSHEALTH.COM it takes around 12 minutes to complete.
mogery
mogeryβ€’14mo ago
Is this a crawl or individual scrape jobs? Are you running LLM Extract?
Sachin
Sachinβ€’14mo ago
I'm scraping URLs from these websites individually. Nope, just a simple scrape request.
mogery
mogeryβ€’14mo ago
Okay, are you parallelizing them? Or just sending one at a time
Sachin
Sachinβ€’14mo ago
Right now it's an iterative approach in order to keep the scrape requests within the limit.
mogery
mogeryβ€’14mo ago
Unfortunately the only way to speed this up would be to parallelize. We'll be increasing rate limits soon and launching a bulk scrape endpoint which might help with this workflow. Parallelization lets you take use of our full server fleet and you should see an exponential speed-up.
Sachin
Sachinβ€’14mo ago
By Parallelization you mean performing scrape requests in parallel using local compute or is there any new feature from FireCrawl side which can be enabled through some parameter?
mogery
mogeryβ€’14mo ago
I mean sending multiple requests on your end, yes The first option
Sachin
Sachinβ€’14mo ago
but this would cause the scrape limits per minute to exhaust pretty quickly. Then I would to have to employ the sleep timer. Wouldn't that be slower as well My initial point was to highlight the fact that why the scraping mechanism performs differently for individual URLs of different websites?
mogery
mogeryβ€’14mo ago
The ideal solution would be to parallelize up to the point where you're not hitting the rate limits yet. You can also upgrade your subscription to get higher rate limits if your workflow requires it. Or alternatively, wait until we launch increased rate limits (next week) Different URLs require different methods of scraping. The network speed also varies (e.g. if you try to load syneoshealth in a browser, it's noticably slower than grail) We have a bunch of methods that we go through in case one fails, and once a page is calibrated to the right method, eventually we just hit a point where the speed of electrons going down a network cable is making things slow :D There's still quite a bit of overhead on scrape and crawl logic though that we're reducing all the time, so expect scrapes to get faster in the future Making fast APIs while properly dealing with authentication and billing is unfortunately not as easy as it should be
Sachin
Sachinβ€’14mo ago
got it, thanks for the clarification. Hi @mogery , quick question. Does the code throws such type of exception as well? Expecting value: line 1 column 1 (char 0)
mogery
mogeryβ€’14mo ago
We're not seeing this on our end. Where are you getting this error?
Sachin
Sachinβ€’14mo ago
ok, might be related to the code design then. Need to investigate, though nothing much is happening in the code block.

Did you find this page helpful?