Firecrawl•14mo ago

Scrape requests fail by timeout

Hello! All the requests for websites scraping fails by timeout. Domains are available, most pages of them were scraped 1 hr ago successfully. I tried to scrape our own domain (that doesn't have any protection) and that one fails too. Which additional information should I provide to figure out with that? Thank you in advance!

36 Replies

mogery•14mo ago

Hey chichin, we're having an incident and we're working on it right now.

chichinOP•14mo ago

Thank you 🤝

mogery•14mo ago

@chichin We've made some changes and it should be working better now

chichinOP•14mo ago

Yeah, that works! Thank you a lot!

Get Rec'd•14mo ago

@mogery we are still getting 408s on /scrape

MikeFreeman•14mo ago

I think it's back down, I'm getting 400 errors as well. @Moderator

mogery•14mo ago

Playground works for me so that seems like a different issue -- ccing @Adobe.Flash @rafaelmiller

MikeFreeman•14mo ago

whatever it was, it's now working again. Maybe you guys got the hug of death! Not a bad problem to have. ok, I'm shutting things down for the night, it keeps breaking and I'm pretty sure it's the API, now what I'm passing it.

mogery•14mo ago

We're getting hugged to death quite often haha. Launching some things to make it much better soon, and we're applying hotfixes 24/7. Thanks for the patience 🙏🏻 🙏🏻 🙏🏻

Sachin•14mo ago

I have been getting the same error frequently as well. Hasn't this been fixed yet? Request Timeout: Failed to scrape URL as the request timed out. Request timed out

mogery•14mo ago

We're working on it, will let you know if we have updates

Sachin•14mo ago

@mogery Seems like the CRAWL jobs are stuck now. What's the matter now? This is creating serious concerns on reliability of your services. adding for visibility @Adobe.Flash @Caleb

mogery•14mo ago

We're on it, deploying a fix now. Still working on fixing the issues fully, but we're back up

MikeFreeman•14mo ago

hey guys, can we get an update on the status of the API. I'm doing some things and getting errors, I'm not sure if those errors are related to the issue or if it's something else that's causing the error.

mogery•14mo ago

Hey @MikeFreeman, things should be working right now. What errors are you getting?

MikeFreeman•14mo ago

Thanks for getting back to me so quicily. My scrape seems to be working about 50% of the time and the other 50% of the time it's failing. I am not sure of the cause, whether the error is coming from the earlier issue with FireCrawl or the issue is coming from what I'm trying to scrape. I'm going to continue to test this and make sure it's going to be functional. But from what I can see, it's kind of working. i might just build in some sort of "fallback" so if it does fail, it will loop until it doesn't fail. That is a "fix" I guess, right? Thanks again for the quick response, it's really helpful having good support, so I wanted to make sure and tell you I appreciate that.

mogery•14mo ago

@MikeFreeman Of course :) Can you send the error message you're getting? A fallback/retry system is a good idea in general

MikeFreeman•14mo ago

I'm fairly certain I've identified the issue. It looks like Zapier's API response time is hitting prior to the call coming back and that's causing it to intermitently fail. @mogery ^ I think their API response time is like 30 seconds, and if what I'm scraping takes longer than that, it's failing. This exaplains why it's not each time, I'll make no changes then randomly it'll fail.

mogery•14mo ago

Interesting. A scrape shouldn't take more than a few seconds. Can you give me an example of what page you are scraping? The page specifically might be something we can optimize for on our end.

MikeFreeman•14mo ago

sure, would you DM me though? Is that possible?

mogery•14mo ago

sent! are my DMs not open by default?

Sachin•14mo ago

Hi @mogery , Ideally a scrape should take a few seconds only no matter what URL is being used to make it a scalable approach. But I have been experiencing slowness in the scraping requests in my process. Can you help with this.

mogery•14mo ago

Yes @Sachin, what URLs are you encountering slowness on?

Sachin•14mo ago

for example, in order to scrape 60 URLs for website GRAIL.COM, it's taking around 5 minutes to complete [which is slow as well]. But for scraping same amount of pages for SYNEOSHEALTH.COM it takes around 12 minutes to complete.

mogery•14mo ago

Is this a crawl or individual scrape jobs? Are you running LLM Extract?

Sachin•14mo ago

I'm scraping URLs from these websites individually. Nope, just a simple scrape request.

mogery•14mo ago

Okay, are you parallelizing them? Or just sending one at a time

Sachin•14mo ago

Right now it's an iterative approach in order to keep the scrape requests within the limit.

mogery•14mo ago

Unfortunately the only way to speed this up would be to parallelize. We'll be increasing rate limits soon and launching a bulk scrape endpoint which might help with this workflow. Parallelization lets you take use of our full server fleet and you should see an exponential speed-up.

Sachin•14mo ago

By Parallelization you mean performing scrape requests in parallel using local compute or is there any new feature from FireCrawl side which can be enabled through some parameter?

mogery•14mo ago

I mean sending multiple requests on your end, yes The first option

Sachin•14mo ago

but this would cause the scrape limits per minute to exhaust pretty quickly. Then I would to have to employ the sleep timer. Wouldn't that be slower as well My initial point was to highlight the fact that why the scraping mechanism performs differently for individual URLs of different websites?

mogery•14mo ago

The ideal solution would be to parallelize up to the point where you're not hitting the rate limits yet. You can also upgrade your subscription to get higher rate limits if your workflow requires it. Or alternatively, wait until we launch increased rate limits (next week) Different URLs require different methods of scraping. The network speed also varies (e.g. if you try to load syneoshealth in a browser, it's noticably slower than grail) We have a bunch of methods that we go through in case one fails, and once a page is calibrated to the right method, eventually we just hit a point where the speed of electrons going down a network cable is making things slow :D There's still quite a bit of overhead on scrape and crawl logic though that we're reducing all the time, so expect scrapes to get faster in the future Making fast APIs while properly dealing with authentication and billing is unfortunately not as easy as it should be

Sachin•14mo ago

got it, thanks for the clarification. Hi @mogery , quick question. Does the code throws such type of exception as well? Expecting value: line 1 column 1 (char 0)

mogery•14mo ago

We're not seeing this on our end. Where are you getting this error?

Sachin•14mo ago

ok, might be related to the code design then. Need to investigate, though nothing much is happening in the code block.

Gaming

Programming

Scrape requests fail by timeout

Did you find this page helpful?