Scrape requests fail by timeout
Hello!
All the requests for websites scraping fails by timeout.
Domains are available, most pages of them were scraped 1 hr ago successfully.
I tried to scrape our own domain (that doesn't have any protection) and that one fails too.
Which additional information should I provide to figure out with that?
Thank you in advance!
36 Replies
Hey chichin, we're having an incident and we're working on it right now.
Thank you π€
@chichin We've made some changes and it should be working better now
Yeah, that works!
Thank you a lot!
@mogery we are still getting 408s on /scrape
I think it's back down, I'm getting 400 errors as well. @Moderator
Playground works for me so that seems like a different issue -- ccing @Adobe.Flash @rafaelmiller
whatever it was, it's now working again. Maybe you guys got the hug of death! Not a bad problem to have.
ok, I'm shutting things down for the night, it keeps breaking and I'm pretty sure it's the API, now what I'm passing it.
We're getting hugged to death quite often haha. Launching some things to make it much better soon, and we're applying hotfixes 24/7. Thanks for the patience ππ» ππ» ππ»
I have been getting the same error frequently as well. Hasn't this been fixed yet?
Request Timeout: Failed to scrape URL as the request timed out. Request timed out
We're working on it, will let you know if we have updates
@mogery Seems like the CRAWL jobs are stuck now. What's the matter now? This is creating serious concerns on reliability of your services.
adding for visibility @Adobe.Flash @Caleb
We're on it, deploying a fix now.
Still working on fixing the issues fully, but we're back up
hey guys, can we get an update on the status of the API. I'm doing some things and getting errors, I'm not sure if those errors are related to the issue or if it's something else that's causing the error.
Hey @MikeFreeman, things should be working right now. What errors are you getting?
Thanks for getting back to me so quicily. My scrape seems to be working about 50% of the time and the other 50% of the time it's failing.
I am not sure of the cause, whether the error is coming from the earlier issue with FireCrawl or the issue is coming from what I'm trying to scrape. I'm going to continue to test this and make sure it's going to be functional. But from what I can see, it's kind of working. i might just build in some sort of "fallback" so if it does fail, it will loop until it doesn't fail. That is a "fix" I guess, right?
Thanks again for the quick response, it's really helpful having good support, so I wanted to make sure and tell you I appreciate that.
@MikeFreeman Of course :) Can you send the error message you're getting?
A fallback/retry system is a good idea in general
I'm fairly certain I've identified the issue. It looks like Zapier's API response time is hitting prior to the call coming back and that's causing it to intermitently fail.
@mogery ^
I think their API response time is like 30 seconds, and if what I'm scraping takes longer than that, it's failing. This exaplains why it's not each time, I'll make no changes then randomly it'll fail.
Interesting. A scrape shouldn't take more than a few seconds. Can you give me an example of what page you are scraping?
The page specifically might be something we can optimize for on our end.
sure, would you DM me though? Is that possible?
sent!
are my DMs not open by default?
Hi @mogery , Ideally a scrape should take a few seconds only no matter what URL is being used to make it a scalable approach.
But I have been experiencing slowness in the scraping requests in my process. Can you help with this.
Yes @Sachin, what URLs are you encountering slowness on?
for example, in order to scrape 60 URLs for website GRAIL.COM, it's taking around 5 minutes to complete [which is slow as well]. But for scraping same amount of pages for SYNEOSHEALTH.COM it takes around 12 minutes to complete.
Is this a crawl or individual scrape jobs? Are you running LLM Extract?
I'm scraping URLs from these websites individually. Nope, just a simple scrape request.
Okay, are you parallelizing them? Or just sending one at a time
Right now it's an iterative approach in order to keep the scrape requests within the limit.
Unfortunately the only way to speed this up would be to parallelize. We'll be increasing rate limits soon and launching a bulk scrape endpoint which might help with this workflow.
Parallelization lets you take use of our full server fleet and you should see an exponential speed-up.
By Parallelization you mean performing scrape requests in parallel using local compute or is there any new feature from FireCrawl side which can be enabled through some parameter?
I mean sending multiple requests on your end, yes
The first option
but this would cause the scrape limits per minute to exhaust pretty quickly. Then I would to have to employ the sleep timer. Wouldn't that be slower as well
My initial point was to highlight the fact that why the scraping mechanism performs differently for individual URLs of different websites?
The ideal solution would be to parallelize up to the point where you're not hitting the rate limits yet. You can also upgrade your subscription to get higher rate limits if your workflow requires it. Or alternatively, wait until we launch increased rate limits (next week)
Different URLs require different methods of scraping. The network speed also varies (e.g. if you try to load syneoshealth in a browser, it's noticably slower than grail)
We have a bunch of methods that we go through in case one fails, and once a page is calibrated to the right method, eventually we just hit a point where the speed of electrons going down a network cable is making things slow :D
There's still quite a bit of overhead on scrape and crawl logic though that we're reducing all the time, so expect scrapes to get faster in the future
Making fast APIs while properly dealing with authentication and billing is unfortunately not as easy as it should be
got it, thanks for the clarification.
Hi @mogery , quick question.
Does the code throws such type of exception as well?
Expecting value: line 1 column 1 (char 0)
We're not seeing this on our end. Where are you getting this error?
ok, might be related to the code design then. Need to investigate, though nothing much is happening in the code block.