Firecrawl•14mo ago

Every crawl returns null data

I have been facing this for past 5 days and hasn't been resolved yet, hence putting here: I just crawled a site and waited after getting completed status for long time over 1-2 min and still the data isn't right. Now instead of sending just null in data, it is sending a list of nulls. Now this even breaks my checks, over if I received an empty list or a null value in data field. The job_id is 56834938-0b41-458c-9012-9c7bcd7f7cbf Further, can't I access older crawls done using their job_ids? I get job_id doesn't for older jobs whereas they are there in the dashboard The same is with every crawl I try can check job: 3232284b-94d1-4c95-90b9-d19065587d5b as well. the data returned in API is of format:

"data": [null, null, null,....]

"data": [null, null, null,....]

As suggested I have delayed my final request to use the data by another 10 sec still the problem persists. Even trying to download the data from activity logs get's me this null data only.

16 Replies

Adobe.Flash•14mo ago

That's really odd, we are looking into it! We found the issue, working on a potential fix for it! @Amoggh Quick question, how many urls did it say it crawled?

andreichiro•14mo ago

Hi, this also happened to me :/

AmogghOP•14mo ago

@Adobe.Flash all had same problem

AmogghOP•14mo ago

And 1000 here is incorrect as the pages in the path were only 23

Julia Schroder•14mo ago

yes, I had the same issue and it says I used 100 credits (it was the max pages i requested) when the website was smaller

andreichiro•14mo ago

Unfortunately, I've also used a lot of credits too. More importantly, though, is the crawler working for larger requests?

Adobe.Flash•14mo ago

Hey yall, we are releasing v1 of the API today which fixes these issues. I will add back the credits to you all's accounts. Dm me your emails please. Ccing @mogery here for visiblity too

Adobe.Flash•14mo ago

@Amoggh @Julia Schroder @andreichiro could you all try the same url that failed on our new v1 endpoints? Thank youuu! https://docs.firecrawl.dev/features/crawl

Firecrawl Docs

Crawl | Firecrawl

Firecrawl can recursively search through a urls subdomains, and gather the content

Julia Schroder•14mo ago

yes i'm getting better results!

andreichiro•14mo ago

Hello, I tried using v1 and I'm also getting null results.

Adobe.Flash•14mo ago

dmd you @andreichiro

andreichiro•14mo ago

Thanks! Just an update: In the dashboard, the number of documents from the crawling appears correct, but the number available for download or in the response via the SDK contains far fewer documents (from 1093 to around 40, from 75 to 9 and so on). Credits are consumed in the right number (ignoring the wrong count of documents), but the documents are not available. I tested both the async method and the regular crawler. It seems to happen only for 1k documents or more (everything works for 100-200 documents) Thanks!

mogery•14mo ago

Which SDK is this?

andreichiro•14mo ago

Python latest version! Also, in the dashboard the map function works (showing all links), but when calling the API with the map function, it seems there is a 50 links hard limit

Adobe.Flash•14mo ago

Thanks for the heads up! It should be fixed.

andreichiro•13mo ago

Hi! I did the pagination but still unable to download the data Error: No documents to download, something went wrong. Please contact help@firecrawl.dev /crawl 4946 October 7, 2024 at 11:45:29 AM API https://docs.snowflake.com/en crawl • id: bec81f38-529b-451d-93ab-bc760d750dc3 • 1938.434s • success

Gaming

Programming

Every crawl returns null data

Did you find this page helpful?