F
Firecrawl14mo ago
Amoggh

Every crawl returns null data

I have been facing this for past 5 days and hasn't been resolved yet, hence putting here: I just crawled a site and waited after getting completed status for long time over 1-2 min and still the data isn't right. Now instead of sending just null in data, it is sending a list of nulls. Now this even breaks my checks, over if I received an empty list or a null value in data field. The job_id is 56834938-0b41-458c-9012-9c7bcd7f7cbf Further, can't I access older crawls done using their job_ids? I get job_id doesn't for older jobs whereas they are there in the dashboard The same is with every crawl I try can check job: 3232284b-94d1-4c95-90b9-d19065587d5b as well. the data returned in API is of format:
"data": [null, null, null,....]
"data": [null, null, null,....]
As suggested I have delayed my final request to use the data by another 10 sec still the problem persists. Even trying to download the data from activity logs get's me this null data only.
16 Replies
Adobe.Flash
Adobe.Flash14mo ago
That's really odd, we are looking into it! We found the issue, working on a potential fix for it! @Amoggh Quick question, how many urls did it say it crawled?
andreichiro
andreichiro14mo ago
Hi, this also happened to me :/
Amoggh
AmogghOP14mo ago
@Adobe.Flash all had same problem
No description
Amoggh
AmogghOP14mo ago
And 1000 here is incorrect as the pages in the path were only 23
Julia Schroder
Julia Schroder14mo ago
yes, I had the same issue and it says I used 100 credits (it was the max pages i requested) when the website was smaller
andreichiro
andreichiro14mo ago
Unfortunately, I've also used a lot of credits too. More importantly, though, is the crawler working for larger requests?
Adobe.Flash
Adobe.Flash14mo ago
Hey yall, we are releasing v1 of the API today which fixes these issues. I will add back the credits to you all's accounts. Dm me your emails please. Ccing @mogery here for visiblity too
Adobe.Flash
Adobe.Flash14mo ago
@Amoggh @Julia Schroder @andreichiro could you all try the same url that failed on our new v1 endpoints? Thank youuu! https://docs.firecrawl.dev/features/crawl
Firecrawl Docs
Crawl | Firecrawl
Firecrawl can recursively search through a urls subdomains, and gather the content
Julia Schroder
Julia Schroder14mo ago
yes i'm getting better results!
andreichiro
andreichiro14mo ago
Hello, I tried using v1 and I'm also getting null results.
Adobe.Flash
Adobe.Flash14mo ago
dmd you @andreichiro
andreichiro
andreichiro14mo ago
Thanks! Just an update: In the dashboard, the number of documents from the crawling appears correct, but the number available for download or in the response via the SDK contains far fewer documents (from 1093 to around 40, from 75 to 9 and so on). Credits are consumed in the right number (ignoring the wrong count of documents), but the documents are not available. I tested both the async method and the regular crawler. It seems to happen only for 1k documents or more (everything works for 100-200 documents) Thanks!
mogery
mogery14mo ago
Which SDK is this?
andreichiro
andreichiro14mo ago
Python latest version! Also, in the dashboard the map function works (showing all links), but when calling the API with the map function, it seems there is a 50 links hard limit
Adobe.Flash
Adobe.Flash14mo ago
Thanks for the heads up! It should be fixed.
andreichiro
andreichiro13mo ago
Hi! I did the pagination but still unable to download the data Error: No documents to download, something went wrong. Please contact help@firecrawl.dev /crawl 4946 October 7, 2024 at 11:45:29 AM API https://docs.snowflake.com/en crawl • id: bec81f38-529b-451d-93ab-bc760d750dc3 • 1938.434s • success

Did you find this page helpful?