F
Firecrawl14mo ago
Amoggh

Incorrect response structure

I have been using firecrawl to crawl websites. Now this has happened very frequently that when I crawl a site of over 100 pages inside it the response structure gets corrupted for some reason. and becomes like:
{"status": "completed", "current": 164, "total": 164, "data": {"success": true, "result": {"links": [{"content": {[All data], "source": "url"}],
}
{"status": "completed", "current": 164, "total": 164, "data": {"success": true, "result": {"links": [{"content": {[All data], "source": "url"}],
}
Secondly I just scraped a site with jobID: cca00388-e453-4446-8225-6a22ef379216 The data field for this is returned as null in status api. Response:
{"status": "completed", "current": 1000, "total": 1000, "data": null, "partial_data": []}
{"status": "completed", "current": 1000, "total": 1000, "data": null, "partial_data": []}
My 1000 credits have been used for the same as well. And the request doesn't come up in activity logs dashboard as well. I need help with understanding what is going wrong here. This breaks my data extraction pipeline. I am using Firecrawl python sdk to make the call Here is the call:
crawl_result = self.app.crawl_url(
self.source_url,
params={
"crawlerOptions": {
"limit": self.max_urls,
"maxDepth": self.crawl_depth,
},
"pageOptions": {
"onlyMainContent": True,
"includeRawHtml": True,
},
},
wait_until_done=False,
)
crawl_result = self.app.crawl_url(
self.source_url,
params={
"crawlerOptions": {
"limit": self.max_urls,
"maxDepth": self.crawl_depth,
},
"pageOptions": {
"onlyMainContent": True,
"includeRawHtml": True,
},
},
wait_until_done=False,
)
In trial period, this happened, and I thought it's due to being in trial but this is very bad user experience in a product you are paying for. For now I have to add following checks in my code:
where crawl_result = response.json()['data']
if not crawl_result:
print("[Firecrawl Error] Crawl data is null")
return []
if isinstance(crawl_result, dict):
if crawl_result['result']:
crawl_result = [c['content'] for c in crawl_result['result']['links']]
else:
print("[Firecrawl Error] crawl result data of unknown format")
return []
if not isinstance(crawl_result, list):
print("[Firecrawl Error] crawl result data of unknown format")
return None
where crawl_result = response.json()['data']
if not crawl_result:
print("[Firecrawl Error] Crawl data is null")
return []
if isinstance(crawl_result, dict):
if crawl_result['result']:
crawl_result = [c['content'] for c in crawl_result['result']['links']]
else:
print("[Firecrawl Error] crawl result data of unknown format")
return []
if not isinstance(crawl_result, list):
print("[Firecrawl Error] crawl result data of unknown format")
return None
2 Replies
Adobe.Flash
Adobe.Flash14mo ago
Hey @Amoggh we just pushed a fix for it!
Amoggh
AmogghOP14mo ago
Thanks Nick! That was fast

Did you find this page helpful?