Firecrawl•14mo ago

Incorrect response structure

I have been using firecrawl to crawl websites. Now this has happened very frequently that when I crawl a site of over 100 pages inside it the response structure gets corrupted for some reason. and becomes like:

{"status": "completed", "current": 164, "total": 164, "data": {"success": true, "result": {"links": [{"content": {[All data], "source": "url"}],
}

{"status": "completed", "current": 164, "total": 164, "data": {"success": true, "result": {"links": [{"content": {[All data], "source": "url"}],
}

Secondly I just scraped a site with jobID: cca00388-e453-4446-8225-6a22ef379216 The data field for this is returned as null in status api. Response:

{"status": "completed", "current": 1000, "total": 1000, "data": null, "partial_data": []}

{"status": "completed", "current": 1000, "total": 1000, "data": null, "partial_data": []}

My 1000 credits have been used for the same as well. And the request doesn't come up in activity logs dashboard as well. I need help with understanding what is going wrong here. This breaks my data extraction pipeline. I am using Firecrawl python sdk to make the call Here is the call:

crawl_result = self.app.crawl_url(
    self.source_url,
    params={
        "crawlerOptions": {
            "limit": self.max_urls,
            "maxDepth": self.crawl_depth,
        },
        "pageOptions": {
            "onlyMainContent": True,
            "includeRawHtml": True,
        },
    },
    wait_until_done=False,
)

crawl_result = self.app.crawl_url(
    self.source_url,
    params={
        "crawlerOptions": {
            "limit": self.max_urls,
            "maxDepth": self.crawl_depth,
        },
        "pageOptions": {
            "onlyMainContent": True,
            "includeRawHtml": True,
        },
    },
    wait_until_done=False,
)

In trial period, this happened, and I thought it's due to being in trial but this is very bad user experience in a product you are paying for. For now I have to add following checks in my code:

where crawl_result = response.json()['data']
if not crawl_result:
    print("[Firecrawl Error] Crawl data is null")
    return []
if isinstance(crawl_result, dict):
    if crawl_result['result']:
        crawl_result = [c['content'] for c in crawl_result['result']['links']]
    else:
        print("[Firecrawl Error] crawl result data of unknown format")
        return []
if not isinstance(crawl_result, list):
    print("[Firecrawl Error] crawl result data of unknown format")
    return None

where crawl_result = response.json()['data']
if not crawl_result:
    print("[Firecrawl Error] Crawl data is null")
    return []
if isinstance(crawl_result, dict):
    if crawl_result['result']:
        crawl_result = [c['content'] for c in crawl_result['result']['links']]
    else:
        print("[Firecrawl Error] crawl result data of unknown format")
        return []
if not isinstance(crawl_result, list):
    print("[Firecrawl Error] crawl result data of unknown format")
    return None

2 Replies

Adobe.Flash•14mo ago

Hey @Amoggh we just pushed a fix for it!

AmogghOP•14mo ago

Thanks Nick! That was fast

Gaming

Programming

Incorrect response structure

Did you find this page helpful?