Firecrawl•3mo ago

scrape - json mode

I am trying to migrate from the cloud version to the self hosted version. The scrape - json format mode was working fine in the cloud based firecrawl api call. However, when I tried the self hosted version, its showing this error

Failed to parse URL from /chat/completions

Failed to parse URL from /chat/completions

The self hosted version is working fine for normal scraping, also I have provided the openai key through the env

message.txt

10 Replies

NalasoOP•3mo ago

am using scrape - json mode

NalasoOP•3mo ago

code.txt

Gaurav Chadha•3mo ago

Hi @Nalaso your script is incorrect, here's a fixed simple curl version of it you can try to test.

curl -X POST "http://localhost:3002/v2/scrape" \
  -H "Content-Type: application/json" \
  --connect-timeout 120 \
  --max-time 120 \
  -d '{
    "url": "https://example.com/article",
    "onlyMainContent": true,
    "formats": [
      {
        "type": "json",
        "schema": {
          "type": "object",
          "required": [],
          "properties": {
            "title": {
              "type": "string"
            },
            "author": {
              "type": "string"
            },
            "date": {
              "type": "string"
            },
            "formatted_content": {
              "type": "string",
              "description": "Main article content with basic formatting, no ads or promotional content, no links to sources or references. Use \\n\\n to separate paragraphs."
            },
            "summary": {
              "type": "string",
              "description": "Only the gist of the text as one plain sentence (≤25 words). Convey the single most important idea and why it matters."
            },
            "primary_tag": {
              "type": "string"
            },
            "secondary_tags": {
              "type": "string"
            },
            "reading_time": {
              "type": "string"
            },
            "images": {
              "type": "array",
              "items": {
                "type": "string"
              }
            },
            "thumbnail": {
              "type": "string"
            }
          }
        }
      }
    ]
  }'

curl -X POST "http://localhost:3002/v2/scrape" \
  -H "Content-Type: application/json" \
  --connect-timeout 120 \
  --max-time 120 \
  -d '{
    "url": "https://example.com/article",
    "onlyMainContent": true,
    "formats": [
      {
        "type": "json",
        "schema": {
          "type": "object",
          "required": [],
          "properties": {
            "title": {
              "type": "string"
            },
            "author": {
              "type": "string"
            },
            "date": {
              "type": "string"
            },
            "formatted_content": {
              "type": "string",
              "description": "Main article content with basic formatting, no ads or promotional content, no links to sources or references. Use \\n\\n to separate paragraphs."
            },
            "summary": {
              "type": "string",
              "description": "Only the gist of the text as one plain sentence (≤25 words). Convey the single most important idea and why it matters."
            },
            "primary_tag": {
              "type": "string"
            },
            "secondary_tags": {
              "type": "string"
            },
            "reading_time": {
              "type": "string"
            },
            "images": {
              "type": "array",
              "items": {
                "type": "string"
              }
            },
            "thumbnail": {
              "type": "string"
            }
          }
        }
      }
    ]
  }'

this will work, also, please note: you'll need to use a real url.

NalasoOP•3mo ago

@Gaurav Chadha From your screenshot i could only see data.metadata. Can you confirm if data.json is present in the response? can you try with some techcrunch article url?

Gaurav Chadha•3mo ago

@Nalaso in v2, when using structured extraction with formats, the response structure typically looks like this:

{
  "success": true,
  "data": {
    "metadata": { ... },
    "json": { ... }
  }
}

{
  "success": true,
  "data": {
    "metadata": { ... },
    "json": { ... }
  }
}

NalasoOP•3mo ago

@Gaurav Chadha Thanks for the quick reply

NalasoOP•3mo ago

Am referring to this one - https://docs.firecrawl.dev/features/llm-extract

Firecrawl Docs

JSON mode | Firecrawl

Extract structured data from pages via LLMs

NalasoOP•3mo ago

Also its data.json not data.data I think if llm extraction fails then response is returned success: true and without data.json

Gaurav Chadha•3mo ago

right, I see now, let me check and verify @Nalaso in the .env can you add these two below OPEN_API_KEY OPENAI_BASE_URL=https://api.openai.com/v1 MODEL_NAME=gpt-4o-mini and then restart the container? The above issue is due to missing baseurl and model

NalasoOP•3mo ago

@Gaurav Chadha Thank you so much This worked

Gaming

Programming

scrape - json mode

Did you find this page helpful?