scrape - json mode

I am trying to migrate from the cloud version to the self hosted version. The scrape - json format mode was working fine in the cloud based firecrawl api call. However, when I tried the self hosted version, its showing this error
Failed to parse URL from /chat/completions
Failed to parse URL from /chat/completions
The self hosted version is working fine for normal scraping, also I have provided the openai key through the env
10 Replies
Nalaso
NalasoOP2w ago
am using scrape - json mode
Nalaso
NalasoOP2w ago
Gaurav Chadha
Gaurav Chadha2w ago
Hi @Nalaso your script is incorrect, here's a fixed simple curl version of it you can try to test.
curl -X POST "http://localhost:3002/v2/scrape" \
-H "Content-Type: application/json" \
--connect-timeout 120 \
--max-time 120 \
-d '{
"url": "https://example.com/article",
"onlyMainContent": true,
"formats": [
{
"type": "json",
"schema": {
"type": "object",
"required": [],
"properties": {
"title": {
"type": "string"
},
"author": {
"type": "string"
},
"date": {
"type": "string"
},
"formatted_content": {
"type": "string",
"description": "Main article content with basic formatting, no ads or promotional content, no links to sources or references. Use \\n\\n to separate paragraphs."
},
"summary": {
"type": "string",
"description": "Only the gist of the text as one plain sentence (≤25 words). Convey the single most important idea and why it matters."
},
"primary_tag": {
"type": "string"
},
"secondary_tags": {
"type": "string"
},
"reading_time": {
"type": "string"
},
"images": {
"type": "array",
"items": {
"type": "string"
}
},
"thumbnail": {
"type": "string"
}
}
}
}
]
}'
curl -X POST "http://localhost:3002/v2/scrape" \
-H "Content-Type: application/json" \
--connect-timeout 120 \
--max-time 120 \
-d '{
"url": "https://example.com/article",
"onlyMainContent": true,
"formats": [
{
"type": "json",
"schema": {
"type": "object",
"required": [],
"properties": {
"title": {
"type": "string"
},
"author": {
"type": "string"
},
"date": {
"type": "string"
},
"formatted_content": {
"type": "string",
"description": "Main article content with basic formatting, no ads or promotional content, no links to sources or references. Use \\n\\n to separate paragraphs."
},
"summary": {
"type": "string",
"description": "Only the gist of the text as one plain sentence (≤25 words). Convey the single most important idea and why it matters."
},
"primary_tag": {
"type": "string"
},
"secondary_tags": {
"type": "string"
},
"reading_time": {
"type": "string"
},
"images": {
"type": "array",
"items": {
"type": "string"
}
},
"thumbnail": {
"type": "string"
}
}
}
}
]
}'
this will work, also, please note: you'll need to use a real url.
No description
Nalaso
NalasoOP2w ago
@Gaurav Chadha From your screenshot i could only see data.metadata. Can you confirm if data.json is present in the response? can you try with some techcrunch article url?
Gaurav Chadha
Gaurav Chadha2w ago
@Nalaso in v2, when using structured extraction with formats, the response structure typically looks like this:
{
"success": true,
"data": {
"metadata": { ... },
"json": { ... }
}
}
{
"success": true,
"data": {
"metadata": { ... },
"json": { ... }
}
}
No description
Nalaso
NalasoOP2w ago
@Gaurav Chadha Thanks for the quick reply
Nalaso
NalasoOP2w ago
Firecrawl Docs
JSON mode | Firecrawl
Extract structured data from pages via LLMs
Nalaso
NalasoOP2w ago
Also its data.json not data.data I think if llm extraction fails then response is returned success: true and without data.json
Gaurav Chadha
Gaurav Chadha2w ago
right, I see now, let me check and verify @Nalaso in the .env can you add these two below OPEN_API_KEY OPENAI_BASE_URL=https://api.openai.com/v1 MODEL_NAME=gpt-4o-mini and then restart the container? The above issue is due to missing baseurl and model
Nalaso
NalasoOP2w ago
@Gaurav Chadha Thank you so much This worked

Did you find this page helpful?