JSON schema not working like earlier

I'm on a hobby plan and I was using json schema (v2) method to scrape market research websites. I used to extract information using the payload attached here. Now i'm almost getting no data regarding table of contents from the websites. Is something changed in the API? If not can someone help me with a json schema that firecrawl will respect. I'm facing pressure from the stakeholders as it affects production app.
12 Replies
Ayan_Khan
Ayan_KhanOP5d ago
I was not able to use defs and refs of json schema as playground throws error for that.
Ayan_Khan
Ayan_KhanOP5d ago
Sample response of firecrawl on the above schema: { "toc": [ { "title": "INTRODUCTION", "children": [ { "title": "Study Assumptions and Market Definition" }, { "title": "Scope of the Study" } ] } ] } Link that I'm scraping: https://www.mordorintelligence.com/industry-reports/canada-cloud-computing-market
Mordor Intelligence
Canada Cloud Computing Market Size, Trends, Growth & Outlook | 2030
The Canada Cloud Computing Market is expected to reach USD 54.78 billion in 2025 and grow at a CAGR of 17.30% to reach USD 121.65 billion by 2030. Amazon Web Services, Inc, Google LLC, Microsoft Corporation, IBM Corporation and Oracle Corporation are the major companies operating in this market.
Gaurav Chadha
Gaurav Chadha5d ago
Hi @Ayan_Khan the issue is with the payload and the site specifically as the data which is available in the table here - https://www.mordorintelligence.com/industry-reports/canada-cloud-computing-market is embedded as visual HTML content and /scrape currently supports only pdf in parser.
Mordor Intelligence
Canada Cloud Computing Market Size, Trends, Growth & Outlook | 2030
The Canada Cloud Computing Market is expected to reach USD 54.78 billion in 2025 and grow at a CAGR of 17.30% to reach USD 121.65 billion by 2030. Amazon Web Services, Inc, Google LLC, Microsoft Corporation, IBM Corporation and Oracle Corporation are the major companies operating in this market.
Gaurav Chadha
Gaurav Chadha5d ago
To fix: you can your request to include a prompt alongside your schema to extract table data explicitly. Here's the sample updated payload:
Gaurav Chadha
Gaurav Chadha5d ago
{
"url": "https://www.mordorintelligence.com/industry-reports/canada-cloud-computing-market",
"onlyMainContent": true,
"maxAge": 172800000,
"parsers": ["pdf"],
"formats": [
{
"type": "json",
"prompt": "Extract any table(s) showing Drivers Impact Analysis from the page. For each driver, extract: driver name, percentage impact on CAGR, geographic relevance, and impact timeline. Also extract the summary and overview sections as before.",
"schema": {
"type": "object",
"required": ["drivers_impact_analysis", "summary", "overview"],
"properties": {
"drivers_impact_analysis": {
"type": "array",
"items": {
"type": "object",
"required": ["driver", "impact_on_cagr", "geographic_relevance", "impact_timeline"],
"properties": {
"driver": { "type": "string" },
"impact_on_cagr": { "type": "string" },
"geographic_relevance": { "type": "string" },
"impact_timeline": { "type": "string" }
}
}
},
"summary": {
"type": "object",
"required": [
"base_year",
"base_revenue",
"base_revenue_unit",
"forecast_year",
"forecast_revenue",
"forecast_revenue_unit",
"cagr"
],
"properties": {
"base_year": { "type": "number" },
"base_revenue": { "type": "number" },
"base_revenue_unit": { "type": "string" },
"forecast_year": { "type": "number" },
"forecast_revenue": { "type": "number" },
"forecast_revenue_unit": { "type": "string" },
"cagr": { "type": "number" }
}
},
"overview": { "type": "string" }
}
}
}
]
}
{
"url": "https://www.mordorintelligence.com/industry-reports/canada-cloud-computing-market",
"onlyMainContent": true,
"maxAge": 172800000,
"parsers": ["pdf"],
"formats": [
{
"type": "json",
"prompt": "Extract any table(s) showing Drivers Impact Analysis from the page. For each driver, extract: driver name, percentage impact on CAGR, geographic relevance, and impact timeline. Also extract the summary and overview sections as before.",
"schema": {
"type": "object",
"required": ["drivers_impact_analysis", "summary", "overview"],
"properties": {
"drivers_impact_analysis": {
"type": "array",
"items": {
"type": "object",
"required": ["driver", "impact_on_cagr", "geographic_relevance", "impact_timeline"],
"properties": {
"driver": { "type": "string" },
"impact_on_cagr": { "type": "string" },
"geographic_relevance": { "type": "string" },
"impact_timeline": { "type": "string" }
}
}
},
"summary": {
"type": "object",
"required": [
"base_year",
"base_revenue",
"base_revenue_unit",
"forecast_year",
"forecast_revenue",
"forecast_revenue_unit",
"cagr"
],
"properties": {
"base_year": { "type": "number" },
"base_revenue": { "type": "number" },
"base_revenue_unit": { "type": "string" },
"forecast_year": { "type": "number" },
"forecast_revenue": { "type": "number" },
"forecast_revenue_unit": { "type": "string" },
"cagr": { "type": "number" }
}
},
"overview": { "type": "string" }
}
}
}
]
}
This will scrape the data from the table.
No description
Gaurav Chadha
Gaurav Chadha5d ago
Also, will check with the team if we can add direct support to html in scrape parser... cc @micah.stairs
Ayan_Khan
Ayan_KhanOP4d ago
@Gaurav Chadha Hello thanks for replying! Earlier it was working fine idk why it stopped working. I have used the same payload for about a week or two. Main table that we are interested is this one. I have provided it with a recursive schema but firecrawl is not following it at all. I already raised the same issue on support via mail and have shared the sample code , schema and response with micah.
No description
Ayan_Khan
Ayan_KhanOP4d ago
It is not working properly on other market research websites as well. I have also tried adding the prompt earlier in my debugging attempts but it did not help.
Gaurav Chadha
Gaurav Chadha4d ago
Can you please share other URLs you've tested? And the above schema I shared is also not working for you?
Ayan_Khan
Ayan_KhanOP3d ago
@Gaurav Chadha Gaurav the schema that you have provided is okay. My main concern is with recursive table of content that i want from any market research website that user will select. Here are all the details which i sent to micah via mail.
Gaurav Chadha
Gaurav Chadha3d ago
@Ayan_Khan thanks for sharing this, I see the issue now, a days ago a change was made which caused this issue to not support $ref in schema due to which you're recursive schema was not working. I've added a PR fix to handle this: https://github.com/firecrawl/firecrawl/pull/2238
GitHub
fix: handle $ref for recursive schema validation by Chadha93 · P...
ref: https://discord.com/channels/1226707384710332458/1421817976566452234 Summary by cubic Fix handling of $ref in schema normalization and validation to support recursive JSON Schemas. Prevents ...
Ayan_Khan
Ayan_KhanOP2d ago
Thanks @Gaurav Chadha . Really appreciate your efforts!

Did you find this page helpful?