JSON schema not working like earlier
I'm on a hobby plan and I was using json schema (v2) method to scrape market research websites. I used to extract information using the payload attached here. Now i'm almost getting no data regarding table of contents from the websites. Is something changed in the API? If not can someone help me with a json schema that firecrawl will respect. I'm facing pressure from the stakeholders as it affects production app.
12 Replies
I was not able to use defs and refs of json schema as playground throws error for that.
Sample response of firecrawl on the above schema:
{
"toc": [
{
"title": "INTRODUCTION",
"children": [
{
"title": "Study Assumptions and Market Definition"
},
{
"title": "Scope of the Study"
}
]
}
]
}
Link that I'm scraping: https://www.mordorintelligence.com/industry-reports/canada-cloud-computing-market
Mordor Intelligence
Canada Cloud Computing Market Size, Trends, Growth & Outlook | 2030
The Canada Cloud Computing Market is expected to reach USD 54.78 billion in 2025 and grow at a CAGR of 17.30% to reach USD 121.65 billion by 2030. Amazon Web Services, Inc, Google LLC, Microsoft Corporation, IBM Corporation and Oracle Corporation are the major companies operating in this market.
Hi @Ayan_Khan the issue is with the payload and the site specifically as the data which is available in the table here - https://www.mordorintelligence.com/industry-reports/canada-cloud-computing-market is embedded as visual HTML content and
/scrape
currently supports only pdf
in parser.Mordor Intelligence
Canada Cloud Computing Market Size, Trends, Growth & Outlook | 2030
The Canada Cloud Computing Market is expected to reach USD 54.78 billion in 2025 and grow at a CAGR of 17.30% to reach USD 121.65 billion by 2030. Amazon Web Services, Inc, Google LLC, Microsoft Corporation, IBM Corporation and Oracle Corporation are the major companies operating in this market.
To fix: you can your request to include a prompt alongside your schema to extract table data explicitly.
Here's the sample updated payload:
This will scrape the data from the table.

Also, will check with the team if we can add direct support to
html
in scrape parser... cc @micah.stairs@Gaurav Chadha Hello thanks for replying! Earlier it was working fine idk why it stopped working. I have used the same payload for about a week or two. Main table that we are interested is this one. I have provided it with a recursive schema but firecrawl is not following it at all. I already raised the same issue on support via mail and have shared the sample code , schema and response with micah.

It is not working properly on other market research websites as well.
I have also tried adding the prompt earlier in my debugging attempts but it did not help.
Can you please share other URLs you've tested? And the above schema I shared is also not working for you?
@Gaurav Chadha Gaurav the schema that you have provided is okay. My main concern is with recursive table of content that i want from any market research website that user will select.
Here are all the details which i sent to micah via mail.
@Ayan_Khan thanks for sharing this, I see the issue now, a days ago a change was made which caused this issue to not support
$ref
in schema due to which you're recursive schema was not working.
I've added a PR fix to handle this: https://github.com/firecrawl/firecrawl/pull/2238GitHub
fix: handle
$ref
for recursive schema validation by Chadha93 · P...ref: https://discord.com/channels/1226707384710332458/1421817976566452234
Summary by cubic
Fix handling of $ref in schema normalization and validation to support recursive JSON Schemas. Prevents ...
Thanks @Gaurav Chadha . Really appreciate your efforts!