F
Firecrawl2mo ago
Foxxy

Search w/ JSON

hey everyone! I am trying to test out Firecrawl versus our app's current Perplexity integration, and trying to do a bit of a head to head comparison between sonar-pro w/ structured JSON, and Firecrawl's /search + json schema extraction. First, I am struggling to get search to return ANY json from the search sdk/endpoint. Is this working as of V2? I'm just trying to get any search or scrape result to return json according to my simple testing schema, and neither are working for search OR scrape, I'm using the v2 SDK. The search endpoint seems to completely ignore my json format and is just returning the web source results. The scrape endpoint is returning a "json" section in the response but does not have any of my schema, just looks like a standard json response from Firecrawl. Am I thinking about these endpoints correctly? I am just trying simple web examples and struggling to get JSON mode to work. any help/direction would be appreciated!
5 Replies
micah.stairs
micah.stairs2mo ago
Hey! Did you remember to pass both the prompt and the schema? See https://docs.firecrawl.dev/features/llm-extract for more info. You should be able to call /search and get structured JSON for each of the search results.
Foxxy
FoxxyOP2mo ago
Yes I believe so. This is still not working, or I'm not understanding the search endpoint. Taking the example from that link you provided, and replacing the .scrape(url) with .search("firecrawl") I still do not get the JSON schema for any results, here's the example I just tried to test
// Define schema to extract contents into
const schema = z.object({
company_mission: z.string(),
supports_sso: z.boolean(),
is_open_source: z.boolean(),
is_in_yc: z.boolean()
});

const result = await firecrawl.search("Firecrawl Company", {
scrapeOptions: {
formats: [{
type: "json",
schema: schema
}],
}
});

console.log(result);
}
// Define schema to extract contents into
const schema = z.object({
company_mission: z.string(),
supports_sso: z.boolean(),
is_open_source: z.boolean(),
is_in_yc: z.boolean()
});

const result = await firecrawl.search("Firecrawl Company", {
scrapeOptions: {
formats: [{
type: "json",
schema: schema
}],
}
});

console.log(result);
}
Which just gives me the regular search web results like this:
{
web: [
{
url: 'https://www.firecrawl.dev/',
title: 'Firecrawl - The Web Data API for AI',
description: 'The web crawling, scraping, and search API for AI. Built for scale. Firecrawl delivers the entire internet to AI agents and builders.'
},
...
{
web: [
{
url: 'https://www.firecrawl.dev/',
title: 'Firecrawl - The Web Data API for AI',
description: 'The web crawling, scraping, and search API for AI. Built for scale. Firecrawl delivers the entire internet to AI agents and builders.'
},
...
The /extract endpoint without URLs does essentially what I am trying to do, but it's very slow. Am I misunderstanding how search works? What I am essentially trying to do is use /search to return a single JSON schema based on the context of the search results. I know I could do something like just ask for markdown from /search and then do a secondary call to a different LLM to get the JSON format but I was hoping I could essentially use the search endpoint like an answer engine like "Give me this JSON schema based on your search results"
micah.stairs
micah.stairs2mo ago
Hmm, /search endpoint should support scraping with JSON mode without doing a secondary call. Can you try using the API directly to rule out an SDK bug?
Foxxy
FoxxyOP2mo ago
I am able to recreate the same issue via direct API as well. What I have found with some additional testing, is that v2 endpoints (/scrape, search are the two I tested) neither are working with a provided JSON schema. for example if I provide my scrape format as
const results = await firecrawl.search('business information on the company "Apple, Inc" ', {
limit: 5,
sources: ['web', 'news'],
scrapeOptions: {
formats: [{
type: "json",
schema: ExtractionSchema,
}],
parsers: []
},
location: 'United States'
});
const results = await firecrawl.search('business information on the company "Apple, Inc" ', {
limit: 5,
sources: ['web', 'news'],
scrapeOptions: {
formats: [{
type: "json",
schema: ExtractionSchema,
}],
parsers: []
},
location: 'United States'
});
I technically get the "json" field in all of the responses, but it seems to just be some sort of default scrape schema about information on the page, and nothing to do with the ExtractionSchema zod schema I am actually passing. It gets a bit more interesting when I then try to add the "prompt" to the scrapeOptions. If I then change my JSON format to include a prompt like this
formats: [{
type: "json",
schema: ExtractionSchema,
prompt: "Extract the social media links, addresses, contact information, and hours of operation from the page"
}],
formats: [{
type: "json",
schema: ExtractionSchema,
prompt: "Extract the social media links, addresses, contact information, and hours of operation from the page"
}],
Then I actually get closer to my intended schema, but that is because I am including the fields I want in the prompt, it is still completely ignoring my actual schema that is passed in through the API or the SDK. An additional thing I noticed, is that the "prompt" for the JSON schema seems to lack context from my actual search query. What I mean by this, is my assumption is that the prompting schema would be looking for this information for Apple, Inc but it seems to just extract ANY "social media links, addresses" etc. from the json prompt. So if there's a search result that talks about Microsoft for example, the search result is returning information about micrsoft, and not Apple. (hopefully that makes sense)
micah.stairs
micah.stairs2mo ago
Thanks for sharing these details! We will look into this and I will keep you posted once I have an update for you.

Did you find this page helpful?