Search w/ JSON
hey everyone!
I am trying to test out Firecrawl versus our app's current Perplexity integration, and trying to do a bit of a head to head comparison between sonar-pro w/ structured JSON, and Firecrawl's /search + json schema extraction.
First, I am struggling to get search to return ANY json from the search sdk/endpoint. Is this working as of V2? I'm just trying to get any search or scrape result to return json according to my simple testing schema, and neither are working for search OR scrape, I'm using the v2 SDK.
The search endpoint seems to completely ignore my json format and is just returning the web source results. The scrape endpoint is returning a "json" section in the response but does not have any of my schema, just looks like a standard json response from Firecrawl. Am I thinking about these endpoints correctly? I am just trying simple web examples and struggling to get JSON mode to work. any help/direction would be appreciated!
5 Replies
Hey! Did you remember to pass both the prompt and the schema? See https://docs.firecrawl.dev/features/llm-extract for more info. You should be able to call /search and get structured JSON for each of the search results.
Yes I believe so. This is still not working, or I'm not understanding the search endpoint. Taking the example from that link you provided, and replacing the
.scrape(url)
with .search("firecrawl")
I still do not get the JSON schema for any results, here's the example I just tried to test
Which just gives me the regular search web results like this:
The /extract endpoint without URLs does essentially what I am trying to do, but it's very slow. Am I misunderstanding how search works? What I am essentially trying to do is use /search to return a single JSON schema based on the context of the search results. I know I could do something like just ask for markdown from /search and then do a secondary call to a different LLM to get the JSON format but I was hoping I could essentially use the search endpoint like an answer engine like "Give me this JSON schema based on your search results"Hmm, /search endpoint should support scraping with JSON mode without doing a secondary call.
Can you try using the API directly to rule out an SDK bug?
I am able to recreate the same issue via direct API as well. What I have found with some additional testing, is that v2 endpoints (/scrape, search are the two I tested) neither are working with a provided JSON schema. for example if I provide my scrape format as
I technically get the "json" field in all of the responses, but it seems to just be some sort of default scrape schema about information on the page, and nothing to do with the ExtractionSchema zod schema I am actually passing.
It gets a bit more interesting when I then try to add the "prompt" to the scrapeOptions. If I then change my JSON format to include a prompt like this
Then I actually get closer to my intended schema, but that is because I am including the fields I want in the prompt, it is still completely ignoring my actual schema that is passed in through the API or the SDK.
An additional thing I noticed, is that the "prompt" for the JSON schema seems to lack context from my actual search query. What I mean by this, is my assumption is that the prompting schema would be looking for this information for Apple, Inc but it seems to just extract ANY "social media links, addresses" etc. from the json prompt. So if there's a search result that talks about Microsoft for example, the search result is returning information about micrsoft, and not Apple. (hopefully that makes sense)
Thanks for sharing these details! We will look into this and I will keep you posted once I have an update for you.