Firecrawl

F

Firecrawl

Join builders, developers, users and turn any website into LLM-ready data, enabling developers to power their AI applications with clean, structured information crawled from the web.

Join

❓┃community-help

💬┃general

🛠️┃self-hosting

Does the self-hosted have a rate-limiter?

How could we rate limit crawling/scraping on the self-hosted Firecrawl?

100% of URLs Domain?

Is there any way to make MAP return 100% of a domain's public URLs? I ask this because there are certain domains that return less than half of the URLs.

JSON SCHEMA NOT WORKING

Hi team I am using scrape endpoint for json Schema output but after 50 or more URLs but not more than 80 the API request just hangs I am facing thsi issue from past 2 days. Can someone help me here. Is this a rate limit issue or what because if it is a rate limit issue why I am still facing same issue after 2 days. Thanks....

n8n - looping through 300 records to search each result - 500 error

I am new to n8n + firecrawl. Building a workflow to iterate through 300 entries from json and firecrawl/search each. From n8n I am using generic HTTP POST with this payload: { "query": "{{ $json.name }}, {{$json.city}}, {{$json.state}}", "sources": ["news"],...

invalid JSON schema with python client

I have been unable to send a request with a prompt and schema. I get OK results when I write a simplified schema within the prompt, however passing the class itself would be ideal per the documentation. Code is as follows: ``` class SecondaryAsset(BaseModel):...

Best Firecrawl methods for large recurring tasks

I have another question if you @Gaurav Chadha don't mind.
Without addressing the appropriateness of such tasks, I'd like to do the following using Firecrawl -- 1. do a search for all bid publishing sites for one service 2. do a scrape of each site, extracting bids with a "Current" or "Open" criteria (dealing with pagination if present) 3. do a scrape of each bid, extracting a known schema for each bid...

Firecrawl pro, N8n, Claudecode - upgraded but unsure how to continue.

I have subs to each of those, and i have a list of 7000 urls i need to scrape from nrd.gov - Could someone help suggest the most effecient way to scrape them all, so i can add them to my supabase backend?

Search Query Operators

I am using the /search endpoint , and we're making pretty heavy use of the query operators described here: https://docs.firecrawl.dev/api-reference/endpoint/search#supported-query-operators I just had a quick question around recommended usage or limits. Is there a maximum amount of operators that we can include in a single query? Or is there a recommended limit we should try to stay under for performance reasons? Or is it really just a free for all? Our use case is to exclude a fair number of URLs from our firecrawl queries, and I couldn't find any guidance in the documentation around if it matters if we use 10 query operators, 50 query operators, etc....

New to scraping. Have over 100k pages I need to scrape and maintain.

Been doing some ai research and stuck between browserbase/firecrawl for efficiency and cost consciousness. Bootstrapped out startup, so we’re worried about taking the wrong steps. Any advice? Are there any services that do this for you?

How to execute an API fetch inside a public web page

When I use: var aResult = await app.scrapeUrl( aURL, { ... } ) I get a nice table of many rows. The web page has a form for filtering rows. The form's key value pairs get sent to the web server via an API call.
In DevTools, I see this.ajax( aURL, "GET", { data: { key1: value, key2: value, ... } ) ...

maxDiscoveryDepth

Hello, I would like to ask how does the maxDiscoveryDepth works? Right now I am trying the depth to be two and the limit to be 10 for https://books.toscrape.com/ to test this parameters. I somehow don't get it. Tho the results were like this: ``` "data": [ { "links": [ ... },...

facing an issue here, while scraping one of the blog website on firecrawl

Hey, I am facing an issue here. while scraping one of the blog website, firecrawl still generating main website content. For ex - I am scraping for ||(www.main-brand.com/blog/any-blog-content), but its still giving me scrape content for (www.main-brand.com) only. it seems scraping is not working for any /blog subsection domain. Is there any prerequisite for scraping blog website? Am I missing something here? I have tried all feature correction like zero maxage, stealth mode, main content:false etc. but still no correct response. ...

Disappointing quality of PDF page scraping

We have several documents which contain tables with a lot of relevant information for AI tools. How can I create good markdown files from thes PDF's?

Extract (with fire-agent) taking long time.

Why is extract (with fire-agent) taking so long?

SCRAPE_SITE_ERROR : ERR_TUNNEL_CONNECTION_FAILED

Hello, I have this error on two differents website : {"success":false,"code":"SCRAPE_SITE_ERROR","error":"Specified URL is failing to load in the browser. Error code: ERR_TUNNEL_CONNECTION_FAILED"}. It happens 50% of the time and for the same url, sometimes it's worked, sometimes no. Few months ago, it worked fine....

/crawl robots.txt

Does /crawl respect robots.tx Crawl-delay?

Help Request: /map + search not returning product URLs

I’m using the /v2/map incl. search query endpoint to discover product detail page URLs, but I’ve run into an issue where some PDPs aren’t being returned even though they’re live and accessible on the site. Example Site: https://www.bstgroup.eu Query: "PRF Mouse"...