Firecrawl

F

Firecrawl

Join builders, developers, users and turn any website into LLM-ready data, enabling developers to power their AI applications with clean, structured information crawled from the web.

Join

❓┃community-help

💬┃general

🛠️┃self-hosting

Debugging /map endpoint

I'm looking to map the URLs on this Intercom-hosted support center: https://intercom.help/autods---new/en/ I'd expect 49 articles, based on the website's table of contents, but the map endpoint is only returning two results when I run it in the playground. I have tried ignoring the sitemap and gotten the same results....

Different to actual HTML despite scraping rawHtml

It looks like scraping with rawHtml format provides slightly different HTML body to actual HTML (as given by curl or a web browser). The docs say that rawHtml is "with no modifications" (see https://docs.firecrawl.dev/features/scrape#scrape-formats) but that doesn't seem to be the case. It looks modified. For example, scraping https://www.example.com with these options gives a different HTML:...

Retry failed pages during batchscrape[FIRECRAWL SDK]

Is there a way to retry failed pages in batchscrape? i'm using the webhook method with batchscrape in firecrawl sdk and only the successful scrapped pages are returned with batchscrape.page. am i missing something here? i can see in dashboard we are actually logging the failed pages but i don't think its being returned to webhook.

Can't retrieve results from a crawl job

client.get_crawl_status(crawl_job_id) is stuck since 25 minutes. Also tried to download the results from the website UI but also seems to be stuck although job is marked as completed. JOB_ID = 019acae3-c1ea-712d-a07f-8f0bdd3e127f...

Question about deactivating YouTube transcript in Python SDK

Hi, is there a way to disable YouTube transcripts when using the Python SDK? I’m having issues with long videos because the transcript consumption is too large, and I’m getting this warning: “The extraction content would have used more tokens (198238) than the maximum we allow (120000). The input has been

Student Program Not Accessable

Hey guys, i just wanted to join the student-program to get the 20.000 credits but my academic mail adress wont be accepted. Where can i request the credits manually? Thanks for helping 🙂...

[N8N] Custom Body Not Being Applied in n8n Firecrawl Node

Hi I'm using the Firecrawl node in n8n with "Use Custom Body" enabled, but it appears that my custom body configuration is not being applied at all. Here's my custom body:...
No description

Skip JS rendering and get raw content

Hi, it seems like Firecrawl cloud always runs JS rendering. Is there a way to skip/disable it per request? Also, if a XML/RSS document URL is requested and formats is set to only [ "rawHtml" ] then the response content is enclosed in HTML instead of raw XML. For example: ```...

Guidance for getting screenshot when using /scrape

We're finding that screenshots often have blank spaces, especially with dynamically rendered sites. A few questions: - Any general tips for getting better screenshots via /scrape? - Should we be adding delays (scroll delay, pre-screenshot delay, etc.)?...

Open Lovable

I cant seem to get vercel to generate a OIDC key anytime im trying to scrape a website. I keep getting the error "Invalid Vercel OIDC token: Cannot read properties of undefined (reading 'replace')" Can anyone helpz?...

Is there a way to set the "theme" when scraping?

I'm using the /scrape endpoint for "branding" and the "screenshot" formats. For these, Firecrawl seems to default to the "light" theme when browsing sites. However, I want the browser to be set to the "dark" theme. Is this possible?...

The response does not return all of the products on the page.

By making an API call to this address https://www.svapoebasta.com/sa1_207/, I get a maximum of 30 products. The JSON for the call is as follows: ```json {...

Crawl endpoint missing a page that is both in sitemap and linked

We're crawling a site with allowSubdomains set to true and sitemap set to "include" in the request, but still noticing many pages aren't getting pulled. In the past, if we didn't include sitemap, this would be because a page is orphaned so the recursive crawl can't grab it because nothing links to it. However, with sitemap included, shouldn't this be solved?

Firecrawl Crawl Issue - n8n Integration

Issue: Firecrawl crawl via n8n node (@mendable/n8n-nodes-firecrawl v1) has three problems: Initial API error: First 2 attempts fail with scrapeOptions.formats validation error (expected array, received object), even though my config only has default empty headers. After 2-3 retries it starts. Rate limiting: Hitting "too many requests" on free tier despite delay: 1000, maxConcurrency: 5, and batching....

JSON result is incomplete

I've been trying to scrape <www.sfrestaurantweek.com/restaurants/>, but I'm unable to get all the data when specifying a JSON schema. If I switch it to markdown, it'll get all the data. I've just been running it via the playground: 1. Markdown...

Popup in Screenshot

Hi! I’m currently using the FireCrawl node in n8n, and it works great overall. However, I’ve noticed that when scraping some URLs, the extracted data is fine but the screenshots often include pop-ups (like cookie banners or subscription modals). Is there any way to remove or block pop-ups before taking the screenshot?...
No description

Is there any method to crawl the next pages

Here is my current current crawl setting crawl = firecrawl.crawl( url=i, max_discovery_depth=4, scrape_options={"formats": ["html"]},...
No description

Custom Web Crawling Result Format

Is there any method to customize the web crawling result. For example, if the output format is markdown, can I skip the image urls / hyperlinks in markdown ? Besides, I also wonder the difference between raw HTML and HTML in result

reCAPTCHA token is required

When I try to scrape using the paid version of firecrawl I get the message "reCAPTCHA token is required" But there's no mention of this error message in the doc or on the web, what's happening?...
No description

Version 2 Question

What is the differences between "web' and "news" under sources key inside the payload?
Next