Firecrawl

F

Firecrawl

Join the community to ask questions about Firecrawl and get answers from other members.

Join

❓┃community-help

💬┃general

🛠️┃self-hosting

"markdown":"### Your browser version is not supported.

Im using the self hosted version and I can scrape/crawl the firecrawl site but when I try and scrape this link, https://airtable.com/developers, I get a response saying my browser version isnt supported. I've tried adding custom headers, but dont know what else may be causing this or how to fix it. Anyone know how to fix this?
No description

Crawl process using n8n cloud

Hi everyone! 👋 I’m trying to figure out how to use crawl on n8n cloud. I managed to set up an HTTP node that can scrape data, but when I try to use the crawl functionality, I keep running into a Bad Request error. Anyone knows any good workflow?...

scraping results only return 1 item from the list of items on the webpage

When I use Firecrawl to scrape a webpage, it's returning only the 1st item in a list. Do I need to tell it to iterate over the list to capture each item in the list as an object? This is what my code looks like: https://gist.github.com/natea/8224320f7cfce8a4d39ed58b300a41bc

Scrape - Lazy Load Images

Hello, Using the scrape endpoint, i am capturing a fullPage screenshot. The screenshot has a lot of blank areas where images should of lazy loaded. ...
No description

Batch Scrape Failure / Success

Hi all, I have a question regarding batch scrapes. When a payload is returned I get the successes back from the API, is there a way to also see the failures and the error message as to why the scrape failed for those failures? The scrapes my organisation is looking at we need to ideally rescrape if a webpage was down but we have no idea of knowing if the website is dead or timed out from the batch...

/map endpoint seems to have an arbitrary limit of ~300 urls

I've attempted to use the /map endpoint for several domains, but I've struggled to get the full list of urls that exist. Even when I use a search paramater, the limit still seems to be about 300. I've tested this both with the Playground and API. For example, if I do the /map endpoint for simplicity.com, it responds with ~300 general urls within that overarching domain (even though I know there are 4000+ urls on the site). If I do the /map endpoint with search term '/butterick/' as a subfolder, I'll continue to get ~300 urls that belong within the '/butterick/' subfolder — however, I know for a fact that there are at least 808 urls (products) within that directory. I've tested with domains including thefoldline.com (302 results) and simplicity.com (308 results) but feel like I'm just burning credits at this point. Given that the domain should be returning A LOT more URLs and the response is consistent regardless of search terms, can someone explain why this is happening?...

Crawl results of url/ vs url/index.html are different

I tried the Crawl function on the playground and also my code by using the SDK. I noticed that if I use the url with index.html at the end, it only crawls the single page. However if I remove the index.html, it crawls the page and all the links in the page. Is this expected? Why they are handled differently? Thanks!...
No description

Help me to scrap whole data from specific url

Hello everybody! Now I am scrapping with Firecrawl whole data including long positions and shor positions from https://hyperdash.info/ticker/BTC page. Of course I can extract data from this page, but not all of them. I can just scrap some of them and everytime, number of scraped positions data are changing. How to solve this?...

Crawling 101 Doubts

Hey so im trying to crawl a website using FireCrawl and my aim is to create a RAG System from the crawled data, So is it possible to get the output after crawl in json format or will it be better if I get the response in markdown itself,which format will be better for RAG System ?

Entire scrape failed due to a link

Ran into this error: requests.exceptions.HTTPError: Unexpected error during start batch scrape job: Status code 400. Bad Request - [{'code': 'custom', 'message': 'This website is no longer supported, please reach out to help@firecrawl.com for more info on how to activate it on your account.', 'path': ['urls', 36]}], first time Im encountering this one. @Adobe.Flash, @mogery would love to know where this went wrong

Erics most recent trend finder repo.

[nodemon] app crashed - waiting for file changes before starting... bdavies@bs-macbook-pro trendFinder % npm run start
trend-finder@1.0.0 start nodemon src/index.ts...

What Software Does Everyone Use With FireCrawl ?

I am a non-techie person running a blog and I need some help. My use case is to crawl multiple webpages on the same topic with FireCrawl and use that information to create one comprehensive article with an LLM. I have been using TypingMind premium version and it used to work about 5 out of 10 times initially. However, since the last one week or so, I always get an error from the LLM like below.
"Claude has rejected your request with error code 429. Here are the possible reasons: 1. You are sending requests too quickly; 2. You have hit your maximum monthly spend (hard limit); 3. The model is currently overloaded. Here is the error message from Claude: This request would exceed your organization’s rate limit of 40,000 input tokens per minute." For context, I have Claude Pro and Paid OPENAI accounts and use these API....

Batch Scrape Delay

Hi there, Might be something very obvious that I am missing but I will ask anyhow. I am trying to play a bit with batch scrape. I use the documentation sample and I am not getting any results with sync and async use. After checking out the IDs that I get (they appear normally on the activity log on the dashboard), they seem to be on status "Scraping" when I check through API. This is already 12 hour past since I sent the request. Is it normal? And if yes, what does the 'sucess': True mean anyhow? {'success': True, 'status': 'scraping', 'total': 0, 'completed': 0, 'creditsUsed': 0, 'expiresAt': '2025-01-02T20:13:18.000Z', 'data': [], 'error': None, 'next': 'https://api.firecrawl.dev/v1/batch/.....'}...

Crawl gets blocked on sites like MSNBC

Is there a work around to the sites like MSNBC being blocked? I get a 403 error when trying to reach them.

cookies after scraping web with Firecrawl

Hello everyone. I’m currently exploring Firecrawl and trying to retrieve cookies from a website after scraping it. Is there a way to get them? With ZenRows, for example, it returns cookies after scraping the site, like this: "Zr-Cookies": "64a4bb21e29e708eed03141ffa38edba=cbf9a029b80c6a705da7fe1e417a5a27; coesessionid=A4798E244C11D9E4A13534A5829D52D8; COE-Tab-ID=b2026dbf-7041-43bb-b1c3-962fe9749d38; COE-Customer=02a82883-6c4e-43ed-b12e-c53bd91d7301","...

scrape's markdown contents different on API vs Playground

When I run a scrape on a URL via the API, I get a significantly larger response size than I do when it is run on the playground. The markdown content can be up to 10 times larger. My API function is just using the standard scrape request config: const response = await fetch(FIRECRAWL_API_URL, {...
No description

crawler in playground not extracting the content of all the sub pages

Hey I'm trying to extract the content of all these subpages and idk why is not working https://feverup.com/es/barcelona/when/this-weekend

WaitFor doesn't seem to work

I use the self-hosted version of Firecrawl in a project where I scrape over 100 websites. I run into an issue with a specific website that uses a lot of Javascript. About half the time the markdown that Firecrawl outputs doesn't contain the information that I'm after (which is being loaded in by Javascript). The rest of the time it works fine and I get all the expected info in the markdown. I tried using a waitFor time of 10000 to force Firecrawl to wait for everything to load. However, I see that half the time it finishes after 6-8 seconds and the info I want is still not in the markdown. Am I misinterpreting the waitFor parameter? Isn't it a minimum wait time? Hope someone can explain this to me.. Thanks! 😅...

The /map api is not working for https://razorpay.com/docs/payments/

Using it on playground. It returns only the base url. the odd thing is 3 days ago I got 150+ urls from above base url with map api. Help would be appreciated

How to fetch estimated delivery dates of products on Amazon.com for different zip codes?

To get an overview about product availability on Amazon.com, I want to fetch the delivery date for multiple ASINs (Amazon Product Identifier). Therefore I am trying to update the size (variation) of the product to then iterate through different zip codes, fetching the date for each. My coding skills are limited but I am trying. What am I missing here with the example? I want to: 1. Open https://www.amazon.com/dp/B0DFH6N4SC 2. Select Size "Large"...
No description