Firecrawl

F

Firecrawl

Join the community to ask questions about Firecrawl and get answers from other members.

Join

❓┃community-help

💬┃general

🛠️┃self-hosting

How to transfer all data from Squarespace site to a Lovable Site?

How to transfer all data from Squarespace site to a Lovable Site? someone said i can use firecrawl to do this but im lost

Crawling website to get all data including pagination data

So there is this website, which has company data and which has pagination. initially it displays data for page 1 which lists 15 company names .I used " https://api.firecrawl.dev/v1/scrape " endpoint with prompt to get all company names, it does only give me 15 company names, which are listed in page 1. How to go about fetching compnay names of those page 2,page 3.... . Also tried using "https://api.firecrawl.dev/v1/crawl" endpoint. The response is passed on to my webhook url and i do receive data but it is complete page data. I want only the company name data. This endpoint also does not support prompt, using which i could have got only company names. Please suggest me how to approach such problem....

Searching beyond the 100 page limit

If I want to search beyond the limit of 100 API requests, how can I do this? I'm looking to search for the top 5000 results for my search?

Downloading activity log

Hello, if I want to download all of the markdowns / JSONs of scraped websites (I have over 120,000), is there a way to do this in my activity log without individually downloading each of them?

Error: OnlyMainContent parameter not working in scraping

Hey im trying to scrape these urls "https://www.northeastern.edu/research https://www.northeastern.edu/graduate" to get the main content from the pages. But i get back messy polluted data full of html content and UI/navigation residue that should be prevented by the OnlyMainContent script. What do i need to do to scrape only the main content from the urls? ps: im building a large dataset so using the extract function would be too time consuming....

Error The reqeust URL could not be retrived!

Hey guys were getting this error on most of our scrape data. "Markdown": "# ERROR\n\n## The requested URL could not be retrieved\n\n* * \n\nThe following error was encountered while trying to retrieve the URL: http://superdry.com.au/products/vintage-b-boy-cap-eclipse-navy\n\n> Access Denied.\n\nAccess control configuration prevents your request from being allowed at this time. Please contact your service provider if you feel this is incorrect.\n\nYour cache administrator is webmaster%20Chrome%2F137.0.0.0%20Safari%2F537.36%0D%0AAccept-Language%3A%20en-AU,%20en-US%3Bq%3D0.7%3Bq%3D0.9%0D%0AAccept%3A%20text%2Fhtml,application%2Fxhtml+xml,application%2Fxml%3Bq%3D0.9,image%2Favif,image%2Fwebp,image%2Fapng,%2F%3Bq%3D0.8,application%2Fsigned-exchange%3Bv%3Db3%3Bq%3D0.7%0D%0AAccept-Encoding%3A%20gzip,%20deflate%0D%0AHost%3A%20superdry.com.au%0D%0A%0D%0A%0D%0A).\n\n * *" can you advice on what to do? this is the first time i saw this error. Thank you...

Excluding SVG data URI from the markdown output

I'm getting very long data:image/svg+xml data URIs in the markdown outputs from bulk scrape endpoint. I'm trying to exclude all media using the parameters below: ```json { "removeBase64Images": true,...

Proxy issue

May I know about your proxy? 1. do you only support 11 countries, like in this doc? https://docs.firecrawl.dev/features/proxies 2. and I can only use stealth mode proxy in Br and US?...

Can't access dashboard

After signing in, when I click dashboard the page quickly gets re-direct through the following urls: 1. firecrawl.dev/app 2. firecrawl.dev/signin 3. firecrawl.dev/password_signin...

Scraping data from linked page

Hi everyone, I'm new to Firecrawl and trying to get used to the Crawl functionality. There's a page I'm interested in (https://www.ussportscamps.com/soccer/nike) that has a number of soccer camps listed with hyperlinks. Ideally I want to open each link and retrieve information on each camp like address, cost, etc. Here is an example camp: https://www.ussportscamps.com/soccer/nike/nike-soccer-camp-pima-county-surf Is this possible?...

n8n Integration - Extract + FIRE-1

I am trying to build a db of available rows in a site that is behind auth; pass through an https request, thats not the issue. Im using a community node in n8n (n8n-nodes-firecrawl-scraper). The issue is when using the same prompt in n8n to perform the extract the results are only from the first page of the table, where as when I execute the extract from the site in the playground (with the agent enabled; which might be the difference) its extracts the entire table successfully. Additionally the...

self-hosted /scrape doesn't populate the JSON schema

Running batch/scrape on a local setup using Docker, I'm getting all the metadata, but no actual content. See an example of my results: ``` [{...

`/map`: sitemapOnly is FALSE and still only sitemap.xml is used

I'm running the same /map request both on Firecrawl Cloud, as well as locally (self hosted Docker). Cloud returns 149 links, while my local setup returns 117 links - exactly the amount of links in the sitemap.xml of the website. This is my curl payload: ...

Getting error in self host using extract

Getting error in selfhosting aiohttp.client_exception.ClientError:Failed to parse Firecrawl error response as JSON. Status code :404...
No description

<style> replaced with <link>

Hey all... I'm having an issue with <style> tags being replaced with <link> tags. I'm using formats: ['rawHtml'], and onlyMainContent: false and I've tried a whole bunch of other stuff, but no joy... This is what I'm getting in place of the <style> tags in the output: <link rel="stylesheet" type="text/css" href="cid:css-6f98369c-94e5-4096-a6f6-755b9e5c5aff@mhtml.blink">...

Error when using Firecrawl MCP

Hey guys! How are you? I am Clara from Darwin's Product team. I am trying Firecrawl MCP and I am having an error that maybe you could help me. When I scrape the the web the output of the MCP is that it was unable to extract information(image) but when I check in firecrawl the activity logs I see that the url was screapper correctly. Can you help me understand what is the issue please?
No description

Source URL is not allowed by includePaths/excludePaths

I am trying to crawl this website here, but I noticed I get the same error on other websites if I try to crawl a url with many subpages in the URL. The error being "Source URL is not allowed by includePaths/excludePaths". https://www.dhl.com/se-sv/home/frakt/hjalpcenter-for-europeisk-vag-och-jarnvag/anvandbar-information-och-hamtningsbara-filer.html I get the same error with this URL:...

I am not able to create an account.

Trying to create an account and i keep getting an error message