Firecrawl

F

Firecrawl

Join the community to ask questions about Firecrawl and get answers from other members.

Join

❓┃community-help

💬┃general

🛠️┃self-hosting

Help Needed with Scraping Website Behind Anti-Bot Protection!

I've been trying to scrape this website: https://de.pandora.net/de/charms-armbander/charms/charms-mit-anhanger/bicolor-fahrrad-mit-drehenden-radern-charm-anhanger/763354C01.html The script works perfectly on my local playground setup, but when I move it to a self-hosted environment, it fails to scrape. I’ve also added a proxy server to bypass any unauthorized access issues, but I still can't get it to work. Here's the error message I'm encountering:...

Incomplete scraping

I'm trying to capture links from a page's table of contents. I can use pyppeteer with chromium and it returns the entire page contents, but when I use the scrape and crawl, it doesn't. I've tried both the firecrawl API and the self-hosted, neither one provides a complete response. it doesn't seem like playwright is even being used, but the logging doesn't work very well once everything is running, so I'm having a hard time troubleshooting. I know my playwright service is up, but it doesn't even seem to be used. I have the waitFor page option enabled in my scrape query, but it still doesn't seem to use playwright. https://docs.venafi.com/Docs/currentSDK/TopNav/Content/SDK/WebSDK/r-SDK-CertificatesModuleProgramming-Interfaces.php?tocpath=Web%20SDK%20REST%7CCertificate%20endpoints%20for%20TLS%7CCertificates%20API%7C_____0...

Error: Supabase Client Not Configured in Self-Hosted Firecrawl

I'm currently self-hosting Firecrawl and running into an issue with configuring the Supabase client. I'm using the example code from the documentation: import uuid from firecrawl.firecrawl import FirecrawlApp ...

Support dynamic content?

Hi, is it possible in the moment? Thanks.

(YC W24) Inconsistent crawl results between prod and local

I'm trying to test out crawling https://fanfiction.net with a script locally before I switch to the Firecrawl API, however I'm getting different results. Locally I am just running Firecrawl with the docker setup: docker compose up from the SELF_HOST.md instructions and default .env variables with no DB. I am initiating the crawl sequence with the following command and a 200 response is returned with the jobId:...

Scraping openai . com

Can't seem to scrape any data from openai urls for example: https://platform.openai.com/docs/models any work arounds?...

SSL url3 lib

I'm using the following example to test the API but I get the shown error. Any ideas? https://docs.firecrawl.dev/api-reference/endpoint/crawl...
No description

Unauthorized - even after adding the valid token

Has anyone else experienced this, I am getting a 401 error even after adding a valid token. The same token was working a while ago.
No description

How can I use LLM Extract?

Any docs I can look into?

is FireEngine live?

Just saw the blog post about the new system. Is this live today or has it already been deployed?

api updates

hi, we are developing a system with your api and we want auto update our requests to the api when the documentation is updated, is it possible?

Cloudflare blocking scrape

I'm running into an issue with the /scrape endpoint. It's been running into Cloudflare verification. Can Firecrawl get passed this?
No description

Can I Retrieve Data from an SPA Website?

I am trying to retrieve information from a website built with SPA. However, my attempts have resulted in empty responses. Does your current service support data extraction from SPA websites? If so, could you provide guidance on how to achieve this?

Support

Shit, i thought this meant 100,000 scrapes = 100,000 front page websites scraped 🤦‍♂️ 🤦‍♂️ 🤦‍♂️ I'm using an http api integration to Clay. Any solution to this? Was just planning on uploading a video tutorial on firecrawl on my linkedin https://www.linkedin.com/in/codycarnes, but if it cant scrape one page only, big waste of creds on my end imo...
No description

Extraction of data from PDF files

Hi team @Adobe.Flash @rafaelmiller , Do we have the data extraction feature from PDF files (.pdf) using the FireCrawl Service in place at the moment?...

Crawl job failed or was stopped. Status: Stuck

Is anyone seeing this now? File "C:\Python312\Lib\site-packages\firecrawl\firecrawl.py", line 288, in _monitor_job_status raise Exception(f'Crawl job failed or was stopped. Status: {status_data["status"]}') Exception: Crawl job failed or was stopped. Status: stuck...

Can i llm_extract from already scraped markdown seperately?

Is there a a way to seperately use the llm_extract method?

Use Azure Openai for llm extraction

Hi I am using a local instance of firecrawl and tried the extraction function using an Openai model and it worked. I tried to use azure openai by setting up the base url and api key of my azure openai function but I am getting an error(Internal Server Error: Failed to scrape URL. Invalid status code: undefined). How do I resolve this? I want to use the azure instance as my employer pays for it.

Scraping yelp.com

Hey, just wondering how proficient is Firecrawl at scraping yelp.com? I’m developing a product that relies on scraping businesses profile details from ‘yep.com/biz/{x}’ pages. Firecrawl works when I test it on a few pages, but I’m worried about it getting blocked by Captcha when deployed at scale. I’m not so worried about being IP blocked, as it seems that Firecrawl handles this....