Firecrawl

F

Firecrawl

Join the community to ask questions about Firecrawl and get answers from other members.

Join

❓┃community-help

💬┃general

🛠️┃self-hosting

How to fetch estimated delivery dates of products on Amazon.com for different zip codes?

To get an overview about product availability on Amazon.com, I want to fetch the delivery date for multiple ASINs (Amazon Product Identifier). Therefore I am trying to update the size (variation) of the product to then iterate through different zip codes, fetching the date for each. My coding skills are limited but I am trying. What am I missing here with the example? I want to: 1. Open https://www.amazon.com/dp/B0DFH6N4SC 2. Select Size "Large"...
No description

I am using docker based and its not working for exclusion or inclusion tags:

{ "url": "https://google.com", "limit": 50,
"formats": ["html"], "onlyMainContent": true,...

Has anyone successfully scraped and aggregated US realtor emails with FireCrawl?

I need help with aggregating HIGH quality realtor emails across the US and Canada to start an email campaign to - please help if you have any background in this and can share lessons!

'Failed to load event' and 2 parallel crawls

See attached image. 1/ I cannot load jobs in the activity logs. 2/ It seems somehow it started 2 crawls almost at the same time, spending a lot of credits. How to avoid this? I'm not sure what happened....
No description

How do I set the amount of pages I want to be crawled? In example only 10 pages on this run?

Is there away I can set the amount of pages from a root url that are crawled?

Instant "Request timed out"

I just signed up, generated a key, tested my query in the playground and it works, but when trying to call via cURL or Postman, I get an instant 408 "Request timed out" response. If I intentionally mangle the post body, then it does return response about the malformed body, or invalid parameters. I've tried from multiple connections (via vpn and without). ...

Iterating over batch result pages (python)

New to the developer world, so excuse me if this is a basic question I should know already. When you call the batch scrape endpoint (synchronously), I want to be able to make a loop that iterates over each page and creates a structured response of all 'extract' data from each page. Does anyone have advice on the best way to do this? I use batch scrape with extract format, using prompt instead of schema as the websites I'm scraping don't all share the same page structure....

Why the worker is not stable

I deploy a firecrawl on my vps, it works and responses very fast in few traffic , but when a few jobs come, the workers are all broken, I don't know how to fix it

Unable to crawl more than the base page of https://www.trustpilot.com/review/huel.com

The page has pagination at the bottom with direct links that tick on "?page=2/3/4/etc" to the end of the URL. Shouldn't Firecrawl pick up on that? My setup: I've tried 'includePaths' : ['page='] and similar variations with no luck. What am I missing?...

issues with /map

Hi. I have some issues with map. 1) These urls do not work I assume the crawler is blocked but I am not sure? - https://www.card-corner.de/ - https://www.cardbuddys.de/...
No description

No markdown content is returned for https://api.sharefile.com

I get back HTML content but the markdown field is completely empty. This happens when I scrape or crawl.

Webhook is intermittent

I've been dabbling with the webhook for scrape and I've noticed that its very intermittent/flaky. I don't get all the events through (started/page/completed), and if I do, I usually just get page events through. At the moment I'm only batching one URL through with plans to expand to more, but I really need this functionality to be consistent/reliable. Is there an issue with webhooks currently? This is how I'm calling: ``` const result = await fetch("https://api.firecrawl.dev/v1/batch/scrape", {...

Local Environment Max Retries Error

I am trying to self-host Firecrawl, but am running into a weird error. Here is my code: ```from firecrawl import FirecrawlApp app = FirecrawlApp(api_key="hi", api_url="http://localhost:3002") #Crawl a website:...

Missing all child pages when crawling

Hi! Any ideas why when scraping e.g. https://www.mt.com/us/en/home/products/Laboratory_Analytics_Browse/Product_Family_Browse_titrators_main.html I don't get any child pages? I run into this problem both in Playground and Python API. When I toggle backward links I get more pages, but that goes back to the top domain - something I'd like to avoid. Page like this https://www.mt.com/us/en/home/products/Laboratory_Analytics_Browse/Product_Family_Browse_titrators_main/karl-fischer-titrators.html seems to be a legit and easy to follow child page. Any tips on what I might be missing? I am on the managed (not self-hosted) plan....

Unable to crawl https://developers.tryprive.com/

It's only returning 1 webpage, when there is clearly a lot more...

I have 3,000 credits. Why I can't crawl more website?

When I use firecrawl to crawl more contents. It throw out an error: data: { error: 'Insufficient credits. You may be requesting with a higher limit than the amount of credits you have left. If not, upgrade your plan at https://firecrawl.dev/pricing or contact us at help@firecrawl.com' } However, I do have enough credits....
No description