Skip JS rendering and get raw content
Hi, it seems like Firecrawl cloud always runs JS rendering. Is there a way to skip/disable it per request?
Also, if a XML/RSS document URL is requested and
formats is set to only [ "rawHtml" ] then the response content is enclosed in HTML instead of raw XML. For example:
Is there a way to return raw XML instead of HTML, and if not what's the best way to extract the XML (decoded) from the returned HTML?
Thank you.9 Replies
You can try fastMode, which can skip JS rendering when it doesn't execute JS - https://docs.firecrawl.dev/features/fast-scraping#faster-scraping
otherwise, you'll have to parse the HTML and then extract the text content
Hey! I also noticed another way to do the same.
By setting waitFor parameter to 0.
eg:
reference: https://docs.firecrawl.dev/advanced-scraping-guide#wait-for-page-readiness-waitfor
Firecrawl Docs
Advanced Scraping Guide | Firecrawl
Learn how to improve your Firecrawl scraping with advanced options.
Thanks Gaurav, it appears that fastMode uses a cache of the last successful response, so it may already have JS rendered content. Anyhow, is the fastMode cache global (shared among all users) or isolated (per-account)?
If you want Firecrawl to do a fresh scrape, just pass maxAge=0
Our cache is global but it doesn't have anything to do with fastMode
Hi @ash4cord! Firecrawl uses rendering to handle dynamic sites, and there isn’t a documented option to disable rendering. For a fully static site, you can just fetch the page yourself with a simple HTTP request.
If the XML is wrapped inside HTML, parse the HTML first and extract the XML from the right element. In Node.js, Cheerio is a good choice for that.
Thanks, looks like the default value of
waitFor is 0 (see https://docs.firecrawl.dev/api-reference/endpoint/scrape#body-wait-for) so it won't wait anyway which is fine, but if default wait time is zero then why does the website say "Firecrawl intelligently waits for content to load" (see https://www.firecrawl.dev/#:~:text=Firecrawl%20intelligently%20waits%20for%20content%20to%20load). Ideally it should wait for an event like DOM content loaded and network-idle (like in Puppeteer). Any clarification on this?Firecrawl - The Web Data API for AI
Firecrawl - The Web Data API for AI
The web crawling, scraping, and search API for AI. Built for scale. Firecrawl delivers the entire internet to AI agents and builders. Clean, structured, and ready to reason with.
Firecrawl Docs
Scrape - Firecrawl Docs
Oh, waitFor just adds extra wait time. We already have a "smart wait" feature on our side.
Thanks for clarifying. Maybe the docs can mention that waitFor adds extra wait time on top of "smart wait" time.
Good idea! Done: https://github.com/firecrawl/firecrawl-docs/pull/168