Skip JS rendering and get raw content

Hi, it seems like Firecrawl cloud always runs JS rendering. Is there a way to skip/disable it per request? Also, if a XML/RSS document URL is requested and formats is set to only [ "rawHtml" ] then the response content is enclosed in HTML instead of raw XML. For example:
{
"rawHtml": "<html><head><meta http-equiv=\"Content-Type\" content=\"text/html; charset=UTF-8\"><meta name=\"color-scheme\" content=\"light dark\"></head><body><pre style=\"word-wrap: break-word; white-space: pre-wrap;\">&lt;?xml version=\"1.0\" encoding=\"UTF-8\"?&gt;\n</pre></body></html>"
}
{
"rawHtml": "<html><head><meta http-equiv=\"Content-Type\" content=\"text/html; charset=UTF-8\"><meta name=\"color-scheme\" content=\"light dark\"></head><body><pre style=\"word-wrap: break-word; white-space: pre-wrap;\">&lt;?xml version=\"1.0\" encoding=\"UTF-8\"?&gt;\n</pre></body></html>"
}
Is there a way to return raw XML instead of HTML, and if not what's the best way to extract the XML (decoded) from the returned HTML? Thank you.
9 Replies
Gaurav Chadha
Gaurav Chadha2w ago
You can try fastMode, which can skip JS rendering when it doesn't execute JS - https://docs.firecrawl.dev/features/fast-scraping#faster-scraping otherwise, you'll have to parse the HTML and then extract the text content
Firecrawl Docs
Faster Scraping | Firecrawl
Speed up your scrapes by 500% with the maxAge parameter
jit
jit2w ago
Hey! I also noticed another way to do the same. By setting waitFor parameter to 0. eg:
const scrapeResult = await firecrawl.scrape('https://example.com', {
formats: ['html'],
waitFor: 0
});
const scrapeResult = await firecrawl.scrape('https://example.com', {
formats: ['html'],
waitFor: 0
});
reference: https://docs.firecrawl.dev/advanced-scraping-guide#wait-for-page-readiness-waitfor
Firecrawl Docs
Advanced Scraping Guide | Firecrawl
Learn how to improve your Firecrawl scraping with advanced options.
ash4cord
ash4cordOP2w ago
Thanks Gaurav, it appears that fastMode uses a cache of the last successful response, so it may already have JS rendered content. Anyhow, is the fastMode cache global (shared among all users) or isolated (per-account)?
Gaurav Chadha
Gaurav Chadha2w ago
If you want Firecrawl to do a fresh scrape, just pass maxAge=0 Our cache is global but it doesn't have anything to do with fastMode
Himanshu
Himanshu2w ago
Hi @ash4cord! Firecrawl uses rendering to handle dynamic sites, and there isn’t a documented option to disable rendering. For a fully static site, you can just fetch the page yourself with a simple HTTP request. If the XML is wrapped inside HTML, parse the HTML first and extract the XML from the right element. In Node.js, Cheerio is a good choice for that.
ash4cord
ash4cordOP2w ago
Thanks, looks like the default value of waitFor is 0 (see https://docs.firecrawl.dev/api-reference/endpoint/scrape#body-wait-for) so it won't wait anyway which is fine, but if default wait time is zero then why does the website say "Firecrawl intelligently waits for content to load" (see https://www.firecrawl.dev/#:~:text=Firecrawl%20intelligently%20waits%20for%20content%20to%20load). Ideally it should wait for an event like DOM content loaded and network-idle (like in Puppeteer). Any clarification on this?
Firecrawl - The Web Data API for AI
Firecrawl - The Web Data API for AI
The web crawling, scraping, and search API for AI. Built for scale. Firecrawl delivers the entire internet to AI agents and builders. Clean, structured, and ready to reason with.
Gaurav Chadha
Gaurav Chadha2w ago
Oh, waitFor just adds extra wait time. We already have a "smart wait" feature on our side.
ash4cord
ash4cordOP2w ago
Thanks for clarifying. Maybe the docs can mention that waitFor adds extra wait time on top of "smart wait" time.

Did you find this page helpful?