FirecrawlF
Firecrawl7mo ago
Albin

Prevent crawling of .xlsx / non-text/html pages

Is there a way to filter out pages based on content-type header? There is a link in some docs we are crawling, which downloads a .xlsx file.
Firecrawl returns this as raw text, unreadable utf tokens like if you take the .xlsx and change to .txt.
We use formats: ['markdown'], I don't understnad how this is possible

thanks!
Was this page helpful?