F
Firecrawl4mo ago
Albin

Prevent crawling of .xlsx / non-text/html pages

Is there a way to filter out pages based on content-type header? There is a link in some docs we are crawling, which downloads a .xlsx file. Firecrawl returns this as raw text, unreadable utf tokens like if you take the .xlsx and change to .txt. We use formats: ['markdown'], I don't understnad how this is possible thanks!
2 Replies
Albin
AlbinOP4mo ago
anyone here from @Firecrawl Team
I am paying customer, is there a better way to get support?
Adobe.Flash
Adobe.Flash4mo ago
Hey @Albin best way is to email help@firecrawl.com We can help you there for sure! I don't believe we have a way to filter out based on the content-type yet But that's a great idea!

Did you find this page helpful?