Firecrawl•4mo ago

Prevent crawling of .xlsx / non-text/html pages

Is there a way to filter out pages based on content-type header? There is a link in some docs we are crawling, which downloads a .xlsx file. Firecrawl returns this as raw text, unreadable utf tokens like if you take the .xlsx and change to .txt. We use formats: ['markdown'], I don't understnad how this is possible thanks!

2 Replies

AlbinOP•4mo ago

anyone here from @Firecrawl Team
I am paying customer, is there a better way to get support?

Adobe.Flash•4mo ago

Hey @Albin best way is to email help@firecrawl.com We can help you there for sure! I don't believe we have a way to filter out based on the content-type yet But that's a great idea!

Gaming

Programming

Prevent crawling of .xlsx / non-text/html pages

Did you find this page helpful?