Prevent crawling of .xlsx / non-text/html pages
Is there a way to filter out pages based on content-type header? There is a link in some docs we are crawling, which downloads a .xlsx file.
Firecrawl returns this as raw text, unreadable utf tokens like if you take the .xlsx and change to .txt.
We use formats: ['markdown'], I don't understnad how this is possible
thanks!
2 Replies
anyone here from @Firecrawl Team
I am paying customer, is there a better way to get support?
I am paying customer, is there a better way to get support?
Hey @Albin best way is to email help@firecrawl.com
We can help you there for sure!
I don't believe we have a way to filter out based on the content-type yet
But that's a great idea!