Crawlling pdf page does not retrieve entire content
I recently noticed that your system now supports scraping PDF pages — that’s great! However, I ran into an issue while testing it.
I’m trying to crawl the following URL: https://www.druva.com/documents/l4-cyber-investigations.pdf
Some content appears to be missing from the first paragraph in the extracted result.
I’ll send the result below for reference.
2 Replies
Hey! We are about to release some large improvements to our PDF parser. I will let you know once that has rolled out.