Invalid PDF structure

I'm using the crawl endpoint and one of the URLs it discovered is https://www.gamweb.com/assets/files/lsk.pdf, however, I get a "Invalid PDF structure" error when the page is scraped by FireCrawl. I can see why, since it's webpage with an embedded PDF instead of just a raw PDF as the URL implies. However, I do think that FireCrawl should be able to gracefully handle this.
3 Replies
Adobe.Flash
Adobe.Flash12mo ago
Hey @micah.stairs , adding this as a GitHub issue. We should def be able to handle it! Thanks for letting us know!
Adobe.Flash
Adobe.Flash12mo ago
GitHub
[Feat] Ability to scrape embedded pdfs · Issue #839 · mendableai/fi...
"I'm using the crawl endpoint and one of the URLs it discovered is https://www.gamweb.com/assets/files/lsk.pdf, however, I get a "Invalid PDF structure" error when the page is sc...
micah.stairs
micah.stairsOP12mo ago
Thanks! I subscribed to the issue

Did you find this page helpful?