Firecrawl•12mo ago

Invalid PDF structure

I'm using the crawl endpoint and one of the URLs it discovered is https://www.gamweb.com/assets/files/lsk.pdf, however, I get a "Invalid PDF structure" error when the page is scraped by FireCrawl. I can see why, since it's webpage with an embedded PDF instead of just a raw PDF as the URL implies. However, I do think that FireCrawl should be able to gracefully handle this.

3 Replies

Adobe.Flash•12mo ago

Hey @micah.stairs , adding this as a GitHub issue. We should def be able to handle it! Thanks for letting us know!

Adobe.Flash•12mo ago

https://github.com/mendableai/firecrawl/issues/839

GitHub

[Feat] Ability to scrape embedded pdfs · Issue #839 · mendableai/fi...

"I'm using the crawl endpoint and one of the URLs it discovered is https://www.gamweb.com/assets/files/lsk.pdf, however, I get a "Invalid PDF structure" error when the page is sc...

micah.stairsOP•12mo ago

Thanks! I subscribed to the issue

Gaming

Programming

Invalid PDF structure

Did you find this page helpful?