Firecrawl•2mo ago

Images in PDF

Hello, when we try to scrape a PDF, it gets image links for correct locations but the links are sadly not working which results on the response in the image. What can we do?

7 Replies

Gaurav Chadha•2mo ago

Hmm, they must use short-term image links when rendering their PDFs. I logged a feature request for Firecrawl to host the images for you and return those instead. I'll let you know if/when we implement it.

OkanOP•2mo ago

Thank you! @micah.stairs We are using one of these plans, not sure but if enterprise can help for that, please let us know, so that we can contact you

Gaurav Chadha•2mo ago

Nothing to do with enterprise! If we build this feature, we would likely include it as a general feature.

OkanOP•2mo ago

Thank you! @micah.stairs great to hear Hello @micah.stairs is there any workaround we can do till it is released? Maybe doing multiple calls etc.

Gaurav Chadha•2mo ago

Hmm I can't think of a good workaround, since the issue is that the site serving the PDF is only using short-lived links to render the images (if I'm understanding the issue correctly). If you share the PDF URL, I will take a closer look to confirm.

OkanOP•2mo ago

Example PDF: the-guild.eu/publications/position-papers/the-guild-s-position-paper-on-the-use-of-ai-in-research_nov2024.pdf THE URL to image: https://www.the-guild.eu/publications/position-papers/images/23dc0b86a720a02fff852d34bed200c17e190396db045212a3403fbc93491909.jpg It is actually happening for any PDF I am trying

Gaurav Chadha•2mo ago

Oh thanks for flagging! We will definitely dig into this.

Gaming

Programming

Images in PDF

Did you find this page helpful?