Crawlee & Apify•15mo ago

Batch PDF Text extraction

Hello, I'm new to apify and tested your Website Content Crawler which worked great. I downloaded several PDFs in that process which are now stored in a database file on apify. I can manually extract the text using the PDF Text Extractor for each pdf with the key store link. However for multiple PDFs that is not efficient. If I provide a database link or key value link of all the PDFs the pdf extraction states invalid file format. Is there a way to batch process all these PDFs? Thank you very much 🙂

2 Replies

conscious-sapphire•15mo ago

We will get back to you soon!

fascinating-indigo•15mo ago

Input can ba an array of URLs. This way, you can process multiple URLs simultaneously: https://console.apify.com/actors/QbKEOrw6PkLcy4Xms/information/latest/readme#input However, there's no direct way to retrieve all values at once. You can try to use Apify API: https://docs.apify.com/api/v2/#/reference/key-value-stores or you can access the key-value store via your code : https://docs.apify.com/sdk/js/reference/class/Actor#openKeyValueStore https://docs.apify.com/sdk/js/docs/next/guides/result-storage#key-value-store Simply loop over the keys and then utilize the result as an array of URLs.

Gaming

Programming

Batch PDF Text extraction

Did you find this page helpful?