Batch PDF Text extraction
Hello,
I'm new to apify and tested your Website Content Crawler which worked great. I downloaded several PDFs in that process which are now stored in a database file on apify.
I can manually extract the text using the PDF Text Extractor for each pdf with the key store link.
However for multiple PDFs that is not efficient.
If I provide a database link or key value link of all the PDFs the pdf extraction states invalid file format.
Is there a way to batch process all these PDFs?
Thank you very much 🙂
2 Replies
conscious-sapphire•15mo ago
We will get back to you soon!
fascinating-indigo•15mo ago
Input can ba an array of URLs. This way, you can process multiple URLs simultaneously:
https://console.apify.com/actors/QbKEOrw6PkLcy4Xms/information/latest/readme#input
However, there's no direct way to retrieve all values at once.
You can try to use Apify API:
https://docs.apify.com/api/v2/#/reference/key-value-stores
or you can access the key-value store via your code :
https://docs.apify.com/sdk/js/reference/class/Actor#openKeyValueStore
https://docs.apify.com/sdk/js/docs/next/guides/result-storage#key-value-store
Simply loop over the keys and then utilize the result as an array of URLs.