C
C#8mo ago
Eri…..

❔ Converting mails with a attached pdf to a txt file using OCR

What I'm creating is a software that when running it, it uses IMAP to access the specified email account and look for emails with a pdf attached to it, and convert that pdf to a txt file by using OCR to read the pdf text. I was wondering if anyone knew how to solve this error:
Processing PDF: output\Utenlandsbetaling-FERJE - OCRDrift-25.10.23 (1).pdf
OCR Process Error: Error in pixReadStream: Pdf reading is not supported
Error in pixRead: pix not read
Error during processing.

Extracted Text:
Error: Failed to extract text from Utenlandsbetaling-FERJE - OCRDrift-25.10.23 (1).pdf
Processing PDF: output\Utenlandsbetaling-FERJE - OCRDrift-25.10.23 (1).pdf
OCR Process Error: Error in pixReadStream: Pdf reading is not supported
Error in pixRead: pix not read
Error during processing.

Extracted Text:
Error: Failed to extract text from Utenlandsbetaling-FERJE - OCRDrift-25.10.23 (1).pdf
This is my C# code in a replit because it was too big to post here (Its also attached in a text file: https://replit.com/@ersor29/CSharp#main.cs
ersor29
replit
CSharp
Run C# code live in your browser. Write and run code in 50+ languages online with Replit, a powerful IDE, compiler, & interpreter.
4 Replies
SinFluxx
SinFluxx8mo ago
Guess you'd need the file you're trying to OCR to be one of those file types first
Eri…..
Eri…..8mo ago
Oh okay. I tried making it using python first, but i forgot im using pytesseract with that, not normal tesseract.... ty
Accord
Accord8mo ago
Was this issue resolved? If so, run /close - otherwise I will mark this as stale and this post will be archived until there is new activity.