C#•2y ago

❔ Converting mails with a attached pdf to a txt file using OCR

What I'm creating is a software that when running it, it uses IMAP to access the specified email account and look for emails with a pdf attached to it, and convert that pdf to a txt file by using OCR to read the pdf text. I was wondering if anyone knew how to solve this error:

Processing PDF: output\Utenlandsbetaling-FERJE - OCRDrift-25.10.23 (1).pdf
OCR Process Error: Error in pixReadStream: Pdf reading is not supported
Error in pixRead: pix not read
Error during processing.

Extracted Text:
Error: Failed to extract text from Utenlandsbetaling-FERJE - OCRDrift-25.10.23 (1).pdf

Processing PDF: output\Utenlandsbetaling-FERJE - OCRDrift-25.10.23 (1).pdf
OCR Process Error: Error in pixReadStream: Pdf reading is not supported
Error in pixRead: pix not read
Error during processing.

Extracted Text:
Error: Failed to extract text from Utenlandsbetaling-FERJE - OCRDrift-25.10.23 (1).pdf

This is my C# code in a replit because it was too big to post here (Its also attached in a text file: https://replit.com/@ersor29/CSharp#main.cs

ersor29

replit

CSharp

Run C# code live in your browser. Write and run code in 50+ languages online with Replit, a powerful IDE, compiler, & interpreter.

message.txt

4 Replies

SinFluxx•2y ago

https://tesseract-ocr.github.io/tessdoc/InputFormats.html

tessdoc

Input formats

Tesseract documentation

SinFluxx•2y ago

Guess you'd need the file you're trying to OCR to be one of those file types first

Erik…OP•2y ago

Oh okay. I tried making it using python first, but i forgot im using pytesseract with that, not normal tesseract.... ty

Accord•2y ago

Was this issue resolved? If so, run /close - otherwise I will mark this as stale and this post will be archived until there is new activity.

Gaming

Programming

❔ Converting mails with a attached pdf to a txt file using OCR

Did you find this page helpful?