I have problems using PDF in a JS project
so I was working on this project and I need to accurately convert pdf to something with structure (I choose markdown for now) so I can analyze it. The result text has been very inaccurate, page numbers would get in, multiple repeated headings like h2 h2 h2
has anyone worked with pdf before? is there a recommended way to make sure pdf structure is kept?
I used this lib: https://github.com/opengovsg/pdf2md
has anyone worked with pdf before? is there a recommended way to make sure pdf structure is kept?
I used this lib: https://github.com/opengovsg/pdf2md
GitHub
A PDF to Markdown converter. Contribute to opengovsg/pdf2md development by creating an account on GitHub.
