Extract PDF Formatting

I'm building a web app (using t3 stack) that prompts the user for about ~100 page PDF and that PDF is made up of multiple sections (20-30) each will have it's own page. On each section page, I want to display the full text of that section with the proper formatting (text positioning, bold, underline, etc...). I can isolate each section by identifying the section title using a regex pattern (i.e. Section 1, Section 2, etc...). The problem I am running into is when I attempt to extract the text from the pdf I lose all the formatting. Is there a way to also extract the formatting of the text or if there's a way to render just a specific section of the pdf instead of rendering a specific page or the whole pdf. Note: I still want the text to highlightable by the user, so turning it into a canvas/image won't work.
0 Replies
No replies yetBe the first to reply to this messageJoin
Want results from more Discord servers?
Add your server