HTML Chunking via Mastra rag Mdocument

I want a simple example of it, in the current example it is really basic and not real world. I did this code and it does not work, chunks are too big. Bottom line i just want a real world working example of html chunking for websites (otherwise im thinking about using firecrawl -> markdown -> chunk via markdown) // Load the paper console.time("Paper text loaded"); const paperUrl = "https://arxiv.org/html/1706.03762"; const response = await fetch(paperUrl); const paperText = await response.text(); console.timeEnd("Paper text loaded"); // Create document and chunk it console.time("Chunks created"); const doc = MDocument.fromHTML(paperText); const chunks = await doc.chunk({ strategy: "html", headers: [ ["h1", "Header 1"], ["p", "Paragraph"], ], });
2 Replies
Mastra Triager
Mastra Triager2mo ago
GitHub
[DISCORD:1417833432456757339] HTML Chunking via Mastra rag Mdocumen...
This issue was created from Discord post: https://discord.com/channels/1309558646228779139/1417833432456757339 I want a simple example of it, in the current example it is really basic and not real ...
_roamin_
_roamin_2mo ago
Hey, @yairtheyair! You could try to convert your html to markdown manually, I'm sure there are libraries that already exist for that. Then you could use the markdown strategy as you mentioned.

Did you find this page helpful?