HTML Chunking via Mastra rag Mdocument
I want a simple example of it, in the current example it is really basic and not real world.
I did this code and it does not work, chunks are too big.
Bottom line i just want a real world working example of html chunking for websites (otherwise im thinking about using firecrawl -> markdown -> chunk via markdown)
  // Load the paper
  console.time("Paper text loaded");
  const paperUrl = "https://arxiv.org/html/1706.03762";
  const response = await fetch(paperUrl);
  const paperText = await response.text();
  console.timeEnd("Paper text loaded");
  // Create document and chunk it
  console.time("Chunks created");
  const doc = MDocument.fromHTML(paperText);
  const chunks = await doc.chunk({
    strategy: "html",
    headers: [
      ["h1", "Header 1"],
      ["p", "Paragraph"],
    ],
  });
2 Replies
📝 Created GitHub issue: https://github.com/mastra-ai/mastra/issues/7942
GitHub
[DISCORD:1417833432456757339] HTML Chunking via Mastra rag Mdocumen...
This issue was created from Discord post: https://discord.com/channels/1309558646228779139/1417833432456757339 I want a simple example of it, in the current example it is really basic and not real ...
Hey, @yairtheyair! You could try to convert your html to markdown manually, I'm sure there are libraries that already exist for that. Then you could use the markdown strategy as you mentioned.