I’m looking for advice on the current best practices for building a RAG system over a large corpus of technical documents (think specs, manuals, internal docs, etc.).
Context:
* Very large document set (tens/hundreds of thousands of files)
* Mostly technical text (structured + unstructured)
* Need accurate retrieval, minimal hallucinations
Questions:
1. What architectures are people using in 2025/2026? (classic embeddings + vector DB vs hybrid vs graph-RAG?)
2. Recommended chunking strategies for technical docs?
3. How are you handling evaluation + grounding quality?
Would love real-world lessons learned or links to solid repos/blogs. The current state of the web feels unclear, and I haven’t found much high-quality research on this topic yet.