niuzj
ATApache TinkerPop
•Created by niuzj on 5/21/2025 in #questions
Seeking Help: Building a Text-to-Gremlin Corpus Generator - AST Parsing
Hey everyone,
I'm working on fine-tuning a large language model for text-to-Gremlin generation. To do this, I need a substantial dataset of natural language queries paired with their corresponding Gremlin queries. I'm currently building a corpus generator for this.
I've seen some work on text-to-Cypher where they parsed the Cypher AST (Abstract Syntax Tree). However, the ASTs for Cypher and Gremlin are quite different.
Does anyone have suggestions on how to tackle this? Specifically:
* Are there any existing tools for parsing Gremlin ASTs?
* Alternatively, are there any methods to build such a corpus generator without relying on AST parsing?
Any help or ideas would be greatly appreciated! Thanks!
34 replies