I'm trying to use AutoRAG to spell check

I'm trying to use AutoRAG to spell check names. I have a list of names in a R2 bucket (split in ~3MB chunks). AutoRAG claims it finished processing them. I then use c.env.AI.autorag('my-rag-name').aiSearch() to ask it basically "give me the correct version of the given name". Hoping it would fix the spelling of "Suma Covjek" to "Šuma Čovjek", as listed in one of the files. But results are underwhelming. When I looked at the AI Gateway and what the aiSearch() used for the default system message, I see "...no relevant documents were retrieved for the user's query...". For each and all of them. If I do a search in the playground it would sometimes return a file, but often not the right or relevant one, even if the exact correct spelling is given. What am I missing? How come aiSearch() would not find my content?
3 Replies
samjs
samjs5w ago
Hey @maj. I'm not a deep expert on the ML side of things, but from my understanding of LLMs + RAG I don't think that's a use case that LLMs shine at. Under the surface, RAG is using vector search with LLM embeddings. Embeddings are great and finding semantically similar concepts. Based on that, I don't think it's a given that two simliar names would appear close to each other in the embedding space. Furthermore, if you have a bunch of names in one file, then it's even harder for the two to appear "similar" to each other. You may instead be better off using a more traditional method like Levenshtein Distance to measure the distance between the given name and each word in your wordlist.
maj
majOP5w ago
hehe... that's pretty much what I've been working on for a few days now: use d1's fulltext search for partial matches, then FuseJS.io to do a fuzzy search on those results. That's looking much more promising so far - and faster, too. Thanks for getting back to me!
steve silk hurley f -> Steve “Silk” Hurley
tiesto -> Tiësto
ceccarelli trio -> André Ceccarelli Trio
j. geils band -> The J. Geils Band
steve silk hurley f -> Steve “Silk” Hurley
tiesto -> Tiësto
ceccarelli trio -> André Ceccarelli Trio
j. geils band -> The J. Geils Band
Just a few examples I've seen so far. Perfect matches.
samjs
samjs5w ago
Nice!

Did you find this page helpful?