Effect CommunityEC
Effect Community6mo ago
5 replies
Dmitrii Bykov

Hash Function Collisions in String Hashing Algorithm

Hi! Recently I've been fighting one weird bug in my application. Turns out hashing function for strings produces collisions which I did not expect for my use case. It's easy to fix by introducing custom hashing function for the data type, but I decided to share my experience with you just to hear your thoughts.
First of all, the documentation doesn't mention possible collisions using Hash trait.
Perhaps we should reconsider the algorithm of strings hashing, because it doesn't produce unique hashes for even normal use cases. For example, I have a list of unique 1040 IATA codes, and after hashing, I'm getting only 895 unique hashes. Same thing happens to country codes.
Of course, I'm not the expert in hashing algorithms, but I always thought that distribution should be strong enough for common use cases such as I described.
You are welcome into this thread to see what LLM thinks about current implementation of string hashing algorithm.
Was this page helpful?