Hash Function Collisions in String Hashing Algorithm
Hi! Recently I've been fighting one weird bug in my application. Turns out hashing function for strings produces collisions which I did not expect for my use case. It's easy to fix by introducing custom hashing function for the data type, but I decided to share my experience with you just to hear your thoughts.
First of all, the documentation doesn't mention possible collisions using
Perhaps we should reconsider the algorithm of strings hashing, because it doesn't produce unique hashes for even normal use cases. For example, I have a list of unique
Of course, I'm not the expert in hashing algorithms, but I always thought that distribution should be strong enough for common use cases such as I described.
You are welcome into this thread to see what LLM thinks about current implementation of string hashing algorithm.
First of all, the documentation doesn't mention possible collisions using
Hash trait.Perhaps we should reconsider the algorithm of strings hashing, because it doesn't produce unique hashes for even normal use cases. For example, I have a list of unique
1040 IATA codes, and after hashing, I'm getting only 895 unique hashes. Same thing happens to country codes.Of course, I'm not the expert in hashing algorithms, but I always thought that distribution should be strong enough for common use cases such as I described.
You are welcome into this thread to see what LLM thinks about current implementation of string hashing algorithm.
