5 replies

Hash Function Collisions in String Hashing Algorithm

Hi! Recently I've been fighting one weird bug in my application. Turns out hashing function for strings produces collisions which I did not expect for my use case. It's easy to fix by introducing custom hashing function for the data type, but I decided to share my experience with you just to hear your thoughts.
First of all, the documentation doesn't mention possible collisions using

Hash

Hash

Hash

Hash trait.
Perhaps we should reconsider the algorithm of strings hashing, because it doesn't produce unique hashes for even normal use cases. For example, I have a list of unique

1040

1040 IATA codes, and after hashing, I'm getting only

895

895 unique hashes. Same thing happens to country codes.
Of course, I'm not the expert in hashing algorithms, but I always thought that distribution should be strong enough for common use cases such as I described.
You are welcome into this thread to see what LLM thinks about current implementation of string hashing algorithm.

Hash Function Collisions in String Hashing Algorithm

Similar Threads

Similar Threads

Similar Threads