Implementing a good GetHashCode
If you've ever implemented GetHashCode you probably did it the way suggested in MSDN which is using XOR. And if you use R# you might have seen that it generates a different GetHashCode using prime numbers. So what should you do? I think there are three properties you want to aim for when it comes to implement GetHashCode, in order of importance.
- GetHashCode must be consistent. This should be a no-brainer. It simply means that for two objects that are equal, the should generate the same hash code every time.
- GetHashCode should provide random distribution output. Random here just means that it appears to be random, not that it is truly random since that would break rule number one. Also the output of GetHashCode is only an integer. That is only 32 bits. So if your object has a lot more possible states there will be hash collisions and you don't want them to be likely. For example if you have an object that consists of two different integers that you just XOR together, then (a ^ b) and (b ^ a) will return the same hash. You most definitely don't want this.
- GetHashCode should be fast. Since GetHashCode is used in a lot of collection classes you don't want this method to be slow.
Given all this it turns out that the approach R# uses is pretty powerful into achieving all these goals. But the best possible approach is to use pick an algorithm that is suitable for your data. A few examples:
- If your object have up to int.MAX possible values then a simple bit shift is probably a good way. Example: our object have two short values: GetHashCode() { return (a << 16) | b; }
- If your object consists of other objects that generate good distribution hash codes and XOR or just multiplying them together is likely good enough for you. Please note that the GetHashCode for a lot of basic types do not generate good distribution values (especially bool and integers).
- In all other cases and especially when in doubt it is preferable to go with an implementation like the one R# generates for you.
If you want to read more about a number of different hashing options there a is a pretty good list here with lots of good information.