Hash Code - Introduction
By Tomasz Kuczma
Welcome to the first article of the Hash Code miniserie where you can read how the hash codes (non-cryptographic hashing algorithms) and hash collections work in different programming languages. Every software engineer uses hash collections like Python’s dictionary, Java’s hash map or C++’s unordered map. You get them to know pretty early in your learning path as they are key data structures to solve many problems effectively but are you sure that you know them well? In this miniserie, I will shed some light on this subject.
What is the hash code
As mention above a hash code is the result of a non-cryptographic hashing algorithm so it cannot be used to provide security.
It maps a value (let it be object or primitive value) to the fixed-length binary value in a deterministic way.
In Java, it is a primitive 32-bit integer provided by every object via hashCode()
method.
How does the hash codes work with Java primitives?
Java does not treat primitives the same way as objects.
You cannot call hashCode
method directly on the value but you can get it via wrapper class e.g. Integer.hashCode(fooInt)
.
For int
it is just the value of this integer. Types shorter than 4 bytes are casted to int
so there is no precision lost.
Long is a 64-bit number so it has to somehow “compress” itself:
public static int hashCode(long value) {
return (int)(value ^ (value >>> 32));
}
Note that values that could be represented by 32-bit integer have the same hashcodes as plain integers.
Floating points numbers are a little more tricky. Java uses floatToIntBits
(it makes sense since floats are 4 bytes) but there special cases for NaN
value:
public static int hashCode(float value) {
return floatToIntBits(value);
}
public static int floatToIntBits(float value) {
if (!isNaN(value)) {
return floatToRawIntBits(value);
}
return 0x7fc00000;
}
public static boolean isNaN(float v) {
return (v != v);
}
public static native int floatToRawIntBits(float var0);
Note how isNaN
is implemented.
Quite interesting is magic number used for NaN
. The quick experiment shows us that NaN
is greater than positive infinity and Float.MAX_VALUE
.
Table of selected float
values:
Code | Hex value | Binary value | Decimal value |
---|---|---|---|
Float.floatToRawIntBits(POSITIVE_INFINITY) | 0x7f800000 | 11111111-00000000-00000000-00000000 | 2139095040 |
Float.floatToRawIntBits(Float.MAX_VALUE) | 0x7f7fffff | 11111110-11111111-11111111-11111111 | 2139095039 |
Float.floatToRawIntBits(Float.NaN) | 0x7fc00000 | 11111111-10000000-00000000-00000000 | 2143289344 |
Float.hashcode(Float.NaN) | 0x7fc00000 | 11111111-10000000-00000000-00000000 | 2143289344 |
Integer.MAX_VALUE | 0x7fffffff | 11111111-11111111-11111111-11111111 | 2147483647 |
Note that bit representations of float’s Nan, MAX_VALUE, POSITIVE_INFINITY differ significantly and do not reach a maximum 32-bit integer value. All as per IEEE 754 standard.
The distinguished number is used for NaN
to hash it always to the same value, no matter how we get it. Remember that various unpermitted operations can produce NaN
but with different bit representation.
double
uses the same algorithm as float
but stores bits in long
and then does the same XOR trick as long
.
boolean
is actually a small surprise as it has own magic numbers too even though it represented only one of two states:
public static int hashCode(boolean value) {
return value ? 1231 : 1237;
}
Why exactly those two magic numbers has been chosen? They are just two significantly large prime numbers. You will understand why it is important after reading the next article :)
What about objects?
The default hashCode()
implementation is inherited from Object
class and it is a native method.
A lot of people used to say that this value is related to the address of the object in the memory.
The truth is in documentation
“The hashCode may or may not be implemented as some function of an object’s memory address at some point in time.”
If you override it in your class hierarchy and for some reason need to get hash code value from the original object implementation then you can use System.identityHashCode(Object)
.
It is up to the programmer how he implements hash code in own classes. The default implementation is inherited from Object
and should be overwritten. I will explain the reason later.
What about enums?
enum
is a special class in Java. It does not provide any new hash code implementation - it just uses Object
one.
This is sufficient and there is no need to change that as enums are singletons and this is forced by Java itself.
It’s not the end
Stay tuned for the next article when I will write how hash codes are generated for Java collections
Note that all code examples come from OpenJDK 11.
Software engineer with a passion. Interested in computer networks and large-scale distributed computing. He loves to optimize and simplify software on various levels of abstraction starting from memory ordering through non-blocking algorithms up to system design and end-user experience. Geek. Linux user.