Hash Code - Introduction

By Tomasz Kuczma

January 14, 2020

Welcome to the first article of the Hash Code miniserie where you can read how the hash codes (non-cryptographic hashing algorithms) and hash collections work in different programming languages. Every software engineer uses hash collections like Python’s dictionary, Java’s hash map or C++’s unordered map. You get them to know pretty early in your learning path as they are key data structures to solve many problems effectively but are you sure that you know them well? In this miniserie, I will shed some light on this subject.

What is the hash code

As mention above a hash code is the result of a non-cryptographic hashing algorithm so it cannot be used to provide security. It maps a value (let it be object or primitive value) to the fixed-length binary value in a deterministic way. In Java, it is a primitive 32-bit integer provided by every object via hashCode() method.

How does the hash codes work with Java primitives?

Java does not treat primitives the same way as objects. You cannot call hashCode method directly on the value but you can get it via wrapper class e.g. Integer.hashCode(fooInt). For int it is just the value of this integer. Types shorter than 4 bytes are casted to int so there is no precision lost. Long is a 64-bit number so it has to somehow “compress” itself:

public static int hashCode(long value) {
	return (int)(value ^ (value >>> 32));
}

Note that values that could be represented by 32-bit integer have the same hashcodes as plain integers.

Floating points numbers are a little more tricky. Java uses floatToIntBits (it makes sense since floats are 4 bytes) but there special cases for NaN value:

public static int hashCode(float value) {
	return floatToIntBits(value);
}

public static int floatToIntBits(float value) {
	if (!isNaN(value)) {
		return floatToRawIntBits(value);
	}
	return 0x7fc00000;
}

public static boolean isNaN(float v) {
	return (v != v);
}

public static native int floatToRawIntBits(float var0);

Note how isNaN is implemented.

Quite interesting is magic number used for NaN. The quick experiment shows us that NaN is greater than positive infinity and Float.MAX_VALUE. Table of selected float values:

Code	Hex value	Binary value	Decimal value
Float.floatToRawIntBits(POSITIVE_INFINITY)	0x7f800000	11111111-00000000-00000000-00000000	2139095040
Float.floatToRawIntBits(Float.MAX_VALUE)	0x7f7fffff	11111110-11111111-11111111-11111111	2139095039
Float.floatToRawIntBits(Float.NaN)	0x7fc00000	11111111-10000000-00000000-00000000	2143289344
Float.hashcode(Float.NaN)	0x7fc00000	11111111-10000000-00000000-00000000	2143289344
Integer.MAX_VALUE	0x7fffffff	11111111-11111111-11111111-11111111	2147483647

Note that bit representations of float’s Nan, MAX_VALUE, POSITIVE_INFINITY differ significantly and do not reach a maximum 32-bit integer value. All as per IEEE 754 standard.

The distinguished number is used for NaN to hash it always to the same value, no matter how we get it. Remember that various unpermitted operations can produce NaN but with different bit representation.

double uses the same algorithm as float but stores bits in long and then does the same XOR trick as long.

boolean is actually a small surprise as it has own magic numbers too even though it represented only one of two states:

public static int hashCode(boolean value) {
	return value ? 1231 : 1237;
}

Why exactly those two magic numbers has been chosen? They are just two significantly large prime numbers. You will understand why it is important after reading the next article :)

What about objects?

The default hashCode() implementation is inherited from Object class and it is a native method. A lot of people used to say that this value is related to the address of the object in the memory. The truth is in documentation “The hashCode may or may not be implemented as some function of an object’s memory address at some point in time.” If you override it in your class hierarchy and for some reason need to get hash code value from the original object implementation then you can use System.identityHashCode(Object).

It is up to the programmer how he implements hash code in own classes. The default implementation is inherited from Object and should be overwritten. I will explain the reason later.

What about enums?

enum is a special class in Java. It does not provide any new hash code implementation - it just uses Object one. This is sufficient and there is no need to change that as enums are singletons and this is forced by Java itself.

It’s not the end

Stay tuned for the next article when I will write how hash codes are generated for Java collections

Note that all code examples come from OpenJDK 11.

Article by Tomasz Kuczma

Software engineer with a passion. Interested in computer networks and large-scale distributed computing. He loves to optimize and simplify software on various levels of abstraction starting from memory ordering through non-blocking algorithms up to system design and end-user experience. Geek. Linux user.

The views I express are my alone and they do not necessarily express the views of my employer or ex-employers. They are not investment advice nor based on any non-public information of any kind. Poglądy, które wyrażam, są tylko moje i niekoniecznie wyrażają opinie mojego pracodawcy lub byłych pracodawców. Nie są poradami inwestycyjnymi ani nie opierają się na jakichkolwiek niepublicznych informacjach.