Introduction
Mapping partitions (or keys) to slices (or buckets) in a distributed or sharded system has a large impact on performance.
Different hash-based solutions for this problem exist; each has drawbacks.
The goal is to choose a hash function that is simple to implement and gives acceptable performance with as few collisions as possible.
The problem is defined as follows. Given a logical partition number $P$, compute the corresponding slice number $S = s(P)$.
Where $0 ≤ P < 2^{32}$, $0 ≤ S < M$, and $M$ denotes the table size (e.g. $M = 2^{14}$).
In our “binary” world, assumptions that input data (partition numbers) have uniform distribution are not always correct.
Therefore, the hash function $s: P \to S$ must be designed very carefully.
In addition to providing uniform distribution of hash values (slice numbers), it has to add some randomness.
Luckily, this field is well studied: well-known textbooks of Corman’s and Knuth’s have a good introduction to this field.
The last one has more detail explanation; therefore, without hesitation, we make use of Knuth’s book (Section 6.4.):
usually explaining in two paragraphs what the book is saying just in one sentence.