Coprimes |

Introduction

Mapping partitions (or keys) to slices (or buckets) in a distributed or sharded system has a large impact on performance. Different hash-based solutions for this problem exist; each has drawbacks. The goal is to choose a hash function that is simple to implement and gives acceptable performance with as few collisions as possible.

The problem is defined as follows. Given a logical partition number $P$, compute the corresponding slice number $S = s(P)$. Where $0 ≤ P < 2^{32}$, $0 ≤ S < M$, and $M$ denotes the table size (e.g. $M = 2^{14}$). In our “binary” world, assumptions that input data (partition numbers) have uniform distribution are not always correct. Therefore, the hash function $s: P \to S$ must be designed very carefully. In addition to providing uniform distribution of hash values (slice numbers), it has to add some randomness. Luckily, this field is well studied: well-known textbooks of Corman’s and Knuth’s have a good introduction to this field. The last one has more detail explanation; therefore, without hesitation, we make use of Knuth’s book (Section 6.4.): usually explaining in two paragraphs what the book is saying just in one sentence.