Minimal Perfect Hash Functions
A hash function: collisions!
A hash function: injective / perfect
A hash function: bijective / minimal & perfect
Problem statement
Given a set of \(n\) keys \(K\subseteq \mathbb K\), build a function \(h\) satisfying
- \(h(K) = \{0, \dots, n-1\}\): \(h\) is minimal & perfect.
Why? Array of \(n\) values as space-efficient value store.
Solutions take \(\log_2(e) {=} 1.443\) to \(3\) bits/key.
Goals of PtrHash:
- at most 3 bits/key (you probably store ≥16 bit values anyway),
- fast to evaluate,
- fast to construct.
Previous & parallel work
- FCH: Fox, Chen, and Heath (1992)
- CHD: Belazzougui, Botelho, and Dietzfelbinger (2009)
- PTHash: Pibiri and Trani (2021)
- PHOBIC: Hermann et al. (2024)
- PHast: Beling and Sanders (2025)
- Consensus: Lehmann, Sanders, et al. (2025)
- Survey: Lehmann, Mueller, et al. (2025)
\[
\newcommand{\lo}{\mathsf{lo}}
\newcommand{\hi}{\mathsf{hi}}
\]
Naive: \(n^n / n!\approx e^n\) tries, \(n\cdot\log_2 e\) bits
Build \(h\) incrementally using buckets
Build \(h\) incrementally
Build \(h\) incrementally
Build \(h\) incrementally
Build \(h\) incrementally
Build \(h\) incrementally
Build \(h\) incrementally
Build \(h\) incrementally
Build \(h\) incrementally
Build \(h\) incrementally
Build \(h\) incrementally
Build \(h\) incrementally
Build \(h\) incrementally
Better: assign buckets from large to small
Remapping: avoiding the hard part
Remapping: add \(1\%\) extra slots
Remapping: add \(1\%\) extra slots
Remapping: map extra slots back
Remapping: map extra slots back
PTHash
- Iterate seed values (pilots) until a hit is found.
- Dictionary compression to handle large values.
PtrHash
- Iterate seed values (pilots) until a hit is found.
- Dictionary compression to handle large values.
- “Inlined” 8-bit pilots for fast lookup
- Handle impossible buckets by evicting others
- When none of the \(2^8=256\) seeds are collision free, choose the one with
minimal number of collisions, and break ties towards evicting small buckets.
- Eviction: “unassign” a seed, and put the bucket back in the queue.
- Distributes the entropy from the hard buckets over all seeds.
<no animation here :(>
\(\varepsilon\) cost sharding
- \(S\) shards
- PTHash, PHOBIC: shards of expected size \(a = n/S\), real size \(a_i\)
- \(a / \lambda\) buckets per shard, \(a_i/\alpha\) slots per shard
- Store shard offsets
- PtrHash: \(a / \lambda\) buckets per shard, \(a /\alpha\) slots per shard
- Shard size follows from Sebastiano’s formula
PtrHash with array vs VFUNC
- VFUNC, storing \(b\) bit values:
- 10% memory overhead: \(b n + 0.1 b n\)
- 3 independent/parallel reads
- PtrHash:
- 3 bits overhead: \(b n + 3n\)
- 2 dependent/sequential reads
Construction time vs. size
Comparison
- 300M variable length string keys
- <3 bits/key
- fastest queries by 2x to 4x
- also fast construction
Queries: looping, batching, streaming
A simple loop
- Setup: PtrHash on \(10^9\) integer keys; 300 MB.
- Fact: Most queries need uncached data from RAM.
- Fact: Reading RAM takes 80 ns
- Question: how long does this take? (>90, ≈80, 40-60, <30)
for k in keys {
h.query(k);
}
- Answer: 12 ns!
- The CPU works on 7 iterations in parallel!
- To measure latency:
let mut seed = 0;
for k in keys {
seed ^= h.query(seed ^ k);
}
Batching & Streaming
Batching:
for batch in keys.array_chunks::<32>() {
for k in batch {
h.prefetch_seed(k);
}
for k in batch {
h.query(k);
}
}
Streaming:
for i in 0..n {
h.prefetch_seed(keys[i+32]);
h.query(keys[i]);
}
Batching with Prefetching
Batching with Prefetching
Conclusions
- Simple \(\implies\) fast
- Partition data into cache-sized chunks
- Streaming & prefetching for max throughput
Simple operations
- Hashing via
\[h(x) = \lo(C\cdot x) = (C\cdot x)\mod 2^{64}.\]
- Quadratic bucket function
\[\gamma(x) = \hi(x\cdot x) = \lfloor x^2/2^{64}\rfloor.\]
- Use fastmod for \(\mod n\)