PtrHash

Minimal Perfect Hash Functions Link to heading

A set of \(n\) keys Link to heading

A hash function Link to heading

A hash function: collisions! Link to heading

A hash function: injective / perfect Link to heading

A hash function: bijective / minimal & perfect Link to heading

Problem statement Link to heading

Given a set of \(n\) keys \(K\subseteq \mathbb K\), build a function \(h\) satisfying

  • \(h(K) = \{0, \dots, n-1\}\): \(h\) is minimal & perfect.

Why? Array of \(n\) values as space-efficient value store.

Solutions take \(\log_2(e) {=} 1.443\) to \(3\) bits/key.

Goals of PtrHash:

  • at most 3 bits/key (you probably store ≥16 bit values anyway),
  • fast to evaluate,
  • fast to construct.

Previous & parallel work Link to heading

  • FCH: Fox, Chen, and Heath (1992)
  • CHD: Belazzougui, Botelho, and Dietzfelbinger (2009)

  • PTHash: Pibiri and Trani (2021)
  • PHOBIC: Hermann et al. (2024)

  • PHast: Beling and Sanders (2025)

  • Consensus: Lehmann, Sanders, et al. (2025)

  • Survey: Lehmann, Mueller, et al. (2025)

\[ \newcommand{\lo}{\mathsf{lo}} \newcommand{\hi}{\mathsf{hi}} \]

Naive: try seeded hashes Link to heading

Naive: try seeded hashes Link to heading

Naive: try seeded hashes Link to heading

Naive: \(n^n / n!\approx e^n\) tries, \(n\cdot\log_2 e\) bits Link to heading

Construction Link to heading

Build \(h\) incrementally using buckets Link to heading

Build \(h\) incrementally Link to heading

Build \(h\) incrementally Link to heading

Build \(h\) incrementally Link to heading

Build \(h\) incrementally Link to heading

Build \(h\) incrementally Link to heading

Build \(h\) incrementally Link to heading

Build \(h\) incrementally Link to heading

Build \(h\) incrementally Link to heading

Build \(h\) incrementally Link to heading

Build \(h\) incrementally Link to heading

Build \(h\) incrementally Link to heading

Build \(h\) incrementally Link to heading

Better: assign buckets from large to small Link to heading

Skew bucket sizes Link to heading

Remapping: avoiding the hard part Link to heading

Remapping: add \(1\%\) extra slots Link to heading

Remapping: add \(1\%\) extra slots Link to heading

Remapping: map extra slots back Link to heading

Remapping: map extra slots back Link to heading

Queries Link to heading

Query: 1. Hash key Link to heading

Query: 2. Lookup bucket Link to heading

Query: 3. Compute slot Link to heading

PTHash Link to heading

  • Iterate seed values (pilots) until a hit is found.
  • Dictionary compression to handle large values.

PtrHash Link to heading

  • Iterate seed values (pilots) until a hit is found.
  • Dictionary compression to handle large values.

  • “Inlined” 8-bit pilots for fast lookup
  • Handle impossible buckets by evicting others
    • When none of the \(2^8=256\) seeds are collision free, choose the one with minimal number of collisions, and break ties towards evicting small buckets.
    • Eviction: “unassign” a seed, and put the bucket back in the queue.
    • Distributes the entropy from the hard buckets over all seeds.

<no animation here :(>

\(\varepsilon\) cost sharding Link to heading

  • \(S\) shards
  • PTHash, PHOBIC: shards of expected size \(a = n/S\), real size \(a_i\)
    • \(a / \lambda\) buckets per shard, \(a_i/\alpha\) slots per shard
    • Store shard offsets
  • PtrHash: \(a / \lambda\) buckets per shard, \(a /\alpha\) slots per shard
    • Direct indexing
  • Shard size follows from Sebastiano’s formula

PtrHash with array vs VFUNC Link to heading

  • VFUNC, storing \(b\) bit values:
    • 10% memory overhead: \(b n + 0.1 b n\)
    • 3 independent/parallel reads
  • PtrHash:
    • 3 bits overhead: \(b n + 3n\)
    • 2 dependent/sequential reads

Results Link to heading

Construction time vs. size Link to heading

Comparison Link to heading

  • 300M variable length string keys

  • <3 bits/key
  • fastest queries by 2x to 4x
  • also fast construction

Queries: looping, batching, streaming Link to heading

A simple loop Link to heading

  • Setup: PtrHash on \(10^9\) integer keys; 300 MB.
  • Fact: Most queries need uncached data from RAM.
  • Fact: Reading RAM takes 80 ns
  • Question: how long does this take? (>90, ≈80, 40-60, <30)
1
2
3
for k in keys {
    h.query(k);
}
  • Answer: 12 ns!
    • The CPU works on 7 iterations in parallel!
  • To measure latency:
1
2
3
4
let mut seed = 0;
for k in keys {
    seed ^= h.query(seed ^ k);
}

Batching & Streaming Link to heading

Batching:

1
2
3
4
5
6
7
8
for batch in keys.array_chunks::<32>() {
    for k in batch {
        h.prefetch_seed(k);
    }
    for k in batch {
        h.query(k);
    }
}

Streaming:

1
2
3
4
for i in 0..n {
    h.prefetch_seed(keys[i+32]);
    h.query(keys[i]);
}

Batching with Prefetching Link to heading

Batching with Prefetching Link to heading

Batching & Prefetching Link to heading

Multithreading Link to heading

Conclusions Link to heading

  • Simple \(\implies\) fast
  • Partition data into cache-sized chunks
  • Streaming & prefetching for max throughput

Brought to you by Link to heading

Bonus slides Link to heading

Simple operations Link to heading

  • Hashing via \[h(x) = \lo(C\cdot x) = (C\cdot x)\mod 2^{64}.\]
  • Quadratic bucket function \[\gamma(x) = \hi(x\cdot x) = \lfloor x^2/2^{64}\rfloor.\]
  • Use fastmod for \(\mod n\)

Overview Link to heading

Parameters Link to heading