PtrHash
Minimal Perfect Hashing at RAM Throughput

Ragnar {Groot Koerkamp}

@curiouscoding.nl

SEA 2025, July 24, Venice

curiouscoding.nl/papers/ptrhash.pdf curiouscoding.nl/slides/ptrhash

Minimal Perfect Hash Functions

A set of \(n\) keys

h-keys.svg

A hash function

h-hash.svg

A hash function: collisions!

h-collision.svg

A hash function: injective / perfect

h-injection.svg

A hash function: bijective / minimal & perfect

h-bijection.svg

Problem statement

Given a set of \(n\) keys \(K\subseteq \mathbb K\), build a function \(h\) satisfying

  • \(h(K) = \{0, \dots, n-1\}\): \(h\) is minimal & perfect.

Why? Array of \(n\) values as space-efficient value store.

Solutions take \(\log_2(e) {=} 1.443\) to \(3\) bits/key.

Goals of PtrHash:

  • at most 3 bits/key (you probably store ≥16 bit values anyway),
  • fast to evaluate,
  • fast to construct.

Previous & parallel work

  • FCH: Fox, Chen, and Heath (1992)
  • CHD: Belazzougui, Botelho, and Dietzfelbinger (2009)

  • PTHash: Pibiri and Trani (2021)
  • PHOBIC: Hermann et al. (2024)

  • PHast: Beling and Sanders (2025)
  • Consensus: Lehmann, Sanders, et al. (2025)
  • Survey: Lehmann, Mueller, et al. (2025)

\[ \newcommand{\lo}{\mathsf{lo}} \newcommand{\hi}{\mathsf{hi}} \]

Naive: try seeded hashes

n-1.svg

Naive: try seeded hashes

n-2.svg

Naive: try seeded hashes

n-3.svg

Naive: \(n^n / n!\approx e^n\) tries, \(n\cdot\log_2 e\) bits

n-4.svg

Construction

Build \(h\) incrementally using buckets

buckets-before.svg

Build \(h\) incrementally

incremental-1.svg

Build \(h\) incrementally

incremental-2.svg

Build \(h\) incrementally

incremental-3.svg

Build \(h\) incrementally

incremental-4.svg

Build \(h\) incrementally

incremental-5.svg

Build \(h\) incrementally

incremental-6.svg

Build \(h\) incrementally

incremental-7.svg

Build \(h\) incrementally

incremental-8.svg

Build \(h\) incrementally

incremental-9.svg

Build \(h\) incrementally

incremental-10.svg

Build \(h\) incrementally

incremental-11.svg

Build \(h\) incrementally

incremental-12.svg

Better: assign buckets from large to small

incremental-12.svg

Skew bucket sizes

bucket-sizes.svg

Remapping: avoiding the hard part

incremental-12.svg

Remapping: add \(1\%\) extra slots

remap-1.svg

Remapping: add \(1\%\) extra slots

remap-2.svg

Remapping: map extra slots back

remap-3.svg

Remapping: map extra slots back

remap-4.svg

Queries

Query: 1. Hash key

query-1.svg

Query: 2. Lookup bucket

query-2.svg

Query: 3. Compute slot

query-3.svg

PTHash

  • Iterate seed values (pilots) until a hit is found.
  • Dictionary compression to handle large values.

dictionary.svg

PtrHash

  • Iterate seed values (pilots) until a hit is found.
  • Dictionary compression to handle large values.

  • “Inlined” 8-bit pilots for fast lookup
  • Handle impossible buckets by evicting others
    • When none of the \(2^8=256\) seeds are collision free, choose the one with minimal number of collisions, and break ties towards evicting small buckets.
    • Eviction: “unassign” a seed, and put the bucket back in the queue.
    • Distributes the entropy from the hard buckets over all seeds.

<no animation here :(>

\(\varepsilon\) cost sharding

  • \(S\) shards
  • PTHash, PHOBIC: shards of expected size \(a = n/S\), real size \(a_i\)
    • \(a / \lambda\) buckets per shard, \(a_i/\alpha\) slots per shard
    • Store shard offsets
  • PtrHash: \(a / \lambda\) buckets per shard, \(a /\alpha\) slots per shard
    • Direct indexing
  • Shard size follows from Sebastiano’s formula

PtrHash with array vs VFUNC

  • VFUNC, storing \(b\) bit values:
    • 10% memory overhead: \(b n + 0.1 b n\)
    • 3 independent/parallel reads
  • PtrHash:
    • 3 bits overhead: \(b n + 3n\)
    • 2 dependent/sequential reads

Results

Construction time vs. size

size.svg

Comparison

  • 300M variable length string keys


  • <3 bits/key
  • fastest queries by 2x to 4x
  • also fast construction

table2.png

Queries: looping, batching, streaming

A simple loop

  • Setup: PtrHash on \(10^9\) integer keys; 300 MB.
  • Fact: Most queries need uncached data from RAM.
  • Fact: Reading RAM takes 80 ns
  • Question: how long does this take? (>90, ≈80, 40-60, <30)
for k in keys {
    h.query(k);
}
  • Answer: 12 ns!
    • The CPU works on 7 iterations in parallel!
  • To measure latency:
let mut seed = 0;
for k in keys {
    seed ^= h.query(seed ^ k);
}

Batching & Streaming

Batching:

for batch in keys.array_chunks::<32>() {
    for k in batch {
        h.prefetch_seed(k);
    }
    for k in batch {
        h.query(k);
    }
}

Streaming:

for i in 0..n {
    h.prefetch_seed(keys[i+32]);
    h.query(keys[i]);
}

Batching with Prefetching

batching-1.svg

Batching with Prefetching

batching-2.svg

Batching & Prefetching

query_batching.svg

Multithreading

query_throughput.svg

Conclusions

  • Simple \(\implies\) fast
  • Partition data into cache-sized chunks
  • Streaming & prefetching for max throughput

Brought to you by

ptr.jpg

Bonus slides

Simple operations

  • Hashing via \[h(x) = \lo(C\cdot x) = (C\cdot x)\mod 2^{64}.\]
  • Quadratic bucket function \[\gamma(x) = \hi(x\cdot x) = \lfloor x^2/2^{64}\rfloor.\]
  • Use fastmod for \(\mod n\)

Overview

overview.drawio.svg

Parameters

table1.png