PtrHash
Minimal Perfect Hashing at RAM Throughput

Ragnar {Groot Koerkamp}

@curiouscoding.nl

SEA 2025, July 24, Venice

curiouscoding.nl/papers/ptrhash.pdf curiouscoding.nl/slides/ptrhash

Minimal Perfect Hash Functions

A set of \(n\) keys

A hash function

A hash function: collisions!

A hash function: injective / perfect

A hash function: bijective / minimal & perfect

Problem statement

Given a set of \(n\) keys \(K\subseteq \mathbb K\), build a function \(h\) satisfying

\(h(K) = \{0, \dots, n-1\}\): \(h\) is minimal & perfect.

Why? Array of \(n\) values as space-efficient value store.

Solutions take \(\log_2(e) {=} 1.443\) to \(3\) bits/key.

Goals of PtrHash:

at most 3 bits/key (you probably store ≥16 bit values anyway),
fast to evaluate,
fast to construct.

Previous & parallel work

FCH: Fox, Chen, and Heath (1992)
CHD: Belazzougui, Botelho, and Dietzfelbinger (2009)

PTHash: Pibiri and Trani (2021)
PHOBIC: Hermann et al. (2024)

PHast: Beling and Sanders (2025)
Consensus: Lehmann, Sanders, et al. (2025)
Survey: Lehmann, Mueller, et al. (2025)

\[ \newcommand{\lo}{\mathsf{lo}} \newcommand{\hi}{\mathsf{hi}} \]

Naive: try seeded hashes

Naive: \(n^n / n!\approx e^n\) tries, \(n\cdot\log_2 e\) bits

Construction

Build \(h\) incrementally using buckets

Build \(h\) incrementally

Better: assign buckets from large to small

Skew bucket sizes

Remapping: avoiding the hard part

Remapping: add \(1\%\) extra slots

Remapping: map extra slots back

Queries

Query: 1. Hash key

Query: 2. Lookup bucket

Query: 3. Compute slot

PTHash

Iterate seed values (pilots) until a hit is found.
Dictionary compression to handle large values.

PtrHash

Iterate seed values (pilots) until a hit is found.
Dictionary compression to handle large values.

“Inlined” 8-bit pilots for fast lookup
Handle impossible buckets by evicting others
- When none of the \(2^8=256\) seeds are collision free, choose the one with minimal number of collisions, and break ties towards evicting small buckets.
- Eviction: “unassign” a seed, and put the bucket back in the queue.
- Distributes the entropy from the hard buckets over all seeds.

\(\varepsilon\) cost sharding

\(S\) shards
PTHash, PHOBIC: shards of expected size \(a = n/S\), real size \(a_i\)
- \(a / \lambda\) buckets per shard, \(a_i/\alpha\) slots per shard
- Store shard offsets
PtrHash: \(a / \lambda\) buckets per shard, \(a /\alpha\) slots per shard
- Direct indexing
Shard size follows from Sebastiano’s formula

PtrHash with array vs VFUNC

VFUNC, storing \(b\) bit values:
- 10% memory overhead: \(b n + 0.1 b n\)
- 3 independent/parallel reads
PtrHash:
- 3 bits overhead: \(b n + 3n\)
- 2 dependent/sequential reads

Results

Construction time vs. size

Comparison

300M variable length string keys

<3 bits/key
fastest queries by 2x to 4x
also fast construction

Queries: looping, batching, streaming

A simple loop

Setup: PtrHash on \(10^9\) integer keys; 300 MB.
Fact: Most queries need uncached data from RAM.
Fact: Reading RAM takes 80 ns
Question: how long does this take? (>90, ≈80, 40-60, <30)

for k in keys {
    h.query(k);
}

Answer: 12 ns!
- The CPU works on 7 iterations in parallel!
To measure latency:

let mut seed = 0;
for k in keys {
    seed ^= h.query(seed ^ k);
}

Batching & Streaming

Batching:

for batch in keys.array_chunks::<32>() {
    for k in batch {
        h.prefetch_seed(k);
    }
    for k in batch {
        h.query(k);
    }
}

Streaming:

for i in 0..n {
    h.prefetch_seed(keys[i+32]);
    h.query(keys[i]);
}

Batching with Prefetching

Batching & Prefetching

Multithreading

Conclusions

Simple \(\implies\) fast
Partition data into cache-sized chunks
Streaming & prefetching for max throughput

Brought to you by

Bonus slides

Simple operations

Hashing via \[h(x) = \lo(C\cdot x) = (C\cdot x)\mod 2^{64}.\]
Quadratic bucket function \[\gamma(x) = \hi(x\cdot x) = \lfloor x^2/2^{64}\rfloor.\]
Use fastmod for \(\mod n\)

Overview

overview.drawio.svg

PtrHashMinimal Perfect Hashing at RAM Throughput

Ragnar {Groot Koerkamp}

@curiouscoding.nl

SEA 2025, July 24, Venice

Minimal Perfect Hash Functions

A set of \(n\) keys

A hash function

A hash function: collisions!

A hash function: injective / perfect

A hash function: bijective / minimal & perfect

Problem statement

Previous & parallel work

Naive: try seeded hashes

Naive: try seeded hashes

Naive: try seeded hashes

Naive: \(n^n / n!\approx e^n\) tries, \(n\cdot\log_2 e\) bits

Construction

Build \(h\) incrementally using buckets

Build \(h\) incrementally

Build \(h\) incrementally

Build \(h\) incrementally

Build \(h\) incrementally

Build \(h\) incrementally

Build \(h\) incrementally

Build \(h\) incrementally

Build \(h\) incrementally

Build \(h\) incrementally

Build \(h\) incrementally

Build \(h\) incrementally

Build \(h\) incrementally

Better: assign buckets from large to small

Skew bucket sizes

Remapping: avoiding the hard part

Remapping: add \(1\%\) extra slots

Remapping: add \(1\%\) extra slots

Remapping: map extra slots back

Remapping: map extra slots back

Queries

Query: 1. Hash key

Query: 2. Lookup bucket

Query: 3. Compute slot

PTHash

PtrHash

\(\varepsilon\) cost sharding

PtrHash with array vs VFUNC

Results

Construction time vs. size

Comparison

Queries: looping, batching, streaming

A simple loop

Batching & Streaming

Batching with Prefetching

Batching with Prefetching

Batching & Prefetching

Multithreading

Conclusions

Brought to you by

Bonus slides

Simple operations

Overview

Parameters

PtrHash
Minimal Perfect Hashing at RAM Throughput