[WIP] PTRhash: Improving the PTHash Minimal Perfect Hash Function

Table of Contents

Abstract

Motivation: Given a set \(S\) of \(n\) objects, a minimal perfect hash function (MPHF) is a collision-free bijective map \(f\) from the elements of \(S\) to \(\{0, \dots, n-1\}\). These functions have uses in databases, search engines, and are used in bioinformatics indexing tools such as Pufferfish (using BBHash), SSHash, and Piscem (both using PTHash). This work presents an MPHF that prioritizes query throughput and can be constructed efficiently for billions or more elements using \(2\) to \(4\) bits of memory per key.

Contributions: PTRHash builds on PTHash by 1) partitioning the table for faster construction, and 2) using cuckoo hashing to find $8$bit pilots, 3) making compression redundant. We further speed up queries by 4) simplifying hash functions and modulo operations, and 5) prefetching reads from memory.

Results: We implemented PTRhash in Rust and show that at \(3\) bits/key memory usage it is \(3\times\) faster to construct than PTHash while achieving $5$ns/key query throughput, \(5\times\) faster than state-of-the-art methods.

Source: github.com/RagnarGrootKoerkamp/pthash-rs