BBHash Limasset et al. (2017) uses multiple layers to create a minimal perfect hashing functions (MPFH), that hashes some input set into \([n]\).
(See also my note on PTHash (Pibiri and Trani 2021).)
Simply said, it maps the \(n\) elements into \([\gamma \cdot n]\) using hashing function \(h_0\). The \(k_0\) elements that have collisions are mapped into \([\gamma \cdot k_0]\) using \(h_1\). Then, the \(k_1\) elements with collisions are mapped into \([\gamma \cdot k_1]\), and so on.
Possible speedup?
One bottleneck seems to be that this needs multiple layers of lookups each at different positions in memory. A possible idea:
- Using \(h_0\), map all items into buckets with \(\sim 100\) elements each.
- Hash each of those buckets using the original ‘recursive’ BBhash technique into a cacheline of \(256\) bits.
This way, only two lookups should be needed (apart from the rank operation that follows).
I have no idea how the analysis works out though – maybe the recursive strategy works better on a larger scale and when hashing only so few items there isn’t much benefit to the BBhash approach in the first place.