BBHash Limasset et al. (2017) uses multiple *layers* to create a minimal perfect
hashing functions (MPFH), that hashes some input set into \([n]\).

(See also my note on PTHash (Pibiri and Trani 2021).)

Simply said, it maps the \(n\) elements into \([\gamma \cdot n]\) using hashing function \(h_0\). The \(k_0\) elements that have collisions are mapped into \([\gamma \cdot k_0]\) using \(h_1\). Then, the \(k_1\) elements with collisions are mapped into \([\gamma \cdot k_1]\), and so on.

## Possible speedup?

One bottleneck seems to be that this needs multiple layers of lookups each at different positions in memory. A possible idea:

- Using \(h_0\), map all items into buckets with \(\sim 100\) elements each.
- Hash each of those buckets using the original ‘recursive’ BBhash technique into a cacheline of \(256\) bits.

This way, only two lookups should be needed (apart from the rank operation that follows).

I have no idea how the analysis works out though – maybe the recursive strategy works better on a larger scale and when hashing only so few items there isn’t much benefit to the BBhash approach in the first place.

## References

*Proceedings of the 44th International Acm Sigir Conference on Research and Development in Information Retrieval*, July. https://doi.org/10.1145/3404835.3462849.