BBHash: some ideas

Table of Contents

BBHash Limasset et al. (2017) uses multiple layers to create a minimal perfect hashing functions (MPFH), that hashes some input set into \([n]\).

(See also my note on PTHash (Pibiri and Trani 2021).)

Simply said, it maps the \(n\) elements into \([\gamma \cdot n]\) using hashing function \(h_0\). The \(k_0\) elements that have collisions are mapped into \([\gamma \cdot k_0]\) using \(h_1\). Then, the \(k_1\) elements with collisions are mapped into \([\gamma \cdot k_1]\), and so on.

Possible speedup?

One bottleneck seems to be that this needs multiple layers of lookups each at different positions in memory. A possible idea:

  1. Using \(h_0\), map all items into buckets with \(\sim 100\) elements each.
  2. Hash each of those buckets using the original ‘recursive’ BBhash technique into a cacheline of \(256\) bits.

This way, only two lookups should be needed (apart from the rank operation that follows).

I have no idea how the analysis works out though – maybe the recursive strategy works better on a larger scale and when hashing only so few items there isn’t much benefit to the BBhash approach in the first place.

References

Limasset, Antoine, Guillaume Rizk, Rayan Chikhi, and Pierre Peterlongo. 2017. “Fast and Scalable Minimal Perfect Hashing for Massive Key Sets.” arXiv. https://doi.org/10.48550/ARXIV.1702.03154.
Pibiri, Giulio Ermanno, and Roberto Trani. 2021. “Pthash: Revisiting Fch Minimal Perfect Hashing.” Proceedings of the 44th International Acm Sigir Conference on Research and Development in Information Retrieval, July. https://doi.org/10.1145/3404835.3462849.