Static Hash Sets using $k$-Perfect Hashing

Table of Contents

1 libs
2 TODO

In this post, we’ll have a look at developing a fast static hash set of 32 bit elements. Specifically, we want high throughput: given a list of integers to be queried, we want to answer all those queries together in as little time as possible.

Problem statement: Given a fixed set of 32 bit integers, build a data structure that can answer contains queries with maximal throughput.

More precisely, we want some data structure that can answer whether an element is in

Table 1: u32 keys. hb: hashbrown implementation of ABSL flat hash map. me: my variant. +pf: with prefetching. Pos%: fraction of positive queries.

pos%	hb	hb	me	+pf	me	+pf
n	117M	118M	117M		118M
size	+43%	+184%	+50%		+50%
0	29.13	14.33	23.25	12.33	23.48	12.47
1	29.57	14.72	23.24	12.40	23.51	12.55
10	32.08	16.56	23.34	12.39	23.58	12.50
50	30.95	21.47	23.34	12.30	23.51	12.48
90	24.02	20.80	23.40	12.39	23.68	12.53
99	22.79	20.37	23.32	12.45	23.67	12.51
100	22.30	20.29	23.44	12.44	23.73	12.54

Table 2: 100M u32 keys

pos%	hb	EF	me	+pf
size/input	1.678	0.240	1.401
0	18.92	96.19	22.90	15.47
1	18.86	101.17	22.93	15.35
10	20.87	105.48	23.13	15.43
50	23.94	107.56	23.52	15.39
90	21.95	97.65	23.71	15.97
99	20.95	93.64	24.15	16.49
100	21.46	92.93	24.46	16.58

1 libs Link to heading

cuckoo filter: https://github.com/efficient/cuckoofilter

TODO 2 Link to heading

partitioning by ~1k at a time
given 2^k parts, do not store top 2^k bits of key
- 52-bit keys with 2^20 parts -> 32bit values in the map