Table of Contents

In this post, we’ll have a look at developing a fast static hash set of 32 bit elements. Specifically, we want high throughput: given a list of integers to be queried, we want to answer all those queries together in as little time as possible.

Problem statement: Given a fixed set of 32 bit integers, build a data structure that can answer contains queries with maximal throughput.

More precisely, we want some data structure that can answer whether an element is in

Table 1: u32 keys. hb: hashbrown implementation of ABSL flat hash map. me: my variant. +pf: with prefetching. Pos%: fraction of positive queries.
pos%hbhbme+pfme+pf
n117M118M117M118M
size+43%+184%+50%+50%
029.1314.3323.2512.3323.4812.47
129.5714.7223.2412.4023.5112.55
1032.0816.5623.3412.3923.5812.50
5030.9521.4723.3412.3023.5112.48
9024.0220.8023.4012.3923.6812.53
9922.7920.3723.3212.4523.6712.51
10022.3020.2923.4412.4423.7312.54
Table 2: 100M u32 keys
pos%hbEFme+pf
size/input1.6780.2401.401
018.9296.1922.9015.47
118.86101.1722.9315.35
1020.87105.4823.1315.43
5023.94107.5623.5215.39
9021.9597.6523.7115.97
9920.9593.6424.1516.49
10021.4692.9324.4616.58

1 libs Link to heading

TODO 2 Link to heading

  • partitioning by ~1k at a time
  • given 2^k parts, do not store top 2^k bits of key
    • 52-bit keys with 2^20 parts -> 32bit values in the map