- 1 Classification of static data structures
- 2 Space lower bounds and practical approaches
- 2.1 Rank
- 2.2 Rank + Select
- 2.3 Minimal perfect hash function (MPHF)
- 2.4 TODO Order-preserving MPHF
- 2.5 Static retrieval: Static function with static values
- 2.6 Updatable retrieval: Static function with mutable values
- 2.7 Static set (membership)
- 2.8 Static ordered set
- 2.9 Static dictionary: static keys and values
- 2.10 Updatable dictionary with mutable values
- 2.11 TODO Dynamic dictionary with mutable keys and values
- 2.12 TODO Static filter
- 2.13 TODO Ordered static/updatable/dynamic dictionary?
- 3 Summary table
\[ \newcommand{\K}{\mathbb K} \newcommand{\V}{\mathbb V} \newcommand{\c}[1]{\mathbf{\mathsf{#1}}} \]
This post summarizes different types of static data structures:
- their definition,
- space lower bounds, and
- practical approaches.
Surely some relevant methods are still missing. Feel free to suggest more.
This post was written with input from Stefan Hermann and Stefan Walzer.
See the end for a summarizing table.
1 Classification of static data structures Link to heading
We assume that we are given
- a function \(f: K \to \V\)
- from a set of \(n\) $k$-bit keys \(K\subseteq \K = [2^{k}]\)
- with the assumption that \(|K| \ll |\K|\) (say, \(|K| \leq 0.01 |\K|\))
- to $v$-bit values \(f(K) = V \subseteq \V=[2^{v}]\).
Given \(x\in \K\) and \(y\in \V\), we consider a few operations one can do in such a setting (Walzer 2020):
- \(\c{member}(x)\): returns whether \(x\in K\).
- \(\c{lookup}(x)\): returns \(f(x)\) when \(x\in K\) and \(\bot\) otherwise.
- \(\c{eval}(x)\): returns \(f(x)\) when \(x\in K\) and an arbitrary value in \(\mathbb V\) otherwise.
- \(\c{update}(x, y)\): updates \(f(x) \gets v\).
There are different classes of data structures that support different subsets of these operations:
| problem | eval | member | lookup | update | note |
|---|---|---|---|---|---|
| minimal perfect hashing | x | \(f\) is some bijection from \(K\) to \([n]\) | |||
| order-preserving (monotone) MPHF | x | \(f(x) = \vert\{x’\in K \mid x’ < x\}\vert\) | |||
| static retrieval | x | ||||
| updatable retrieval | x | x | |||
| static set | x | ||||
| static ordered set | x | Also supports returning the smallest key \(\geq x\) | |||
| static dictionary | x | x | |||
| updatable dictionary | x | x | x | ||
| dynamic dictionary | x | x | x | Also supports inserting/removing keys |
Not listed in this table are rank and select, because their input is a typically a bit-array instead.
Terminology:
- retrieval: supports \(\c{eval}\) and possibly \(\c{update}\)
- static function: somewhat an alias for retrieval
- dictionary: supports \(\c{lookup}\)
- static: static keys and values
- updatable: static keys but updatable values
- dynamic: updatable keys and values
- non-keys: values \(x’ \in K’ = \K - K\).
2 Space lower bounds and practical approaches Link to heading
In the following, we summarize each problem, list space lower bounds, and give some examples of data structures that can be used. Space usage ignores lower-order terms, and we do not consider construction time. For reference, \(\log e =1.44\).
2.1 Rank Link to heading
Given an $n$-bit array of \(0\) and \(1\), answer queries: how many $1$s are there before position \(x\)?
- Space usage: usually between \(1.01n\) and \(1.2n\) bits.
- Implementations:
- Rank9 (Vigna 2008, 2024)
- sdsl (Gog et al. 2014)
- pasta (Kurpicz 2022)
- poppy-cs (Zhou, Andersen, and Kaminsky 2013)
- qwt/RSWide/RSNarrow (Matteo Ceregini 2023)
- paired blocks (Gottlieb and Reinert 2025)
- btree (Pibiri and Kanda 2021)
- quadrank (wip, post)
2.2 Rank + Select Link to heading
Given an $n$-bit array of \(0\) and \(1\), answer queries: what is the position of the $i$th \(1\)?
- Space usage: usually between \(1.01n\) and \(1.2n\) bits.
- Implementations:
- simple (Vigna 2008, 2024)
- sdsl (Gog et al. 2014)
- pasta (Kurpicz 2022)
- poppy-cs (Zhou, Andersen, and Kaminsky 2013)
- paired blocks (Gottlieb and Reinert 2025)
- 3* (Kurpicz, Rigi-Luperti, and Sanders 2025)
- btree (Pibiri and Kanda 2021)
2.3 Minimal perfect hash function (MPHF) Link to heading
An MPHF bijectively maps the \(n\) keys of \(K\) to \([n]\). It is undefined for non-keys.
MPHF’s form a building block for other static data structures listed below.
- Space lower bound: \(\log e = 1.44\) bits/key
- Examples:
- PTHash (Pibiri and Trani 2021), Phobic (Hermann et al. 2024), PtrHash (Groot Koerkamp 2025), Phast (Beling and Sanders 2025): \(2\) to \(3\) bits/key, \(O(1)\) queries mostly via 1 cache miss
- Consensus (Lehmann, Sanders, et al. 2025): as small as \(1.45\) bits/key and query time \(\log 1/\varepsilon\) or \(\log n\) depending on the exact method.
- survey: Lehmann, Mueller, et al. (2025)
TODO 2.4 Order-preserving MPHF Link to heading
- Space lower bound: TODO
- Examples: TODO
2.5 Static retrieval: Static function with static values Link to heading
Static retrieval corresponds to a static function \(f\) storing \(f(x)\) for all \(x\in K\), that is undefined for all non-keys. Values are static as well.
- Space lower bound: \(v\) bits/key
- Examples:
- 3-way hypergraph peeling: \(1.23v\) bits/key, \(O(1)\) query via 3 parallel reads (Botelho, Pagh, and Ziviani 2013)
- fuse graphs: \(1.12v\) bits/key via more efficient hypergraph peeling, still 3 parallel reads (Dietzfelbinger and Walzer 2019) (sux-rs (Vigna 2024) has an implementation)
- BuRR (bumped ribbon retrieval): \(v\) bits/key, \(O(v)\) query time (\(v\) times
an unaligned
u64read) (Dillinger et al. 2022) - MPHF+array: \(v+\log e\) bits/key, \(O(1)\) query time via \(2\) sequential reads.
- see also $ε$-cost sharding for partitioned construction (Vigna 2025)
2.6 Updatable retrieval: Static function with mutable values Link to heading
Updatable retrieval corresponds to a static function \(f\) storing \(f(x)\) for all \(x\in K\), that is undefined for all non-keys, for which values can be updated, i.e., we can assign \(f(x) \gets v’\).
- Space lower bound: \(v + 0.16\) bits/key (Kuszmaul and Walzer 2024)
- In practice, also \(v+\log e\) bits/key for \(v\gg 1\)
- Example:
- MPHF+array: \(v+\log e\) bits/key, \(O(1)\) query time via \(2\) sequential reads.
2.7 Static set (membership) Link to heading
A static set \(f\) answers for each \(x\) whether it is in \(K\) or not.
- Space lower bound: \(k - \log n + \log e\) bits/key
- Note: when \(n\) is large and keys are dense in \(K\), effectively one only has to store the distances between adjacent keys, which takes just over \(\log (2^k/n) = k - \log n\) bits/key.
- Examples:
- standard HashSet (C++
unordered_hashset, RustHashSet): \(1.2k\) to \(2.4k\) bits/key; space overhead for \(2\times\) growth factor and \(\approx 20\%\) empty slots. \(O(1)\) queries with probing and mostly 1 cache miss - cuckoo hashing (Pagh and Rodler 2001; Fernholz and Ramachandran 2007; Cain, Sanders, and Wormald 2007): \((1+\varepsilon)k\) bits/key; \(O(1)\) queries with either 2 parallel reads or the second read conditional on the first.
- Elias Fano (EF) encoding: \(2+0.06+\log (2^k/n) = 2.06+k - \log n\) bits/key
- Stores the low \(b=\lceil k-\log n\rceil\) bits of each key explicitly, and uses a 1-hot encoding for the number of times a multiple of \(2^b\) is crossed between adjacent values. Then uses a rank and/or select structure on the bitvector for index operations that take an additional \(0.06\) to \(0.4\) bits/key.
- Answers whether \(x\) is in the set by checking if the smallest value \(\geq x\) is \(x\) itself.
- optimal static dictionary (Hu et al. 2024): \(n^\varepsilon\) overhead to lowerbound, \(O(1)\) queries
- standard HashSet (C++
2.8 Static ordered set Link to heading
Compared to a static set, a static ordered set allows successor queries: what is the smallest element in the set \(\geq x\)?
We assume that all implementations of this somehow store an ordered list of keys, and that returning the first element \(\geq x\) implicitly computes an index into this list.
- Space lower bound: \(k - \log n + \log e\) bits/key (as before)
- Examples:
- C++
set(red-black tree): pointer-based; \(k+64\) bits/key? - Rust
BTreeSet(B-tree): around \(2k\) or so? - EF with select: \(2+0.06+\log (2^k/n) = 2.06+k - \log n\) bits/key.
- C++
2.9 Static dictionary: static keys and values Link to heading
A dictionary differs from a function in that it recognizes non-keys \(x’\): \(f(x’) = \bot\) is guaranteed, whereas \(f(x)\in \V\) is the usual value. In a static dictionary, both keys and values are static.
- Space lower bound: \(k-\log n + v + \log e\) bits/key
- Examples:
- static set + static retrieval: \(k-\log n+\log e + v\) bits/key (\(\log e\) away)
- static ordered set + array: \(k - \log n + \log e + v\) bits/key (\(\log e\) away)
- Retrieve the sorted position via the ordered set, and use that to index an array.
- static ordered set on \((k,v)\) pairs: \(k-\log n + \log e + v\) bits/key (\(\log e\) away)
- MPHF + key-value array: \(\log e + k + v\) bits/key (\(\log n\) away).
- cuckoo hashing: \((1+\varepsilon)(k+v)\) bits/key
- optimal static dictionary (Hu et al. 2024): \(n^\varepsilon\) overhead to lowerbound, \(O(1)\) queries
2.10 Updatable dictionary with mutable values Link to heading
In an updatable dictionary, values can be updated.
In practice, all known static dictionaries associate a unique memory location to each value (unlike e.g. static retrieval structures). Thus, we can trivially update the value as needed, and all static dictionary methods also work as an updatable dictionary.
- Space lower bound: \(k-\log n + v + \log e\) bits/key
- (This equals the static case, because the dictionary contains all needed information to build a modified version from scratch on each update.)
TODO 2.11 Dynamic dictionary with mutable keys and values Link to heading
In a dynamic dictionary, keys are mutable and can thus be added and removed.
- Space lower bound: ?
- Examples:
- C++
absl::flat_hash_map, RustHashMap: \(1.2(k+v)\) to \(2.4(k+v)\) bits/key with \(2\times\) growth factor
- C++
TODO 2.12 Static filter Link to heading
A static set with false-positive ratio \(2^{-p}\).
- Space lower bound: \(p\) bits/key?
- Examples:
- bloom filter (Bloom 1970)
- cuckoo filter (Fan et al. 2014)
- xor filter (Graf and Lemire 2020)
- binary fuse filters (Graf and Lemire 2022): \(1.12p\) bits/key
- blocked bloom filters (Putze, Sanders, and Singler 2009)
- many more …
TODO 2.13 Ordered static/updatable/dynamic dictionary? Link to heading
Like a normal dictionary, but with successor queries.
- Space lower bound: ?
- Examples:
- C++
map - Rust
BTreeMap
- C++
3 Summary table Link to heading
| data structure/method | member | values | space (bits/key) | query |
|---|---|---|---|---|
| MPHF | no | - | \(\geq \log e\) | |
| - Consensus* | \(\log e + 0.001\) | \(O(1/\varepsilon)\), \(O(\log n)\) | ||
| - PTHash/PHOBIC/PtrHash/Phast | \(2\) to \(3\) | \(O(1)\) | ||
| static retrieval | no | static | \(\geq v\) | |
| - BuRR** | \(v\) | \(O(v)\) | ||
| - MPHF + array* | \(v+\log e\) | \(O(1)\) | ||
| - hypergraph peeling | \(1.23v\) | \(O(1)\) | ||
| updatable retrieval | no | mutable | \(\geq v+\log e\) (for \(v\gg 1\)) | |
| - MPHF + array* | \(v+\log e\) | |||
| static set | yes | - | \(\geq k - \log n + \log e\) | |
| - Elias Fano* | \(k - \log n + 2\) | |||
| - cuckoo hashing | \((1+\varepsilon)k\) | \(O(1)\) | ||
| - hashset | \(1.2k\) to \(2.4k\) | \(O(1)\) | ||
| - optimal static dictionary | \(k - \log n + O(n^{\varepsilon-1})\) | \(O(1)\) | ||
| static ordered set | yes | - | \(\geq k - \log n + \log e\) | |
| - Elias Fano* | \(k - \log n + 2\) | |||
| static dictionary | yes | static | \(\geq k-\log n + v + \log e\) | |
| updatable dictionary | yes | mutable | \(\geq k-\log n + v + \log e\) | |
| - static set + inline values** | \(k-\log n + v + \log e\) | |||
| - ordered set on $(k,v)$** | \(k-\log n + v + \log e\) | |||
| - MPHF + kv-array | \(k + v + \log e\) | \(O(1)\) | ||
| - cuckoo hashing | \((1+\varepsilon)(k+v)\) | \(O(1)\) | ||
| - optimal static dictionary | \(k - \log n + v + O(n^{\varepsilon-1})\) | \(O(1)\) | ||
| standard data structures | ||||
- HashSet | yes | - | \(1.2k\) to \(2.4k\) | \(O(1)\) |
- HashMap | yes | mutable | \(1.2(k+v)\) to \(2.4(k+v)\) | \(O(1)\) |
- BTreeSet | yes | - | \(\approx 2k\)? | \(O(B\lg n)\) |
- BTreeMap | yes | mutable | \(\approx 2(k+v)\)? | \(O(B\lg n)\) |
| - static hashset | yes | - | \(\approx 1.2k\) | \(O(1)\) |
| - static hashmap | yes | mutable | \(\approx 1.2(k+v)\) | \(O(1)\) |
| - static btree set | yes | - | \((1+1/(B-1))k\) | \(O(B\lg n)\) |
| - static btree map | yes | mutable | \((1+1/(B-1))(k+v)\) | \(O(B\lg n)\) |