Overview of static data structures

Table of Contents

1 Classification of static data structures
2 Space lower bounds and practical approaches
3 Summary table

\[ \newcommand{\K}{\mathbb K} \newcommand{\V}{\mathbb V} \newcommand{\c}[1]{\mathbf{\mathsf{#1}}} \]

This post summarizes different types of static data structures:

their definition,
space lower bounds, and
practical approaches.

Surely some relevant methods are still missing. Feel free to suggest more.

This post was written with input from Stefan Hermann and Stefan Walzer.

See the end for a summarizing table.

1 Classification of static data structures Link to heading

We assume that we are given

a function $f: K \to \V$
from a set of $n$ $k$-bit keys $K\subseteq \K = [2^{k}]$
- with the assumption that $|K| \ll |\K|$ (say, $|K| \leq 0.01 |\K|$)
to $v$-bit values $f(K) = V \subseteq \V=[2^{v}]$.

Given $x\in \K$ and $y\in \V$, we consider a few operations one can do in such a setting (Walzer 2020):

$\c{member}(x)$: returns whether $x\in K$.
$\c{lookup}(x)$: returns $f(x)$ when $x\in K$ and $\bot$ otherwise.
$\c{eval}(x)$: returns $f(x)$ when $x\in K$ and an arbitrary value in $\mathbb V$ otherwise.
$\c{update}(x, y)$: updates $f(x) \gets v$.

There are different classes of data structures that support different subsets of these operations:

problem	eval	member	lookup	update	note
minimal perfect hashing	x				$f$ is some bijection from $K$ to $[n]$
monotone MPHF	x				$f(x) = \vert\{x’\in K \mid x’ < x\}\vert$
static retrieval	x
updatable retrieval	x			x
static set		x
static ordered set		x			Also supports returning the smallest key $\geq x$
static dictionary		x	x
updatable dictionary		x	x	x
dynamic dictionary		x	x	x	Also supports inserting/removing keys

Not listed in this table are rank and select, because their input is a typically a bit-array instead.

Terminology:

retrieval: supports $\c{eval}$ and possibly $\c{update}$
static function: somewhat an alias for retrieval
dictionary: supports $\c{lookup}$
static: static keys and values
updatable: static keys but updatable values
dynamic: updatable keys and values
non-keys: values $x’ \in K’ = \K - K$.

2 Space lower bounds and practical approaches Link to heading

In the following, we summarize each problem, list space lower bounds, and give some examples of data structures that can be used. Space usage ignores lower-order terms, and we do not consider construction time. For reference, $\log e =1.44$.

2.1 Rank Link to heading

Given an $n$-bit array of $0$ and $1$, answer queries: how many $1$s are there before position $x$?

Space usage: usually between $1.01n$ and $1.2n$ bits.
Implementations:
- Rank9 (Vigna 2008, 2024)
- sdsl (Gog et al. 2014)
- pasta (Kurpicz 2022)
- poppy-cs (Zhou, Andersen, and Kaminsky 2013)
- qwt/RSWide/RSNarrow (Ceregini, Kurpicz, and Venturini 2024)
- paired blocks (Gottlieb and Reinert 2025)
- btree (Pibiri and Kanda 2021)
- quadrank (wip, post)

2.2 Rank + Select Link to heading

Given an $n$-bit array of $0$ and $1$, answer queries: what is the position of the $i$th $1$?

Space usage: usually between $1.01n$ and $1.2n$ bits.
Implementations:
- simple (Vigna 2008, 2024)
- sdsl (Gog et al. 2014)
- pasta (Kurpicz 2022)
- poppy-cs (Zhou, Andersen, and Kaminsky 2013)
- paired blocks (Gottlieb and Reinert 2025)
- 3* (Kurpicz, Rigi-Luperti, and Sanders 2025)
- btree (Pibiri and Kanda 2021)

2.3 Minimal perfect hash function (MPHF) Link to heading

An MPHF bijectively maps the $n$ keys of $K$ to $[n]$. It is undefined for non-keys.

MPHF’s form a building block for other static data structures listed below.

Space lower bound: $\log e = 1.44$ bits/key
Examples:
- PTHash (Pibiri and Trani 2021), Phobic (Hermann et al. 2024), PtrHash (Groot Koerkamp 2025), Phast (Beling and Sanders 2025): $2$ to $3$ bits/key, $O(1)$ queries mostly via 1 cache miss
- Consensus (Lehmann, Sanders, et al. 2025): as small as $1.45$ bits/key and query time $\log 1/\varepsilon$ or $\log n$ depending on the exact method.
- survey: Lehmann, Mueller, et al. (2025)

TODO 2.4 Monotone MPHF Link to heading

MPHF that maps keys to their index in the sorted list.
Space lower bound: TODO
Examples: TODO

TODO 2.5 Order-preserving MPHF Link to heading

MPHF that maps keys to $[n]$ in the order as given in the input.
Not very interesting, because the space lower bound is $\lg n$ bits per element, and so an MPHF followed by explicit values is near-optimal.

2.6 Static retrieval: Static function with static values Link to heading

Static retrieval corresponds to a static function $f$ storing $f(x)$ for all $x\in K$, that is undefined for all non-keys. Values are static as well.

Space lower bound: $v$ bits/key
Examples:
- 3-way hypergraph peeling: $1.23v$ bits/key, $O(1)$ query via 3 parallel reads (Botelho, Pagh, and Ziviani 2013)
- fuse graphs: $1.12v$ bits/key via more efficient hypergraph peeling, still 3 parallel reads (Dietzfelbinger and Walzer 2019) (sux-rs (Vigna 2024) has an implementation)
- BuRR (bumped ribbon retrieval): $v$ bits/key, $O(v)$ query time ($v$ times an unaligned u64 read) (Dillinger et al. 2022)
- MPHF+array: $v+\log e$ bits/key, $O(1)$ query time via $2$ sequential reads.
- see also $ε$-cost sharding for partitioned construction (Vigna 2025)

2.7 Updatable retrieval: Static function with mutable values Link to heading

Updatable retrieval corresponds to a static function $f$ storing $f(x)$ for all $x\in K$, that is undefined for all non-keys, for which values can be updated, i.e., we can assign $f(x) \gets v’$.

Space lower bound: $v + 0.16$ bits/key (Kuszmaul and Walzer 2024)
- In practice, also $v+\log e$ bits/key for $v\gg 1$
Example:
- MPHF+array: $v+\log e$ bits/key, $O(1)$ query time via $2$ sequential reads.

2.8 Static set (membership) Link to heading

A static set $f$ answers for each $x$ whether it is in $K$ or not.

Space lower bound: $k - \log n + \log e$ bits/key
- Note: when $n$ is large and keys are dense in $K$, effectively one only has to store the distances between adjacent keys, which takes just over $\log (2^k/n) = k - \log n$ bits/key.
Examples:
- standard HashSet (C++ unordered_hashset, Rust HashSet): $1.2k$ to $2.4k$ bits/key; space overhead for $2\times$ growth factor and $\approx 20\%$ empty slots. $O(1)$ queries with probing and mostly 1 cache miss
- cuckoo hashing (Pagh and Rodler 2001; Fernholz and Ramachandran 2007; Cain, Sanders, and Wormald 2007): $(1+\varepsilon)k$ bits/key; $O(1)$ queries with either 2 parallel reads or the second read conditional on the first.
- Elias Fano (EF) encoding: $2+0.06+\log (2^k/n) = 2.06+k - \log n$ bits/key
  - Stores the low $b=\lceil k-\log n\rceil$ bits of each key explicitly, and uses a 1-hot encoding for the number of times a multiple of $2^b$ is crossed between adjacent values. Then uses a rank and/or select structure on the bitvector for index operations that take an additional $0.06$ to $0.4$ bits/key.
  - Answers whether $x$ is in the set by checking if the smallest value $\geq x$ is $x$ itself.
- optimal static dictionary (Hu et al. 2024): $n^\varepsilon$ overhead to lowerbound, $O(1)$ queries

2.9 Static ordered set Link to heading

Compared to a static set, a static ordered set allows successor queries: what is the smallest element in the set $\geq x$?

We assume that all implementations of this somehow store an ordered list of keys, and that returning the first element $\geq x$ implicitly computes an index into this list.

Space lower bound: $k - \log n + \log e$ bits/key (as before)
Examples:
- C++ set (red-black tree): pointer-based; $k+64$ bits/key?
- Rust BTreeSet (B-tree): around $2k$ or so?
- EF with select: $2+0.06+\log (2^k/n) = 2.06+k - \log n$ bits/key.

2.10 Static dictionary: static keys and values Link to heading

A dictionary differs from a function in that it recognizes non-keys $x’$: $f(x’) = \bot$ is guaranteed, whereas $f(x)\in \V$ is the usual value. In a static dictionary, both keys and values are static.

Space lower bound: $k-\log n + v + \log e$ bits/key
Examples:
- static set + static retrieval: $k-\log n+\log e + v$ bits/key ($\log e$ away)
- static ordered set + array: $k - \log n + \log e + v$ bits/key ($\log e$ away)
  - Retrieve the sorted position via the ordered set, and use that to index an array.
- static ordered set on $(k,v)$ pairs: $k-\log n + \log e + v$ bits/key ($\log e$ away)
- MPHF + key-value array: $\log e + k + v$ bits/key ($\log n$ away).
- cuckoo hashing: $(1+\varepsilon)(k+v)$ bits/key
- optimal static dictionary (Hu et al. 2024): $n^\varepsilon$ overhead to lowerbound, $O(1)$ queries

2.11 Updatable dictionary with mutable values Link to heading

In an updatable dictionary, values can be updated.

In practice, all known static dictionaries associate a unique memory location to each value (unlike e.g. static retrieval structures). Thus, we can trivially update the value as needed, and all static dictionary methods also work as an updatable dictionary.

Space lower bound: $k-\log n + v + \log e$ bits/key
- (This equals the static case, because the dictionary contains all needed information to build a modified version from scratch on each update.)

TODO 2.12 Dynamic dictionary with mutable keys and values Link to heading

In a dynamic dictionary, keys are mutable and can thus be added and removed.

Space lower bound: ?
Examples:
- C++ absl::flat_hash_map, Rust HashMap: $1.2(k+v)$ to $2.4(k+v)$ bits/key with $2\times$ growth factor

TODO 2.13 Static filter Link to heading

A static set with false-positive ratio $2^{-p}$.

Space lower bound: $p$ bits/key?
Examples:
- bloom filter (Bloom 1970)
- cuckoo filter (Fan et al. 2014)
- xor filter (Graf and Lemire 2020)
- binary fuse filters (Graf and Lemire 2022): $1.12p$ bits/key
- blocked bloom filters (Putze, Sanders, and Singler 2009)
- many more …

TODO 2.14 Ordered static/updatable/dynamic dictionary? Link to heading

Like a normal dictionary, but with successor queries.

Space lower bound: ?
Examples:
- C++ map
- Rust BTreeMap

3 Summary table Link to heading

Table 1: Summary of static data structures, their space lower bounds, and practical solutions with their space usage. Double starred methods (**) are within $o(1)$ bit/key from optimal. Single starred methods (*) are within $\Theta(1)$ bit/key from optimal.

data structure/method	member	values	space (bits/key)	query
MPHF	no	-	$\geq \log e$
- Consensus*			$\log e + 0.001$	$O(1/\varepsilon)$, $O(\log n)$
- PTHash/PHOBIC/PtrHash/Phast			$2$ to $3$	$O(1)$

static retrieval	no	static	$\geq v$
- BuRR**			$v$	$O(v)$
- MPHF + array*			$v+\log e$	$O(1)$
- hypergraph peeling			$1.23v$	$O(1)$

updatable retrieval	no	mutable	$\geq v+\log e$ (for $v\gg 1$)
- MPHF + array*			$v+\log e$

static set	yes	-	$\geq k - \log n + \log e$
- Elias Fano*			$k - \log n + 2$
- cuckoo hashing			$(1+\varepsilon)k$	$O(1)$
- hashset			$1.2k$ to $2.4k$	$O(1)$
- optimal static dictionary			$k - \log n + O(n^{\varepsilon-1})$	$O(1)$

static ordered set	yes	-	$\geq k - \log n + \log e$
- Elias Fano*			$k - \log n + 2$

static dictionary	yes	static	$\geq k-\log n + v + \log e$
updatable dictionary	yes	mutable	$\geq k-\log n + v + \log e$
- static set + inline values**			$k-\log n + v + \log e$
- ordered set on $(k,v)$**			$k-\log n + v + \log e$
- MPHF + kv-array			$k + v + \log e$	$O(1)$
- cuckoo hashing			$(1+\varepsilon)(k+v)$	$O(1)$
- optimal static dictionary			$k - \log n + v + O(n^{\varepsilon-1})$	$O(1)$

standard data structures
- `HashSet`	yes	-	$1.2k$ to $2.4k$	$O(1)$
- `HashMap`	yes	mutable	$1.2(k+v)$ to $2.4(k+v)$	$O(1)$
- `BTreeSet`	yes	-	$\approx 2k$?	$O(B\lg n)$
- `BTreeMap`	yes	mutable	$\approx 2(k+v)$?	$O(B\lg n)$
- static hashset	yes	-	$\approx 1.2k$	$O(1)$
- static hashmap	yes	mutable	$\approx 1.2(k+v)$	$O(1)$
- static btree set	yes	-	$(1+1/(B-1))k$	$O(B\lg n)$
- static btree map	yes	mutable	$(1+1/(B-1))(k+v)$	$O(B\lg n)$

References Link to heading

Beling, Piotr, and Peter Sanders. 2025. “Phast – Perfect Hashing with Fast Evaluation.” arXiv; arXiv. https://doi.org/10.48550/ARXIV.2504.17918.

Bloom, Burton H. 1970. “Space/Time Trade-Offs in Hash Coding with Allowable Errors.” Communications of the Acm 13 (7): 422–26. https://doi.org/10.1145/362686.362692.

Botelho, Fabiano C., Rasmus Pagh, and Nivio Ziviani. 2013. “Practical Perfect Hashing in Nearly Optimal Space.” Information Systems 38 (1): 108–31. https://doi.org/10.1016/j.is.2012.06.002.

Cain, Julie Anne, Peter Sanders, and Nick Wormald. 2007. “The Random Graph Threshold for K-Orientiability and a Fast Algorithm for Optimal Multiple-Choice Allocation.” In Proceedings of the Eighteenth Annual Acm-Siam Symposium on Discrete Algorithms, 469–76. Soda ’07. New Orleans, Louisiana: Society for Industrial and Applied Mathematics.

Ceregini, Matteo, Florian Kurpicz, and Rossano Venturini. 2024. “Faster Wavelet Tree Queries.” In 2024 Data Compression Conference (DCC). IEEE. https://doi.org/10.1109/dcc58796.2024.00030.

Dietzfelbinger, Martin, and Stefan Walzer. 2019. “Dense Peelable Random Uniform Hypergraphs.” In Lipics, Volume 144, Esa 2019, 144:38:1–38:16. Schloss Dagstuhl – Leibniz-Zentrum für Informatik. https://doi.org/10.4230/LIPICS.ESA.2019.38.

Dillinger, Peter C., Lorenz Hübschle-Schneider, Peter Sanders, and Stefan Walzer. 2022. “Fast Succinct Retrieval and Approximate Membership Using Ribbon.” In Lipics, Volume 233, Sea 2022, 233:4:1–4:20. Schloss Dagstuhl – Leibniz-Zentrum für Informatik. https://doi.org/10.4230/LIPICS.SEA.2022.4.

Fan, Bin, Dave G. Andersen, Michael Kaminsky, and Michael D. Mitzenmacher. 2014. “Cuckoo Filter.” Proceedings of the 10th Acm International on Conference on Emerging Networking Experiments and Technologies, December. https://doi.org/10.1145/2674005.2674994.

Fernholz, Daniel, and Vijaya Ramachandran. 2007. “The $k$-Orientability Thresholds for $g_n,p$.” In Proceedings of the Eighteenth Annual Acm-Siam Symposium on Discrete Algorithms, 459–68. Soda ’07. New Orleans, Louisiana: Society for Industrial and Applied Mathematics.

Gog, Simon, Timo Beller, Alistair Moffat, and Matthias Petri. 2014. “From Theory to Practice: Plug and Play with Succinct Data Structures.” In SEA 2014, 326–37. Springer International Publishing. https://doi.org/10.1007/978-3-319-07959-2_28.

Gottlieb, Simon Gene, and Knut Reinert. 2025. “Engineering Rank Queries on Bit Vectors and Strings.” Algorithms for Molecular Biology 20 (1). https://doi.org/10.1186/s13015-025-00291-9.

Graf, Thomas Mueller, and Daniel Lemire. 2020. “Xor Filters.” Acm Journal of Experimental Algorithmics 25 (March): 1–16. https://doi.org/10.1145/3376122.

———. 2022. “Binary Fuse Filters: Fast and Smaller than Xor Filters.” Acm Journal of Experimental Algorithmics 27 (March): 1–15. https://doi.org/10.1145/3510449.

Groot Koerkamp, Ragnar. 2025. “PtrHash: Minimal Perfect Hashing at RAM Throughput.” In SEA 2025, 338:21:1–21:21. Lipics. https://doi.org/10.4230/LIPIcs.SEA.2025.21.

Hermann, Stefan, Hans-Peter Lehmann, Giulio Ermanno Pibiri, Peter Sanders, and Stefan Walzer. 2024. “PHOBIC: Perfect Hashing with Optimized Bucket Sizes and Interleaved Coding.” In Esa. https://doi.org/10.4230/LIPICS.ESA.2024.69.

Hu, Yang, Jingxun Liang, Huacheng Yu, Junkai Zhang, and Renfei Zhou. 2024. “Optimal Static Dictionary with Worst-Case Constant Query Time.” arXiv. https://doi.org/10.48550/ARXIV.2412.10655.

Kurpicz, Florian. 2022. “Engineering Compact Data Structures for Rank and Select Queries on Bit Vectors.” In SPIRE 2022, 257–72. Springer International Publishing. https://doi.org/10.1007/978-3-031-20643-6_19.

Kurpicz, Florian, Niccolò Rigi-Luperti, and Peter Sanders. 2025. “Theory Meets Practice for Bit Vectors Supporting Rank and Select.” arXiv. https://doi.org/10.48550/ARXIV.2509.17819.

Kuszmaul, William, and Stefan Walzer. 2024. “Space Lower Bounds for Dynamic Filters and Value-Dynamic Retrieval.” In Proceedings of the 56th Annual Acm Symposium on Theory of Computing, 1153–64. Stoc ’24. ACM. https://doi.org/10.1145/3618260.3649649.

Lehmann, Hans-Peter, Thomas Mueller, Rasmus Pagh, Giulio Ermanno Pibiri, Peter Sanders, Sebastiano Vigna, and Stefan Walzer. 2025. “Modern Minimal Perfect Hashing: A Survey.” arXiv. https://doi.org/10.48550/ARXIV.2506.06536.

Lehmann, Hans-Peter, Peter Sanders, Stefan Walzer, and Jonatan Ziegler. 2025. “Combined Search and Encoding for Seeds, with an Application to Minimal Perfect Hashing.” arXiv; arXiv. https://doi.org/10.48550/ARXIV.2502.05613.

Pagh, Rasmus, and Flemming Friche Rodler. 2001. “Cuckoo Hashing.” In Algorithms — Esa 2001, 121–33. Springer Berlin Heidelberg. https://doi.org/10.1007/3-540-44676-1_10.

Pibiri, Giulio Ermanno, and Shunsuke Kanda. 2021. “Rank/Select Queries over Mutable Bitmaps.” Information Systems 99 (July): 101756. https://doi.org/10.1016/j.is.2021.101756.

Pibiri, Giulio Ermanno, and Roberto Trani. 2021. “PTHash: Revisiting FCH Minimal Perfect Hashing.” Proceedings of the 44th International Acm Sigir Conference on Research and Development in Information Retrieval, July. https://doi.org/10.1145/3404835.3462849.

Putze, Felix, Peter Sanders, and Johannes Singler. 2009. “Cache-, Hash-, and Space-Efficient Bloom Filters.” Acm Journal of Experimental Algorithmics 14 (December). https://doi.org/10.1145/1498698.1594230.

Vigna, Sebastiano. 2008. “Broadword Implementation of Rank/Select Queries.” In WEA 2008, edited by Catherine C. McGeoch, 154–68. Berlin, Heidelberg: Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-540-68552-4_12.

———. 2024. “sux-rs.” https://github.com/vigna/sux-rs.

———. 2025. “$\varepsilon$-Cost Sharding: Scaling Hypergraph-Based Static Functions and Filters to Trillions of Keys.” arXiv. https://doi.org/10.48550/ARXIV.2503.18397.

Walzer, Stefan. 2020. “Random Hypergraphs for Hashing-Based Data Structures.” Ilmenau. https://www.db-thueringen.de/receive/dbt_mods_00047127.

Zhou, Dong, David G. Andersen, and Michael Kaminsky. 2013. “Space-Efficient, High-Performance Rank and Select Structures on Uncompressed Bit Sequences.” In SEA 2013, 151–63. Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-642-38527-8_15.

problem	eval	member	lookup	update	note
minimal perfect hashing	x				\(f\) is some bijection from \(K\) to \([n]\)
monotone MPHF	x				\(f(x) = \vert\{x’\in K \mid x’ < x\}\vert\)
static retrieval	x
updatable retrieval	x			x
static set		x
static ordered set		x			Also supports returning the smallest key \(\geq x\)
static dictionary		x	x
updatable dictionary		x	x	x
dynamic dictionary		x	x	x	Also supports inserting/removing keys

data structure/method	member	values	space (bits/key)	query
MPHF	no	-	\(\geq \log e\)
- Consensus*			\(\log e + 0.001\)	\(O(1/\varepsilon)\), \(O(\log n)\)
- PTHash/PHOBIC/PtrHash/Phast			\(2\) to \(3\)	\(O(1)\)

static retrieval	no	static	\(\geq v\)
- BuRR**			\(v\)	\(O(v)\)
- MPHF + array*			\(v+\log e\)	\(O(1)\)
- hypergraph peeling			\(1.23v\)	\(O(1)\)

updatable retrieval	no	mutable	\(\geq v+\log e\) (for \(v\gg 1\))
- MPHF + array*			\(v+\log e\)

static set	yes	-	\(\geq k - \log n + \log e\)
- Elias Fano*			\(k - \log n + 2\)
- cuckoo hashing			\((1+\varepsilon)k\)	\(O(1)\)
- hashset			\(1.2k\) to \(2.4k\)	\(O(1)\)
- optimal static dictionary			\(k - \log n + O(n^{\varepsilon-1})\)	\(O(1)\)

static ordered set	yes	-	\(\geq k - \log n + \log e\)
- Elias Fano*			\(k - \log n + 2\)

static dictionary	yes	static	\(\geq k-\log n + v + \log e\)
updatable dictionary	yes	mutable	\(\geq k-\log n + v + \log e\)
- static set + inline values**			\(k-\log n + v + \log e\)
- ordered set on $(k,v)$**			\(k-\log n + v + \log e\)
- MPHF + kv-array			\(k + v + \log e\)	\(O(1)\)
- cuckoo hashing			\((1+\varepsilon)(k+v)\)	\(O(1)\)
- optimal static dictionary			\(k - \log n + v + O(n^{\varepsilon-1})\)	\(O(1)\)

standard data structures
- `HashSet`	yes	-	\(1.2k\) to \(2.4k\)	\(O(1)\)
- `HashMap`	yes	mutable	\(1.2(k+v)\) to \(2.4(k+v)\)	\(O(1)\)
- `BTreeSet`	yes	-	\(\approx 2k\)?	\(O(B\lg n)\)
- `BTreeMap`	yes	mutable	\(\approx 2(k+v)\)?	\(O(B\lg n)\)
- static hashset	yes	-	\(\approx 1.2k\)	\(O(1)\)
- static hashmap	yes	mutable	\(\approx 1.2(k+v)\)	\(O(1)\)
- static btree set	yes	-	\((1+1/(B-1))k\)	\(O(B\lg n)\)
- static btree map	yes	mutable	\((1+1/(B-1))(k+v)\)	\(O(B\lg n)\)