\[ \newcommand{\K}{\mathbb K} \newcommand{\V}{\mathbb V} \newcommand{\c}[1]{\mathbf{\mathsf{#1}}} \]

This post summarizes different types of static data structures:

  • their definition,
  • space lower bounds, and
  • practical approaches.

Surely some relevant methods are still missing. Feel free to suggest more.

This post was written with input from Stefan Hermann and Stefan Walzer.

See the end for a summarizing table.

1 Classification of static data structures Link to heading

We assume that we are given

  • a function \(f: K \to \V\)
  • from a set of \(n\) $k$-bit keys \(K\subseteq \K = [2^{k}]\)
    • with the assumption that \(|K| \ll |\K|\) (say, \(|K| \leq 0.01 |\K|\))
  • to $v$-bit values \(f(K) = V \subseteq \V=[2^{v}]\).

Given \(x\in \K\) and \(y\in \V\), we consider a few operations one can do in such a setting (Walzer 2020):

  • \(\c{member}(x)\): returns whether \(x\in K\).
  • \(\c{lookup}(x)\): returns \(f(x)\) when \(x\in K\) and \(\bot\) otherwise.
  • \(\c{eval}(x)\): returns \(f(x)\) when \(x\in K\) and an arbitrary value in \(\mathbb V\) otherwise.
  • \(\c{update}(x, y)\): updates \(f(x) \gets v\).

There are different classes of data structures that support different subsets of these operations:

problemevalmemberlookupupdatenote
minimal perfect hashingx\(f\) is some bijection from \(K\) to \([n]\)
order-preserving (monotone) MPHFx\(f(x) = \vert\{x’\in K \mid x’ < x\}\vert\)
static retrievalx
updatable retrievalxx
static setx
static ordered setxAlso supports returning the smallest key \(\geq x\)
static dictionaryxx
updatable dictionaryxxx
dynamic dictionaryxxxAlso supports inserting/removing keys

Not listed in this table are rank and select, because their input is a typically a bit-array instead.

Terminology:

  • retrieval: supports \(\c{eval}\) and possibly \(\c{update}\)
  • static function: somewhat an alias for retrieval
  • dictionary: supports \(\c{lookup}\)
  • static: static keys and values
  • updatable: static keys but updatable values
  • dynamic: updatable keys and values
  • non-keys: values \(x’ \in K’ = \K - K\).

2 Space lower bounds and practical approaches Link to heading

In the following, we summarize each problem, list space lower bounds, and give some examples of data structures that can be used. Space usage ignores lower-order terms, and we do not consider construction time. For reference, \(\log e =1.44\).

2.1 Rank Link to heading

Given an $n$-bit array of \(0\) and \(1\), answer queries: how many $1$s are there before position \(x\)?

2.2 Rank + Select Link to heading

Given an $n$-bit array of \(0\) and \(1\), answer queries: what is the position of the $i$th \(1\)?

2.3 Minimal perfect hash function (MPHF) Link to heading

An MPHF bijectively maps the \(n\) keys of \(K\) to \([n]\). It is undefined for non-keys.

MPHF’s form a building block for other static data structures listed below.

TODO 2.4 Order-preserving MPHF Link to heading

  • Space lower bound: TODO
  • Examples: TODO

2.5 Static retrieval: Static function with static values Link to heading

Static retrieval corresponds to a static function \(f\) storing \(f(x)\) for all \(x\in K\), that is undefined for all non-keys. Values are static as well.

  • Space lower bound: \(v\) bits/key
  • Examples:
    • 3-way hypergraph peeling: \(1.23v\) bits/key, \(O(1)\) query via 3 parallel reads (Botelho, Pagh, and Ziviani 2013)
    • fuse graphs: \(1.12v\) bits/key via more efficient hypergraph peeling, still 3 parallel reads (Dietzfelbinger and Walzer 2019) (sux-rs (Vigna 2024) has an implementation)
    • BuRR (bumped ribbon retrieval): \(v\) bits/key, \(O(v)\) query time (\(v\) times an unaligned u64 read) (Dillinger et al. 2022)
    • MPHF+array: \(v+\log e\) bits/key, \(O(1)\) query time via \(2\) sequential reads.
    • see also $ε$-cost sharding for partitioned construction (Vigna 2025)

2.6 Updatable retrieval: Static function with mutable values Link to heading

Updatable retrieval corresponds to a static function \(f\) storing \(f(x)\) for all \(x\in K\), that is undefined for all non-keys, for which values can be updated, i.e., we can assign \(f(x) \gets v’\).

  • Space lower bound: \(v + 0.16\) bits/key (Kuszmaul and Walzer 2024)
    • In practice, also \(v+\log e\) bits/key for \(v\gg 1\)
  • Example:
    • MPHF+array: \(v+\log e\) bits/key, \(O(1)\) query time via \(2\) sequential reads.

2.7 Static set (membership) Link to heading

A static set \(f\) answers for each \(x\) whether it is in \(K\) or not.

  • Space lower bound: \(k - \log n + \log e\) bits/key
    • Note: when \(n\) is large and keys are dense in \(K\), effectively one only has to store the distances between adjacent keys, which takes just over \(\log (2^k/n) = k - \log n\) bits/key.
  • Examples:
    • standard HashSet (C++ unordered_hashset, Rust HashSet): \(1.2k\) to \(2.4k\) bits/key; space overhead for \(2\times\) growth factor and \(\approx 20\%\) empty slots. \(O(1)\) queries with probing and mostly 1 cache miss
    • cuckoo hashing (Pagh and Rodler 2001; Fernholz and Ramachandran 2007; Cain, Sanders, and Wormald 2007): \((1+\varepsilon)k\) bits/key; \(O(1)\) queries with either 2 parallel reads or the second read conditional on the first.
    • Elias Fano (EF) encoding: \(2+0.06+\log (2^k/n) = 2.06+k - \log n\) bits/key
      • Stores the low \(b=\lceil k-\log n\rceil\) bits of each key explicitly, and uses a 1-hot encoding for the number of times a multiple of \(2^b\) is crossed between adjacent values. Then uses a rank and/or select structure on the bitvector for index operations that take an additional \(0.06\) to \(0.4\) bits/key.
      • Answers whether \(x\) is in the set by checking if the smallest value \(\geq x\) is \(x\) itself.
    • optimal static dictionary (Hu et al. 2024): \(n^\varepsilon\) overhead to lowerbound, \(O(1)\) queries

2.8 Static ordered set Link to heading

Compared to a static set, a static ordered set allows successor queries: what is the smallest element in the set \(\geq x\)?

We assume that all implementations of this somehow store an ordered list of keys, and that returning the first element \(\geq x\) implicitly computes an index into this list.

  • Space lower bound: \(k - \log n + \log e\) bits/key (as before)
  • Examples:
    • C++ set (red-black tree): pointer-based; \(k+64\) bits/key?
    • Rust BTreeSet (B-tree): around \(2k\) or so?
    • EF with select: \(2+0.06+\log (2^k/n) = 2.06+k - \log n\) bits/key.

2.9 Static dictionary: static keys and values Link to heading

A dictionary differs from a function in that it recognizes non-keys \(x’\): \(f(x’) = \bot\) is guaranteed, whereas \(f(x)\in \V\) is the usual value. In a static dictionary, both keys and values are static.

  • Space lower bound: \(k-\log n + v + \log e\) bits/key
  • Examples:
    • static set + static retrieval: \(k-\log n+\log e + v\) bits/key (\(\log e\) away)
    • static ordered set + array: \(k - \log n + \log e + v\) bits/key (\(\log e\) away)
      • Retrieve the sorted position via the ordered set, and use that to index an array.
    • static ordered set on \((k,v)\) pairs: \(k-\log n + \log e + v\) bits/key (\(\log e\) away)
    • MPHF + key-value array: \(\log e + k + v\) bits/key (\(\log n\) away).
    • cuckoo hashing: \((1+\varepsilon)(k+v)\) bits/key
    • optimal static dictionary (Hu et al. 2024): \(n^\varepsilon\) overhead to lowerbound, \(O(1)\) queries

2.10 Updatable dictionary with mutable values Link to heading

In an updatable dictionary, values can be updated.

In practice, all known static dictionaries associate a unique memory location to each value (unlike e.g. static retrieval structures). Thus, we can trivially update the value as needed, and all static dictionary methods also work as an updatable dictionary.

  • Space lower bound: \(k-\log n + v + \log e\) bits/key
    • (This equals the static case, because the dictionary contains all needed information to build a modified version from scratch on each update.)

TODO 2.11 Dynamic dictionary with mutable keys and values Link to heading

In a dynamic dictionary, keys are mutable and can thus be added and removed.

  • Space lower bound: ?
  • Examples:
    • C++ absl::flat_hash_map, Rust HashMap: \(1.2(k+v)\) to \(2.4(k+v)\) bits/key with \(2\times\) growth factor

TODO 2.12 Static filter Link to heading

A static set with false-positive ratio \(2^{-p}\).

TODO 2.13 Ordered static/updatable/dynamic dictionary? Link to heading

Like a normal dictionary, but with successor queries.

  • Space lower bound: ?
  • Examples:
    • C++ map
    • Rust BTreeMap

3 Summary table Link to heading

Table 1: Summary of static data structures, their space lower bounds, and practical solutions with their space usage. Double starred methods (**) are within \(o(1)\) bit/key from optimal. Single starred methods (*) are within \(\Theta(1)\) bit/key from optimal.
data structure/methodmembervaluesspace (bits/key)query
MPHFno-\(\geq \log e\)
- Consensus*\(\log e + 0.001\)\(O(1/\varepsilon)\), \(O(\log n)\)
- PTHash/PHOBIC/PtrHash/Phast\(2\) to \(3\)\(O(1)\)
static retrievalnostatic\(\geq v\)
- BuRR**\(v\)\(O(v)\)
- MPHF + array*\(v+\log e\)\(O(1)\)
- hypergraph peeling\(1.23v\)\(O(1)\)
updatable retrievalnomutable\(\geq v+\log e\) (for \(v\gg 1\))
- MPHF + array*\(v+\log e\)
static setyes-\(\geq k - \log n + \log e\)
- Elias Fano*\(k - \log n + 2\)
- cuckoo hashing\((1+\varepsilon)k\)\(O(1)\)
- hashset\(1.2k\) to \(2.4k\)\(O(1)\)
- optimal static dictionary\(k - \log n + O(n^{\varepsilon-1})\)\(O(1)\)
static ordered setyes-\(\geq k - \log n + \log e\)
- Elias Fano*\(k - \log n + 2\)
static dictionaryyesstatic\(\geq k-\log n + v + \log e\)
updatable dictionaryyesmutable\(\geq k-\log n + v + \log e\)
- static set + inline values**\(k-\log n + v + \log e\)
- ordered set on $(k,v)$**\(k-\log n + v + \log e\)
- MPHF + kv-array\(k + v + \log e\)\(O(1)\)
- cuckoo hashing\((1+\varepsilon)(k+v)\)\(O(1)\)
- optimal static dictionary\(k - \log n + v + O(n^{\varepsilon-1})\)\(O(1)\)
standard data structures
- HashSetyes-\(1.2k\) to \(2.4k\)\(O(1)\)
- HashMapyesmutable\(1.2(k+v)\) to \(2.4(k+v)\)\(O(1)\)
- BTreeSetyes-\(\approx 2k\)?\(O(B\lg n)\)
- BTreeMapyesmutable\(\approx 2(k+v)\)?\(O(B\lg n)\)
- static hashsetyes-\(\approx 1.2k\)\(O(1)\)
- static hashmapyesmutable\(\approx 1.2(k+v)\)\(O(1)\)
- static btree setyes-\((1+1/(B-1))k\)\(O(B\lg n)\)
- static btree mapyesmutable\((1+1/(B-1))(k+v)\)\(O(B\lg n)\)

References Link to heading

Beling, Piotr, and Peter Sanders. 2025. “Phast – Perfect Hashing with Fast Evaluation.” arXiv; arXiv. https://doi.org/10.48550/ARXIV.2504.17918.
Bloom, Burton H. 1970. “Space/Time Trade-Offs in Hash Coding with Allowable Errors.” Communications of the Acm 13 (7): 422–26. https://doi.org/10.1145/362686.362692.
Botelho, Fabiano C., Rasmus Pagh, and Nivio Ziviani. 2013. “Practical Perfect Hashing in Nearly Optimal Space.” Information Systems 38 (1): 108–31. https://doi.org/10.1016/j.is.2012.06.002.
Cain, Julie Anne, Peter Sanders, and Nick Wormald. 2007. “The Random Graph Threshold for K-Orientiability and a Fast Algorithm for Optimal Multiple-Choice Allocation.” In Proceedings of the Eighteenth Annual Acm-Siam Symposium on Discrete Algorithms, 469–76. Soda ’07. New Orleans, Louisiana: Society for Industrial and Applied Mathematics.
Dietzfelbinger, Martin, and Stefan Walzer. 2019. “Dense Peelable Random Uniform Hypergraphs.” In Lipics, Volume 144, Esa 2019, 144:38:1–38:16. Schloss Dagstuhl – Leibniz-Zentrum für Informatik. https://doi.org/10.4230/LIPICS.ESA.2019.38.
Dillinger, Peter C., Lorenz Hübschle-Schneider, Peter Sanders, and Stefan Walzer. 2022. “Fast Succinct Retrieval and Approximate Membership Using Ribbon.” In Lipics, Volume 233, Sea 2022, 233:4:1–4:20. Schloss Dagstuhl – Leibniz-Zentrum für Informatik. https://doi.org/10.4230/LIPICS.SEA.2022.4.
Fan, Bin, Dave G. Andersen, Michael Kaminsky, and Michael D. Mitzenmacher. 2014. “Cuckoo Filter.” Proceedings of the 10th Acm International on Conference on Emerging Networking Experiments and Technologies, December. https://doi.org/10.1145/2674005.2674994.
Fernholz, Daniel, and Vijaya Ramachandran. 2007. “The $k$-Orientability Thresholds for $g_n,p$.” In Proceedings of the Eighteenth Annual Acm-Siam Symposium on Discrete Algorithms, 459–68. Soda ’07. New Orleans, Louisiana: Society for Industrial and Applied Mathematics.
Gog, Simon, Timo Beller, Alistair Moffat, and Matthias Petri. 2014. “From Theory to Practice: Plug and Play with Succinct Data Structures.” In Experimental Algorithms, 326–37. Springer International Publishing. https://doi.org/10.1007/978-3-319-07959-2_28.
Gottlieb, Simon Gene, and Knut Reinert. 2025. “Engineering Rank Queries on Bit Vectors and Strings.” Algorithms for Molecular Biology 20 (1). https://doi.org/10.1186/s13015-025-00291-9.
Graf, Thomas Mueller, and Daniel Lemire. 2020. “Xor Filters.” Acm Journal of Experimental Algorithmics 25 (March): 1–16. https://doi.org/10.1145/3376122.
———. 2022. “Binary Fuse Filters: Fast and Smaller than Xor Filters.” Acm Journal of Experimental Algorithmics 27 (March): 1–15. https://doi.org/10.1145/3510449.
Groot Koerkamp, Ragnar. 2025. “PtrHash: Minimal Perfect Hashing at RAM Throughput.” In Sea 2025, 338:21:1–21:21. Lipics. https://doi.org/10.4230/LIPIcs.SEA.2025.21.
Hermann, Stefan, Hans-Peter Lehmann, Giulio Ermanno Pibiri, Peter Sanders, and Stefan Walzer. 2024. “PHOBIC: Perfect Hashing with Optimized Bucket Sizes and Interleaved Coding.” In Esa. https://doi.org/10.4230/LIPICS.ESA.2024.69.
Hu, Yang, Jingxun Liang, Huacheng Yu, Junkai Zhang, and Renfei Zhou. 2024. “Optimal Static Dictionary with Worst-Case Constant Query Time.” arXiv. https://doi.org/10.48550/ARXIV.2412.10655.
Kurpicz, Florian. 2022. “Engineering Compact Data Structures for Rank and Select Queries on Bit Vectors.” In String Processing and Information Retrieval, 257–72. Springer International Publishing. https://doi.org/10.1007/978-3-031-20643-6_19.
Kurpicz, Florian, Niccolò Rigi-Luperti, and Peter Sanders. 2025. “Theory Meets Practice for Bit Vectors Supporting Rank and Select.” arXiv. https://doi.org/10.48550/ARXIV.2509.17819.
Kuszmaul, William, and Stefan Walzer. 2024. “Space Lower Bounds for Dynamic Filters and Value-Dynamic Retrieval.” In Proceedings of the 56th Annual Acm Symposium on Theory of Computing, 1153–64. Stoc ’24. ACM. https://doi.org/10.1145/3618260.3649649.
Lehmann, Hans-Peter, Thomas Mueller, Rasmus Pagh, Giulio Ermanno Pibiri, Peter Sanders, Sebastiano Vigna, and Stefan Walzer. 2025. “Modern Minimal Perfect Hashing: A Survey.” arXiv. https://doi.org/10.48550/ARXIV.2506.06536.
Lehmann, Hans-Peter, Peter Sanders, Stefan Walzer, and Jonatan Ziegler. 2025. “Combined Search and Encoding for Seeds, with an Application to Minimal Perfect Hashing.” arXiv; arXiv. https://doi.org/10.48550/ARXIV.2502.05613.
Matteo Ceregini, Rossano Venturini Florian Kurpicz. 2023. “Faster Wavelet Trees with Quad Vectors.” arXiv. https://doi.org/10.48550/ARXIV.2302.09239.
Pagh, Rasmus, and Flemming Friche Rodler. 2001. “Cuckoo Hashing.” In Algorithms — Esa 2001, 121–33. Springer Berlin Heidelberg. https://doi.org/10.1007/3-540-44676-1_10.
Pibiri, Giulio Ermanno, and Shunsuke Kanda. 2021. “Rank/Select Queries over Mutable Bitmaps.” Information Systems 99 (July): 101756. https://doi.org/10.1016/j.is.2021.101756.
Pibiri, Giulio Ermanno, and Roberto Trani. 2021. “PTHash: Revisiting FCH Minimal Perfect Hashing.” Proceedings of the 44th International Acm Sigir Conference on Research and Development in Information Retrieval, July. https://doi.org/10.1145/3404835.3462849.
Putze, Felix, Peter Sanders, and Johannes Singler. 2009. “Cache-, Hash-, and Space-Efficient Bloom Filters.” Acm Journal of Experimental Algorithmics 14 (December). https://doi.org/10.1145/1498698.1594230.
Vigna, Sebastiano. 2008. “Broadword Implementation of Rank/Select Queries.” In Experimental Algorithms, edited by Catherine C. McGeoch, 154–68. Berlin, Heidelberg: Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-540-68552-4_12.
———. 2024. “sux-rs.” https://github.com/vigna/sux-rs.
———. 2025. “$\varepsilon$-Cost Sharding: Scaling Hypergraph-Based Static Functions and Filters to Trillions of Keys.” arXiv. https://doi.org/10.48550/ARXIV.2503.18397.
Walzer, Stefan. 2020. “Random Hypergraphs for Hashing-Based Data Structures.” Ilmenau. https://www.db-thueringen.de/receive/dbt_mods_00047127.
Zhou, Dong, David G. Andersen, and Michael Kaminsky. 2013. “Space-Efficient, High-Performance Rank and Select Structures on Uncompressed Bit Sequences.” In Experimental Algorithms, 151–63. Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-642-38527-8_15.