Advanced Data Structures – Summer 2026 – Lecture 2

How do we efficiently store a list of integers \[ 0\leq x_0 \leq x_1 \leq \dots \leq x_{n-1} < U. \]
Using stars and bars, there are \(\binom{U+n-1}{n}\) of choosing \(n\) elements (with duplicates) from \(\{0, 1, \dots, U-1\}\).
\[ n \log_2 \left(\frac{U+n-1}{n}\right) \leq \log_2 \binom{U+n-1}{n} \leq n \log_2 \left(e\cdot \frac{U+n-1}{n}\right) \] so it can be done with \(\log_2 e + \log_2\left(\frac{U+n-1}{n}\right)\) bits per key.
For \(n = o(\sqrt U)\) and \(n\to\infty\), \[ \frac 1n \log_2 \binom{U+n-1}{n} \to \log_2 \left(e\cdot \frac{U}{n}\right). \]

What if \(n \gg U\)?
When \(n\gg U\), store how often each element each element occurs.

What if \(n \approx U\)?
(Hint: What if all elements are distinct?)
When all elements are distinct: store \(U\) bits indicating for each \(1\) or \(0\).
Otherwise: encode all the counts in unary and concatenate them, e.g., a count of 3 becomes 1110.

What if \(n \ll U\)?
Store directly:
Where is the "waste"/inefficiency?
Consecutive elements share the \(\approx \log_2 n\) high bits!

A list of \(n\) integers \(0\leq x_0\leq x_1\leq \dots \leq x_{n-1} < U\) can be stored in \[ n (\log_2 (U/n) +2) \] bits while supporting \(O(1)\) access time.
This is only \(2-\log_2(e) = 0.5573\) bits/key away from the lower bound when \(n = o(\sqrt U)\)!
Idea:

Split each integer into a \(\ell=\lfloor \log(U/n)\rfloor\)-bit low part and \((\log U - \ell)\)-bit high part.


To get \(x_i\):
Get the high part of the \(i\)'th value:
\[\mathsf{high}(i) = \mathsf{select}_1(i) - i\]


sux::EliasFano implementation



