Succinct trees

Previous slides: Rank & Select, Elias-Fano Link to heading

  • Rank: counting 1 bits
    • 2-levels of blocks of length \(\log n\) and \(\log^2 n\)
  • Select: finding 1 bits
    • 2-levels of blocks with \(\log^2 n\) 1-bits and \(\sqrt{\log n}\) 1-bits.
  • There exist succinct \(o(n)\) space data structures for rank and select.
  • Elias-Fano: efficient encoding of sorted list of integers
    • Split in high and \(\ell\)-bit low part; \(\approx 0.5\) bit/elem overhead.
    • Encode high parts using stars & bars:
      1 for each element, 0 each time a multiple of \(2^\ell\) is crossed.

Today: Succinct representation of trees Link to heading

A.k.a.: applications of rank and select.

  • What is a tree?
  • How many trees on \(n\) nodes are there?
  • How do we encode the tree?
    • 3 options: LOUDS (BFS),
    • balanced parentheses (in-order DFS),
    • DFUDS (pre-order DFS)
  • Operations on trees
    • degree, parent, \(i\)’th child, subtree size

Application: in-place json parsing

Ordered trees Link to heading

Definition 1.

An ordered tree is a tree with:

  • \(n\) nodes, \(n-1\) edges,
  • a single root,
  • an order on the children of each node.
Question 1.

How many ordered trees are there at least? At most?

Bijection to balanced parentheses Link to heading

Theorem 1 Balanced parentheses.

There is a bijection between ordered trees on \(n\) nodes and balanced parentheses (BP) sequences of \(n-1\) “(” and “)”, where each “(” has a matching “)”.

“Trace” the outline of the tree in DFS-order: each time an edge goes down, write “(”, and each time an edge goes up, write “)”.

Upper bound Link to heading

Theorem 2 Upper bound.

There are \(o(4^n)\) ordered trees on \(n\) nodes.

The number of BP sequences is at most \[\textsf{#trees} \leq \binom{2(n-1)}{n-1} = \Theta\left(\frac{2^{2n}}{\sqrt n}\right)\]

Lower bound Link to heading

Theorem 3 Lower bound.

There are \(\Omega(4^n/n^{1.5})\) ordered trees on \(n\) nodes.

Partition the \(\binom{2(n-1)}{n-1}\) possible BP sequences in equivalence classes under rotation of the string. Take a random element of any class. Find the prefix with the largest value of \(\mathsf{count}_)(i) - \mathsf{count}_((i)\), and consider the rotation starting after this prefix. This is a BP. Thus, each class has size \(\leq 2(n-1)\), and contains at least one BP:

\[\textsf{#trees} \geq \frac 1{2(n-1)}\binom{2(n-1)}{n-1} = \Omega\left(\frac{2^{2n}}{n^{1.5}}\right).\]

Space lower bound Link to heading

Theorem 4.

We need at least \[\frac 1n\log_2(4^n/n^{1.5}) = 2 - o(1)\] bits per node, and this is tight.

Problem 1 Goal for the lecture.

Represent trees using \(2 + o(1)\) bits per node, while supporting common operations.

Aside: Catalan numbers Link to heading

Theorem 5.

The exact number of ordered trees on \(n\) nodes is the Catalan number: \[ C_{n-1} := \frac1{n} \binom{2(n-1)}{n-1} \sim \frac{4^{n-1}}{n^{3/2} \sqrt \pi}. \]

Aside: Bijection to binary trees Link to heading

Example 1 left-child right-sibling binary tree.
  • Bijection between ordered trees and binary trees.
  • Given a binary tree, transform it to an ordered tree like this:
  • a left child of a node is a child in the ordered tree.
  • a right child of a node is a sibling in the ordered tree.

Level Ordered Unary Degree Sequence (LOUDS) Link to heading

Definition 2.

Start with 10. Then append the degree of each node in level/BFS order in unary: 1 for each child, and then a 0.

Example 2.

10111100110011001100000
a_bchiABdeCHjkIDfgEJKFG

  • lowercase: introduce node as child.
  • uppercase: end the list of children of the node.

\(2n+1\) bits:

  • a 1 to add the node as child,
  • an ’end’ marker 0 for each node,
  • an extra 0.

Operations Link to heading

Question 2.

How to implement operations:

  • degree,
  • \(i\)’th child,
  • parent,
  • subtree size.

Advanced:

  • depth,
  • lowest common ancestor (LCA),
  • rank (pre- or post-order),

\[ \newcommand{\rank}{\mathsf{rank}} \newcommand{\select}{\mathsf{select}} \]

Implementing the operations Link to heading

Example 3.

10111100110011001100000
a_bchiABdeCHjkIDfgEJKFG

Definition 3.

Number nodes \(0\) to \(n-1\) in level/BFS-order.

  • Position of 0 of \(u\): \(p_0(u):=\select_0(u+1)\)
  • Position of 1 of \(u\): \(p_1(u):=\select_1(u)\)
  • Children of \(u\): position \(\select_0(u)+1\) to \(\select_0(u+1)\)
  • Degree of \(u\): \(\select_0(u+1) - \select_0(u) - 1\).
  • \(i\)’th child of \(u\): (number of children before \(u) + i = \rank_1(\select_0(u)) + 1 + i\)
  • Parent of \(u\): the node containing the \(u\)’th 1: \(\rank_0(\select_1(u))-1\)
    • Roughly inverse of \(\mathsf{child}\)
  • Subtree size: hard; data is all over the place

Balanced Parentheses (BP) Link to heading

Definition 4.

Traverse tree in DFS-order. Write a ( each time a node/subtree is first entered, and a ) a node/subtree is left.

Example 4.

(()(()(()()))()(()()))
1101101101000101101000 10 instead of ()
abBcdDefFgGEChHijJkKIA

  • lowercase: enter subtree
  • uppercase: leave subtree

\(2n\) bits: a ( and ) for each node.

Note: the children of a are all over the place!
\(\rank\) and \(\select\) won’t work here.

Additional operations on BP Link to heading

Definition 5.
  • \(\mathsf{findclose(i)}\): given ( at pos \(i\), get the position of the matching ).
  • \(\mathsf{findopen(i)}\): given ) at pos \(i\), get the position of the matching (.
  • \(\mathsf{enclose(i)}\): given ( at pos \(i\), get the position of the closest ( left of it enclosing it.

Can all be done in \(O(1)\) time and \(o(n)\) space

Implementing the BP operations Link to heading

Example 5.

(()(()(()()))()(()()))
1101101101000101101000 10 instead of ()
abBcdDefFgGEChHijJkKIA

Definition 6.

Number nodes \(0\) to \(n-1\) in DFS-order.

  • Position of ( of \(u\): \(p=\select_((u)\)
  • Subtree size of \(u\): \((\mathsf{findclose}(p) - p + 1)/2\)
  • Parent of \(u\): \(\mathsf{enclose}(p)\)
  • Degree, \(i\)’th child: hard (Navarro and Sadakane 2014)

Depth-First Unary Degree Sequence (DFUDS) Link to heading

Definition 7.

Start with 1. Then traverse the nodes in DFS order, and append for each its degree in unary: a 1 for each child and then a 0.

Then convert 1 to ( and 0 to ) to get a BP sequence.

Example 6.

((((())(())(())))(()))
1111100110011000011000
aihcbABedCDgfEFGHkjIJK

  • lowercase: introduce node as child.
  • uppercase: end the list of children of the node.

\(2n\) bits: 1 to introduce each node,
0 to end the unary degree encoding of each node.

Implementing the DFUDS operations Link to heading

Example 7.

((((())(())(())))(()))
1111100110011000011000
aihcbABedCDgfEFGHkjIJK

Definition 8.
  • Position of 0 or ) of \(u\): \(q=\select_0(u)\)
  • Degree of \(u\): \(\select_0(u) - \select_0(u-1) - 1\). (Edge case for \(u=0\) )
  • \(i\)’th child (position of first 1-bit): next after the matching ): \(\mathsf{findclose}(q-1-i)+1\)
  • Parent of \(u\): get preceding ), then matching (, then succeeding ): \(\select_0(\rank_0(\mathsf{findopen}(\select_0(u-1))))\)
  • Subtree size of \(u\), starting at first bit \(p\): find succeeding ) that leaves the subtree: \((\mathsf{findclose}(\mathsf{enclose}(p)) - p)/2 + 1\)

Implementing findclose and enclose Link to heading

  • Uses a segment tree (next week) over \(\mathsf{excess}(i) = \rank_((i) - \rank_)(i)\).

    • matches depth in the tree
  • \(\mathsf{findclose}(i) = \min\{j > i: \mathsf{excess}(j)=\mathsf{excess}(i-1)\}\)

  • \(\mathsf{findopen}(i) = \max\{j < i: \mathsf{excess}(i)=\mathsf{excess}(j-1)\}\)

  • \(\mathsf{enclose}(i) = \max\{j < i: \mathsf{excess}(i)=\mathsf{excess}(j-1)+2\}\)

  • Each node of the segment tree stores the difference of ( and ) over its range, and the minimum depth node it contains.

Further reading / sources Link to heading

Bibliography Link to heading

References Link to heading

Navarro, Gonzalo, and Kunihiko Sadakane. 2014. “Fully Functional Static and Dynamic Succinct Trees.” Acm Transactions on Algorithms 10 (3): 1–39. https://doi.org/10.1145/2601073.

Possible exam questions Link to heading

  • How many bit are needed to succinctly encode an ordered tree?
    • Why is this sufficient?
    • Why is this required?
  • What are three different ways to succinctly encode an ordered tree?
    • How do they relate?
    • How do they differ?
    • Which operations are (not) easily supported by each variant?
  • Why is it hard to support both ‘subtree size’ and ‘\(i\)’th child’ queries at the same time?