Notes on implementing Longest Common Repeat (LCR)

These are my running notes on implementing an algorithm for Longest Common Repeat using minimizers.

Notes

Coloured Tree Problem

See Lemma 3 at here

Generic sparse suffix array

For random strings and \(b \leq n / \log n\), direct radix sort on $2log n + log log n$-bit prefixes is sufficient for \(O(n)\) runtime. In fact, since computer word size \(w\geq \log n\), we only need at most \(2\) rounds of radix sort! (See simple-saca.)

Sparse suffix array on minimizers

  1. Fix \(w\), the window length, and \(k\), the minimizer length.
  2. Find all minimizers. We expect a density of \(2/w\), and there is one at least every \(w\) positions.
  3. Radix sort the suffixes of length \(2\cdot w\) starting at each minimizer in \(O(2/w \cdot 2w \cdot n) = O(n)\) time.
  4. Use prefix doubling to resolve missing elements.

Discussion / TODOs

Evals

  • Plot space/time tradeoff in \(l\)
  • Plot space/time in \(n\)
  • Compare to full SA/LCP construction algorithm

References