<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Text-Indexing on CuriousCoding</title><link>https://curiouscoding.nl/tags/text-indexing/</link><description>Recent content in Text-Indexing on CuriousCoding</description><generator>Hugo</generator><language>en</language><lastBuildDate>Thu, 26 Feb 2026 00:00:00 +0100</lastBuildDate><atom:link href="https://curiouscoding.nl/tags/text-indexing/index.xml" rel="self" type="application/rss+xml"/><item><title>Wheeler graphs</title><link>https://curiouscoding.nl/posts/wheeler-graphs/</link><pubDate>Thu, 26 Feb 2026 00:00:00 +0100</pubDate><guid>https://curiouscoding.nl/posts/wheeler-graphs/</guid><description>&lt;div class="ox-hugo-toc toc has-section-numbers"&gt;
&lt;div class="heading"&gt;Table of Contents&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="section-num"&gt;1&lt;/span&gt; &lt;a href="#deterministic-finite-automaton--dfa" &gt;Deterministic Finite Automaton (DFA)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2&lt;/span&gt; &lt;a href="#wheeler-dfa" &gt;Wheeler-DFA&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;3&lt;/span&gt; &lt;a href="#linear-graphs-prefix-array" &gt;Linear graphs: Prefix array&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;4&lt;/span&gt; &lt;a href="#de-bruijn-graphs-are-wheeler" &gt;De Bruijn graphs are Wheeler&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;5&lt;/span&gt; &lt;a href="#not-all-dfas-are-wheeler" &gt;Not all DFAs are Wheeler&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;6&lt;/span&gt; &lt;a href="#locating-patterns-via-binary-search" &gt;Locating patterns via binary search&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;7&lt;/span&gt; &lt;a href="#locating-patterns-via-the-boss-table" &gt;Locating patterns via the BOSS table&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;!--endtoc--&gt;
&lt;p&gt;These are some notes on Wheeler DFAs after chatting with Nicola
Prezza&lt;sup id="fnref:1"&gt;&lt;a href="#fn:1" class="footnote-ref" role="doc-noteref"&gt;1&lt;/a&gt;&lt;/sup&gt; and others
from the RAVEN lab at DSB 2026 in Venice.&lt;/p&gt;
&lt;h3 id="deterministic-finite-automaton--dfa"&gt;
 &lt;span class="section-num"&gt;1&lt;/span&gt; Deterministic Finite Automaton (DFA)
 &lt;a class="heading-link" href="#deterministic-finite-automaton--dfa"&gt;
 &lt;i class="fa-solid fa-link" aria-hidden="true" title="Link to heading"&gt;&lt;/i&gt;
 &lt;span class="sr-only"&gt;Link to heading&lt;/span&gt;
 &lt;/a&gt;
&lt;/h3&gt;
&lt;p&gt;A DFA is a graph where edges are labelled by characters.
Each node can have at most one outgoing edge with each label. (Otherwise it
would be &lt;em&gt;non-deterministic&lt;/em&gt;.)&lt;/p&gt;</description></item><item><title>A lemma on suffix array searching</title><link>https://curiouscoding.nl/posts/suffix-array-searching-lemma/</link><pubDate>Sat, 05 Oct 2024 00:00:00 +0200</pubDate><guid>https://curiouscoding.nl/posts/suffix-array-searching-lemma/</guid><description>&lt;div class="ox-hugo-toc toc has-section-numbers"&gt;
&lt;div class="heading"&gt;Table of Contents&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="section-num"&gt;1&lt;/span&gt; &lt;a href="#suffix-arrays" &gt;Suffix arrays&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2&lt;/span&gt; &lt;a href="#searching-methods" &gt;Searching methods&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2.1&lt;/span&gt; &lt;a href="#naive-o--p-cdot-lg-2-n--search" &gt;Naive \(O(|P|\cdot \lg_2 n)\) search&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2.2&lt;/span&gt; &lt;a href="#faster-search" &gt;Faster \(O(|P|\cdot \lg_2 n)\) search&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2.3&lt;/span&gt; &lt;a href="#lcp-based-o--p-plus-lg-2-n--search" &gt;LCP-based \(O(|P| + \lg_2 n)\) search&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;3&lt;/span&gt; &lt;a href="#analysing-the-faster-search" &gt;Analysing the faster search&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;!--endtoc--&gt;
&lt;p&gt;We&amp;rsquo;ll prove that using the &amp;ldquo;faster&amp;rdquo; binary search algorithm (see &lt;a href="#faster-search" &gt;2.2&lt;/a&gt;) that tracks the LCP
with the left and right boundary of the remaining search interval has amortized
runtime&lt;/p&gt;
&lt;p&gt;\[
O\Big(\lg_2(n) + |P| + |P| \cdot \lg_2(Occ(P))\Big),
\]
when \(P\) is a randomly sampled fixed-length pattern from the text and \(Occ(P)\) counts the number of occurrences of \(P\) in the text.&lt;/p&gt;</description></item><item><title>FM-index implementations</title><link>https://curiouscoding.nl/posts/fm-index-implementations/</link><pubDate>Wed, 02 Oct 2024 00:00:00 +0200</pubDate><guid>https://curiouscoding.nl/posts/fm-index-implementations/</guid><description>&lt;div class="ox-hugo-toc toc"&gt;
&lt;div class="heading"&gt;Table of Contents&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#a-note-on-sdsl-versions" &gt;A note on SDSL versions&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;!--endtoc--&gt;
&lt;p&gt;Here I&amp;rsquo;ll briefly list some FM-index and related implementations around the web.
Implementations seem relatively inconsistent, mostly because the FM-index is
more of a &amp;lsquo;wrapper&amp;rsquo; type around a given Burrows-Wheeler-transform and an
&lt;em&gt;occurrences&lt;/em&gt; list implementation. Both can be implemented in various ways. In particular
occurrences should be stored using a wavelet tree for optimal compression.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The &lt;a href="https://github.com/wafflespeanut/nucleic-acid/blob/2adbf5181081245423f974a88b5ccf53d7bf26ac/src/bwt.rs#L96" class="external-link" target="_blank" rel="noopener"&gt;nucleic-acid repo&lt;/a&gt; contains a completely unoptimised version.&lt;/li&gt;
&lt;li&gt;The Rust-bio crate contains a &lt;a href="https://github.com/rust-bio/rust-bio/blob/master/src/data_structures/fmindex.rs#L209" class="external-link" target="_blank" rel="noopener"&gt;generic FM-index&lt;/a&gt;. It stores a &lt;a href="https://github.com/rust-bio/rust-bio/blob/master/src/data_structures/bwt.rs#L75-L94" class="external-link" target="_blank" rel="noopener"&gt;sampled
occurrences array&lt;/a&gt;, so that space is relatively small but lookups take \(O(k)\)
time for sampling factor \(k\).&lt;/li&gt;
&lt;li&gt;SDSL-lite contains a &lt;a href="https://github.com/simongog/sdsl-lite/blob/c32874cb2d8524119f25f3b501526fe692df29f4/include/sdsl/wavelet_" class="external-link" target="_blank" rel="noopener"&gt;wavelet tree&lt;/a&gt; and &lt;a href="https://github.com/simongog/sdsl-lite/blob/master/include/sdsl/csa_wt.hpp#L48" class="external-link" target="_blank" rel="noopener"&gt;compressed suffix array&lt;/a&gt; implementation based
on it, that provides the same functionality as an FM-index.&lt;/li&gt;
&lt;li&gt;There is the &lt;a href="https://github.com/rossanoventurini/qwt" class="external-link" target="_blank" rel="noopener"&gt;Quad Wavelet Tree&lt;/a&gt; (QWT) Rust crate (Ceregini, Kurpicz, and Venturini 2024). This uses a 4-ary
tree instead of the usual binary wavelet tree, and improves latency by around
a factor 2 over SDSL wavelet trees.&lt;/li&gt;
&lt;li&gt;Dominik Kempa has the &lt;a href="https://github.com/dominikkempa/faster-minuter?tab=readme-ov-file" class="external-link" target="_blank" rel="noopener"&gt;Faster-Minuter index&lt;/a&gt; (Gog et al. 2019) that contains
an improved wavelet tree as well.&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/achacond/gem-cutter" class="external-link" target="_blank" rel="noopener"&gt;GEM-Cutter&lt;/a&gt; contain a GPU implementation of the FM-index (Chacon et al. 2015).&lt;/li&gt;
&lt;li&gt;There is also &lt;a href="https://github.com/lh3/ropebwt3" class="external-link" target="_blank" rel="noopener"&gt;RopeBWT3&lt;/a&gt; (Li 2024), which is basically a run-length
compressed BWT with a B+ tree on top for fast queries.&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/UM-Applied-Algorithms-Lab/AWRY" class="external-link" target="_blank" rel="noopener"&gt;AWRY&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="a-note-on-sdsl-versions"&gt;
 A note on SDSL versions
 &lt;a class="heading-link" href="#a-note-on-sdsl-versions"&gt;
 &lt;i class="fa-solid fa-link" aria-hidden="true" title="Link to heading"&gt;&lt;/i&gt;
 &lt;span class="sr-only"&gt;Link to heading&lt;/span&gt;
 &lt;/a&gt;
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/simongog/sdsl" class="external-link" target="_blank" rel="noopener"&gt;github:simongog/sdsl&lt;/a&gt; is the original, with last commit in 2013.&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/simongog/sdsl-lite" class="external-link" target="_blank" rel="noopener"&gt;github:simongog/sdsl-lite&lt;/a&gt; is v2, with last commit in 2019, and seems the most
used currently.&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/xxsds/sdsl-lite" class="external-link" target="_blank" rel="noopener"&gt;github:xxsds/sdsl-lite&lt;/a&gt; is v3 and seems to be actively maintained at the time
of writing (Jan 2025), and is &lt;a href="https://www.reddit.com/r/rust/comments/nlxhym/comment/gzpqejn/?utm_source=share&amp;amp;utm_medium=web3x&amp;amp;utm_name=web3xcss&amp;amp;utm_term=1&amp;amp;utm_content=share_button" class="external-link" target="_blank" rel="noopener"&gt;recommended&lt;/a&gt; by the original developers. From a
quick glance, I think it&amp;rsquo;s somewhat restructured and truly a v3, not just a v2.1.
However, it seems to be much less popular.&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/vgteam/sdsl-lite" class="external-link" target="_blank" rel="noopener"&gt;github:vgteam/sdsl-lite&lt;/a&gt; is a fork of the original &lt;code&gt;sdsl-lite&lt;/code&gt;, with, I think,
a number of small bug fixes and some updates for recent compiler versions.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Then there are also some rust versions:&lt;/p&gt;</description></item><item><title>Tools for suffix array searching</title><link>https://curiouscoding.nl/posts/suffix-array-searching/</link><pubDate>Fri, 14 Jun 2024 00:00:00 +0200</pubDate><guid>https://curiouscoding.nl/posts/suffix-array-searching/</guid><description>&lt;div class="ox-hugo-toc toc has-section-numbers"&gt;
&lt;div class="heading"&gt;Table of Contents&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="section-num"&gt;1&lt;/span&gt; &lt;a href="#sapling" &gt;Sapling&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2&lt;/span&gt; &lt;a href="#pla-index" &gt;PLA-Index&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;3&lt;/span&gt; &lt;a href="#lisa-learned-index" &gt;LISA: learned index&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;!--endtoc--&gt;
&lt;p&gt;Let&amp;rsquo;s summarize some tools for efficiently searching suffix arrays.&lt;/p&gt;
&lt;h2 id="sapling"&gt;
 &lt;span class="section-num"&gt;1&lt;/span&gt; Sapling
 &lt;a class="heading-link" href="#sapling"&gt;
 &lt;i class="fa-solid fa-link" aria-hidden="true" title="Link to heading"&gt;&lt;/i&gt;
 &lt;span class="sr-only"&gt;Link to heading&lt;/span&gt;
 &lt;/a&gt;
&lt;/h2&gt;
&lt;p&gt;Sapling (&lt;a href="#citeproc_bib_item_2"&gt;Kirsche, Das, and Schatz 2020&lt;/a&gt;) works as follows:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Choose a parameter \(p\) store for each of the \(2^p\) &lt;strong&gt;$p$-bit prefixes&lt;/strong&gt; the
corresponding position in the suffix array.&lt;/li&gt;
&lt;li&gt;When querying, first find the bucket for the query prefix. Then do a &lt;strong&gt;linear
interpolation&lt;/strong&gt; inside the bucket.&lt;/li&gt;
&lt;li&gt;Search the area \([-E, +E]\) around the interpolated position, where \(E\) is a
bound on the error of the linear approximation. In practice \(E\) is only a
$95\%$-confidence bound, and if the true value is not in the range, a linear
search with steps of size \(E\) is done.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The paper also introduces a neural network approach to approximating buckets,
but this takes over a day to learn and is slower to query in practice.&lt;/p&gt;</description></item><item><title>Crates for suffix array construction</title><link>https://curiouscoding.nl/posts/suffix-array-crates/</link><pubDate>Thu, 13 Jun 2024 00:00:00 +0200</pubDate><guid>https://curiouscoding.nl/posts/suffix-array-crates/</guid><description>&lt;p&gt;Popular C libraries are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/y-256/libdivsufsort" class="external-link" target="_blank" rel="noopener"&gt;divsufsort&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/IlyaGrebnov/libsais" class="external-link" target="_blank" rel="noopener"&gt;libsais&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Both have a &lt;code&gt;..64&lt;/code&gt; variant that supports input strings longer than &lt;code&gt;2GB&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Rust wrappers:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://crates.io/crates/divsufsort" class="external-link" target="_blank" rel="noopener"&gt;divsufsort&lt;/a&gt;: rust reimplementation, does not support large inputs.&lt;/li&gt;
&lt;li&gt;&lt;a href="https://crates.io/crates/cdivsufsort" class="external-link" target="_blank" rel="noopener"&gt;cdivsufsort&lt;/a&gt;: c-wrapper, does not support large inputs&lt;/li&gt;
&lt;li&gt;&lt;a href="https://crates.io/crates/libdivsufsort-rs" class="external-link" target="_blank" rel="noopener"&gt;livdivsufsort-rs&lt;/a&gt;: c-wrapper, &lt;strong&gt;does&lt;/strong&gt; support large inputs&lt;/li&gt;
&lt;li&gt;&lt;a href="https://crates.io/crates/sais" class="external-link" target="_blank" rel="noopener"&gt;sais&lt;/a&gt;: unrelated to the original library; does not implement a linear time
algorithm anyway&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/Daniel-Liu-c0deb0t/libsais-rs" class="external-link" target="_blank" rel="noopener"&gt;libsais-rs&lt;/a&gt;: Daniel Liu&amp;rsquo;s fork-of-fork of &lt;a href="https://github.com/hucsmn/libsais-rs" class="external-link" target="_blank" rel="noopener"&gt;the original&lt;/a&gt;, but not on crates.io. Supports multithreading
using OpenMP and wraps both the original and 64bit version.&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/Daniel-Liu-c0deb0t/simple-saca" class="external-link" target="_blank" rel="noopener"&gt;simple-saca&lt;/a&gt;: Daniel Liu&amp;rsquo;s bounded-context suffix array construction that is
faster than divsufsort and libsais, but does not return a true fully sorted
suffix array.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="references"&gt;
 References
 &lt;a class="heading-link" href="#references"&gt;
 &lt;i class="fa-solid fa-link" aria-hidden="true" title="Link to heading"&gt;&lt;/i&gt;
 &lt;span class="sr-only"&gt;Link to heading&lt;/span&gt;
 &lt;/a&gt;
&lt;/h2&gt;
&lt;style&gt;.csl-entry{text-indent: -1.5em; margin-left: 1.5em;}&lt;/style&gt;&lt;div class="csl-bib-body"&gt;
&lt;/div&gt;</description></item><item><title>String algorithm visualizations</title><link>https://curiouscoding.nl/posts/alg-viz/</link><pubDate>Tue, 08 Nov 2022 00:00:00 +0100</pubDate><guid>https://curiouscoding.nl/posts/alg-viz/</guid><description>&lt;ol&gt;
&lt;li&gt;Select the algorithm to visualize&lt;/li&gt;
&lt;li&gt;Click the buttons, or click the canvas and use the indicated keys&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Suffix-array construction is explained &lt;a href="https://curiouscoding.nl/posts/suffix-array-construction/" &gt;here&lt;/a&gt; and BWT is explained &lt;a href="https://curiouscoding.nl/posts/bwt/" &gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Source code is &lt;a href="https://github.com/RagnarGrootKoerkamp/alg-viz" class="external-link" target="_blank" rel="noopener"&gt;on GitHub&lt;/a&gt;.&lt;/p&gt;
&lt;script defer src="https://curiouscoding.nl/js/alg-viz.js" type="module"&gt;&lt;/script&gt;&lt;/head&gt;
&lt;div class="controls"&gt;
&lt;label for="algorithm"&gt;Algorithm&lt;/label&gt;
&lt;select name="algorithm" id="algorithm"&gt;
 &lt;option value="suffix-array"&gt;Suffix Array Construction&lt;/option&gt;
 &lt;option value="bwt"&gt;Burrows-Wheeler Transform&lt;/option&gt;
 &lt;option value="bibwt"&gt;Bidirectional BWT&lt;/option&gt;
&lt;/select&gt;
&lt;br/&gt;
&lt;label for="string"&gt;String&lt;/label&gt; &lt;input type="string" name="string" id="string"/&gt;&lt;br/&gt;
&lt;label for="query"&gt;Query&lt;/label&gt; &lt;input type="string" name="query" id="query"/&gt;&lt;br/&gt;
&lt;button class="button-primary" id="prev"&gt;prev (←/backspace)&lt;/button&gt;
&lt;button class="button-primary" id="next"&gt;next (→/space)&lt;/button&gt;
&lt;br/&gt;
&lt;label for="delay"&gt;Delay (s)&lt;/label&gt; &lt;input type="number" name="delay" id="delay" value="0.8"/&gt;&lt;br/&gt;
&lt;button class="button-primary" id="faster"&gt;faster (↑/+/f)&lt;/button&gt;
&lt;button class="button-primary" id="slower"&gt;slower (↓/-/s)&lt;/button&gt;
&lt;button class="button-primary" id="pauseplay"&gt;pause/play (p/return)&lt;/button&gt;
&lt;/div&gt;
&lt;div class="canvas"&gt;
&lt;canvas id="canvas" tabindex='1' width="1600" height="1200"&gt;&lt;/canvas&gt;
&lt;/div&gt;</description></item><item><title>BWT and FM-index</title><link>https://curiouscoding.nl/posts/bwt/</link><pubDate>Tue, 18 Oct 2022 00:00:00 +0200</pubDate><guid>https://curiouscoding.nl/posts/bwt/</guid><description>&lt;div class="ox-hugo-toc toc"&gt;
&lt;div class="heading"&gt;Table of Contents&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#burrows-wheeler-transformation--bwt" &gt;Burrows-Wheeler Transformation (BWT)&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#last-to-first-mapping--lf-mapping" &gt;Last-to-first mapping (LF mapping)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#pattern-matching" &gt;Pattern matching&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#visualization" &gt;Visualization&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#bi-directional-bwt" &gt;Bi-directional BWT&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;!--endtoc--&gt;
&lt;p&gt;These are some notes about the &lt;a href="https://en.wikipedia.org/wiki/Burrows%E2%80%93Wheeler_transform" class="external-link" target="_blank" rel="noopener"&gt;Burrows-Wheeler Transform&lt;/a&gt; (BWT), &lt;a href="https://en.wikipedia.org/wiki/FM-index" class="external-link" target="_blank" rel="noopener"&gt;FM-index&lt;/a&gt;, and variants.&lt;/p&gt;
&lt;p&gt;See my post on the &lt;a href="../suffix-array-construction/" &gt;linear time suffix array construction algorithm&lt;/a&gt; for
notation and terminology.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;At &lt;a href="#visualization" &gt;the bottom&lt;/a&gt; you can find a visualization.&lt;/li&gt;
&lt;li&gt;&lt;a href="https://curiouscoding.nl/posts/alg-viz/" &gt;&lt;strong&gt;&lt;strong&gt;This page&lt;/strong&gt;&lt;/strong&gt;&lt;/a&gt; has an &lt;strong&gt;&lt;strong&gt;interactive demo&lt;/strong&gt;&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Source code for visualizations is &lt;a href="https://github.com/RagnarGrootKoerkamp/suffix-array-construction" class="external-link" target="_blank" rel="noopener"&gt;this GitHub repo&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id="burrows-wheeler-transformation--bwt"&gt;
 Burrows-Wheeler Transformation (BWT)
 &lt;a class="heading-link" href="#burrows-wheeler-transformation--bwt"&gt;
 &lt;i class="fa-solid fa-link" aria-hidden="true" title="Link to heading"&gt;&lt;/i&gt;
 &lt;span class="sr-only"&gt;Link to heading&lt;/span&gt;
 &lt;/a&gt;
&lt;/h2&gt;
&lt;p&gt;The BWT of a string \(S\) is generated as follows:&lt;/p&gt;</description></item><item><title>Linear-time suffix array construction</title><link>https://curiouscoding.nl/posts/suffix-array-construction/</link><pubDate>Thu, 13 Oct 2022 00:00:00 +0200</pubDate><guid>https://curiouscoding.nl/posts/suffix-array-construction/</guid><description>&lt;div class="ox-hugo-toc toc"&gt;
&lt;div class="heading"&gt;Table of Contents&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#notation" &gt;Notation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#small-and-large-suffixes" &gt;Small and Large suffixes&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#building-the-suffix-array-from-a-smaller-one" &gt;Building the suffix array from a smaller one&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#visualization" &gt;Visualization&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;!--endtoc--&gt;
&lt;p&gt;These are some notes about linear time suffix array (SA) construction algorithms (SACA&amp;rsquo;s).&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;At &lt;a href="#visualization" &gt;the bottom&lt;/a&gt; you can find a visualization.&lt;/li&gt;
&lt;li&gt;&lt;a href="https://curiouscoding.nl/posts/alg-viz/" &gt;&lt;strong&gt;&lt;strong&gt;This page&lt;/strong&gt;&lt;/strong&gt;&lt;/a&gt; has an &lt;strong&gt;&lt;strong&gt;interactive demo&lt;/strong&gt;&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;History of suffix array construction algorithms:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;1990 first algorithm: Manber and Myers (&lt;a href="#citeproc_bib_item_2"&gt;1993&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;2002 small/large suffixes, explained below: Ko and Aluru (&lt;a href="#citeproc_bib_item_1"&gt;2005&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;2009 recursion only on &lt;em&gt;LMS&lt;/em&gt; suffixes: Nong, Zhang, and Chan (&lt;a href="#citeproc_bib_item_3"&gt;2009&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;a href="http://web.stanford.edu/class/archive/cs/cs166/cs166.1196/lectures/04/Small04.pdf" class="external-link" target="_blank" rel="noopener"&gt;These slides&lt;/a&gt; from Stanford are a nice reference for the last algorithm.&lt;/p&gt;</description></item></channel></rss>