<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Survey on CuriousCoding</title><link>https://curiouscoding.nl/categories/survey/</link><description>Recent content in Survey on CuriousCoding</description><generator>Hugo</generator><language>en</language><lastBuildDate>Mon, 11 May 2026 00:00:00 +0200</lastBuildDate><atom:link href="https://curiouscoding.nl/categories/survey/index.xml" rel="self" type="application/rss+xml"/><item><title>Range Minimum Queries</title><link>https://curiouscoding.nl/teaching/rmq-notes/</link><pubDate>Mon, 11 May 2026 00:00:00 +0200</pubDate><guid>https://curiouscoding.nl/teaching/rmq-notes/</guid><description>&lt;div class="ox-hugo-toc toc has-section-numbers"&gt;
&lt;div class="heading"&gt;Table of Contents&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="section-num"&gt;1&lt;/span&gt; &lt;a href="#naive" &gt;Naive&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="section-num"&gt;1.1&lt;/span&gt; &lt;a href="#compute-on-the-fly" &gt;Compute on-the-fly&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;1.2&lt;/span&gt; &lt;a href="#precompute-everything" &gt;Precompute everything&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2&lt;/span&gt; &lt;a href="#space-lower-bounds" &gt;Space lower bounds&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;3&lt;/span&gt; &lt;a href="#sparse-table-o--1--queries-o--n-log-2-n--words" &gt;Sparse Table: \(O(1)\) queries, \(O(n \log_2 n)\) words&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;4&lt;/span&gt; &lt;a href="#segment-tree-o--log-n--queries-o--n--words" &gt;Segment Tree: \(O(\log n)\) queries, \(O(n)\) words&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;5&lt;/span&gt; &lt;a href="#blocks-o--s--queries-o--n-s-log-n--words" &gt;Blocks: \(O(s)\) queries, \(O(n/s \log n)\) words&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;6&lt;/span&gt; &lt;a href="#cartesian-trees-o--1--queries-o--n--words" &gt;Cartesian Trees: \(O(1)\) queries, \(O(n)\) words&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;7&lt;/span&gt; &lt;a href="#recursive-cartesian-trees-o--1--queries-2n-plus-o--n--bits" &gt;Recursive Cartesian Trees: \(O(1)\) queries, \(2n+o(n)\) bits&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;8&lt;/span&gt; &lt;a href="#blackboard" &gt;Blackboard&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;!--endtoc--&gt;
&lt;p&gt;\[
\DeclareMathOperator*{\argmin}{arg\,min}
\newcommand{\rmq}{\mathsf{rmq}}
\]&lt;/p&gt;
&lt;p&gt;These notes are based on slides by Florian Kurpicz and the paper by
Fischer and Heun (&lt;a href="#citeproc_bib_item_1"&gt;2011&lt;/a&gt;) that introduces a number of optimal implementations.&lt;/p&gt;</description></item><item><title>Wheeler graphs</title><link>https://curiouscoding.nl/posts/wheeler-graphs/</link><pubDate>Thu, 26 Feb 2026 00:00:00 +0100</pubDate><guid>https://curiouscoding.nl/posts/wheeler-graphs/</guid><description>&lt;div class="ox-hugo-toc toc has-section-numbers"&gt;
&lt;div class="heading"&gt;Table of Contents&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="section-num"&gt;1&lt;/span&gt; &lt;a href="#deterministic-finite-automaton--dfa" &gt;Deterministic Finite Automaton (DFA)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2&lt;/span&gt; &lt;a href="#wheeler-dfa" &gt;Wheeler-DFA&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;3&lt;/span&gt; &lt;a href="#linear-graphs-prefix-array" &gt;Linear graphs: Prefix array&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;4&lt;/span&gt; &lt;a href="#de-bruijn-graphs-are-wheeler" &gt;De Bruijn graphs are Wheeler&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;5&lt;/span&gt; &lt;a href="#not-all-dfas-are-wheeler" &gt;Not all DFAs are Wheeler&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;6&lt;/span&gt; &lt;a href="#locating-patterns-via-binary-search" &gt;Locating patterns via binary search&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;7&lt;/span&gt; &lt;a href="#locating-patterns-via-the-boss-table" &gt;Locating patterns via the BOSS table&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;!--endtoc--&gt;
&lt;p&gt;These are some notes on Wheeler DFAs after chatting with Nicola
Prezza&lt;sup id="fnref:1"&gt;&lt;a href="#fn:1" class="footnote-ref" role="doc-noteref"&gt;1&lt;/a&gt;&lt;/sup&gt; and others
from the RAVEN lab at DSB 2026 in Venice.&lt;/p&gt;
&lt;h3 id="deterministic-finite-automaton--dfa"&gt;
 &lt;span class="section-num"&gt;1&lt;/span&gt; Deterministic Finite Automaton (DFA)
 &lt;a class="heading-link" href="#deterministic-finite-automaton--dfa"&gt;
 &lt;i class="fa-solid fa-link" aria-hidden="true" title="Link to heading"&gt;&lt;/i&gt;
 &lt;span class="sr-only"&gt;Link to heading&lt;/span&gt;
 &lt;/a&gt;
&lt;/h3&gt;
&lt;p&gt;A DFA is a graph where edges are labelled by characters.
Each node can have at most one outgoing edge with each label. (Otherwise it
would be &lt;em&gt;non-deterministic&lt;/em&gt;.)&lt;/p&gt;</description></item><item><title>Recent results on hash tables</title><link>https://curiouscoding.nl/posts/hash-table-bounds/</link><pubDate>Wed, 28 Jan 2026 00:00:00 +0100</pubDate><guid>https://curiouscoding.nl/posts/hash-table-bounds/</guid><description>&lt;div class="ox-hugo-toc toc has-section-numbers"&gt;
&lt;div class="heading"&gt;Table of Contents&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="section-num"&gt;1&lt;/span&gt; &lt;a href="#types-of-hash-tables" &gt;Types of hash tables&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2&lt;/span&gt; &lt;a href="#metrics" &gt;Metrics&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;3&lt;/span&gt; &lt;a href="#lower-bounds" &gt;Lower bounds&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;4&lt;/span&gt; &lt;a href="#iceberg-hashing--2023" &gt;Iceberg hashing (2023)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;5&lt;/span&gt; &lt;a href="#iceberg-ht" &gt;IcebergHT (2023)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;6&lt;/span&gt; &lt;a href="#tight-bounds-for-classical-open-addressing--2024" &gt;Tight Bounds for Classical Open Addressing (2024)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;7&lt;/span&gt; &lt;a href="#optimal-non-oblivious-open-addressing--2025" &gt;Optimal Non-oblivious Open Addressing (2025)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;8&lt;/span&gt; &lt;a href="#optimal-bounds-for-open-addressing-without-reordering" &gt;Optimal Bounds for Open Addressing Without Reordering&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="section-num"&gt;8.1&lt;/span&gt; &lt;a href="#funnel-hashing" &gt;Funnel hashing&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;8.2&lt;/span&gt; &lt;a href="#elastic-hashing" &gt;Elastic hashing&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;!--endtoc--&gt;
&lt;p&gt;\[
\newcommand{\inv}{^{-1}}
\newcommand{\di}{\delta\inv}
\newcommand{\poly}{\mathrm{poly}}
\]&lt;/p&gt;
&lt;p&gt;This post summarizes some recent results and idea on various types hash tables.
Collected together with Stefan Walzer.&lt;/p&gt;</description></item><item><title>Overview of static data structures</title><link>https://curiouscoding.nl/posts/static-data-structures/</link><pubDate>Wed, 17 Dec 2025 00:00:00 +0100</pubDate><guid>https://curiouscoding.nl/posts/static-data-structures/</guid><description>&lt;div class="ox-hugo-toc toc has-section-numbers"&gt;
&lt;div class="heading"&gt;Table of Contents&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="section-num"&gt;1&lt;/span&gt; &lt;a href="#classification-of-static-data-structures" &gt;Classification of static data structures&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2&lt;/span&gt; &lt;a href="#space-lower-bounds-and-practical-approaches" &gt;Space lower bounds and practical approaches&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2.1&lt;/span&gt; &lt;a href="#rank" &gt;Rank&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2.2&lt;/span&gt; &lt;a href="#rank-plus-select" &gt;Rank + Select&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2.3&lt;/span&gt; &lt;a href="#minimal-perfect-hash-function--mphf" &gt;Minimal perfect hash function (MPHF)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2.4&lt;/span&gt; &lt;a href="#monotone-mphf" &gt;&lt;span class="org-todo todo TODO"&gt;TODO&lt;/span&gt; Monotone MPHF&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2.5&lt;/span&gt; &lt;a href="#order-preserving-mphf" &gt;&lt;span class="org-todo todo TODO"&gt;TODO&lt;/span&gt; Order-preserving MPHF&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2.6&lt;/span&gt; &lt;a href="#static-retrieval-static-function-with-static-values" &gt;Static retrieval: Static function with static values&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2.7&lt;/span&gt; &lt;a href="#updatable-retrieval-static-function-with-mutable-values" &gt;Updatable retrieval: Static function with mutable values&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2.8&lt;/span&gt; &lt;a href="#static-set--membership" &gt;Static set (membership)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2.9&lt;/span&gt; &lt;a href="#static-ordered-set" &gt;Static ordered set&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2.10&lt;/span&gt; &lt;a href="#static-dictionary-static-keys-and-values" &gt;Static dictionary: static keys and values&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2.11&lt;/span&gt; &lt;a href="#updatable-dictionary-with-mutable-values" &gt;Updatable dictionary with mutable values&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2.12&lt;/span&gt; &lt;a href="#dynamic-dictionary-with-mutable-keys-and-values" &gt;&lt;span class="org-todo todo TODO"&gt;TODO&lt;/span&gt; Dynamic dictionary with mutable keys and values&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2.13&lt;/span&gt; &lt;a href="#static-filter" &gt;&lt;span class="org-todo todo TODO"&gt;TODO&lt;/span&gt; Static filter&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2.14&lt;/span&gt; &lt;a href="#ordered-static-updatable-dynamic-dictionary" &gt;&lt;span class="org-todo todo TODO"&gt;TODO&lt;/span&gt; Ordered static/updatable/dynamic dictionary?&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;3&lt;/span&gt; &lt;a href="#summary" &gt;Summary table&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;!--endtoc--&gt;
&lt;p&gt;\[
\newcommand{\K}{\mathbb K}
\newcommand{\V}{\mathbb V}
\newcommand{\c}[1]{\mathbf{\mathsf{#1}}}
\]&lt;/p&gt;</description></item><item><title>A History of Pairwise Alignment</title><link>https://curiouscoding.nl/posts/pairwise-alignment/</link><pubDate>Wed, 09 Apr 2025 00:00:00 +0200</pubDate><guid>https://curiouscoding.nl/posts/pairwise-alignment/</guid><description>&lt;div class="ox-hugo-toc toc has-section-numbers"&gt;
&lt;div class="heading"&gt;Table of Contents&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="section-num"&gt;1&lt;/span&gt; &lt;a href="#a-brief-history" &gt;A Brief History&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="section-num"&gt;1.1&lt;/span&gt; &lt;a href="#a-pa" &gt;A*PA&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;1.2&lt;/span&gt; &lt;a href="#a-pa2" &gt;A*PA2&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;1.3&lt;/span&gt; &lt;a href="#overview" &gt;Overview&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2&lt;/span&gt; &lt;a href="#problem-statement" &gt;Problem Statement&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;3&lt;/span&gt; &lt;a href="#alignment-types" &gt;Alignment types&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;4&lt;/span&gt; &lt;a href="#cost-models" &gt;Cost Models&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="section-num"&gt;4.1&lt;/span&gt; &lt;a href="#minimizing-cost-versus-maximizing-score" &gt;Minimizing Cost versus Maximizing Score&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;5&lt;/span&gt; &lt;a href="#dp" &gt;The Classic DP Algorithms&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;6&lt;/span&gt; &lt;a href="#linear-memory-using-divide-and-conquer" &gt;Linear Memory using Divide and Conquer&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;7&lt;/span&gt; &lt;a href="#graphs" &gt;Dijkstra&amp;rsquo;s Algorithm and A*&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;8&lt;/span&gt; &lt;a href="#computational-volumes" &gt;Computational Volumes and Band Doubling&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;9&lt;/span&gt; &lt;a href="#diagonal-transition" &gt;Diagonal Transition&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;10&lt;/span&gt; &lt;a href="#parallelism" &gt;Parallelism&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;11&lt;/span&gt; &lt;a href="#lcs-and-contours" &gt;LCS and Contours&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;12&lt;/span&gt; &lt;a href="#some-tools" &gt;Some Tools&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;13&lt;/span&gt; &lt;a href="#subquadratic-methods-and-lower-bounds" &gt;Subquadratic Methods and Lower Bounds&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;14&lt;/span&gt; &lt;a href="#summary" &gt;Summary&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;!--endtoc--&gt;
&lt;p&gt;This is Chapter 2 of my &lt;a href="https://curiouscoding.nl/posts/thesis/" &gt;thesis&lt;/a&gt; (&lt;a href="#citeproc_bib_item_27"&gt;Groot Koerkamp 2025&lt;/a&gt;), to introduce the first part on Pairwise Alignment.
Please cite the thesis instead of this post.&lt;/p&gt;</description></item><item><title>Low Density Minimizers</title><link>https://curiouscoding.nl/posts/minimizers/</link><pubDate>Tue, 08 Apr 2025 00:00:00 +0200</pubDate><guid>https://curiouscoding.nl/posts/minimizers/</guid><description>&lt;div class="ox-hugo-toc toc has-section-numbers"&gt;
&lt;div class="heading"&gt;Table of Contents&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="section-num"&gt;1&lt;/span&gt; &lt;a href="#theory-of-sampling-schemes" &gt;Theory of Sampling Schemes&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="section-num"&gt;1.1&lt;/span&gt; &lt;a href="#introduction" &gt;Introduction&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;1.2&lt;/span&gt; &lt;a href="#overview" &gt;Overview&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;1.3&lt;/span&gt; &lt;a href="#theory-of-sampling-schemes" &gt;Theory of sampling schemes&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;1.4&lt;/span&gt; &lt;a href="#notation" &gt;Notation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;1.5&lt;/span&gt; &lt;a href="#types-of-sampling-schemes" &gt;Types of sampling schemes&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;1.6&lt;/span&gt; &lt;a href="#computing-the-density" &gt;Computing the density&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;1.7&lt;/span&gt; &lt;a href="#random-mini-density" &gt;The density of random minimizers&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;1.8&lt;/span&gt; &lt;a href="#universal-hitting-sets" &gt;Universal hitting sets&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;1.9&lt;/span&gt; &lt;a href="#asymptotic-results" &gt;Asymptotic results&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;1.10&lt;/span&gt; &lt;a href="#variants" &gt;Variants&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2&lt;/span&gt; &lt;a href="#lower-bounds" &gt;Lower Bounds on Sampling Scheme Density&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2.1&lt;/span&gt; &lt;a href="#schleimer-et-al-dot-s-bound" &gt;Schleimer et al.&amp;rsquo;s bound&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2.2&lt;/span&gt; &lt;a href="#mar%c3%a7ais-et-al-dot-s-bound" &gt;Marçais et al.&amp;rsquo;s bound&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2.3&lt;/span&gt; &lt;a href="#improving-and-extending-mar%c3%a7ais-et-al-dot-s-bound" &gt;Improving and extending Marçais et al.&amp;rsquo;s bound&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2.4&lt;/span&gt; &lt;a href="#near-tight-lb" &gt;A near-tight lower bound on the density of forward sampling schemes&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2.5&lt;/span&gt; &lt;a href="#lower-bound-eval" &gt;Discussion&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;3&lt;/span&gt; &lt;a href="#sampling-schemes" &gt;Practical Sampling Schemes&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="section-num"&gt;3.1&lt;/span&gt; &lt;a href="#lexmin" &gt;Variants of lexicographic minimizers&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#lex-eval" &gt;Evaluation&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;3.2&lt;/span&gt; &lt;a href="#uhs-inspired-schemes" &gt;UHS-inspired schemes&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;3.3&lt;/span&gt; &lt;a href="#syncmer-based-schemes" &gt;Syncmer-based schemes&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;3.4&lt;/span&gt; &lt;a href="#open-closed-minimizer" &gt;Open-closed minimizer&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#oc-eval" &gt;Evaluation&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;3.5&lt;/span&gt; &lt;a href="#modmini" &gt;Mod-minimizer&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#theoretical-density" &gt;Theoretical density&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#modmini-eval" &gt;Evaluation&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;3.6&lt;/span&gt; &lt;a href="#sampling-schemes-discussion" &gt;Discussion&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;4&lt;/span&gt; &lt;a href="#selection-schemes" &gt;Towards Optimal Selection Schemes&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="section-num"&gt;4.1&lt;/span&gt; &lt;a href="#bd-anchors" &gt;Bidirectional anchors&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;4.2&lt;/span&gt; &lt;a href="#sus-anchors" &gt;Sus-anchors&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#sus-anchor-eval" &gt;Evaluation&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;4.3&lt;/span&gt; &lt;a href="#selection-schemes-discussion" &gt;Discussion&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;!--endtoc--&gt;
&lt;p&gt;This is Part 2 of my &lt;a href="https://curiouscoding.nl/posts/thesis/" &gt;thesis&lt;/a&gt; (&lt;a href="#citeproc_bib_item_15"&gt;Groot Koerkamp 2025&lt;/a&gt;), containing chapters 6 to 9 on Low Density Minimizers.
Please cite the thesis instead of this post.&lt;/p&gt;</description></item><item><title>SimdSketch: a fast bucket sketch</title><link>https://curiouscoding.nl/posts/simd-sketch/</link><pubDate>Sun, 09 Mar 2025 00:00:00 +0100</pubDate><guid>https://curiouscoding.nl/posts/simd-sketch/</guid><description>&lt;div class="ox-hugo-toc toc has-section-numbers"&gt;
&lt;div class="heading"&gt;Table of Contents&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="section-num"&gt;1&lt;/span&gt; &lt;a href="#jaccard-similarity" &gt;Jaccard similarity&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2&lt;/span&gt; &lt;a href="#hash-schemes" &gt;Hash schemes&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2.1&lt;/span&gt; &lt;a href="#minhash" &gt;MinHash&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2.2&lt;/span&gt; &lt;a href="#s-mins-sketch" &gt;$s$-mins sketch&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2.3&lt;/span&gt; &lt;a href="#bottom-s" &gt;Bottom-\(s\) sketch&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2.4&lt;/span&gt; &lt;a href="#fracminhash" &gt;FracMinHash&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2.5&lt;/span&gt; &lt;a href="#bucket-sketch" &gt;Bucket sketch&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2.6&lt;/span&gt; &lt;a href="#mod-bucket-hash--new" &gt;Mod-bucket hash (new?)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2.7&lt;/span&gt; &lt;a href="#variants" &gt;Variants&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;3&lt;/span&gt; &lt;a href="#compressing-sketches" &gt;Compressing sketches&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="section-num"&gt;3.1&lt;/span&gt; &lt;a href="#b-bit-hashing" &gt;$b$-bit hashing&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="section-num"&gt;3.1.1&lt;/span&gt; &lt;a href="#accounting-for-collisions" &gt;Accounting for collisions&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;3.2&lt;/span&gt; &lt;a href="#hyperminhash" &gt;HyperMinHash&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;4&lt;/span&gt; &lt;a href="#densification-strategies" &gt;Densification strategies&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;5&lt;/span&gt; &lt;a href="#simdsketch" &gt;SimdSketch&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;6&lt;/span&gt; &lt;a href="#evaluation" &gt;Evaluation&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="section-num"&gt;6.1&lt;/span&gt; &lt;a href="#setup" &gt;Setup&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="section-num"&gt;6.1.1&lt;/span&gt; &lt;a href="#tools" &gt;Tools&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;6.1.2&lt;/span&gt; &lt;a href="#inputs" &gt;Inputs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;6.1.3&lt;/span&gt; &lt;a href="#parameters" &gt;Parameters&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;6.1.4&lt;/span&gt; &lt;a href="#metrics" &gt;Metrics&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;6.2&lt;/span&gt; &lt;a href="#raw-results" &gt;Raw results&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;6.3&lt;/span&gt; &lt;a href="#correlation" &gt;Correlation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;6.4&lt;/span&gt; &lt;a href="#comparison-speed" &gt;Comparison speed&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;6.5&lt;/span&gt; &lt;a href="#low-similarity-data" &gt;Low-similarity data&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;7&lt;/span&gt; &lt;a href="#discussion" &gt;Discussion&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;8&lt;/span&gt; &lt;a href="#future-work" &gt;&lt;span class="org-todo todo TODO"&gt;TODO&lt;/span&gt; / Future work&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;!--endtoc--&gt;
&lt;p&gt;\[
\newcommand{\sketch}{\mathsf{sketch}}
\]&lt;/p&gt;</description></item><item><title>Types of tigs</title><link>https://curiouscoding.nl/posts/tigs/</link><pubDate>Sun, 09 Mar 2025 00:00:00 +0100</pubDate><guid>https://curiouscoding.nl/posts/tigs/</guid><description>&lt;h3 id="de-bruijn-graph"&gt;
 &lt;span class="section-num"&gt;1&lt;/span&gt; De Bruijn graph
 &lt;a class="heading-link" href="#de-bruijn-graph"&gt;
 &lt;i class="fa-solid fa-link" aria-hidden="true" title="Link to heading"&gt;&lt;/i&gt;
 &lt;span class="sr-only"&gt;Link to heading&lt;/span&gt;
 &lt;/a&gt;
&lt;/h3&gt;
&lt;p&gt;Consider an edge-centric De Bruijn graph, where each edge corresponds to a
k-mer, and nodes are the \(k-1\) overlaps between adjacent k-mers. In the figures,
all edges are directed towards the right.&lt;/p&gt;
&lt;figure class="inset medium"&gt;&lt;img src="https://curiouscoding.nl/ox-hugo/graph.svg"&gt;
&lt;/figure&gt;

&lt;h3 id="k-mers"&gt;
 &lt;span class="section-num"&gt;2&lt;/span&gt; k-mers
 &lt;a class="heading-link" href="#k-mers"&gt;
 &lt;i class="fa-solid fa-link" aria-hidden="true" title="Link to heading"&gt;&lt;/i&gt;
 &lt;span class="sr-only"&gt;Link to heading&lt;/span&gt;
 &lt;/a&gt;
&lt;/h3&gt;
&lt;p&gt;The goal is now to store all edges / k-mers of the graph efficiently.
A &lt;em&gt;spectrum preserving string set&lt;/em&gt; (SPSS) is a set of strings whose k-mers are
the k-mers of the input graph, that does not contain duplicate k-mers (&lt;a href="#citeproc_bib_item_2"&gt;Rahman and Medvedev 2020&lt;/a&gt;).&lt;/p&gt;</description></item><item><title>Minimizer papers</title><link>https://curiouscoding.nl/posts/minimizer-papers/</link><pubDate>Mon, 17 Feb 2025 00:00:00 +0100</pubDate><guid>https://curiouscoding.nl/posts/minimizer-papers/</guid><description>&lt;div class="ox-hugo-toc toc has-section-numbers"&gt;
&lt;div class="heading"&gt;Table of Contents&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="section-num"&gt;1&lt;/span&gt; &lt;a href="#overview" &gt;Overview&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2&lt;/span&gt; &lt;a href="#introduction" &gt;Introduction&lt;/a&gt;
- &lt;a href="#previous-reviews" &gt;Previous reviews&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;3&lt;/span&gt; &lt;a href="#theory-of-sampling-schemes" &gt;Theory of sampling schemes&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="section-num"&gt;3.1&lt;/span&gt; &lt;a href="#questions" &gt;Questions&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;3.2&lt;/span&gt; &lt;a href="#types-of-schemes" &gt;Types of schemes&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;3.3&lt;/span&gt; &lt;a href="#parameter-regimes" &gt;Parameter regimes&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;3.4&lt;/span&gt; &lt;a href="#different-perspectives" &gt;Different perspectives&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;3.5&lt;/span&gt; &lt;a href="#uhs-vs-minimizer-scheme" &gt;UHS vs minimizer scheme&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;3.6&lt;/span&gt; &lt;a href="#asymptotic--bounds" &gt;(Asymptotic) bounds&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;3.7&lt;/span&gt; &lt;a href="#lower-bounds" &gt;Lower bounds&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;4&lt;/span&gt; &lt;a href="#minimizer-schemes" &gt;Minimizer schemes&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="section-num"&gt;4.1&lt;/span&gt; &lt;a href="#orders" &gt;Orders&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;4.2&lt;/span&gt; &lt;a href="#uhs-based-and-search-based-schemes" &gt;UHS-based and search-based schemes&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;4.3&lt;/span&gt; &lt;a href="#pure-schemes" &gt;Pure schemes&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;4.4&lt;/span&gt; &lt;a href="#other-variants" &gt;Other variants&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#selection-schemes" &gt;Selection schemes&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#canonical-minimizers" &gt;Canonical minimizers&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;4.5&lt;/span&gt; &lt;a href="#non-overlapping-string-sets" &gt;Non-overlapping string sets&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;!--endtoc--&gt;
&lt;p&gt;This post is simply a list of brief comments on many papers related to
minimizers, and forms the basis of &lt;a href="https://curiouscoding.nl/posts/minimizers/" &gt;/posts/minimizers/&lt;/a&gt;.&lt;/p&gt;</description></item><item><title>FM-index implementations</title><link>https://curiouscoding.nl/posts/fm-index-implementations/</link><pubDate>Wed, 02 Oct 2024 00:00:00 +0200</pubDate><guid>https://curiouscoding.nl/posts/fm-index-implementations/</guid><description>&lt;div class="ox-hugo-toc toc"&gt;
&lt;div class="heading"&gt;Table of Contents&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#a-note-on-sdsl-versions" &gt;A note on SDSL versions&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;!--endtoc--&gt;
&lt;p&gt;Here I&amp;rsquo;ll briefly list some FM-index and related implementations around the web.
Implementations seem relatively inconsistent, mostly because the FM-index is
more of a &amp;lsquo;wrapper&amp;rsquo; type around a given Burrows-Wheeler-transform and an
&lt;em&gt;occurrences&lt;/em&gt; list implementation. Both can be implemented in various ways. In particular
occurrences should be stored using a wavelet tree for optimal compression.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The &lt;a href="https://github.com/wafflespeanut/nucleic-acid/blob/2adbf5181081245423f974a88b5ccf53d7bf26ac/src/bwt.rs#L96" class="external-link" target="_blank" rel="noopener"&gt;nucleic-acid repo&lt;/a&gt; contains a completely unoptimised version.&lt;/li&gt;
&lt;li&gt;The Rust-bio crate contains a &lt;a href="https://github.com/rust-bio/rust-bio/blob/master/src/data_structures/fmindex.rs#L209" class="external-link" target="_blank" rel="noopener"&gt;generic FM-index&lt;/a&gt;. It stores a &lt;a href="https://github.com/rust-bio/rust-bio/blob/master/src/data_structures/bwt.rs#L75-L94" class="external-link" target="_blank" rel="noopener"&gt;sampled
occurrences array&lt;/a&gt;, so that space is relatively small but lookups take \(O(k)\)
time for sampling factor \(k\).&lt;/li&gt;
&lt;li&gt;SDSL-lite contains a &lt;a href="https://github.com/simongog/sdsl-lite/blob/c32874cb2d8524119f25f3b501526fe692df29f4/include/sdsl/wavelet_" class="external-link" target="_blank" rel="noopener"&gt;wavelet tree&lt;/a&gt; and &lt;a href="https://github.com/simongog/sdsl-lite/blob/master/include/sdsl/csa_wt.hpp#L48" class="external-link" target="_blank" rel="noopener"&gt;compressed suffix array&lt;/a&gt; implementation based
on it, that provides the same functionality as an FM-index.&lt;/li&gt;
&lt;li&gt;There is the &lt;a href="https://github.com/rossanoventurini/qwt" class="external-link" target="_blank" rel="noopener"&gt;Quad Wavelet Tree&lt;/a&gt; (QWT) Rust crate (Ceregini, Kurpicz, and Venturini 2024). This uses a 4-ary
tree instead of the usual binary wavelet tree, and improves latency by around
a factor 2 over SDSL wavelet trees.&lt;/li&gt;
&lt;li&gt;Dominik Kempa has the &lt;a href="https://github.com/dominikkempa/faster-minuter?tab=readme-ov-file" class="external-link" target="_blank" rel="noopener"&gt;Faster-Minuter index&lt;/a&gt; (Gog et al. 2019) that contains
an improved wavelet tree as well.&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/achacond/gem-cutter" class="external-link" target="_blank" rel="noopener"&gt;GEM-Cutter&lt;/a&gt; contain a GPU implementation of the FM-index (Chacon et al. 2015).&lt;/li&gt;
&lt;li&gt;There is also &lt;a href="https://github.com/lh3/ropebwt3" class="external-link" target="_blank" rel="noopener"&gt;RopeBWT3&lt;/a&gt; (Li 2024), which is basically a run-length
compressed BWT with a B+ tree on top for fast queries.&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/UM-Applied-Algorithms-Lab/AWRY" class="external-link" target="_blank" rel="noopener"&gt;AWRY&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="a-note-on-sdsl-versions"&gt;
 A note on SDSL versions
 &lt;a class="heading-link" href="#a-note-on-sdsl-versions"&gt;
 &lt;i class="fa-solid fa-link" aria-hidden="true" title="Link to heading"&gt;&lt;/i&gt;
 &lt;span class="sr-only"&gt;Link to heading&lt;/span&gt;
 &lt;/a&gt;
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/simongog/sdsl" class="external-link" target="_blank" rel="noopener"&gt;github:simongog/sdsl&lt;/a&gt; is the original, with last commit in 2013.&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/simongog/sdsl-lite" class="external-link" target="_blank" rel="noopener"&gt;github:simongog/sdsl-lite&lt;/a&gt; is v2, with last commit in 2019, and seems the most
used currently.&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/xxsds/sdsl-lite" class="external-link" target="_blank" rel="noopener"&gt;github:xxsds/sdsl-lite&lt;/a&gt; is v3 and seems to be actively maintained at the time
of writing (Jan 2025), and is &lt;a href="https://www.reddit.com/r/rust/comments/nlxhym/comment/gzpqejn/?utm_source=share&amp;amp;utm_medium=web3x&amp;amp;utm_name=web3xcss&amp;amp;utm_term=1&amp;amp;utm_content=share_button" class="external-link" target="_blank" rel="noopener"&gt;recommended&lt;/a&gt; by the original developers. From a
quick glance, I think it&amp;rsquo;s somewhat restructured and truly a v3, not just a v2.1.
However, it seems to be much less popular.&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/vgteam/sdsl-lite" class="external-link" target="_blank" rel="noopener"&gt;github:vgteam/sdsl-lite&lt;/a&gt; is a fork of the original &lt;code&gt;sdsl-lite&lt;/code&gt;, with, I think,
a number of small bug fixes and some updates for recent compiler versions.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Then there are also some rust versions:&lt;/p&gt;</description></item><item><title>PACE 24</title><link>https://curiouscoding.nl/posts/pace24/</link><pubDate>Thu, 05 Sep 2024 00:00:00 +0200</pubDate><guid>https://curiouscoding.nl/posts/pace24/</guid><description>&lt;div class="ox-hugo-toc toc has-section-numbers"&gt;
&lt;div class="heading"&gt;Table of Contents&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="section-num"&gt;1&lt;/span&gt; &lt;a href="#general-observations" &gt;General observations&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2&lt;/span&gt; &lt;a href="#heuristic-track" &gt;Heuristic track&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;3&lt;/span&gt; &lt;a href="#parameterized-track" &gt;Parameterized track&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;4&lt;/span&gt; &lt;a href="#exact-track" &gt;Exact track&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;!--endtoc--&gt;
&lt;p&gt;In this post I will collect some high level ideas and approaches used to solve
the PACE 2024 challenge.
Very briefly, the goal is to write fast solvers for NP-hard problems. The
problem for the &lt;a href="https://pacechallenge.org/2024/" class="external-link" target="_blank" rel="noopener"&gt;2024 edition is one-side crossing minimization&lt;/a&gt;: Given is a
bipartite graph \((A, B)\) that is drawn in standard way with the nodes of both
\(A\) and \(B\) on a line, where the order of the nodes of \(A\) is fixed. The goal is
to find a permutation of \(B\) that minimizes the number of edge crossings when
all edges are drawn as straight lines.&lt;/p&gt;</description></item><item><title>Tools for suffix array searching</title><link>https://curiouscoding.nl/posts/suffix-array-searching/</link><pubDate>Fri, 14 Jun 2024 00:00:00 +0200</pubDate><guid>https://curiouscoding.nl/posts/suffix-array-searching/</guid><description>&lt;div class="ox-hugo-toc toc has-section-numbers"&gt;
&lt;div class="heading"&gt;Table of Contents&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="section-num"&gt;1&lt;/span&gt; &lt;a href="#sapling" &gt;Sapling&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2&lt;/span&gt; &lt;a href="#pla-index" &gt;PLA-Index&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;3&lt;/span&gt; &lt;a href="#lisa-learned-index" &gt;LISA: learned index&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;!--endtoc--&gt;
&lt;p&gt;Let&amp;rsquo;s summarize some tools for efficiently searching suffix arrays.&lt;/p&gt;
&lt;h2 id="sapling"&gt;
 &lt;span class="section-num"&gt;1&lt;/span&gt; Sapling
 &lt;a class="heading-link" href="#sapling"&gt;
 &lt;i class="fa-solid fa-link" aria-hidden="true" title="Link to heading"&gt;&lt;/i&gt;
 &lt;span class="sr-only"&gt;Link to heading&lt;/span&gt;
 &lt;/a&gt;
&lt;/h2&gt;
&lt;p&gt;Sapling (&lt;a href="#citeproc_bib_item_2"&gt;Kirsche, Das, and Schatz 2020&lt;/a&gt;) works as follows:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Choose a parameter \(p\) store for each of the \(2^p\) &lt;strong&gt;$p$-bit prefixes&lt;/strong&gt; the
corresponding position in the suffix array.&lt;/li&gt;
&lt;li&gt;When querying, first find the bucket for the query prefix. Then do a &lt;strong&gt;linear
interpolation&lt;/strong&gt; inside the bucket.&lt;/li&gt;
&lt;li&gt;Search the area \([-E, +E]\) around the interpolated position, where \(E\) is a
bound on the error of the linear approximation. In practice \(E\) is only a
$95\%$-confidence bound, and if the true value is not in the range, a linear
search with steps of size \(E\) is done.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The paper also introduces a neural network approach to approximating buckets,
but this takes over a day to learn and is slower to query in practice.&lt;/p&gt;</description></item><item><title>Crates for suffix array construction</title><link>https://curiouscoding.nl/posts/suffix-array-crates/</link><pubDate>Thu, 13 Jun 2024 00:00:00 +0200</pubDate><guid>https://curiouscoding.nl/posts/suffix-array-crates/</guid><description>&lt;p&gt;Popular C libraries are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/y-256/libdivsufsort" class="external-link" target="_blank" rel="noopener"&gt;divsufsort&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/IlyaGrebnov/libsais" class="external-link" target="_blank" rel="noopener"&gt;libsais&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Both have a &lt;code&gt;..64&lt;/code&gt; variant that supports input strings longer than &lt;code&gt;2GB&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Rust wrappers:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://crates.io/crates/divsufsort" class="external-link" target="_blank" rel="noopener"&gt;divsufsort&lt;/a&gt;: rust reimplementation, does not support large inputs.&lt;/li&gt;
&lt;li&gt;&lt;a href="https://crates.io/crates/cdivsufsort" class="external-link" target="_blank" rel="noopener"&gt;cdivsufsort&lt;/a&gt;: c-wrapper, does not support large inputs&lt;/li&gt;
&lt;li&gt;&lt;a href="https://crates.io/crates/libdivsufsort-rs" class="external-link" target="_blank" rel="noopener"&gt;livdivsufsort-rs&lt;/a&gt;: c-wrapper, &lt;strong&gt;does&lt;/strong&gt; support large inputs&lt;/li&gt;
&lt;li&gt;&lt;a href="https://crates.io/crates/sais" class="external-link" target="_blank" rel="noopener"&gt;sais&lt;/a&gt;: unrelated to the original library; does not implement a linear time
algorithm anyway&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/Daniel-Liu-c0deb0t/libsais-rs" class="external-link" target="_blank" rel="noopener"&gt;libsais-rs&lt;/a&gt;: Daniel Liu&amp;rsquo;s fork-of-fork of &lt;a href="https://github.com/hucsmn/libsais-rs" class="external-link" target="_blank" rel="noopener"&gt;the original&lt;/a&gt;, but not on crates.io. Supports multithreading
using OpenMP and wraps both the original and 64bit version.&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/Daniel-Liu-c0deb0t/simple-saca" class="external-link" target="_blank" rel="noopener"&gt;simple-saca&lt;/a&gt;: Daniel Liu&amp;rsquo;s bounded-context suffix array construction that is
faster than divsufsort and libsais, but does not return a true fully sorted
suffix array.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="references"&gt;
 References
 &lt;a class="heading-link" href="#references"&gt;
 &lt;i class="fa-solid fa-link" aria-hidden="true" title="Link to heading"&gt;&lt;/i&gt;
 &lt;span class="sr-only"&gt;Link to heading&lt;/span&gt;
 &lt;/a&gt;
&lt;/h2&gt;
&lt;style&gt;.csl-entry{text-indent: -1.5em; margin-left: 1.5em;}&lt;/style&gt;&lt;div class="csl-bib-body"&gt;
&lt;/div&gt;</description></item><item><title>Mod-minimizers and other minimizers</title><link>https://curiouscoding.nl/posts/mod-minimizers/</link><pubDate>Thu, 18 Jan 2024 00:00:00 +0100</pubDate><guid>https://curiouscoding.nl/posts/mod-minimizers/</guid><description>&lt;div class="ox-hugo-toc toc"&gt;
&lt;div class="heading"&gt;Table of Contents&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#applications" &gt;Applications&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#background" &gt;Background&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#minimizers" &gt;Minimizers&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#density-bounds" &gt;Density bounds&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#robust-minimizers" &gt;Robust minimizers&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#pasha" &gt;PASHA&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#miniception" &gt;Miniception&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#closed-syncmers" &gt;Closed syncmers&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#bd-anchors" &gt;Bd-anchors&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#new-mod-minimizers" &gt;New: Mod-minimizers&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#experiments" &gt;Experiments&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#conclusion" &gt;Conclusion&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#small-k-experiments" &gt;Small k experiments&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#search-methods" &gt;Search methods&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#directed-minimizer" &gt;Directed minimizer&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#k-1-w-2" &gt;\(k=1\), \(w=2\)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#k-1-w-4" &gt;\(k=1\), \(w=4\)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#k-1-w-5" &gt;\(k=1\), \(w=5\)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#k-2-w-2" &gt;\(k=2\), \(w=2\)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#k-2-w-4" &gt;\(k=2\), \(w=4\)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#notes" &gt;Notes&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#reading-list" &gt;Reading list&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;!--endtoc--&gt;
&lt;p&gt;\[
\newcommand{\d}{\mathrm{d}}
\newcommand{\L}{\mathcal{L}}
\]&lt;/p&gt;
&lt;p&gt;This post introduces some background for minimizers and some
experiments for a new minimizer variant. That new variant is now called the
&lt;em&gt;mod-minimizer&lt;/em&gt; and published at WABI24 (&lt;a href="https://doi.org/10.4230/LIPIcs.WABI.2024.11" class="external-link" target="_blank" rel="noopener"&gt;&lt;strong&gt;DOI&lt;/strong&gt;&lt;/a&gt;, &lt;a href="https://curiouscoding.nl/papers/modmini.pdf" &gt;&lt;strong&gt;PDF&lt;/strong&gt;&lt;/a&gt;) (&lt;a href="#citeproc_bib_item_5"&gt;Groot Koerkamp and Pibiri 2024&lt;/a&gt;). The paper
also includes a review of existing methods, including pseudocode for
most of the methods covered below.&lt;/p&gt;</description></item><item><title>Shortest paths, bucket queues, and A* on the edit graph</title><link>https://curiouscoding.nl/posts/shortest_path_history/</link><pubDate>Sat, 29 Jul 2023 00:00:00 +0200</pubDate><guid>https://curiouscoding.nl/posts/shortest_path_history/</guid><description>&lt;div class="ox-hugo-toc toc"&gt;
&lt;div class="heading"&gt;Table of Contents&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#shortest-path-algorithms-dot-dot" &gt;Shortest path algorithms ..&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#dot-dot-in-general" &gt;.. in general&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#dot-dot-for-circuit-design" &gt;.. for circuit design&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#bucket-queues" &gt;Bucket queues&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#shortest-path-algorithms-by-hadlock" &gt;Shortest path algorithms by Hadlock&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#grid-graphs" &gt;Grid graphs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#strings" &gt;Strings&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#spouge-s-computational-volumes" &gt;Spouge&amp;rsquo;s computational volumes&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;!--endtoc--&gt;
&lt;p&gt;This note summarizes some papers I was reading while investigating the history
of A* for pairwise alignment, and related to that the first usage of a &lt;em&gt;bucket
queue&lt;/em&gt;. Schrijver (&lt;a href="#citeproc_bib_item_16"&gt;2012&lt;/a&gt;) provides a nice overview of general shortest path methods.&lt;/p&gt;</description></item><item><title>The complexity and performance of WFA and band doubling</title><link>https://curiouscoding.nl/posts/wfa-edlib-perf/</link><pubDate>Thu, 17 Nov 2022 00:00:00 +0100</pubDate><guid>https://curiouscoding.nl/posts/wfa-edlib-perf/</guid><description>&lt;div class="ox-hugo-toc toc"&gt;
&lt;div class="heading"&gt;Table of Contents&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#complexity-analysis" &gt;Complexity analysis&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#complexity-of-edit-distance" &gt;Complexity of edit distance&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#complexity-of-affine-cost-alignment" &gt;Complexity of affine cost alignment&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#comparison" &gt;Comparison&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#implementation-efficiency" &gt;Implementation efficiency&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#band-doubling-for-affine-scores-was-never-implemented" &gt;Band doubling for affine scores was never implemented&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#wfa-vs-band-doubling-for-affine-costs" &gt;WFA vs band doubling for affine costs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#conclusion" &gt;Conclusion&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#future-work" &gt;Future work&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;!--endtoc--&gt;
&lt;p&gt;This note explores the complexity and performance of band doubling (Edlib) and WFA under varying cost models.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://github.com/Martinsos/edlib" class="external-link" target="_blank" rel="noopener"&gt;Edlib&lt;/a&gt; (&lt;a href="#citeproc_bib_item_5"&gt;Šošić and Šikić 2017&lt;/a&gt;) uses band doubling and runs in \(O(ns)\) time, for sequence length \(n\)
and edit distance \(s\) between the two sequences.&lt;/p&gt;</description></item><item><title>Bidirectional A*</title><link>https://curiouscoding.nl/posts/bidirectional-astar/</link><pubDate>Thu, 28 Jul 2022 17:59:00 +0200</pubDate><guid>https://curiouscoding.nl/posts/bidirectional-astar/</guid><description>&lt;p&gt;These are some links and papers on bidirectional A* variants. Nothing
insightful at the moment.&lt;/p&gt;
&lt;dl&gt;
&lt;dt&gt;&lt;a href="https://www.coursera.org/lecture/algorithms-on-graphs/bidirectional-a-Qel6Q" class="external-link" target="_blank" rel="noopener"&gt;small lecture&lt;/a&gt;&lt;/dt&gt;
&lt;dd&gt;introduces \(h_f(u) = \frac 12 (\pi_f(u) - \pi_r)\). Not found
a paper yet.&lt;/dd&gt;
&lt;dt&gt;An Improved Bidirectional Heuristic Search Algorithm (Champeaux 1977)&lt;/dt&gt;
&lt;dd&gt;introduces a bidirectional variant&lt;/dd&gt;
&lt;dt&gt;Bidirectional Heuristic Search Again (Champeaux 1983)&lt;/dt&gt;
&lt;dd&gt;fixes a bug in the
above paper&lt;/dd&gt;
&lt;dt&gt;Efficient modified bidirectional A* algorithm for optimal route-finding&lt;/dt&gt;
&lt;dd&gt;Didn&amp;rsquo;t read closely yet.&lt;/dd&gt;
&lt;dt&gt;A new bidirectional algorithm for shortest paths (Pijls 2008)&lt;/dt&gt;
&lt;dd&gt;Actually a
new methods. Seems to cite useful papers.
&lt;p&gt;There 2 papers that cite this one may also be interesting.&lt;/p&gt;</description></item><item><title>A* variants</title><link>https://curiouscoding.nl/posts/astar-variants/</link><pubDate>Sun, 12 Jun 2022 12:04:00 +0200</pubDate><guid>https://curiouscoding.nl/posts/astar-variants/</guid><description>&lt;p&gt;These are some quick notes listing papers related to A* itself and variants. In
particular, here I&amp;rsquo;m interested in papers that update \(h\) during the A* search,
as a background for &lt;a href="https://curiouscoding.nl/posts/pruning/" &gt;pruning&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Specifically, our version of pruning increases \(h\) during a &lt;em&gt;single&lt;/em&gt; A* search,
and in fact the heuristic becomes &lt;em&gt;in-admissible&lt;/em&gt; after pruning.&lt;/p&gt;
&lt;h2 id="changing-h"&gt;
 Changing \(h\)
 &lt;a class="heading-link" href="#changing-h"&gt;
 &lt;i class="fa-solid fa-link" aria-hidden="true" title="Link to heading"&gt;&lt;/i&gt;
 &lt;span class="sr-only"&gt;Link to heading&lt;/span&gt;
 &lt;/a&gt;
&lt;/h2&gt;
&lt;p&gt;The original A* paper has a proof of optimality. Later papers consider this also
with heuristics that change their value over time.&lt;/p&gt;</description></item><item><title>A survey of exact global pairwise alignment</title><link>https://curiouscoding.nl/posts/pairwise-alignment-history/</link><pubDate>Fri, 01 Apr 2022 00:00:00 +0200</pubDate><guid>https://curiouscoding.nl/posts/pairwise-alignment-history/</guid><description>&lt;div class="ox-hugo-toc toc"&gt;
&lt;div class="heading"&gt;Table of Contents&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#variants-of-pairwise-alignment" &gt;Variants of pairwise alignment&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#cost-models" &gt;Cost models&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#alignment-types" &gt;Alignment types&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#a-chronological-overview-of-global-pairwise-alignment" &gt;A chronological overview of global pairwise alignment&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#algorithms-in-detail" &gt;Algorithms in detail&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#classic-dp-algorithms" &gt;Classic DP algorithms&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#cubic-dp" &gt;Cubic algorithm of Needleman and Wunsch (&lt;a href="#citeproc_bib_item_25"&gt;1970&lt;/a&gt;)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#quadratic-dp" &gt;A quadratic DP&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#local-alignment" &gt;Local alignment&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#affine-costs" &gt;Affine costs&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#minimizing-vs-dot-maximizing-duality" &gt;Minimizing vs. maximizing duality&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#four-russians" &gt;Four Russians method&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#o--ns--methods" &gt;&lt;span class="org-todo todo TODO"&gt;TODO&lt;/span&gt; \(O(ns)\) methods&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#exponential-band" &gt;&lt;span class="org-todo todo TODO"&gt;TODO&lt;/span&gt; Exponential search on band&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#thresholds" &gt;&lt;span class="org-todo todo TODO"&gt;TODO&lt;/span&gt; LCS: thresholds, $k$-candidates and contours&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#diagonal-transition" &gt;&lt;span class="org-todo todo TODO"&gt;TODO&lt;/span&gt; Diagonal transition: furthest reaching and wavefronts&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#ns2" &gt;&lt;span class="org-todo todo TODO"&gt;TODO&lt;/span&gt; Suffixtree for \(O(n+s^2)\) expected runtime&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#using-less-memory" &gt;Using less memory&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#computing-the-score-in-linear-space" &gt;Computing the score in linear space&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#divide-and-conquer" &gt;Divide-and-conquer&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#lcsk-plus-plus-algorithms" &gt;&lt;span class="org-todo todo TODO"&gt;TODO&lt;/span&gt; LCSk[++] algorithms&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#theoretical-lower-bound" &gt;Theoretical lower bound&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#a-note-on-dp--toposort--vs-dijkstra-vs-a" &gt;&lt;span class="org-todo todo TODO"&gt;TODO&lt;/span&gt; A note on DP (toposort) vs Dijkstra vs A*&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#tools" &gt;&lt;span class="org-todo todo TODO"&gt;TODO&lt;/span&gt; Tools&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#notes-for-other-posts" &gt;&lt;span class="org-todo todo TODO"&gt;TODO&lt;/span&gt; Notes for other posts&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#semi-global-alignment-papers" &gt;Semi-global alignment papers&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#approximate-pairwise-aligners" &gt;Approximate pairwise aligners&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#old-vs-new-papers" &gt;Old vs new papers&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;!--endtoc--&gt;
&lt;p&gt;This post explains the many variants of pairwise alignment, and covers papers
defining and exploring the topic.&lt;/p&gt;</description></item><item><title>Spaced k-mer and assembler methods</title><link>https://curiouscoding.nl/posts/spaced-kmer-review/</link><pubDate>Wed, 14 Jul 2021 00:00:00 +0200</pubDate><guid>https://curiouscoding.nl/posts/spaced-kmer-review/</guid><description>&lt;div class="ox-hugo-toc toc"&gt;
&lt;div class="heading"&gt;Table of Contents&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#spaced-k-mers" &gt;Spaced \(k\)-mers&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#minimap" &gt;Minimap&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#spades" &gt;SPAdes&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#mummer4" &gt;MUMmer4&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#blasr" &gt;BLASR&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#bowtie-2" &gt;Bowtie 2&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#patternhunter" &gt;Patternhunter&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#spaced-seeds-improve-k-mer-based-metagenomic-classification" &gt;Spaced seeds improve \(k\)-mer-based metagenomic classification&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#lomex" &gt;LoMeX&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#meeting-notes" &gt;Meeting notes&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;!--endtoc--&gt;
&lt;p&gt;Concepts:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Mapping&lt;/strong&gt; Map a sequence onto a reference genome/dataset&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Assembly&lt;/strong&gt; Build a genome from a set of reads
&lt;ul&gt;
&lt;li&gt;&lt;em&gt;de novo&lt;/em&gt; (implied): without using a reference genome&lt;/li&gt;
&lt;li&gt;Otherwise just called &lt;em&gt;mapping&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Typical complicating factors:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;read errors&lt;/li&gt;
&lt;li&gt;non-uniform coverage&lt;/li&gt;
&lt;li&gt;insert size variation&lt;/li&gt;
&lt;li&gt;chimeric reads (?)&lt;/li&gt;
&lt;li&gt;bireads&lt;/li&gt;
&lt;li&gt;non-uniform read coverage (as in metagenomics, i.e. multi cell
assembly)&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="spaced-k-mers"&gt;
 Spaced \(k\)-mers
 &lt;a class="heading-link" href="#spaced-k-mers"&gt;
 &lt;i class="fa-solid fa-link" aria-hidden="true" title="Link to heading"&gt;&lt;/i&gt;
 &lt;span class="sr-only"&gt;Link to heading&lt;/span&gt;
 &lt;/a&gt;
&lt;/h2&gt;
&lt;p&gt;Also called&lt;/p&gt;</description></item></channel></rss>