<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Ideas on CuriousCoding</title><link>https://curiouscoding.nl/categories/ideas/</link><description>Recent content in Ideas on CuriousCoding</description><generator>Hugo</generator><language>en</language><lastBuildDate>Sun, 31 May 2026 00:00:00 +0200</lastBuildDate><atom:link href="https://curiouscoding.nl/categories/ideas/index.xml" rel="self" type="application/rss+xml"/><item><title>Chunking for Fasta Parsing</title><link>https://curiouscoding.nl/posts/fasta-parsing/</link><pubDate>Wed, 06 Aug 2025 00:00:00 +0200</pubDate><guid>https://curiouscoding.nl/posts/fasta-parsing/</guid><description>&lt;div class="ox-hugo-toc toc has-section-numbers"&gt;
&lt;div class="heading"&gt;Table of Contents&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="section-num"&gt;1&lt;/span&gt; &lt;a href="#minimizing-critical-sections" &gt;Minimizing critical sections&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2&lt;/span&gt; &lt;a href="#rabbitfx-chunking" &gt;RabbitFx: chunking&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;3&lt;/span&gt; &lt;a href="#different-chunking" &gt;Different chunking&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;4&lt;/span&gt; &lt;a href="#experiments" &gt;Experiments&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;5&lt;/span&gt; &lt;a href="#outlook" &gt;Outlook&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;6&lt;/span&gt; &lt;a href="#some-helicase-benchmarks" &gt;Some helicase benchmarks&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;!--endtoc--&gt;
&lt;p&gt;This is a quick note on some thoughts &amp;amp; experiments on fasta parsing, alongside &lt;a href="https://github.com/RagnarGrootKoerkamp/fasta-parsing-playground" class="external-link" target="_blank" rel="noopener"&gt;github:RagnarGrootKoerkamp/fasta-parsing-playground&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s a common complaint these days that Fasta parsing is slow.
A common parser is Needletail (&lt;a href="https://github.com/onecodex/needletail" class="external-link" target="_blank" rel="noopener"&gt;github&lt;/a&gt;), which builds on seq_io (&lt;a href="https://github.com/markschl/seq_io" class="external-link" target="_blank" rel="noopener"&gt;github&lt;/a&gt;).
Another recent one is paraseq (&lt;a href="https://github.com/noamteyssier/paraseq" class="external-link" target="_blank" rel="noopener"&gt;github&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;Paraseq helps the user by providing an interface for parallel processing of
reads, see eg &lt;a href="https://github.com/noamteyssier/paraseq/blob/main/examples/multi_parallel.rs#L33" class="external-link" target="_blank" rel="noopener"&gt;this example&lt;/a&gt;.
Unfortunately, this still has a bottleneck: while the users processing of reads
is multi threaded, the parsing itself is still single threaded.&lt;/p&gt;</description></item><item><title>Practical minimizers</title><link>https://curiouscoding.nl/posts/practical-minimizers/</link><pubDate>Thu, 12 Sep 2024 00:00:00 +0200</pubDate><guid>https://curiouscoding.nl/posts/practical-minimizers/</guid><description>&lt;div class="ox-hugo-toc toc has-section-numbers"&gt;
&lt;div class="heading"&gt;Table of Contents&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="section-num"&gt;1&lt;/span&gt; &lt;a href="#sampling-schemes" &gt;Sampling schemes&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="section-num"&gt;1.1&lt;/span&gt; &lt;a href="#definitions" &gt;Definitions&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;1.2&lt;/span&gt; &lt;a href="#miniception" &gt;Miniception&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;1.3&lt;/span&gt; &lt;a href="#mod-minimizer" &gt;Mod-minimizer&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;1.4&lt;/span&gt; &lt;a href="#forward-scheme-lower-bound" &gt;Forward scheme lower bound&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;1.5&lt;/span&gt; &lt;a href="#open-syncmer-minimizer" &gt;Open syncmer minimizer&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;1.6&lt;/span&gt; &lt;a href="#open-closed-minimizer" &gt;Open-closed minimizer&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;1.7&lt;/span&gt; &lt;a href="#new-general-mod-minimizer" &gt;New: General mod-minimizer&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;1.8&lt;/span&gt; &lt;a href="#variant-open-closed-minimizer-using-offsets" &gt;Variant: Open-closed minimizer using offsets&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2&lt;/span&gt; &lt;a href="#selection-schemes" &gt;Selection schemes&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2.1&lt;/span&gt; &lt;a href="#definition" &gt;Definition&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2.2&lt;/span&gt; &lt;a href="#bd-anchors" &gt;Bd-anchors&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2.3&lt;/span&gt; &lt;a href="#new-smallest-unique-substring-anchors" &gt;New: Smallest unique substring anchors&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2.4&lt;/span&gt; &lt;a href="#new-anti-lexicographic-sorting" &gt;New: Anti lexicographic sorting&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;3&lt;/span&gt; &lt;a href="#more-sampling-schemes" &gt;More sampling schemes&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="section-num"&gt;3.1&lt;/span&gt; &lt;a href="#anti-lex-sus-anchors" &gt;Anti-lex sus-anchors&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;3.2&lt;/span&gt; &lt;a href="#threshold-anchors" &gt;Threshold anchors&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;3.3&lt;/span&gt; &lt;a href="#the-t-gap-disappears-for-large-alphabets" &gt;The $t$-gap disappears for large alphabets&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;4&lt;/span&gt; &lt;a href="#computing-the-density-of-forward-schemes" &gt;Computing the density of forward schemes&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="section-num"&gt;4.1&lt;/span&gt; &lt;a href="#wip-anti-lexicographic-sus-anchor-density" &gt;WIP: Anti lexicographic sus-anchor density&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;5&lt;/span&gt; &lt;a href="#open-questions" &gt;Open questions&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;6&lt;/span&gt; &lt;a href="#ideas" &gt;Ideas&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;7&lt;/span&gt; &lt;a href="#optimal-schemes-for-k-in-w-w-plus-1" &gt;Optimal schemes for \(k \in \{w, w+1\}\)&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;!--endtoc--&gt;
&lt;p&gt;&lt;strong&gt;Most of the content here has now been absorbed into my &lt;a href="https://curiouscoding.nl/posts/minimizers/" &gt;thesis chapter on minimizers&lt;/a&gt;.&lt;/strong&gt;&lt;/p&gt;</description></item><item><title>Mod-minimizers and other minimizers</title><link>https://curiouscoding.nl/posts/mod-minimizers/</link><pubDate>Thu, 18 Jan 2024 00:00:00 +0100</pubDate><guid>https://curiouscoding.nl/posts/mod-minimizers/</guid><description>&lt;div class="ox-hugo-toc toc"&gt;
&lt;div class="heading"&gt;Table of Contents&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#applications" &gt;Applications&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#background" &gt;Background&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#minimizers" &gt;Minimizers&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#density-bounds" &gt;Density bounds&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#robust-minimizers" &gt;Robust minimizers&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#pasha" &gt;PASHA&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#miniception" &gt;Miniception&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#closed-syncmers" &gt;Closed syncmers&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#bd-anchors" &gt;Bd-anchors&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#new-mod-minimizers" &gt;New: Mod-minimizers&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#experiments" &gt;Experiments&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#conclusion" &gt;Conclusion&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#small-k-experiments" &gt;Small k experiments&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#search-methods" &gt;Search methods&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#directed-minimizer" &gt;Directed minimizer&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#k-1-w-2" &gt;\(k=1\), \(w=2\)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#k-1-w-4" &gt;\(k=1\), \(w=4\)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#k-1-w-5" &gt;\(k=1\), \(w=5\)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#k-2-w-2" &gt;\(k=2\), \(w=2\)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#k-2-w-4" &gt;\(k=2\), \(w=4\)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#notes" &gt;Notes&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#reading-list" &gt;Reading list&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;!--endtoc--&gt;
&lt;p&gt;\[
\newcommand{\d}{\mathrm{d}}
\newcommand{\L}{\mathcal{L}}
\]&lt;/p&gt;
&lt;p&gt;This post introduces some background for minimizers and some
experiments for a new minimizer variant. That new variant is now called the
&lt;em&gt;mod-minimizer&lt;/em&gt; and published at WABI24 (&lt;a href="https://doi.org/10.4230/LIPIcs.WABI.2024.11" class="external-link" target="_blank" rel="noopener"&gt;&lt;strong&gt;DOI&lt;/strong&gt;&lt;/a&gt;, &lt;a href="https://curiouscoding.nl/papers/modmini.pdf" &gt;&lt;strong&gt;PDF&lt;/strong&gt;&lt;/a&gt;) (&lt;a href="#citeproc_bib_item_5"&gt;Groot Koerkamp and Pibiri 2024&lt;/a&gt;). The paper
also includes a review of existing methods, including pseudocode for
most of the methods covered below.&lt;/p&gt;</description></item><item><title>Notes on implementing Longest Common Repeat (LCR)</title><link>https://curiouscoding.nl/posts/longest-common-repeat/</link><pubDate>Wed, 06 Dec 2023 00:00:00 +0100</pubDate><guid>https://curiouscoding.nl/posts/longest-common-repeat/</guid><description>&lt;div class="ox-hugo-toc toc"&gt;
&lt;div class="heading"&gt;Table of Contents&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#notes" &gt;Notes&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#coloured-tree-problem" &gt;Coloured Tree Problem&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#generic-sparse-suffix-array" &gt;Generic sparse suffix array&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#sparse-suffix-array-on-minimizers" &gt;Sparse suffix array on minimizers&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#discussion-todos" &gt;Discussion / TODOs&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#evals" &gt;Evals&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;!--endtoc--&gt;
&lt;p&gt;These are my running notes on implementing an algorithm for Longest Common
Repeat using minimizers.&lt;/p&gt;
&lt;h2 id="notes"&gt;
 Notes
 &lt;a class="heading-link" href="#notes"&gt;
 &lt;i class="fa-solid fa-link" aria-hidden="true" title="Link to heading"&gt;&lt;/i&gt;
 &lt;span class="sr-only"&gt;Link to heading&lt;/span&gt;
 &lt;/a&gt;
&lt;/h2&gt;
&lt;h3 id="coloured-tree-problem"&gt;
 Coloured Tree Problem
 &lt;a class="heading-link" href="#coloured-tree-problem"&gt;
 &lt;i class="fa-solid fa-link" aria-hidden="true" title="Link to heading"&gt;&lt;/i&gt;
 &lt;span class="sr-only"&gt;Link to heading&lt;/span&gt;
 &lt;/a&gt;
&lt;/h3&gt;
&lt;p&gt;See Lemma 3 at &lt;a href="https://drops.dagstuhl.de/storage/00lipics/lipics-vol105-cpm2018/LIPIcs.CPM.2018.23/LIPIcs.CPM.2018.23.pdf" class="external-link" target="_blank" rel="noopener"&gt;here&lt;/a&gt;&lt;/p&gt;
&lt;h3 id="generic-sparse-suffix-array"&gt;
 Generic sparse suffix array
 &lt;a class="heading-link" href="#generic-sparse-suffix-array"&gt;
 &lt;i class="fa-solid fa-link" aria-hidden="true" title="Link to heading"&gt;&lt;/i&gt;
 &lt;span class="sr-only"&gt;Link to heading&lt;/span&gt;
 &lt;/a&gt;
&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;paper: &lt;a href="https://arxiv.org/pdf/2310.09023.pdf" class="external-link" target="_blank" rel="noopener"&gt;https://arxiv.org/pdf/2310.09023.pdf&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;code: &lt;a href="https://github.com/lorrainea/SSA/blob/main/PA/ssa.cc" class="external-link" target="_blank" rel="noopener"&gt;https://github.com/lorrainea/SSA/blob/main/PA/ssa.cc&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For random strings and \(b \leq n / \log n\), direct radix sort on $2log n + log
log n$-bit
prefixes is sufficient for \(O(n)\) runtime. In fact, since computer word size
\(w\geq \log n\), we only need at most \(2\) rounds of radix sort! (See simple-saca.)&lt;/p&gt;</description></item><item><title>Research proposal: subquadratic string graph construction</title><link>https://curiouscoding.nl/posts/cwi-proposal/</link><pubDate>Mon, 10 Jul 2023 00:00:00 +0200</pubDate><guid>https://curiouscoding.nl/posts/cwi-proposal/</guid><description>&lt;div class="ox-hugo-toc toc"&gt;
&lt;div class="heading"&gt;Table of Contents&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#introduction" &gt;Introduction&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#research-plan" &gt;Research plan&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#improve-query-performance-using-heavy-light-decomposition" &gt;Improve query performance using Heavy-Light Decomposition&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#add-more-query-types" &gt;Add more query types&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#extend-to-non-exact-suffix-prefix-overlap-that-allows-for-read-errors" &gt;Extend to non-exact suffix-prefix-overlap that allows for read errors&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#implement-an-algorithm-to-build-string-graphs-and-possibly-a-full-assembler" &gt;Implement an algorithm to build string graphs, and possibly a full assembler&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;!--endtoc--&gt;
&lt;p&gt;This is a research proposal for a 5 month internship at CWI during autumn/winter 2023-2024.&lt;/p&gt;
&lt;h2 id="introduction"&gt;
 Introduction
 &lt;a class="heading-link" href="#introduction"&gt;
 &lt;i class="fa-solid fa-link" aria-hidden="true" title="Link to heading"&gt;&lt;/i&gt;
 &lt;span class="sr-only"&gt;Link to heading&lt;/span&gt;
 &lt;/a&gt;
&lt;/h2&gt;
&lt;p&gt;An important problem in bioinformatics is &lt;em&gt;genome assembly&lt;/em&gt;:
DNA sequencing machines read substrings of a full DNA genome, and these pieces
must be &lt;em&gt;assembled&lt;/em&gt; together to recover the entire genome.&lt;/p&gt;</description></item><item><title>Doctoral plan</title><link>https://curiouscoding.nl/posts/research-proposal/</link><pubDate>Mon, 12 Dec 2022 00:00:00 +0100</pubDate><guid>https://curiouscoding.nl/posts/research-proposal/</guid><description>&lt;div class="ox-hugo-toc toc"&gt;
&lt;div class="heading"&gt;Table of Contents&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#research-proposal-near-linear-exact-pairwise-alignment" &gt;Research Proposal: Near-linear exact pairwise alignment&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#abstract" &gt;Abstract&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#introduction-and-current-state-of-research-in-the-field" &gt;Introduction and current state of research in the field&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#goals-of-the-thesis" &gt;Goals of the thesis&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#impact" &gt;Impact&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#progress-to-date" &gt;Progress to date&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#detailed-work-plan" &gt;Detailed work plan&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#wp1-a-pa-v1-initial-version" &gt;WP1: A*PA v1: initial version&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#wp2-visualizing-aligners" &gt;WP2: Visualizing aligners&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#wp3-benchmarking-aligners" &gt;WP3: Benchmarking aligners&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#wp4-theory-review" &gt;WP4: Theory review&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#wp5-a-pa-v2-efficient-implementation" &gt;WP5: A*PA v2: efficient implementation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#wp6-affine-costs" &gt;WP6: Affine costs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#wp7-ends-free-alignment-and-mapping" &gt;WP7: Ends-free alignment and mapping&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#wp8-further-extension-and-open-ended-research" &gt;WP8: Further extension and open ended research&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#wp9-thesis-writing" &gt;WP9: Thesis writing&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#publication-plan" &gt;Publication plan&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#time-schedule" &gt;Time schedule&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#teaching-responsibilities" &gt;Teaching responsibilities&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#other-duties" &gt;Other duties&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#study-plan" &gt;Study plan&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#signatures" &gt;Signatures&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;!--endtoc--&gt;
&lt;h2 id="research-proposal-near-linear-exact-pairwise-alignment"&gt;
 Research Proposal: Near-linear exact pairwise alignment
 &lt;a class="heading-link" href="#research-proposal-near-linear-exact-pairwise-alignment"&gt;
 &lt;i class="fa-solid fa-link" aria-hidden="true" title="Link to heading"&gt;&lt;/i&gt;
 &lt;span class="sr-only"&gt;Link to heading&lt;/span&gt;
 &lt;/a&gt;
&lt;/h2&gt;
&lt;h3 id="abstract"&gt;
 Abstract
 &lt;a class="heading-link" href="#abstract"&gt;
 &lt;i class="fa-solid fa-link" aria-hidden="true" title="Link to heading"&gt;&lt;/i&gt;
 &lt;span class="sr-only"&gt;Link to heading&lt;/span&gt;
 &lt;/a&gt;
&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;Pairwise alignment&lt;/em&gt; and &lt;em&gt;edit distance&lt;/em&gt; specifically is a problem that was
first stated around 1968 (&lt;a href="#citeproc_bib_item_20"&gt;Needleman and Wunsch 1970&lt;/a&gt;; &lt;a href="#citeproc_bib_item_26"&gt;Vintsyuk 1968&lt;/a&gt;). It involves finding the minimal
number of edits (substitutions, insertions, deletions) to transform one string/sequence
into another.
For sequences of length \(n\), the original algorithm takes \(O(n^2)\) quadratic
time (&lt;a href="#citeproc_bib_item_22"&gt;Sellers 1974&lt;/a&gt;).
In 1983, this was improved to \(O(ns)\) for sequences with low edit distance \(s\)
using Band-Doubling. At the same time, a further improvement to
\(O(n+s^2)\) expected runtime was presented using the diagonal-transition method (&lt;a href="#citeproc_bib_item_24"&gt;Ukkonen 1983&lt;/a&gt;, &lt;a href="#citeproc_bib_item_25"&gt;1985&lt;/a&gt;; &lt;a href="#citeproc_bib_item_17"&gt;Myers 1986&lt;/a&gt;).&lt;/p&gt;</description></item><item><title>String algorithm visualizations</title><link>https://curiouscoding.nl/posts/alg-viz/</link><pubDate>Tue, 08 Nov 2022 00:00:00 +0100</pubDate><guid>https://curiouscoding.nl/posts/alg-viz/</guid><description>&lt;ol&gt;
&lt;li&gt;Select the algorithm to visualize&lt;/li&gt;
&lt;li&gt;Click the buttons, or click the canvas and use the indicated keys&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Suffix-array construction is explained &lt;a href="https://curiouscoding.nl/posts/suffix-array-construction/" &gt;here&lt;/a&gt; and BWT is explained &lt;a href="https://curiouscoding.nl/posts/bwt/" &gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Source code is &lt;a href="https://github.com/RagnarGrootKoerkamp/alg-viz" class="external-link" target="_blank" rel="noopener"&gt;on GitHub&lt;/a&gt;.&lt;/p&gt;
&lt;script defer src="https://curiouscoding.nl/js/alg-viz.js" type="module"&gt;&lt;/script&gt;&lt;/head&gt;
&lt;div class="controls"&gt;
&lt;label for="algorithm"&gt;Algorithm&lt;/label&gt;
&lt;select name="algorithm" id="algorithm"&gt;
 &lt;option value="suffix-array"&gt;Suffix Array Construction&lt;/option&gt;
 &lt;option value="bwt"&gt;Burrows-Wheeler Transform&lt;/option&gt;
 &lt;option value="bibwt"&gt;Bidirectional BWT&lt;/option&gt;
&lt;/select&gt;
&lt;br/&gt;
&lt;label for="string"&gt;String&lt;/label&gt; &lt;input type="string" name="string" id="string"/&gt;&lt;br/&gt;
&lt;label for="query"&gt;Query&lt;/label&gt; &lt;input type="string" name="query" id="query"/&gt;&lt;br/&gt;
&lt;button class="button-primary" id="prev"&gt;prev (←/backspace)&lt;/button&gt;
&lt;button class="button-primary" id="next"&gt;next (→/space)&lt;/button&gt;
&lt;br/&gt;
&lt;label for="delay"&gt;Delay (s)&lt;/label&gt; &lt;input type="number" name="delay" id="delay" value="0.8"/&gt;&lt;br/&gt;
&lt;button class="button-primary" id="faster"&gt;faster (↑/+/f)&lt;/button&gt;
&lt;button class="button-primary" id="slower"&gt;slower (↓/-/s)&lt;/button&gt;
&lt;button class="button-primary" id="pauseplay"&gt;pause/play (p/return)&lt;/button&gt;
&lt;/div&gt;
&lt;div class="canvas"&gt;
&lt;canvas id="canvas" tabindex='1' width="1600" height="1200"&gt;&lt;/canvas&gt;
&lt;/div&gt;</description></item><item><title>Thoughts on linear programming</title><link>https://curiouscoding.nl/posts/linear-programming/</link><pubDate>Fri, 04 Nov 2022 00:00:00 +0100</pubDate><guid>https://curiouscoding.nl/posts/linear-programming/</guid><description>&lt;div class="ox-hugo-toc toc"&gt;
&lt;div class="heading"&gt;Table of Contents&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#linear-programming" &gt;Linear programming&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#assumptions" &gt;Assumptions&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#idea-for-an-algorithm" &gt;Idea for an algorithm&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;!--endtoc--&gt;
&lt;p&gt;This note contains some ideas about linear programming and &lt;em&gt;most-orthogonal
faces&lt;/em&gt;. They&amp;rsquo;re mostly on an intuitive level and not very formal.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Postscriptum:&lt;/strong&gt; The ideas here don&amp;rsquo;t work.&lt;/p&gt;
&lt;h2 id="linear-programming"&gt;
 Linear programming
 &lt;a class="heading-link" href="#linear-programming"&gt;
 &lt;i class="fa-solid fa-link" aria-hidden="true" title="Link to heading"&gt;&lt;/i&gt;
 &lt;span class="sr-only"&gt;Link to heading&lt;/span&gt;
 &lt;/a&gt;
&lt;/h2&gt;
&lt;div style="display: none" class="div"&gt;
&lt;p&gt;\begin{equation*}
\newcommand{\v}[1]{\textbf{#1}}
\newcommand{\x}{\v x}
\newcommand{\t}{\v t}
\newcommand{\b}{\v b}
\end{equation*}&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Maximize \(\t\x\) subject to \(A\x \leq \b\).&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;\(\x\) is a vector of \(n\) variables \(x_i\).&lt;/li&gt;
&lt;li&gt;\(A\) is a \(m\times n\) matrix: there are \(m\) constraints \(A_j \x \leq b_j\).&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="assumptions"&gt;
 Assumptions
 &lt;a class="heading-link" href="#assumptions"&gt;
 &lt;i class="fa-solid fa-link" aria-hidden="true" title="Link to heading"&gt;&lt;/i&gt;
 &lt;span class="sr-only"&gt;Link to heading&lt;/span&gt;
 &lt;/a&gt;
&lt;/h2&gt;
&lt;p&gt;We make the following assumptions:&lt;/p&gt;</description></item><item><title>Local Doubling</title><link>https://curiouscoding.nl/posts/local-doubling/</link><pubDate>Wed, 19 Oct 2022 00:00:00 +0200</pubDate><guid>https://curiouscoding.nl/posts/local-doubling/</guid><description>&lt;div class="ox-hugo-toc toc"&gt;
&lt;div class="heading"&gt;Table of Contents&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#notation" &gt;Notation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#needleman-wunsch-where-it-all-begins" &gt;Needleman-Wunsch: where it all begins&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#dijkstra-bfs-visiting-fewer-states" &gt;Dijkstra/BFS: visiting fewer states&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#band-doubling-dijkstra-but-more-efficient" &gt;Band doubling: Dijkstra, but more efficient&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#gapcost-a-first-heuristic" &gt;GapCost: A first heuristic&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#computational-volumes-an-even-smaller-search" &gt;Computational volumes: an even smaller search&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#cheating-an-oracle-gave-us-g" &gt;Cheating: an oracle gave us \(g^*\)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#a-better-heuristics" &gt;A*: Better heuristics&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#broken-idea-a-and-computational-volumes" &gt;Broken idea: A* and computational volumes&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#local-doubling" &gt;Local doubling&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#without-heuristic" &gt;Without heuristic&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#with-heuristic" &gt;With heuristic&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#diagonal-transition" &gt;Diagonal Transition&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#a-with-diagonal-transition-and-pruning-doing-less-work" &gt;A* with Diagonal Transition and pruning: doing less work&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#goal-diagonal-transition-plus-pruning-plus-local-doubling" &gt;Goal: Diagonal Transition + pruning + local doubling&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#pruning-improving-a-heuristics-on-the-go" &gt;Pruning: Improving A* heuristics on the go&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#cheating-more-an-oracle-gave-us-the-optimal-path" &gt;Cheating more: an oracle gave us the optimal path&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#todo-aspriation-windows" &gt;TODO: aspriation windows&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;!--endtoc--&gt;
&lt;p&gt;\begin{equation*}
\newcommand{\st}[2]{\langle #1,#2\rangle}
\newcommand{\g}{g^*}
\newcommand{\fm}{f_{max}}
\newcommand{\gap}{\operatorname{Gap}}
\end{equation*}&lt;/p&gt;</description></item><item><title>Reducing A* memory usage using fronts</title><link>https://curiouscoding.nl/posts/astar-memory-usage/</link><pubDate>Mon, 26 Sep 2022 00:00:00 +0200</pubDate><guid>https://curiouscoding.nl/posts/astar-memory-usage/</guid><description>&lt;div class="ox-hugo-toc toc"&gt;
&lt;div class="heading"&gt;Table of Contents&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#motivation" &gt;Motivation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#parititioning-a-memory-by-fronts" &gt;Parititioning A* memory by fronts&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#non-consistent-heuristics" &gt;Non-consistent heuristics&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#front-indexing" &gt;Front indexing&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#tracing-back-the-path" &gt;Tracing back the path&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;!--endtoc--&gt;
&lt;p&gt;Here is an idea to reduce the memory usage of A* by only storing one &lt;em&gt;front&lt;/em&gt; at
a time, similar to what Edlib and WFA do. Note that for now this &lt;strong&gt;will not
work&lt;/strong&gt;, but I&amp;rsquo;m putting this online anyway.&lt;/p&gt;
&lt;h2 id="motivation"&gt;
 Motivation
 &lt;a class="heading-link" href="#motivation"&gt;
 &lt;i class="fa-solid fa-link" aria-hidden="true" title="Link to heading"&gt;&lt;/i&gt;
 &lt;span class="sr-only"&gt;Link to heading&lt;/span&gt;
 &lt;/a&gt;
&lt;/h2&gt;
&lt;p&gt;In our &lt;a href="https://github.com/RagnarGrootKoerkamp/astar-pairwise-aligner" class="external-link" target="_blank" rel="noopener"&gt;implementation&lt;/a&gt; of A*PA, we use a hashmap to store the value of \(g\) of all
visited (explored/expanded) states by A*. This can take up a lot of memory and
simply reading/writing \(g\) in the hashmap can take over half the total execution
time.&lt;/p&gt;</description></item><item><title>Speeding up A*: computational volumes and path-pruning</title><link>https://curiouscoding.nl/posts/speeding-up-astar/</link><pubDate>Fri, 23 Sep 2022 00:00:00 +0200</pubDate><guid>https://curiouscoding.nl/posts/speeding-up-astar/</guid><description>&lt;div class="ox-hugo-toc toc"&gt;
&lt;div class="heading"&gt;Table of Contents&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#motivation" &gt;Motivation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#summary" &gt;Summary&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#why-is-a-slow" &gt;Why is A* slow?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#computational-volumes" &gt;Computational volumes&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#dealing-with-pruning" &gt;Dealing with pruning&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#thoughts-on-more-aggressive-pruning" &gt;Thoughts on more aggressive pruning&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#algorithm-summary" &gt;Algorithm summary&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#challenges" &gt;Challenges&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#results" &gt;Results&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#what-about-band-doubling" &gt;What about band-doubling?&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#maybe-doubling-can-work-after-all" &gt;Maybe doubling can work after all?&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#todos" &gt;TODOs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#extensions" &gt;Extensions&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;!--endtoc--&gt;
&lt;p&gt;This post build on top of our recent preprint Groot Koerkamp and Ivanov (&lt;a href="#citeproc_bib_item_1"&gt;2024&lt;/a&gt;) and gives an
overview of some of my new ideas to significantly speed up exact global pairwise
alignment. It&amp;rsquo;s recommended you understand the &lt;em&gt;seed heuristic&lt;/em&gt; and &lt;em&gt;match
pruning&lt;/em&gt; before reading this post.&lt;/p&gt;</description></item><item><title>Linear memory WFA?</title><link>https://curiouscoding.nl/posts/linear-memory-wfa/</link><pubDate>Wed, 17 Aug 2022 00:00:00 +0200</pubDate><guid>https://curiouscoding.nl/posts/linear-memory-wfa/</guid><description>&lt;div class="ox-hugo-toc toc"&gt;
&lt;div class="heading"&gt;Table of Contents&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#motivation" &gt;Motivation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#path-traceback-two-strategies" &gt;Path traceback: two strategies&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#observations" &gt;Observations&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#what-information-is-needed-for-path-tracing" &gt;What information is needed for path tracing&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#a-pragmatic-solution" &gt;A pragmatic solution&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#another-interpretation" &gt;Another interpretation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#affine-costs" &gt;Affine costs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#conclusion" &gt;Conclusion&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;!--endtoc--&gt;
&lt;p&gt;&lt;a id="figure--result"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;figure&gt;&lt;a href="https://curiouscoding.nl/ox-hugo/simple-final.png"&gt;&lt;img src="https://curiouscoding.nl/ox-hugo/simple-final.png"
 alt="Figure 1: Only the red substitutions and blue indel need to be stored to trace the entire path."&gt;&lt;/a&gt;&lt;figcaption&gt;
 &lt;p&gt;&lt;span class="figure-number"&gt;Figure 1: &lt;/span&gt;Only the red substitutions and blue indel need to be stored to trace the entire path.&lt;/p&gt;</description></item><item><title>Transforming match bonus into cost</title><link>https://curiouscoding.nl/posts/alignment-scores-transform/</link><pubDate>Tue, 16 Aug 2022 00:00:00 +0200</pubDate><guid>https://curiouscoding.nl/posts/alignment-scores-transform/</guid><description>&lt;div class="ox-hugo-toc toc"&gt;
&lt;div class="heading"&gt;Table of Contents&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#tricks-with-match-bonus-or-how-to-fool-dijkstra-s-limitations" &gt;Tricks with match bonus or how to fool Dijkstra&amp;rsquo;s limitations&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#edit-graph" &gt;Edit graph&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#algorithms" &gt;Algorithms&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#potentials" &gt;Potentials&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#multiple-variants" &gt;Multiple variants&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#some-notes-on-algorithms" &gt;Some notes on algorithms&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#wfa" &gt;WFA&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#a" &gt;A*&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#extending-to-different-cost-models" &gt;Extending to different cost models&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#affine-costs" &gt;Affine costs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#substitution-matrices" &gt;Substitution matrices&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#but-not-local-alignment" &gt;But not local alignment&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#evaluations" &gt;Evaluations&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#unequal-string-length" &gt;Unequal string length&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#equal-string-lengths" &gt;Equal string lengths&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#conclusion" &gt;Conclusion&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;!--endtoc--&gt;
&lt;h2 id="tricks-with-match-bonus-or-how-to-fool-dijkstra-s-limitations"&gt;
 Tricks with match bonus or how to fool Dijkstra&amp;rsquo;s limitations
 &lt;a class="heading-link" href="#tricks-with-match-bonus-or-how-to-fool-dijkstra-s-limitations"&gt;
 &lt;i class="fa-solid fa-link" aria-hidden="true" title="Link to heading"&gt;&lt;/i&gt;
 &lt;span class="sr-only"&gt;Link to heading&lt;/span&gt;
 &lt;/a&gt;
&lt;/h2&gt;
&lt;p&gt;&lt;em&gt;The reader is assumed to have basic knowledge about pairwise alignment and graph theory.&lt;/em&gt;&lt;/p&gt;</description></item><item><title>Diamond optimisation for diagonal transition</title><link>https://curiouscoding.nl/posts/diamond-optimization/</link><pubDate>Mon, 01 Aug 2022 00:00:00 +0200</pubDate><guid>https://curiouscoding.nl/posts/diamond-optimization/</guid><description>&lt;div class="ox-hugo-toc toc"&gt;
&lt;div class="heading"&gt;Table of Contents&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#diamond-transition-or-how-technicalities-can-break-concepts" &gt;Diamond transition or how technicalities can break concepts&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#but-let-s-take-a-closer-look" &gt;But let’s take a closer look&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#conclusion" &gt;Conclusion&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;!--endtoc--&gt;
&lt;h2 id="diamond-transition-or-how-technicalities-can-break-concepts"&gt;
 Diamond transition or how technicalities can break concepts
 &lt;a class="heading-link" href="#diamond-transition-or-how-technicalities-can-break-concepts"&gt;
 &lt;i class="fa-solid fa-link" aria-hidden="true" title="Link to heading"&gt;&lt;/i&gt;
 &lt;span class="sr-only"&gt;Link to heading&lt;/span&gt;
 &lt;/a&gt;
&lt;/h2&gt;
&lt;p&gt;&lt;em&gt;We assume the reader has some basic knowledge about pairwise alignment
and in particular the WFA algorithm.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;In this post we dive into a potential 2x speedup of WFA &amp;mdash; one that turns out not to work.&lt;/p&gt;</description></item><item><title>The BiWFA meeting condition</title><link>https://curiouscoding.nl/posts/biwfa-meeting-condition/</link><pubDate>Mon, 11 Jul 2022 00:00:00 +0200</pubDate><guid>https://curiouscoding.nl/posts/biwfa-meeting-condition/</guid><description>&lt;p&gt;&lt;strong&gt;cross references:&lt;/strong&gt; &lt;a href="https://github.com/smarco/BiWFA-paper/issues/8" class="external-link" target="_blank" rel="noopener"&gt;BiWFA GitHub issue&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;It seems that getting the meeting/overlap condition of BiWFA
(Marco-Sola et al. (&lt;a href="#citeproc_bib_item_1"&gt;2023&lt;/a&gt;), Algorithm 1 and Lemma 2.1) correct is tricky.&lt;/p&gt;
&lt;p&gt;Let \(p := \max(x, o+e)\) be the maximal cost of any edge in the edit graph.
As in the BiWFA paper, let \(s_f\) and \(s_r\) be the distances of the &lt;em&gt;forward&lt;/em&gt; and
&lt;em&gt;reverse&lt;/em&gt; fronts computed so far.&lt;/p&gt;
&lt;p&gt;We prove the following lemma:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Lemma&lt;/strong&gt;
Once BiWFA has expanded the forward and reverse fronts up to \(s_f\) and \(s_r\) and
has found &lt;em&gt;some&lt;/em&gt; path of cost \(s \leq s_f + s_r\),
expanding the fronts until \(s&amp;rsquo;_f + s&amp;rsquo;_r \geq s+p+o\) is guaranteed to find a
&lt;em&gt;shortest&lt;/em&gt; path.&lt;/p&gt;</description></item><item><title>Proof sketch for linear time seed heuristic alignment</title><link>https://curiouscoding.nl/posts/linear-time-pa/</link><pubDate>Sun, 24 Apr 2022 00:00:00 +0200</pubDate><guid>https://curiouscoding.nl/posts/linear-time-pa/</guid><description>&lt;div class="ox-hugo-toc toc"&gt;
&lt;div class="heading"&gt;Table of Contents&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#pairwise-alignment-in-subquadratic-time" &gt;Pairwise alignment in subquadratic time&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#random-model" &gt;Random model&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#algorithm" &gt;Algorithm&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#seed-heuristic" &gt;Seed heuristic&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#match-pruning" &gt;Match pruning&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#analysis" &gt;Analysis&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#expanded-states" &gt;Expanded states&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#excess-errors" &gt;Excess errors&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#algorithmic-complexity" &gt;Algorithmic complexity&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;!--endtoc--&gt;
&lt;p&gt;This post is a proof sketch to show that A* with the &lt;em&gt;seed heuristic&lt;/em&gt;
(&lt;a href="#citeproc_bib_item_3"&gt;Groot Koerkamp and Ivanov 2024&lt;/a&gt;) does exact pairwise alignment of random strings with random
mutations in near linear time.&lt;/p&gt;
&lt;h2 id="pairwise-alignment-in-subquadratic-time"&gt;
 Pairwise alignment in subquadratic time
 &lt;a class="heading-link" href="#pairwise-alignment-in-subquadratic-time"&gt;
 &lt;i class="fa-solid fa-link" aria-hidden="true" title="Link to heading"&gt;&lt;/i&gt;
 &lt;span class="sr-only"&gt;Link to heading&lt;/span&gt;
 &lt;/a&gt;
&lt;/h2&gt;
&lt;p&gt;Backurs and Indyk (&lt;a href="#citeproc_bib_item_1"&gt;2018&lt;/a&gt;) show that computing edit distance can not be
done in strongly subquadratic time (i.e. \(O(n^{2-\delta})\) for any \(\delta &amp;gt;0\))
assuming the Strong Exponential Time Hypothesis.&lt;/p&gt;</description></item><item><title>Variations on the WFA recursion</title><link>https://curiouscoding.nl/posts/wfa-variations/</link><pubDate>Sun, 17 Apr 2022 03:14:00 +0200</pubDate><guid>https://curiouscoding.nl/posts/wfa-variations/</guid><description>&lt;div class="ox-hugo-toc toc"&gt;
&lt;div class="heading"&gt;Table of Contents&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#gap-open" &gt;Gap open&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#gap-close" &gt;Gap close&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#symmetric-alternatives" &gt;Symmetric alternatives&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#another-symmetry" &gt;Another symmetry&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#conclusions" &gt;Conclusions&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;!--endtoc--&gt;
&lt;p&gt;&lt;strong&gt;cross references:&lt;/strong&gt; &lt;a href="https://github.com/smarco/BiWFA-paper/issues/4" class="external-link" target="_blank" rel="noopener"&gt;BiWFA GitHub issue&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;In this post I will explore some variations of the recursion used by WFA/BiWFA
for the affine version of the diagonal transition algorithm.
In particular, we will go over a &lt;em&gt;gap-close&lt;/em&gt; variant, and look into some more symmetric
formulations.&lt;/p&gt;
&lt;h2 id="gap-open"&gt;
 Gap open
 &lt;a class="heading-link" href="#gap-open"&gt;
 &lt;i class="fa-solid fa-link" aria-hidden="true" title="Link to heading"&gt;&lt;/i&gt;
 &lt;span class="sr-only"&gt;Link to heading&lt;/span&gt;
 &lt;/a&gt;
&lt;/h2&gt;
&lt;p&gt;WFA (&lt;a href="#citeproc_bib_item_3"&gt;Marco-Sola et al. 2021&lt;/a&gt;) introduces the affine cost variant of the classic diagonal
transition method.
Let us call it a &lt;strong&gt;gap-open&lt;/strong&gt; variant, because the gap-open cost \(o\) is payed when
opening the gap, that is, when jumping from the \(M\) &lt;em&gt;layer&lt;/em&gt; to the \(I\) or \(D\) &lt;em&gt;layer&lt;/em&gt;.&lt;/p&gt;</description></item><item><title>Pruning for A* heuristics</title><link>https://curiouscoding.nl/posts/pruning/</link><pubDate>Sat, 11 Dec 2021 00:00:00 +0100</pubDate><guid>https://curiouscoding.nl/posts/pruning/</guid><description>&lt;p&gt;Note: this post extends the concept of &lt;em&gt;multiple-path pruning&lt;/em&gt; presented in Poole and Mackworth (&lt;a href="#citeproc_bib_item_1"&gt;2017&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;Say we&amp;rsquo;re running A* in a graph from \(s\) to \(t\). \(d(s,t)\) is the distance we are
looking for.&lt;/p&gt;
&lt;p&gt;An A* heuristic has to satisfy \(h(u) \leq d(u, t)\) to be &lt;em&gt;admissible&lt;/em&gt;: the
estimated distance to the end should never be larger than the actual distance to
guarantee that the algorithm finds a shortest path.&lt;/p&gt;</description></item><item><title>Spaced K-mer Seeded Distance</title><link>https://curiouscoding.nl/posts/spaced-kmer-distance/</link><pubDate>Wed, 20 Oct 2021 00:00:00 +0200</pubDate><guid>https://curiouscoding.nl/posts/spaced-kmer-distance/</guid><description>&lt;div class="ox-hugo-toc toc"&gt;
&lt;div class="heading"&gt;Table of Contents&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#background" &gt;Background&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#k-mers" &gt;$k$-mers&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#sketching" &gt;Sketching&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#minhash" &gt;MinHash&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#terminology" &gt;Terminology&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#introduction" &gt;Introduction&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#spaced-k-mer-seeded-distance" &gt;Spaced $k$-mer Seeded Distance&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#improving-performance" &gt;Improving performance&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#analysis" &gt;Analysis&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#pruning-false-positive-candidate-matches" &gt;Pruning false positive candidate matches&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#phylogeny-reconstruction" &gt;Phylogeny reconstruction&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#running-the-algorithm" &gt;Running the algorithm&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#assembly" &gt;&lt;span class="org-todo todo TODO"&gt;TODO&lt;/span&gt; Assembly&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;!--endtoc--&gt;
&lt;p&gt;\[
\newcommand{\vp}{\varphi}
\newcommand{\A}{\mathcal A}
\newcommand{\O}{\mathcal O}
\newcommand{\N}{\mathbb N}
\newcommand{\ed}{\mathrm{ed}}
\newcommand{\mh}{\mathrm{mh}}
\newcommand{\hash}{\mathrm{hash}}
\]&lt;/p&gt;
&lt;h2 id="background"&gt;
 Background
 &lt;a class="heading-link" href="#background"&gt;
 &lt;i class="fa-solid fa-link" aria-hidden="true" title="Link to heading"&gt;&lt;/i&gt;
 &lt;span class="sr-only"&gt;Link to heading&lt;/span&gt;
 &lt;/a&gt;
&lt;/h2&gt;
&lt;p&gt;Quickly finding similar pieces of DNA within large datasets is at the
core of computational biology. This has many applications:&lt;/p&gt;</description></item><item><title>Ideas for assembling [long] reads</title><link>https://curiouscoding.nl/posts/thoughts-on-assembling/</link><pubDate>Fri, 09 Jul 2021 00:00:00 +0200</pubDate><guid>https://curiouscoding.nl/posts/thoughts-on-assembling/</guid><description>&lt;p&gt;\[
\newcommand{\vp}{\varphi}
\newcommand{\A}{\mathcal A}
\newcommand{\O}{\mathcal O}
\newcommand{\N}{\mathbb N}
\newcommand{\Z}{\mathbb Z}
\newcommand{\ed}{\mathrm{ed}}
\newcommand{\mh}{\mathrm{mh}}
\newcommand{\hash}{\mathrm{hash}}
\]&lt;/p&gt;
&lt;p&gt;Here is an idea for an algorithm to assemble long reads.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Go over all sequences and sketch their windows using the Hamming
distance preserving sketch method described &lt;a href="../hamming-similarity-search" &gt;here&lt;/a&gt;.
This method may need some tweaking to also work with an indel rate of
around 10%.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Let&amp;rsquo;s say we find a pair of matching windows between reads \(A\) and
\(B\) starting at positions \(i\) and \(j\). This indicates that
\(A\) and \(B\) may be related with an offset of \(j-i\).&lt;/p&gt;</description></item><item><title>Hamming Similarity Search</title><link>https://curiouscoding.nl/posts/hamming-similarity-search/</link><pubDate>Thu, 08 Jul 2021 00:00:00 +0200</pubDate><guid>https://curiouscoding.nl/posts/hamming-similarity-search/</guid><description>&lt;div class="ox-hugo-toc toc"&gt;
&lt;div class="heading"&gt;Table of Contents&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#background" &gt;Background&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#k-mers" &gt;$k$-mers&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#sketching" &gt;Sketching&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#minhash" &gt;MinHash&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#introduction" &gt;Introduction&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#hamming-similarity-search1" &gt;Hamming Similarity Search&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#improving-performance" &gt;Improving performance&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#analysis" &gt;Analysis&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#pruning-false-positive-candidate-matches" &gt;Pruning false positive candidate matches&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#phylogeny-reconstruction" &gt;Phylogeny reconstruction&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#running-the-algorithm" &gt;Running the algorithm&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#assembly" &gt;Assembly&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;!--endtoc--&gt;
&lt;p&gt;\[
\newcommand{\vp}{\varphi}
\newcommand{\A}{\mathcal A}
\newcommand{\O}{\mathcal O}
\newcommand{\N}{\mathbb N}
\newcommand{\ed}{\mathrm{ed}}
\newcommand{\mh}{\mathrm{mh}}
\newcommand{\hash}{\mathrm{hash}}
\]&lt;/p&gt;
&lt;h2 id="background"&gt;
 Background
 &lt;a class="heading-link" href="#background"&gt;
 &lt;i class="fa-solid fa-link" aria-hidden="true" title="Link to heading"&gt;&lt;/i&gt;
 &lt;span class="sr-only"&gt;Link to heading&lt;/span&gt;
 &lt;/a&gt;
&lt;/h2&gt;
&lt;p&gt;Quickly finding similar pieces of DNA within large datasets is at the
core of computational biology. This has many applications:&lt;/p&gt;</description></item></channel></rss>