<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Pairwise-Alignment on CuriousCoding</title><link>https://curiouscoding.nl/tags/pairwise-alignment/</link><description>Recent content in Pairwise-Alignment on CuriousCoding</description><generator>Hugo</generator><language>en</language><lastBuildDate>Wed, 10 Dec 2025 00:00:00 +0100</lastBuildDate><atom:link href="https://curiouscoding.nl/tags/pairwise-alignment/index.xml" rel="self" type="application/rss+xml"/><item><title>Thoughts on Singletrack</title><link>https://curiouscoding.nl/posts/singletrack/</link><pubDate>Tue, 04 Nov 2025 00:00:00 +0100</pubDate><guid>https://curiouscoding.nl/posts/singletrack/</guid><description>&lt;p&gt;This is a quick post summarizing the idea of
&lt;a href="#citeproc_bib_item_2"&gt;“Singletrack: An Algorithm for Improving Memory Consumption and Performance of Gap-Affine Sequence Alignment”&lt;/a&gt; by &lt;a href="#citeproc_bib_item_2"&gt;López-Villellas, Iñiguez, Jiménez-Blanco, Aguado-Puig, Moretó, Alastruey-Benedé, Ibáñez, and Marco-Sola&lt;/a&gt; to reduce memory
usage of affine-cost alignment by removing the need to store the affine layers
of the DP matrix.&lt;/p&gt;
&lt;p&gt;Affine-cost alignment uses a gap-open
cost \(o&amp;gt;0\), so that a gap of length \(\ell\) has cost \(o + \ell \cdot e\). The
classic DP solution for this is Gotoh&amp;rsquo;s method (&lt;a href="#citeproc_bib_item_1"&gt;Gotoh 1982&lt;/a&gt;) that uses two
additional DP matrices \(I\) and \(D\) (alongside the main \(M\) matrix):
one to store the best cost to get to state
\((i,j)\) while ending in an insertion, and one that ends with a deletion.&lt;/p&gt;</description></item><item><title>A History of Pairwise Alignment</title><link>https://curiouscoding.nl/posts/pairwise-alignment/</link><pubDate>Wed, 09 Apr 2025 00:00:00 +0200</pubDate><guid>https://curiouscoding.nl/posts/pairwise-alignment/</guid><description>&lt;div class="ox-hugo-toc toc has-section-numbers"&gt;
&lt;div class="heading"&gt;Table of Contents&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="section-num"&gt;1&lt;/span&gt; &lt;a href="#a-brief-history" &gt;A Brief History&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="section-num"&gt;1.1&lt;/span&gt; &lt;a href="#a-pa" &gt;A*PA&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;1.2&lt;/span&gt; &lt;a href="#a-pa2" &gt;A*PA2&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;1.3&lt;/span&gt; &lt;a href="#overview" &gt;Overview&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2&lt;/span&gt; &lt;a href="#problem-statement" &gt;Problem Statement&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;3&lt;/span&gt; &lt;a href="#alignment-types" &gt;Alignment types&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;4&lt;/span&gt; &lt;a href="#cost-models" &gt;Cost Models&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="section-num"&gt;4.1&lt;/span&gt; &lt;a href="#minimizing-cost-versus-maximizing-score" &gt;Minimizing Cost versus Maximizing Score&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;5&lt;/span&gt; &lt;a href="#dp" &gt;The Classic DP Algorithms&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;6&lt;/span&gt; &lt;a href="#linear-memory-using-divide-and-conquer" &gt;Linear Memory using Divide and Conquer&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;7&lt;/span&gt; &lt;a href="#graphs" &gt;Dijkstra&amp;rsquo;s Algorithm and A*&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;8&lt;/span&gt; &lt;a href="#computational-volumes" &gt;Computational Volumes and Band Doubling&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;9&lt;/span&gt; &lt;a href="#diagonal-transition" &gt;Diagonal Transition&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;10&lt;/span&gt; &lt;a href="#parallelism" &gt;Parallelism&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;11&lt;/span&gt; &lt;a href="#lcs-and-contours" &gt;LCS and Contours&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;12&lt;/span&gt; &lt;a href="#some-tools" &gt;Some Tools&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;13&lt;/span&gt; &lt;a href="#subquadratic-methods-and-lower-bounds" &gt;Subquadratic Methods and Lower Bounds&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;14&lt;/span&gt; &lt;a href="#summary" &gt;Summary&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;!--endtoc--&gt;
&lt;p&gt;This is Chapter 2 of my &lt;a href="https://curiouscoding.nl/posts/thesis/" &gt;thesis&lt;/a&gt; (&lt;a href="#citeproc_bib_item_27"&gt;Groot Koerkamp 2025&lt;/a&gt;), to introduce the first part on Pairwise Alignment.
Please cite the thesis instead of this post.&lt;/p&gt;</description></item><item><title>Beyond Global Alignment</title><link>https://curiouscoding.nl/posts/mapping/</link><pubDate>Mon, 07 Apr 2025 00:00:00 +0200</pubDate><guid>https://curiouscoding.nl/posts/mapping/</guid><description>&lt;div class="ox-hugo-toc toc has-section-numbers"&gt;
&lt;div class="heading"&gt;Table of Contents&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="section-num"&gt;1&lt;/span&gt; &lt;a href="#semi-global-variants" &gt;Variants of semi-global alignment&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2&lt;/span&gt; &lt;a href="#text-searching" &gt;Fast text searching&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2.1&lt;/span&gt; &lt;a href="#skip-cost-for-overlap-alignments" &gt;Skip-cost for overlap alignments&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2.2&lt;/span&gt; &lt;a href="#search-results" &gt;Results&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;3&lt;/span&gt; &lt;a href="#mapping" &gt;Mapping using A*Map&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="section-num"&gt;3.1&lt;/span&gt; &lt;a href="#seeding" &gt;Seeding&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;3.2&lt;/span&gt; &lt;a href="#chaining" &gt;Chaining&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;3.3&lt;/span&gt; &lt;a href="#aligning" &gt;Aligning&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;3.4&lt;/span&gt; &lt;a href="#a-map" &gt;A*Map&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;3.5&lt;/span&gt; &lt;a href="#results" &gt;Results&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;!--endtoc--&gt;
&lt;p&gt;This is Chapter 5 of my &lt;a href="https://curiouscoding.nl/posts/thesis/" &gt;thesis&lt;/a&gt; (&lt;a href="#citeproc_bib_item_11"&gt;Groot Koerkamp 2025&lt;/a&gt;).
Please cite the thesis instead of this post.&lt;/p&gt;
&lt;hr&gt;
&lt;div class="notice summary"&gt;
 &lt;div class="notice-title"&gt;
 &lt;i class="fa-solid " aria-hidden="true"&gt;&lt;/i&gt;Summary
 &lt;/div&gt;
 &lt;div class="notice-content"&gt;
&lt;p&gt;So far, we have considered only algorithms for &lt;em&gt;global&lt;/em&gt; alignment.
In this chapter, we consider &lt;em&gt;semi-global&lt;/em&gt; alignment and its variants instead,
where a pattern (query) is searched in a longer string (reference).
There are many flavours of semi-global alignment, depending on the
(relative) sizes of the inputs. We list these variants, and introduce
some common approaches to solve this problem.&lt;/p&gt;</description></item><item><title>Path Pruning Revisited</title><link>https://curiouscoding.nl/posts/path-pruning/</link><pubDate>Mon, 31 Mar 2025 00:00:00 +0200</pubDate><guid>https://curiouscoding.nl/posts/path-pruning/</guid><description>&lt;div class="ox-hugo-toc toc has-section-numbers"&gt;
&lt;div class="heading"&gt;Table of Contents&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="section-num"&gt;1&lt;/span&gt; &lt;a href="#early-idea-bottom-up-match-merging--aka-bummer" &gt;Early idea: Bottom-up match-merging (aka BUMMer?)&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="section-num"&gt;1.1&lt;/span&gt; &lt;a href="#some-previous-ideas" &gt;Some previous ideas&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;1.2&lt;/span&gt; &lt;a href="#divide-and-conquer" &gt;Divide &amp;amp; conquer&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;1.3&lt;/span&gt; &lt;a href="#bottom-up-match-merging--bummer" &gt;Bottom-up match merging (BUMMer)&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;!--endtoc--&gt;
&lt;h2 id="early-idea-bottom-up-match-merging--aka-bummer"&gt;
 &lt;span class="section-num"&gt;1&lt;/span&gt; Early idea: Bottom-up match-merging (aka BUMMer?)
 &lt;a class="heading-link" href="#early-idea-bottom-up-match-merging--aka-bummer"&gt;
 &lt;i class="fa-solid fa-link" aria-hidden="true" title="Link to heading"&gt;&lt;/i&gt;
 &lt;span class="sr-only"&gt;Link to heading&lt;/span&gt;
 &lt;/a&gt;
&lt;/h2&gt;
&lt;p&gt;TODO: Move to separate post.&lt;/p&gt;
&lt;p&gt;One thing that becomes clear with mapping is that we don&amp;rsquo;t quite
know where exactly to start the semi-global alignments.
This can be fixed by adding some buffer/padding, but this remains slightly ugly
and iffy.&lt;/p&gt;</description></item><item><title>Thoughts on POASTA</title><link>https://curiouscoding.nl/posts/poasta/</link><pubDate>Tue, 28 May 2024 00:00:00 +0200</pubDate><guid>https://curiouscoding.nl/posts/poasta/</guid><description>&lt;div class="ox-hugo-toc toc"&gt;
&lt;div class="heading"&gt;Table of Contents&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#summary" &gt;Summary&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#background" &gt;Background&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#review-comments" &gt;Review comments&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#dfs" &gt;DFS&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#supplementary-methods" &gt;Supplementary methods&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#details-of-pruning" &gt;Details of pruning&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#evals" &gt;Evals&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#discussion" &gt;Discussion&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#code-and-repo" &gt;Code &amp;amp; repo&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;!--endtoc--&gt;
&lt;p&gt;Here are some thoughts on POASTA (&lt;a href="#citeproc_bib_item_2"&gt;van Dijk et al. 2024&lt;/a&gt;), a recent affine-cost
sequence-to-DAG (POA) aligner inspired by WFA and using A*.&lt;/p&gt;
&lt;h2 id="summary"&gt;
 Summary
 &lt;a class="heading-link" href="#summary"&gt;
 &lt;i class="fa-solid fa-link" aria-hidden="true" title="Link to heading"&gt;&lt;/i&gt;
 &lt;span class="sr-only"&gt;Link to heading&lt;/span&gt;
 &lt;/a&gt;
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Take a query and a directed acyclic graph (DAG).&lt;/li&gt;
&lt;li&gt;Align the query to the &lt;strong&gt;full&lt;/strong&gt; DAG. It&amp;rsquo;s like global alignment for graphs.
&lt;ul&gt;
&lt;li&gt;In fact I think the graph doesn&amp;rsquo;t actually have to be acyclic, as long as it has
a start and end. (When there is a cycle, the maximum remaining path length
is simply \(\infty\).)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Do greedy extension of matches, similar to WFA and A*PA.
&lt;ul&gt;
&lt;li&gt;Note that this is not as strong as full diagonal transition as done by WFA
and &lt;a href="https://github.com/lh3/gwfa" class="external-link" target="_blank" rel="noopener"&gt;gWFA&lt;/a&gt; (graph WFA for unit costs only), which only consider farthest reaching states.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;In fact, this is &lt;strong&gt;the first&lt;/strong&gt; implementation of affine-cost WFA!&lt;/li&gt;
&lt;li&gt;It also uses A* with the classic gap-cost heuristic extended to graphs.
&lt;ul&gt;
&lt;li&gt;For each point in the graph the minimal and maximal remaining distance is
computed, and if the remaining query length is outside this range, the
difference to get into the range is a lowerbound on number of indels.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Greedy extension is applied (although this is inherent when using WFA).&lt;/li&gt;
&lt;li&gt;Suboptimal states in superbubbles are pruned using additional logic.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="background"&gt;
 Background
 &lt;a class="heading-link" href="#background"&gt;
 &lt;i class="fa-solid fa-link" aria-hidden="true" title="Link to heading"&gt;&lt;/i&gt;
 &lt;span class="sr-only"&gt;Link to heading&lt;/span&gt;
 &lt;/a&gt;
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Daniel: why is nobody doing exact banded alignment, i.e., simple band
doubling, for exact DP-based alignment. We are still not convinced that A*/WFA
is faster than DP, especially when divergence is not super low (\(&amp;lt;1\%\)).&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="review-comments"&gt;
 Review comments
 &lt;a class="heading-link" href="#review-comments"&gt;
 &lt;i class="fa-solid fa-link" aria-hidden="true" title="Link to heading"&gt;&lt;/i&gt;
 &lt;span class="sr-only"&gt;Link to heading&lt;/span&gt;
 &lt;/a&gt;
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Fig 1 confuses me: (partly Daniel)&lt;/p&gt;</description></item><item><title>A*PA2: Up to 19x faster exact global alignment</title><link>https://curiouscoding.nl/posts/astarpa2/</link><pubDate>Sat, 23 Mar 2024 00:00:00 +0100</pubDate><guid>https://curiouscoding.nl/posts/astarpa2/</guid><description>&lt;div class="ox-hugo-toc toc has-section-numbers"&gt;
&lt;div class="heading"&gt;Table of Contents&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#abstract" &gt;Abstract&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;1&lt;/span&gt; &lt;a href="#introduction" &gt;Introduction&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="section-num"&gt;1.1&lt;/span&gt; &lt;a href="#contributions" &gt;Contributions&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;1.2&lt;/span&gt; &lt;a href="#previous-work" &gt;Previous work&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="section-num"&gt;1.2.1&lt;/span&gt; &lt;a href="#needleman-wunsch" &gt;Needleman-Wunsch&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;1.2.2&lt;/span&gt; &lt;a href="#graph-algorithms" &gt;Graph algorithms&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;1.2.3&lt;/span&gt; &lt;a href="#computational-volumes" &gt;Computational volumes&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;1.2.4&lt;/span&gt; &lt;a href="#parallelism" &gt;Parallelism&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;1.2.5&lt;/span&gt; &lt;a href="#tools" &gt;Tools&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2&lt;/span&gt; &lt;a href="#preliminaries" &gt;Preliminaries&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;3&lt;/span&gt; &lt;a href="#methods" &gt;Methods&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="section-num"&gt;3.1&lt;/span&gt; &lt;a href="#band-doubling" &gt;Band-doubling&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;3.2&lt;/span&gt; &lt;a href="#blocks" &gt;Blocks&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;3.3&lt;/span&gt; &lt;a href="#memory" &gt;Memory&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;3.4&lt;/span&gt; &lt;a href="#simd" &gt;SIMD&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;3.5&lt;/span&gt; &lt;a href="#simd-friendly-sequence-profile" &gt;SIMD-friendly sequence profile&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;3.6&lt;/span&gt; &lt;a href="#traceback" &gt;Traceback&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;3.7&lt;/span&gt; &lt;a href="#a" &gt;A*&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="section-num"&gt;3.7.1&lt;/span&gt; &lt;a href="#bulk-contours-update" &gt;Bulk-contours update&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;3.7.2&lt;/span&gt; &lt;a href="#pre-pruning" &gt;Pre-pruning&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;3.8&lt;/span&gt; &lt;a href="#determining-the-rows-to-compute" &gt;Determining the rows to compute&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="section-num"&gt;3.8.1&lt;/span&gt; &lt;a href="#sparse-heuristic-invocation" &gt;Sparse heuristic invocation&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;3.9&lt;/span&gt; &lt;a href="#incremental-doubling" &gt;Incremental doubling&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;4&lt;/span&gt; &lt;a href="#results" &gt;Results&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="section-num"&gt;4.1&lt;/span&gt; &lt;a href="#setup" &gt;Setup&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;4.2&lt;/span&gt; &lt;a href="#comparison-with-other-aligners" &gt;Comparison with other aligners&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;4.3&lt;/span&gt; &lt;a href="#effects-of-methods" &gt;Effects of methods&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;5&lt;/span&gt; &lt;a href="#discussion" &gt;Discussion&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#acknowledgements" &gt;Acknowledgements&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#conflict-of-interest" &gt;Conflict of interest&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;6&lt;/span&gt; &lt;a href="#appendix" &gt;Appendix&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="section-num"&gt;6.1&lt;/span&gt; &lt;a href="#bitpacking" &gt;Bitpacking&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;6.2&lt;/span&gt; &lt;a href="#app-comparison" &gt;Comparison with other aligners&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;6.3&lt;/span&gt; &lt;a href="#app-effects" &gt;Effects of methods&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;!--endtoc--&gt;
&lt;p&gt;\begin{equation*}
\newcommand{\g}{g^*}
\newcommand{\h}{h^*}
\newcommand{\f}{f^*}
\newcommand{\cgap}{c_{\textrm{gap}}}
\newcommand{\xor}{\ \mathrm{xor}\ }
\newcommand{\and}{\ \mathrm{and}\ }
\newcommand{\st}[2]{\langle #1, #2\rangle}
\newcommand{\matches}{\mathcal M}
\end{equation*}&lt;/p&gt;</description></item><item><title>A*PA talk @ CWI</title><link>https://curiouscoding.nl/posts/astarpa-talk-cwi/</link><pubDate>Wed, 27 Dec 2023 00:00:00 +0100</pubDate><guid>https://curiouscoding.nl/posts/astarpa-talk-cwi/</guid><description>&lt;p&gt;I recently gave a talk about A*PA at CWI.
Sadly the recording doesn&amp;rsquo;t show the blackboard, but either way, find it &lt;a href="https://ragnargrootkoerkamp.nl/upload/astarpa-talk-cwi.mp4" class="external-link" target="_blank" rel="noopener"&gt;here&lt;/a&gt;.&lt;/p&gt;</description></item><item><title>BitPAl bitpacking algorithm</title><link>https://curiouscoding.nl/posts/bitpal/</link><pubDate>Sun, 03 Sep 2023 00:00:00 +0200</pubDate><guid>https://curiouscoding.nl/posts/bitpal/</guid><description>&lt;div class="ox-hugo-toc toc"&gt;
&lt;div class="heading"&gt;Table of Contents&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#problem" &gt;Problem&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#input" &gt;Input&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#example" &gt;Example&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#discussion" &gt;Discussion&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#found-the-bug" &gt;Found the bug&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#outlook" &gt;Outlook&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;!--endtoc--&gt;
&lt;p&gt;The supplement (&lt;a href="https://oup.silverchair-cdn.com/oup/backfile/Content_public/Journal/bioinformatics/30/22/10.1093_bioinformatics_btu507/3/bioinformatics_30_22_3166_s1.zip?Expires=1695376479&amp;amp;Signature=vroWHrpg-P0tvOPcafVy~gh6mhZ-AZ8kj6lHr1DH7byZGTK2sy8chti7hDiWdbtGx6onKv94EAI5odd~GMBMG0GNXxfp1bZ~7ItGeNCXp0tosJpArez7Yo~PuKT77nJpgQYo5rabbkJ6qtvP3-V-41oznQ~Zh9Tl~GNLvjLo~5vq0D1wa4PMmqhc-C0zcEeh8ybqEK7hQdyvoxreWppOTZFIHIJwmZOSOeXBWM0fQhcPnM9ZU8cEsqAI64WuWt1AJgmDOPDTBVzQHmHpsl01F4Jt8Hf2gvDYwhmoM7t4U~qCIGFr4raran~hzr-eD2vhwexQhpC7e1U2~N2lMC7e7w__&amp;amp;Key-Pair-Id=APKAIE5G5CRDK6RD3PGA" class="external-link" target="_blank" rel="noopener"&gt;download&lt;/a&gt;) of the Loving, Hernandez, and Benson (&lt;a href="#citeproc_bib_item_1"&gt;2014&lt;/a&gt;) paper introduces a \(15\)
operation version of Myers (&lt;a href="#citeproc_bib_item_2"&gt;1999&lt;/a&gt;) bitpacking algorithm, which uses \(16\)
operations when modified for edit distance.&lt;/p&gt;
&lt;p&gt;I tried implementing it, but it seems to have a bug that I will describe below.
The fix is &lt;a href="#found-the-bug" &gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id="problem"&gt;
 Problem
 &lt;a class="heading-link" href="#problem"&gt;
 &lt;i class="fa-solid fa-link" aria-hidden="true" title="Link to heading"&gt;&lt;/i&gt;
 &lt;span class="sr-only"&gt;Link to heading&lt;/span&gt;
 &lt;/a&gt;
&lt;/h2&gt;
&lt;p&gt;To recap, this algorithm solves the unit-cost edit distance problem by using
bitpacking to compute a \(1\times w\) at a time. As input, it takes&lt;/p&gt;</description></item><item><title>Shortest paths, bucket queues, and A* on the edit graph</title><link>https://curiouscoding.nl/posts/shortest_path_history/</link><pubDate>Sat, 29 Jul 2023 00:00:00 +0200</pubDate><guid>https://curiouscoding.nl/posts/shortest_path_history/</guid><description>&lt;div class="ox-hugo-toc toc"&gt;
&lt;div class="heading"&gt;Table of Contents&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#shortest-path-algorithms-dot-dot" &gt;Shortest path algorithms ..&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#dot-dot-in-general" &gt;.. in general&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#dot-dot-for-circuit-design" &gt;.. for circuit design&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#bucket-queues" &gt;Bucket queues&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#shortest-path-algorithms-by-hadlock" &gt;Shortest path algorithms by Hadlock&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#grid-graphs" &gt;Grid graphs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#strings" &gt;Strings&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#spouge-s-computational-volumes" &gt;Spouge&amp;rsquo;s computational volumes&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;!--endtoc--&gt;
&lt;p&gt;This note summarizes some papers I was reading while investigating the history
of A* for pairwise alignment, and related to that the first usage of a &lt;em&gt;bucket
queue&lt;/em&gt;. Schrijver (&lt;a href="#citeproc_bib_item_16"&gt;2012&lt;/a&gt;) provides a nice overview of general shortest path methods.&lt;/p&gt;</description></item><item><title>The complexity and performance of WFA and band doubling</title><link>https://curiouscoding.nl/posts/wfa-edlib-perf/</link><pubDate>Thu, 17 Nov 2022 00:00:00 +0100</pubDate><guid>https://curiouscoding.nl/posts/wfa-edlib-perf/</guid><description>&lt;div class="ox-hugo-toc toc"&gt;
&lt;div class="heading"&gt;Table of Contents&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#complexity-analysis" &gt;Complexity analysis&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#complexity-of-edit-distance" &gt;Complexity of edit distance&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#complexity-of-affine-cost-alignment" &gt;Complexity of affine cost alignment&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#comparison" &gt;Comparison&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#implementation-efficiency" &gt;Implementation efficiency&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#band-doubling-for-affine-scores-was-never-implemented" &gt;Band doubling for affine scores was never implemented&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#wfa-vs-band-doubling-for-affine-costs" &gt;WFA vs band doubling for affine costs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#conclusion" &gt;Conclusion&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#future-work" &gt;Future work&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;!--endtoc--&gt;
&lt;p&gt;This note explores the complexity and performance of band doubling (Edlib) and WFA under varying cost models.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://github.com/Martinsos/edlib" class="external-link" target="_blank" rel="noopener"&gt;Edlib&lt;/a&gt; (&lt;a href="#citeproc_bib_item_5"&gt;Šošić and Šikić 2017&lt;/a&gt;) uses band doubling and runs in \(O(ns)\) time, for sequence length \(n\)
and edit distance \(s\) between the two sequences.&lt;/p&gt;</description></item><item><title>Local Doubling</title><link>https://curiouscoding.nl/posts/local-doubling/</link><pubDate>Wed, 19 Oct 2022 00:00:00 +0200</pubDate><guid>https://curiouscoding.nl/posts/local-doubling/</guid><description>&lt;div class="ox-hugo-toc toc"&gt;
&lt;div class="heading"&gt;Table of Contents&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#notation" &gt;Notation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#needleman-wunsch-where-it-all-begins" &gt;Needleman-Wunsch: where it all begins&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#dijkstra-bfs-visiting-fewer-states" &gt;Dijkstra/BFS: visiting fewer states&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#band-doubling-dijkstra-but-more-efficient" &gt;Band doubling: Dijkstra, but more efficient&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#gapcost-a-first-heuristic" &gt;GapCost: A first heuristic&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#computational-volumes-an-even-smaller-search" &gt;Computational volumes: an even smaller search&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#cheating-an-oracle-gave-us-g" &gt;Cheating: an oracle gave us \(g^*\)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#a-better-heuristics" &gt;A*: Better heuristics&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#broken-idea-a-and-computational-volumes" &gt;Broken idea: A* and computational volumes&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#local-doubling" &gt;Local doubling&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#without-heuristic" &gt;Without heuristic&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#with-heuristic" &gt;With heuristic&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#diagonal-transition" &gt;Diagonal Transition&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#a-with-diagonal-transition-and-pruning-doing-less-work" &gt;A* with Diagonal Transition and pruning: doing less work&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#goal-diagonal-transition-plus-pruning-plus-local-doubling" &gt;Goal: Diagonal Transition + pruning + local doubling&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#pruning-improving-a-heuristics-on-the-go" &gt;Pruning: Improving A* heuristics on the go&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#cheating-more-an-oracle-gave-us-the-optimal-path" &gt;Cheating more: an oracle gave us the optimal path&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#todo-aspriation-windows" &gt;TODO: aspriation windows&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;!--endtoc--&gt;
&lt;p&gt;\begin{equation*}
\newcommand{\st}[2]{\langle #1,#2\rangle}
\newcommand{\g}{g^*}
\newcommand{\fm}{f_{max}}
\newcommand{\gap}{\operatorname{Gap}}
\end{equation*}&lt;/p&gt;</description></item><item><title>Competitive Programming Lecture</title><link>https://curiouscoding.nl/posts/competitive-programming-lecture/</link><pubDate>Wed, 28 Sep 2022 00:00:00 +0200</pubDate><guid>https://curiouscoding.nl/posts/competitive-programming-lecture/</guid><description>&lt;div class="ox-hugo-toc toc"&gt;
&lt;div class="heading"&gt;Table of Contents&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#contest-strategies" &gt;Contest strategies&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#pairwise-alignment-using-a" &gt;Pairwise Alignment using A*&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#exercises" &gt;Exercises&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;!--endtoc--&gt;
&lt;h2 id="contest-strategies"&gt;
 Contest strategies
 &lt;a class="heading-link" href="#contest-strategies"&gt;
 &lt;i class="fa-solid fa-link" aria-hidden="true" title="Link to heading"&gt;&lt;/i&gt;
 &lt;span class="sr-only"&gt;Link to heading&lt;/span&gt;
 &lt;/a&gt;
&lt;/h2&gt;
&lt;dl&gt;
&lt;dt&gt;Preparation&lt;/dt&gt;
&lt;dd&gt;&lt;ul&gt;
&lt;li&gt;Thinking costs energy!&lt;/li&gt;
&lt;li&gt;Sleep enough; early to bed the 2 nights before.&lt;/li&gt;
&lt;li&gt;No practising on contest day (and the day before); it just takes energy.&lt;/li&gt;
&lt;/ul&gt;
&lt;/dd&gt;
&lt;dt&gt;During the contest&lt;/dt&gt;
&lt;dd&gt;&lt;ul&gt;
&lt;li&gt;Eat! At the very least take a break halfway with the entire team and eat some snacks.&lt;/li&gt;
&lt;li&gt;Make sure to read &lt;strong&gt;all&lt;/strong&gt; the problems before the end of the contest. In the
beginning, split the problems to find the simple ones, but towards the end,
find a problem you think you can solve (because of the scoreboard or because
you like it), and work on it as a team.&lt;/li&gt;
&lt;/ul&gt;
&lt;/dd&gt;
&lt;dt&gt;Coding&lt;/dt&gt;
&lt;dd&gt;&lt;ul&gt;
&lt;li&gt;Ideally, use C++. Otherwise, Python can be used too.
&lt;ul&gt;
&lt;li&gt;For big-integer problems, prefer Python.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Use a TCR (e.g. &lt;a href="https://github.com/TimonKnigge/TCR" class="external-link" target="_blank" rel="noopener"&gt;https://github.com/TimonKnigge/TCR&lt;/a&gt;): a 25 page document
containing algorithms. Ideally, implement all of them yourself so you know
how they work. Otherwise download one.&lt;/li&gt;
&lt;li&gt;Make a template, and add it to your TCR. One person should type this in the
first minutes of the contest and copy it to &lt;code&gt;A.cpp&lt;/code&gt;, &lt;code&gt;B.cpp&lt;/code&gt;, &amp;hellip; .&lt;/li&gt;
&lt;li&gt;When you think you solved a problem:
&lt;ul&gt;
&lt;li&gt;Decide &lt;em&gt;exactly&lt;/em&gt; how the code will look. Maybe write pseudocode on paper.&lt;/li&gt;
&lt;li&gt;For hard problems: verify your solution with a teammate.&lt;/li&gt;
&lt;li&gt;Once the keyboard is free, start typing it out. If needed, ask one
teammate to look while you code.&lt;/li&gt;
&lt;li&gt;Typical distribution:
&lt;ul&gt;
&lt;li&gt;1 person typing&lt;/li&gt;
&lt;li&gt;1 person solving a new problem&lt;/li&gt;
&lt;li&gt;1 person helping the other 2: spotting typos or working on problems.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;h2 id="pairwise-alignment-using-a"&gt;
 Pairwise Alignment using A*
 &lt;a class="heading-link" href="#pairwise-alignment-using-a"&gt;
 &lt;i class="fa-solid fa-link" aria-hidden="true" title="Link to heading"&gt;&lt;/i&gt;
 &lt;span class="sr-only"&gt;Link to heading&lt;/span&gt;
 &lt;/a&gt;
&lt;/h2&gt;
&lt;p&gt;Some resources you can use:&lt;/p&gt;</description></item><item><title>Speeding up A*: computational volumes and path-pruning</title><link>https://curiouscoding.nl/posts/speeding-up-astar/</link><pubDate>Fri, 23 Sep 2022 00:00:00 +0200</pubDate><guid>https://curiouscoding.nl/posts/speeding-up-astar/</guid><description>&lt;div class="ox-hugo-toc toc"&gt;
&lt;div class="heading"&gt;Table of Contents&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#motivation" &gt;Motivation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#summary" &gt;Summary&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#why-is-a-slow" &gt;Why is A* slow?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#computational-volumes" &gt;Computational volumes&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#dealing-with-pruning" &gt;Dealing with pruning&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#thoughts-on-more-aggressive-pruning" &gt;Thoughts on more aggressive pruning&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#algorithm-summary" &gt;Algorithm summary&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#challenges" &gt;Challenges&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#results" &gt;Results&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#what-about-band-doubling" &gt;What about band-doubling?&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#maybe-doubling-can-work-after-all" &gt;Maybe doubling can work after all?&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#todos" &gt;TODOs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#extensions" &gt;Extensions&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;!--endtoc--&gt;
&lt;p&gt;This post build on top of our recent preprint Groot Koerkamp and Ivanov (&lt;a href="#citeproc_bib_item_1"&gt;2024&lt;/a&gt;) and gives an
overview of some of my new ideas to significantly speed up exact global pairwise
alignment. It&amp;rsquo;s recommended you understand the &lt;em&gt;seed heuristic&lt;/em&gt; and &lt;em&gt;match
pruning&lt;/em&gt; before reading this post.&lt;/p&gt;</description></item><item><title>Linear memory WFA?</title><link>https://curiouscoding.nl/posts/linear-memory-wfa/</link><pubDate>Wed, 17 Aug 2022 00:00:00 +0200</pubDate><guid>https://curiouscoding.nl/posts/linear-memory-wfa/</guid><description>&lt;div class="ox-hugo-toc toc"&gt;
&lt;div class="heading"&gt;Table of Contents&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#motivation" &gt;Motivation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#path-traceback-two-strategies" &gt;Path traceback: two strategies&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#observations" &gt;Observations&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#what-information-is-needed-for-path-tracing" &gt;What information is needed for path tracing&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#a-pragmatic-solution" &gt;A pragmatic solution&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#another-interpretation" &gt;Another interpretation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#affine-costs" &gt;Affine costs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#conclusion" &gt;Conclusion&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;!--endtoc--&gt;
&lt;p&gt;&lt;a id="figure--result"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;figure&gt;&lt;a href="https://curiouscoding.nl/ox-hugo/simple-final.png"&gt;&lt;img src="https://curiouscoding.nl/ox-hugo/simple-final.png"
 alt="Figure 1: Only the red substitutions and blue indel need to be stored to trace the entire path."&gt;&lt;/a&gt;&lt;figcaption&gt;
 &lt;p&gt;&lt;span class="figure-number"&gt;Figure 1: &lt;/span&gt;Only the red substitutions and blue indel need to be stored to trace the entire path.&lt;/p&gt;</description></item><item><title>Transforming match bonus into cost</title><link>https://curiouscoding.nl/posts/alignment-scores-transform/</link><pubDate>Tue, 16 Aug 2022 00:00:00 +0200</pubDate><guid>https://curiouscoding.nl/posts/alignment-scores-transform/</guid><description>&lt;div class="ox-hugo-toc toc"&gt;
&lt;div class="heading"&gt;Table of Contents&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#tricks-with-match-bonus-or-how-to-fool-dijkstra-s-limitations" &gt;Tricks with match bonus or how to fool Dijkstra&amp;rsquo;s limitations&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#edit-graph" &gt;Edit graph&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#algorithms" &gt;Algorithms&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#potentials" &gt;Potentials&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#multiple-variants" &gt;Multiple variants&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#some-notes-on-algorithms" &gt;Some notes on algorithms&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#wfa" &gt;WFA&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#a" &gt;A*&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#extending-to-different-cost-models" &gt;Extending to different cost models&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#affine-costs" &gt;Affine costs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#substitution-matrices" &gt;Substitution matrices&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#but-not-local-alignment" &gt;But not local alignment&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#evaluations" &gt;Evaluations&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#unequal-string-length" &gt;Unequal string length&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#equal-string-lengths" &gt;Equal string lengths&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#conclusion" &gt;Conclusion&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;!--endtoc--&gt;
&lt;h2 id="tricks-with-match-bonus-or-how-to-fool-dijkstra-s-limitations"&gt;
 Tricks with match bonus or how to fool Dijkstra&amp;rsquo;s limitations
 &lt;a class="heading-link" href="#tricks-with-match-bonus-or-how-to-fool-dijkstra-s-limitations"&gt;
 &lt;i class="fa-solid fa-link" aria-hidden="true" title="Link to heading"&gt;&lt;/i&gt;
 &lt;span class="sr-only"&gt;Link to heading&lt;/span&gt;
 &lt;/a&gt;
&lt;/h2&gt;
&lt;p&gt;&lt;em&gt;The reader is assumed to have basic knowledge about pairwise alignment and graph theory.&lt;/em&gt;&lt;/p&gt;</description></item><item><title>Diamond optimisation for diagonal transition</title><link>https://curiouscoding.nl/posts/diamond-optimization/</link><pubDate>Mon, 01 Aug 2022 00:00:00 +0200</pubDate><guid>https://curiouscoding.nl/posts/diamond-optimization/</guid><description>&lt;div class="ox-hugo-toc toc"&gt;
&lt;div class="heading"&gt;Table of Contents&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#diamond-transition-or-how-technicalities-can-break-concepts" &gt;Diamond transition or how technicalities can break concepts&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#but-let-s-take-a-closer-look" &gt;But let’s take a closer look&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#conclusion" &gt;Conclusion&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;!--endtoc--&gt;
&lt;h2 id="diamond-transition-or-how-technicalities-can-break-concepts"&gt;
 Diamond transition or how technicalities can break concepts
 &lt;a class="heading-link" href="#diamond-transition-or-how-technicalities-can-break-concepts"&gt;
 &lt;i class="fa-solid fa-link" aria-hidden="true" title="Link to heading"&gt;&lt;/i&gt;
 &lt;span class="sr-only"&gt;Link to heading&lt;/span&gt;
 &lt;/a&gt;
&lt;/h2&gt;
&lt;p&gt;&lt;em&gt;We assume the reader has some basic knowledge about pairwise alignment
and in particular the WFA algorithm.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;In this post we dive into a potential 2x speedup of WFA &amp;mdash; one that turns out not to work.&lt;/p&gt;</description></item><item><title>The BiWFA meeting condition</title><link>https://curiouscoding.nl/posts/biwfa-meeting-condition/</link><pubDate>Mon, 11 Jul 2022 00:00:00 +0200</pubDate><guid>https://curiouscoding.nl/posts/biwfa-meeting-condition/</guid><description>&lt;p&gt;&lt;strong&gt;cross references:&lt;/strong&gt; &lt;a href="https://github.com/smarco/BiWFA-paper/issues/8" class="external-link" target="_blank" rel="noopener"&gt;BiWFA GitHub issue&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;It seems that getting the meeting/overlap condition of BiWFA
(Marco-Sola et al. (&lt;a href="#citeproc_bib_item_1"&gt;2023&lt;/a&gt;), Algorithm 1 and Lemma 2.1) correct is tricky.&lt;/p&gt;
&lt;p&gt;Let \(p := \max(x, o+e)\) be the maximal cost of any edge in the edit graph.
As in the BiWFA paper, let \(s_f\) and \(s_r\) be the distances of the &lt;em&gt;forward&lt;/em&gt; and
&lt;em&gt;reverse&lt;/em&gt; fronts computed so far.&lt;/p&gt;
&lt;p&gt;We prove the following lemma:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Lemma&lt;/strong&gt;
Once BiWFA has expanded the forward and reverse fronts up to \(s_f\) and \(s_r\) and
has found &lt;em&gt;some&lt;/em&gt; path of cost \(s \leq s_f + s_r\),
expanding the fronts until \(s&amp;rsquo;_f + s&amp;rsquo;_r \geq s+p+o\) is guaranteed to find a
&lt;em&gt;shortest&lt;/em&gt; path.&lt;/p&gt;</description></item><item><title>Proof sketch for linear time seed heuristic alignment</title><link>https://curiouscoding.nl/posts/linear-time-pa/</link><pubDate>Sun, 24 Apr 2022 00:00:00 +0200</pubDate><guid>https://curiouscoding.nl/posts/linear-time-pa/</guid><description>&lt;div class="ox-hugo-toc toc"&gt;
&lt;div class="heading"&gt;Table of Contents&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#pairwise-alignment-in-subquadratic-time" &gt;Pairwise alignment in subquadratic time&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#random-model" &gt;Random model&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#algorithm" &gt;Algorithm&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#seed-heuristic" &gt;Seed heuristic&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#match-pruning" &gt;Match pruning&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#analysis" &gt;Analysis&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#expanded-states" &gt;Expanded states&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#excess-errors" &gt;Excess errors&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#algorithmic-complexity" &gt;Algorithmic complexity&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;!--endtoc--&gt;
&lt;p&gt;This post is a proof sketch to show that A* with the &lt;em&gt;seed heuristic&lt;/em&gt;
(&lt;a href="#citeproc_bib_item_3"&gt;Groot Koerkamp and Ivanov 2024&lt;/a&gt;) does exact pairwise alignment of random strings with random
mutations in near linear time.&lt;/p&gt;
&lt;h2 id="pairwise-alignment-in-subquadratic-time"&gt;
 Pairwise alignment in subquadratic time
 &lt;a class="heading-link" href="#pairwise-alignment-in-subquadratic-time"&gt;
 &lt;i class="fa-solid fa-link" aria-hidden="true" title="Link to heading"&gt;&lt;/i&gt;
 &lt;span class="sr-only"&gt;Link to heading&lt;/span&gt;
 &lt;/a&gt;
&lt;/h2&gt;
&lt;p&gt;Backurs and Indyk (&lt;a href="#citeproc_bib_item_1"&gt;2018&lt;/a&gt;) show that computing edit distance can not be
done in strongly subquadratic time (i.e. \(O(n^{2-\delta})\) for any \(\delta &amp;gt;0\))
assuming the Strong Exponential Time Hypothesis.&lt;/p&gt;</description></item><item><title>Variations on the WFA recursion</title><link>https://curiouscoding.nl/posts/wfa-variations/</link><pubDate>Sun, 17 Apr 2022 03:14:00 +0200</pubDate><guid>https://curiouscoding.nl/posts/wfa-variations/</guid><description>&lt;div class="ox-hugo-toc toc"&gt;
&lt;div class="heading"&gt;Table of Contents&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#gap-open" &gt;Gap open&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#gap-close" &gt;Gap close&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#symmetric-alternatives" &gt;Symmetric alternatives&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#another-symmetry" &gt;Another symmetry&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#conclusions" &gt;Conclusions&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;!--endtoc--&gt;
&lt;p&gt;&lt;strong&gt;cross references:&lt;/strong&gt; &lt;a href="https://github.com/smarco/BiWFA-paper/issues/4" class="external-link" target="_blank" rel="noopener"&gt;BiWFA GitHub issue&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;In this post I will explore some variations of the recursion used by WFA/BiWFA
for the affine version of the diagonal transition algorithm.
In particular, we will go over a &lt;em&gt;gap-close&lt;/em&gt; variant, and look into some more symmetric
formulations.&lt;/p&gt;
&lt;h2 id="gap-open"&gt;
 Gap open
 &lt;a class="heading-link" href="#gap-open"&gt;
 &lt;i class="fa-solid fa-link" aria-hidden="true" title="Link to heading"&gt;&lt;/i&gt;
 &lt;span class="sr-only"&gt;Link to heading&lt;/span&gt;
 &lt;/a&gt;
&lt;/h2&gt;
&lt;p&gt;WFA (&lt;a href="#citeproc_bib_item_3"&gt;Marco-Sola et al. 2021&lt;/a&gt;) introduces the affine cost variant of the classic diagonal
transition method.
Let us call it a &lt;strong&gt;gap-open&lt;/strong&gt; variant, because the gap-open cost \(o\) is payed when
opening the gap, that is, when jumping from the \(M\) &lt;em&gt;layer&lt;/em&gt; to the \(I\) or \(D\) &lt;em&gt;layer&lt;/em&gt;.&lt;/p&gt;</description></item><item><title>A survey of exact global pairwise alignment</title><link>https://curiouscoding.nl/posts/pairwise-alignment-history/</link><pubDate>Fri, 01 Apr 2022 00:00:00 +0200</pubDate><guid>https://curiouscoding.nl/posts/pairwise-alignment-history/</guid><description>&lt;div class="ox-hugo-toc toc"&gt;
&lt;div class="heading"&gt;Table of Contents&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#variants-of-pairwise-alignment" &gt;Variants of pairwise alignment&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#cost-models" &gt;Cost models&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#alignment-types" &gt;Alignment types&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#a-chronological-overview-of-global-pairwise-alignment" &gt;A chronological overview of global pairwise alignment&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#algorithms-in-detail" &gt;Algorithms in detail&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#classic-dp-algorithms" &gt;Classic DP algorithms&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#cubic-dp" &gt;Cubic algorithm of Needleman and Wunsch (&lt;a href="#citeproc_bib_item_25"&gt;1970&lt;/a&gt;)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#quadratic-dp" &gt;A quadratic DP&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#local-alignment" &gt;Local alignment&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#affine-costs" &gt;Affine costs&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#minimizing-vs-dot-maximizing-duality" &gt;Minimizing vs. maximizing duality&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#four-russians" &gt;Four Russians method&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#o--ns--methods" &gt;&lt;span class="org-todo todo TODO"&gt;TODO&lt;/span&gt; \(O(ns)\) methods&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#exponential-band" &gt;&lt;span class="org-todo todo TODO"&gt;TODO&lt;/span&gt; Exponential search on band&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#thresholds" &gt;&lt;span class="org-todo todo TODO"&gt;TODO&lt;/span&gt; LCS: thresholds, $k$-candidates and contours&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#diagonal-transition" &gt;&lt;span class="org-todo todo TODO"&gt;TODO&lt;/span&gt; Diagonal transition: furthest reaching and wavefronts&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#ns2" &gt;&lt;span class="org-todo todo TODO"&gt;TODO&lt;/span&gt; Suffixtree for \(O(n+s^2)\) expected runtime&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#using-less-memory" &gt;Using less memory&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#computing-the-score-in-linear-space" &gt;Computing the score in linear space&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#divide-and-conquer" &gt;Divide-and-conquer&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#lcsk-plus-plus-algorithms" &gt;&lt;span class="org-todo todo TODO"&gt;TODO&lt;/span&gt; LCSk[++] algorithms&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#theoretical-lower-bound" &gt;Theoretical lower bound&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#a-note-on-dp--toposort--vs-dijkstra-vs-a" &gt;&lt;span class="org-todo todo TODO"&gt;TODO&lt;/span&gt; A note on DP (toposort) vs Dijkstra vs A*&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#tools" &gt;&lt;span class="org-todo todo TODO"&gt;TODO&lt;/span&gt; Tools&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#notes-for-other-posts" &gt;&lt;span class="org-todo todo TODO"&gt;TODO&lt;/span&gt; Notes for other posts&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#semi-global-alignment-papers" &gt;Semi-global alignment papers&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#approximate-pairwise-aligners" &gt;Approximate pairwise aligners&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#old-vs-new-papers" &gt;Old vs new papers&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;!--endtoc--&gt;
&lt;p&gt;This post explains the many variants of pairwise alignment, and covers papers
defining and exploring the topic.&lt;/p&gt;</description></item><item><title>AStarix</title><link>https://curiouscoding.nl/posts/astarix/</link><pubDate>Fri, 12 Nov 2021 13:05:00 +0100</pubDate><guid>https://curiouscoding.nl/posts/astarix/</guid><description>&lt;p&gt;&lt;strong&gt;Papers&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.biorxiv.org/content/10.1101/2020.01.22.915496v2.full" class="external-link" target="_blank" rel="noopener"&gt;AStarix: Fast and Optimal Sequence-to-Graph Alignment&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.biorxiv.org/content/10.1101/2021.11.05.467453v1" class="external-link" target="_blank" rel="noopener"&gt;Fast and Optimal Sequence-to-Graph Alignment Guided by Seeds&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;AStarix is a method for aligning sequences (reads) to graphs:&lt;/p&gt;
&lt;dl&gt;
&lt;dt&gt;Input&lt;/dt&gt;
&lt;dd&gt;&lt;ul&gt;
&lt;li&gt;A reference sequence or graph&lt;/li&gt;
&lt;li&gt;Alignment costs \((\Delta_{match}, \Delta_{subst}, \Delta_{del}, \Delta_{ins})\) for a match, substitution, insertion and deletion&lt;/li&gt;
&lt;li&gt;Sequence(s) to align&lt;/li&gt;
&lt;/ul&gt;
&lt;/dd&gt;
&lt;dt&gt;Output&lt;/dt&gt;
&lt;dd&gt;An optimal alignment of each input sequence&lt;/dd&gt;
&lt;/dl&gt;
&lt;p&gt;The input is a reference graph (automaton really) \(G_r = (V_r, E_r)\) with edges \(E_r \subseteq
V_r\times V_r\times \Sigma\) that indicate the transitions between states.&lt;/p&gt;</description></item></channel></rss>