<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Note on CuriousCoding</title><link>https://curiouscoding.nl/tags/note/</link><description>Recent content in Note on CuriousCoding</description><generator>Hugo</generator><language>en</language><lastBuildDate>Tue, 17 Feb 2026 00:00:00 +0100</lastBuildDate><atom:link href="https://curiouscoding.nl/tags/note/index.xml" rel="self" type="application/rss+xml"/><item><title>HPRC v2 stats</title><link>https://curiouscoding.nl/posts/hprc-v2-stats/</link><pubDate>Fri, 26 Dec 2025 00:00:00 +0100</pubDate><guid>https://curiouscoding.nl/posts/hprc-v2-stats/</guid><description>&lt;p&gt;The Movi 2 paper (&lt;a href="#citeproc_bib_item_3"&gt;Zakeri et al. 2025&lt;/a&gt;) builds a fast index on HPRCv2 (&lt;a href="#citeproc_bib_item_1"&gt;Human Pangenome Reference Consortium 2025&lt;/a&gt;), a collection of
466 human genomes. This posts
collects some statistics on the number of BWT runs (\(r\)) of this dataset, and
makes an estimate on the number of unique mutations based on that.&lt;/p&gt;
&lt;div class="ox-hugo-table small"&gt;
&lt;div class="table-caption"&gt;
 &lt;span class="table-number"&gt;Table 1:&lt;/span&gt;
 Number of BWT runs and average run length for a random 3.2Gbp string, a human genome, and HPRCv1 and v2. The average run-length in HPRC is taken from Zakeri et al. (&lt;a href="#citeproc_bib_item_3"&gt;2025&lt;/a&gt;).
&lt;/div&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;dataset&lt;/th&gt;
 &lt;th&gt;copies&lt;/th&gt;
 &lt;th&gt;rc?&lt;/th&gt;
 &lt;th&gt;length (Gbp)&lt;/th&gt;
 &lt;th&gt;avg run-len&lt;/th&gt;
 &lt;th&gt;runs (G)&lt;/th&gt;
 &lt;th&gt;est total mut&amp;rsquo;s&lt;/th&gt;
 &lt;th&gt;unique mut rate&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;random&lt;/td&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;td&gt;no&lt;/td&gt;
 &lt;td&gt;3.2&lt;/td&gt;
 &lt;td&gt;1.33&lt;/td&gt;
 &lt;td&gt;2.40&lt;/td&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;CHM13v2.0&lt;/td&gt;
 &lt;td&gt;1&lt;/td&gt;
 &lt;td&gt;no&lt;/td&gt;
 &lt;td&gt;3.2&lt;/td&gt;
 &lt;td&gt;1.85&lt;/td&gt;
 &lt;td&gt;1.72&lt;/td&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;HPRCv1&lt;/td&gt;
 &lt;td&gt;94&lt;/td&gt;
 &lt;td&gt;yes&lt;/td&gt;
 &lt;td&gt;2x 301&lt;/td&gt;
 &lt;td&gt;134&lt;/td&gt;
 &lt;td&gt;2.25&lt;/td&gt;
 &lt;td&gt;33M&lt;/td&gt;
 &lt;td&gt;1/8900&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;HPRCv2&lt;/td&gt;
 &lt;td&gt;466&lt;/td&gt;
 &lt;td&gt;yes&lt;/td&gt;
 &lt;td&gt;2x 1500&lt;/td&gt;
 &lt;td&gt;535&lt;/td&gt;
 &lt;td&gt;2.80&lt;/td&gt;
 &lt;td&gt;68M&lt;/td&gt;
 &lt;td&gt;1/22000&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
&lt;p&gt;TODO: Update for the fact that HPRC run-lengths are for the version &lt;span class="underline"&gt;with&lt;/span&gt; rc!
(Include single human genome with rc)&lt;/p&gt;</description></item><item><title>Path Pruning Revisited</title><link>https://curiouscoding.nl/posts/path-pruning/</link><pubDate>Mon, 31 Mar 2025 00:00:00 +0200</pubDate><guid>https://curiouscoding.nl/posts/path-pruning/</guid><description>&lt;div class="ox-hugo-toc toc has-section-numbers"&gt;
&lt;div class="heading"&gt;Table of Contents&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="section-num"&gt;1&lt;/span&gt; &lt;a href="#early-idea-bottom-up-match-merging--aka-bummer" &gt;Early idea: Bottom-up match-merging (aka BUMMer?)&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="section-num"&gt;1.1&lt;/span&gt; &lt;a href="#some-previous-ideas" &gt;Some previous ideas&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;1.2&lt;/span&gt; &lt;a href="#divide-and-conquer" &gt;Divide &amp;amp; conquer&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;1.3&lt;/span&gt; &lt;a href="#bottom-up-match-merging--bummer" &gt;Bottom-up match merging (BUMMer)&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;!--endtoc--&gt;
&lt;h2 id="early-idea-bottom-up-match-merging--aka-bummer"&gt;
 &lt;span class="section-num"&gt;1&lt;/span&gt; Early idea: Bottom-up match-merging (aka BUMMer?)
 &lt;a class="heading-link" href="#early-idea-bottom-up-match-merging--aka-bummer"&gt;
 &lt;i class="fa-solid fa-link" aria-hidden="true" title="Link to heading"&gt;&lt;/i&gt;
 &lt;span class="sr-only"&gt;Link to heading&lt;/span&gt;
 &lt;/a&gt;
&lt;/h2&gt;
&lt;p&gt;TODO: Move to separate post.&lt;/p&gt;
&lt;p&gt;One thing that becomes clear with mapping is that we don&amp;rsquo;t quite
know where exactly to start the semi-global alignments.
This can be fixed by adding some buffer/padding, but this remains slightly ugly
and iffy.&lt;/p&gt;</description></item><item><title>Types of tigs</title><link>https://curiouscoding.nl/posts/tigs/</link><pubDate>Sun, 09 Mar 2025 00:00:00 +0100</pubDate><guid>https://curiouscoding.nl/posts/tigs/</guid><description>&lt;h3 id="de-bruijn-graph"&gt;
 &lt;span class="section-num"&gt;1&lt;/span&gt; De Bruijn graph
 &lt;a class="heading-link" href="#de-bruijn-graph"&gt;
 &lt;i class="fa-solid fa-link" aria-hidden="true" title="Link to heading"&gt;&lt;/i&gt;
 &lt;span class="sr-only"&gt;Link to heading&lt;/span&gt;
 &lt;/a&gt;
&lt;/h3&gt;
&lt;p&gt;Consider an edge-centric De Bruijn graph, where each edge corresponds to a
k-mer, and nodes are the \(k-1\) overlaps between adjacent k-mers. In the figures,
all edges are directed towards the right.&lt;/p&gt;
&lt;figure class="inset medium"&gt;&lt;img src="https://curiouscoding.nl/ox-hugo/graph.svg"&gt;
&lt;/figure&gt;

&lt;h3 id="k-mers"&gt;
 &lt;span class="section-num"&gt;2&lt;/span&gt; k-mers
 &lt;a class="heading-link" href="#k-mers"&gt;
 &lt;i class="fa-solid fa-link" aria-hidden="true" title="Link to heading"&gt;&lt;/i&gt;
 &lt;span class="sr-only"&gt;Link to heading&lt;/span&gt;
 &lt;/a&gt;
&lt;/h3&gt;
&lt;p&gt;The goal is now to store all edges / k-mers of the graph efficiently.
A &lt;em&gt;spectrum preserving string set&lt;/em&gt; (SPSS) is a set of strings whose k-mers are
the k-mers of the input graph, that does not contain duplicate k-mers (&lt;a href="#citeproc_bib_item_2"&gt;Rahman and Medvedev 2020&lt;/a&gt;).&lt;/p&gt;</description></item><item><title>Minimizer papers</title><link>https://curiouscoding.nl/posts/minimizer-papers/</link><pubDate>Mon, 17 Feb 2025 00:00:00 +0100</pubDate><guid>https://curiouscoding.nl/posts/minimizer-papers/</guid><description>&lt;div class="ox-hugo-toc toc has-section-numbers"&gt;
&lt;div class="heading"&gt;Table of Contents&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="section-num"&gt;1&lt;/span&gt; &lt;a href="#overview" &gt;Overview&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2&lt;/span&gt; &lt;a href="#introduction" &gt;Introduction&lt;/a&gt;
- &lt;a href="#previous-reviews" &gt;Previous reviews&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;3&lt;/span&gt; &lt;a href="#theory-of-sampling-schemes" &gt;Theory of sampling schemes&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="section-num"&gt;3.1&lt;/span&gt; &lt;a href="#questions" &gt;Questions&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;3.2&lt;/span&gt; &lt;a href="#types-of-schemes" &gt;Types of schemes&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;3.3&lt;/span&gt; &lt;a href="#parameter-regimes" &gt;Parameter regimes&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;3.4&lt;/span&gt; &lt;a href="#different-perspectives" &gt;Different perspectives&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;3.5&lt;/span&gt; &lt;a href="#uhs-vs-minimizer-scheme" &gt;UHS vs minimizer scheme&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;3.6&lt;/span&gt; &lt;a href="#asymptotic--bounds" &gt;(Asymptotic) bounds&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;3.7&lt;/span&gt; &lt;a href="#lower-bounds" &gt;Lower bounds&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;4&lt;/span&gt; &lt;a href="#minimizer-schemes" &gt;Minimizer schemes&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="section-num"&gt;4.1&lt;/span&gt; &lt;a href="#orders" &gt;Orders&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;4.2&lt;/span&gt; &lt;a href="#uhs-based-and-search-based-schemes" &gt;UHS-based and search-based schemes&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;4.3&lt;/span&gt; &lt;a href="#pure-schemes" &gt;Pure schemes&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;4.4&lt;/span&gt; &lt;a href="#other-variants" &gt;Other variants&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#selection-schemes" &gt;Selection schemes&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#canonical-minimizers" &gt;Canonical minimizers&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;4.5&lt;/span&gt; &lt;a href="#non-overlapping-string-sets" &gt;Non-overlapping string sets&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;!--endtoc--&gt;
&lt;p&gt;This post is simply a list of brief comments on many papers related to
minimizers, and forms the basis of &lt;a href="https://curiouscoding.nl/posts/minimizers/" &gt;/posts/minimizers/&lt;/a&gt;.&lt;/p&gt;</description></item><item><title>Code snippets for Latex, Rust, and Python</title><link>https://curiouscoding.nl/posts/snippets/</link><pubDate>Wed, 15 Jan 2025 00:00:00 +0100</pubDate><guid>https://curiouscoding.nl/posts/snippets/</guid><description>&lt;div class="ox-hugo-toc toc"&gt;
&lt;div class="heading"&gt;Table of Contents&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#latex" &gt;Latex&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#code-highlighting-minted" &gt;Code highlighting: &lt;code&gt;minted&lt;/code&gt;&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#wide-figure" &gt;Wide figure&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#fit-more-text-on-a-page" &gt;Fit more text on a page&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#rust" &gt;Rust&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#cargo-dot-toml-workspace" &gt;&lt;code&gt;Cargo.toml&lt;/code&gt; workspace&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#release-profile" &gt;Release profile&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#list-exported-functions" &gt;List exported functions&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#read-human-genome-using-needletail" &gt;Read human genome using &lt;code&gt;needletail&lt;/code&gt;&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#prevent-auto-vectorization" &gt;Prevent auto-vectorization&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#sorting" &gt;Sorting&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#python" &gt;Python&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#pretty-plots" &gt;Pretty plots&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#json-to-pivot-table-to-org-table" &gt;Json to pivot table to org table&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#ipe" &gt;IPE&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#make-all-text-textsf" &gt;Make all text &lt;code&gt;\textsf&lt;/code&gt;&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#export-all-named-views" &gt;Export all named views&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;!--endtoc--&gt;
&lt;p&gt;These are some common libraries and code snippets for various tasks.&lt;/p&gt;</description></item><item><title>Tools for suffix array searching</title><link>https://curiouscoding.nl/posts/suffix-array-searching/</link><pubDate>Fri, 14 Jun 2024 00:00:00 +0200</pubDate><guid>https://curiouscoding.nl/posts/suffix-array-searching/</guid><description>&lt;div class="ox-hugo-toc toc has-section-numbers"&gt;
&lt;div class="heading"&gt;Table of Contents&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="section-num"&gt;1&lt;/span&gt; &lt;a href="#sapling" &gt;Sapling&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2&lt;/span&gt; &lt;a href="#pla-index" &gt;PLA-Index&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;3&lt;/span&gt; &lt;a href="#lisa-learned-index" &gt;LISA: learned index&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;!--endtoc--&gt;
&lt;p&gt;Let&amp;rsquo;s summarize some tools for efficiently searching suffix arrays.&lt;/p&gt;
&lt;h2 id="sapling"&gt;
 &lt;span class="section-num"&gt;1&lt;/span&gt; Sapling
 &lt;a class="heading-link" href="#sapling"&gt;
 &lt;i class="fa-solid fa-link" aria-hidden="true" title="Link to heading"&gt;&lt;/i&gt;
 &lt;span class="sr-only"&gt;Link to heading&lt;/span&gt;
 &lt;/a&gt;
&lt;/h2&gt;
&lt;p&gt;Sapling (&lt;a href="#citeproc_bib_item_2"&gt;Kirsche, Das, and Schatz 2020&lt;/a&gt;) works as follows:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Choose a parameter \(p\) store for each of the \(2^p\) &lt;strong&gt;$p$-bit prefixes&lt;/strong&gt; the
corresponding position in the suffix array.&lt;/li&gt;
&lt;li&gt;When querying, first find the bucket for the query prefix. Then do a &lt;strong&gt;linear
interpolation&lt;/strong&gt; inside the bucket.&lt;/li&gt;
&lt;li&gt;Search the area \([-E, +E]\) around the interpolated position, where \(E\) is a
bound on the error of the linear approximation. In practice \(E\) is only a
$95\%$-confidence bound, and if the true value is not in the range, a linear
search with steps of size \(E\) is done.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The paper also introduces a neural network approach to approximating buckets,
but this takes over a day to learn and is slower to query in practice.&lt;/p&gt;</description></item><item><title>Crates for suffix array construction</title><link>https://curiouscoding.nl/posts/suffix-array-crates/</link><pubDate>Thu, 13 Jun 2024 00:00:00 +0200</pubDate><guid>https://curiouscoding.nl/posts/suffix-array-crates/</guid><description>&lt;p&gt;Popular C libraries are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/y-256/libdivsufsort" class="external-link" target="_blank" rel="noopener"&gt;divsufsort&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/IlyaGrebnov/libsais" class="external-link" target="_blank" rel="noopener"&gt;libsais&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Both have a &lt;code&gt;..64&lt;/code&gt; variant that supports input strings longer than &lt;code&gt;2GB&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Rust wrappers:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://crates.io/crates/divsufsort" class="external-link" target="_blank" rel="noopener"&gt;divsufsort&lt;/a&gt;: rust reimplementation, does not support large inputs.&lt;/li&gt;
&lt;li&gt;&lt;a href="https://crates.io/crates/cdivsufsort" class="external-link" target="_blank" rel="noopener"&gt;cdivsufsort&lt;/a&gt;: c-wrapper, does not support large inputs&lt;/li&gt;
&lt;li&gt;&lt;a href="https://crates.io/crates/libdivsufsort-rs" class="external-link" target="_blank" rel="noopener"&gt;livdivsufsort-rs&lt;/a&gt;: c-wrapper, &lt;strong&gt;does&lt;/strong&gt; support large inputs&lt;/li&gt;
&lt;li&gt;&lt;a href="https://crates.io/crates/sais" class="external-link" target="_blank" rel="noopener"&gt;sais&lt;/a&gt;: unrelated to the original library; does not implement a linear time
algorithm anyway&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/Daniel-Liu-c0deb0t/libsais-rs" class="external-link" target="_blank" rel="noopener"&gt;libsais-rs&lt;/a&gt;: Daniel Liu&amp;rsquo;s fork-of-fork of &lt;a href="https://github.com/hucsmn/libsais-rs" class="external-link" target="_blank" rel="noopener"&gt;the original&lt;/a&gt;, but not on crates.io. Supports multithreading
using OpenMP and wraps both the original and 64bit version.&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/Daniel-Liu-c0deb0t/simple-saca" class="external-link" target="_blank" rel="noopener"&gt;simple-saca&lt;/a&gt;: Daniel Liu&amp;rsquo;s bounded-context suffix array construction that is
faster than divsufsort and libsais, but does not return a true fully sorted
suffix array.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="references"&gt;
 References
 &lt;a class="heading-link" href="#references"&gt;
 &lt;i class="fa-solid fa-link" aria-hidden="true" title="Link to heading"&gt;&lt;/i&gt;
 &lt;span class="sr-only"&gt;Link to heading&lt;/span&gt;
 &lt;/a&gt;
&lt;/h2&gt;
&lt;style&gt;.csl-entry{text-indent: -1.5em; margin-left: 1.5em;}&lt;/style&gt;&lt;div class="csl-bib-body"&gt;
&lt;/div&gt;</description></item><item><title>Notes on SsHash</title><link>https://curiouscoding.nl/posts/sshash/</link><pubDate>Mon, 15 Jan 2024 00:00:00 +0100</pubDate><guid>https://curiouscoding.nl/posts/sshash/</guid><description>&lt;div class="ox-hugo-toc toc"&gt;
&lt;div class="heading"&gt;Table of Contents&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#paper-summary" &gt;Paper summary&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#intro" &gt;Intro&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#prelims" &gt;Prelims&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#related-work" &gt;Related work&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#sparse-and-skew-hashing" &gt;Sparse and skew hashing&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#remarks" &gt;Remarks&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#ideas" &gt;Ideas&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;!--endtoc--&gt;
&lt;p&gt;\[\newcommand{\S}{\mathcal{S}}\]&lt;/p&gt;
&lt;h2 id="paper-summary"&gt;
 Paper summary
 &lt;a class="heading-link" href="#paper-summary"&gt;
 &lt;i class="fa-solid fa-link" aria-hidden="true" title="Link to heading"&gt;&lt;/i&gt;
 &lt;span class="sr-only"&gt;Link to heading&lt;/span&gt;
 &lt;/a&gt;
&lt;/h2&gt;
&lt;h3 id="intro"&gt;
 Intro
 &lt;a class="heading-link" href="#intro"&gt;
 &lt;i class="fa-solid fa-link" aria-hidden="true" title="Link to heading"&gt;&lt;/i&gt;
 &lt;span class="sr-only"&gt;Link to heading&lt;/span&gt;
 &lt;/a&gt;
&lt;/h3&gt;
&lt;p&gt;SsHash (&lt;a href="#citeproc_bib_item_7"&gt;Pibiri 2022&lt;/a&gt;) is a datastructure for indexing kmers.
Given a set of kmers \(\S\), it supports two operations:&lt;/p&gt;
&lt;dl&gt;
&lt;dt&gt;\(Lookup(g)\)&lt;/dt&gt;
&lt;dd&gt;return the unique id \(i\in [|\S|]\) of the kmer \(g\).&lt;/dd&gt;
&lt;dt&gt;\(Access(i)\)&lt;/dt&gt;
&lt;dd&gt;return the kmer corresponding to id \(i\).&lt;/dd&gt;
&lt;/dl&gt;
&lt;p&gt;It also supports &lt;em&gt;streaming&lt;/em&gt; queries, looking up all kmers from a longer string
consecutively, by expoiting the overlap between them.&lt;/p&gt;</description></item><item><title>Notes on writing course</title><link>https://curiouscoding.nl/posts/writing-course/</link><pubDate>Tue, 14 Nov 2023 00:00:00 +0100</pubDate><guid>https://curiouscoding.nl/posts/writing-course/</guid><description>&lt;div class="ox-hugo-toc toc"&gt;
&lt;div class="heading"&gt;Table of Contents&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#lecture-1-14-november" &gt;Lecture 1, 14 November&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#resources" &gt;Resources&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#reader-friendlyness" &gt;Reader friendlyness&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#typical-problems" &gt;Typical problems&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#lecture-2-21-november" &gt;Lecture 2, 21 November&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#paragraph-level-expectations" &gt;Paragraph level expectations&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#flow" &gt;Flow&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#assignment-for-next-week" &gt;Assignment for next week&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#lecture-3-28-november" &gt;Lecture 3, 28 November&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#bad-organization" &gt;Bad organization&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#figures" &gt;Figures&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#references-to-figures" &gt;References to figures&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#indicative-vs-informative--ex-dot-7" &gt;Indicative vs Informative (ex. 7)&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#lecture-4-december-5" &gt;Lecture 4, December 5&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#introduction" &gt;Introduction&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#conclusion" &gt;Conclusion&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#tense" &gt;Tense&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#lecture-5-december-12" &gt;Lecture 5, December 12&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#abstracts" &gt;Abstracts&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#titles" &gt;Titles&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#punctuation" &gt;Punctuation&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#comma" &gt;Comma&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#dashes" &gt;Dashes&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;!--endtoc--&gt;
&lt;p&gt;Some notes from the writing course I&amp;rsquo;m taking.&lt;/p&gt;</description></item><item><title>BBHash: some ideas</title><link>https://curiouscoding.nl/posts/bbhash/</link><pubDate>Mon, 04 Sep 2023 00:00:00 +0200</pubDate><guid>https://curiouscoding.nl/posts/bbhash/</guid><description>&lt;div class="ox-hugo-toc toc"&gt;
&lt;div class="heading"&gt;Table of Contents&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#possible-speedup" &gt;Possible speedup?&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;!--endtoc--&gt;
&lt;p&gt;BBHash Limasset et al. (&lt;a href="#citeproc_bib_item_1"&gt;2017&lt;/a&gt;) uses multiple &lt;em&gt;layers&lt;/em&gt; to create a minimal perfect
hashing functions (MPHF), that hashes some input set into \([n]\).&lt;/p&gt;
&lt;p&gt;(See also my &lt;a href="https://curiouscoding.nl/posts/ptrhash/" &gt;note on PTHash&lt;/a&gt; (&lt;a href="#citeproc_bib_item_2"&gt;Pibiri and Trani 2021&lt;/a&gt;).)&lt;/p&gt;
&lt;p&gt;Simply said, it maps the \(n\) elements into \([\gamma \cdot n]\) using hashing function \(h_0\).
The \(k_0\) elements that have collisions are mapped into \([\gamma \cdot k_0]\)
using \(h_1\).
Then, the \(k_1\) elements with collisions are mapped into \([\gamma \cdot k_1]\),
and so on.&lt;/p&gt;</description></item><item><title>BitPAl bitpacking algorithm</title><link>https://curiouscoding.nl/posts/bitpal/</link><pubDate>Sun, 03 Sep 2023 00:00:00 +0200</pubDate><guid>https://curiouscoding.nl/posts/bitpal/</guid><description>&lt;div class="ox-hugo-toc toc"&gt;
&lt;div class="heading"&gt;Table of Contents&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#problem" &gt;Problem&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#input" &gt;Input&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#example" &gt;Example&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#discussion" &gt;Discussion&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#found-the-bug" &gt;Found the bug&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#outlook" &gt;Outlook&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;!--endtoc--&gt;
&lt;p&gt;The supplement (&lt;a href="https://oup.silverchair-cdn.com/oup/backfile/Content_public/Journal/bioinformatics/30/22/10.1093_bioinformatics_btu507/3/bioinformatics_30_22_3166_s1.zip?Expires=1695376479&amp;amp;Signature=vroWHrpg-P0tvOPcafVy~gh6mhZ-AZ8kj6lHr1DH7byZGTK2sy8chti7hDiWdbtGx6onKv94EAI5odd~GMBMG0GNXxfp1bZ~7ItGeNCXp0tosJpArez7Yo~PuKT77nJpgQYo5rabbkJ6qtvP3-V-41oznQ~Zh9Tl~GNLvjLo~5vq0D1wa4PMmqhc-C0zcEeh8ybqEK7hQdyvoxreWppOTZFIHIJwmZOSOeXBWM0fQhcPnM9ZU8cEsqAI64WuWt1AJgmDOPDTBVzQHmHpsl01F4Jt8Hf2gvDYwhmoM7t4U~qCIGFr4raran~hzr-eD2vhwexQhpC7e1U2~N2lMC7e7w__&amp;amp;Key-Pair-Id=APKAIE5G5CRDK6RD3PGA" class="external-link" target="_blank" rel="noopener"&gt;download&lt;/a&gt;) of the Loving, Hernandez, and Benson (&lt;a href="#citeproc_bib_item_1"&gt;2014&lt;/a&gt;) paper introduces a \(15\)
operation version of Myers (&lt;a href="#citeproc_bib_item_2"&gt;1999&lt;/a&gt;) bitpacking algorithm, which uses \(16\)
operations when modified for edit distance.&lt;/p&gt;
&lt;p&gt;I tried implementing it, but it seems to have a bug that I will describe below.
The fix is &lt;a href="#found-the-bug" &gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id="problem"&gt;
 Problem
 &lt;a class="heading-link" href="#problem"&gt;
 &lt;i class="fa-solid fa-link" aria-hidden="true" title="Link to heading"&gt;&lt;/i&gt;
 &lt;span class="sr-only"&gt;Link to heading&lt;/span&gt;
 &lt;/a&gt;
&lt;/h2&gt;
&lt;p&gt;To recap, this algorithm solves the unit-cost edit distance problem by using
bitpacking to compute a \(1\times w\) at a time. As input, it takes&lt;/p&gt;</description></item><item><title>Thoughts on linear programming</title><link>https://curiouscoding.nl/posts/linear-programming/</link><pubDate>Fri, 04 Nov 2022 00:00:00 +0100</pubDate><guid>https://curiouscoding.nl/posts/linear-programming/</guid><description>&lt;div class="ox-hugo-toc toc"&gt;
&lt;div class="heading"&gt;Table of Contents&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#linear-programming" &gt;Linear programming&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#assumptions" &gt;Assumptions&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#idea-for-an-algorithm" &gt;Idea for an algorithm&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;!--endtoc--&gt;
&lt;p&gt;This note contains some ideas about linear programming and &lt;em&gt;most-orthogonal
faces&lt;/em&gt;. They&amp;rsquo;re mostly on an intuitive level and not very formal.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Postscriptum:&lt;/strong&gt; The ideas here don&amp;rsquo;t work.&lt;/p&gt;
&lt;h2 id="linear-programming"&gt;
 Linear programming
 &lt;a class="heading-link" href="#linear-programming"&gt;
 &lt;i class="fa-solid fa-link" aria-hidden="true" title="Link to heading"&gt;&lt;/i&gt;
 &lt;span class="sr-only"&gt;Link to heading&lt;/span&gt;
 &lt;/a&gt;
&lt;/h2&gt;
&lt;div style="display: none" class="div"&gt;
&lt;p&gt;\begin{equation*}
\newcommand{\v}[1]{\textbf{#1}}
\newcommand{\x}{\v x}
\newcommand{\t}{\v t}
\newcommand{\b}{\v b}
\end{equation*}&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Maximize \(\t\x\) subject to \(A\x \leq \b\).&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;\(\x\) is a vector of \(n\) variables \(x_i\).&lt;/li&gt;
&lt;li&gt;\(A\) is a \(m\times n\) matrix: there are \(m\) constraints \(A_j \x \leq b_j\).&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="assumptions"&gt;
 Assumptions
 &lt;a class="heading-link" href="#assumptions"&gt;
 &lt;i class="fa-solid fa-link" aria-hidden="true" title="Link to heading"&gt;&lt;/i&gt;
 &lt;span class="sr-only"&gt;Link to heading&lt;/span&gt;
 &lt;/a&gt;
&lt;/h2&gt;
&lt;p&gt;We make the following assumptions:&lt;/p&gt;</description></item><item><title>A Combinatorial Identity</title><link>https://curiouscoding.nl/posts/a-combinatorial-identity/</link><pubDate>Sun, 16 Oct 2022 00:00:00 +0200</pubDate><guid>https://curiouscoding.nl/posts/a-combinatorial-identity/</guid><description>&lt;p&gt;Some notes regarding the identity&lt;/p&gt;
&lt;p&gt;\begin{equation}
\sum_{k=0}^n \binom{2k}k \binom{2n-2k}{n-k} = 4^n
\end{equation}&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Gould has two derivations:
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://web.archive.org/web/20171225173015/http://math.wvu.edu/~gould/Vol.5.PDF" class="external-link" target="_blank" rel="noopener"&gt;The first&lt;/a&gt;, from Jensens equality, (18) in (&lt;a href="#citeproc_bib_item_2"&gt;Jensen 1902&lt;/a&gt;; &lt;a href="#citeproc_bib_item_3"&gt;Shijie 1303&lt;/a&gt;).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://web.archive.org/web/20171118022119/http://www.math.wvu.edu/~gould/Vol.4.PDF" class="external-link" target="_blank" rel="noopener"&gt;A second&lt;/a&gt; via the Chu-Vandermonde convolution:&lt;/p&gt;
&lt;p&gt;\begin{equation}
\sum_{k=0}^n \binom{x}k \binom{y}{n-k} = \binom{x+y}n
\end{equation}&lt;/p&gt;
&lt;p&gt;using \(x=y=-\frac 12\) and using the &lt;em&gt;$-\frac 12$-transform&lt;/em&gt;:&lt;/p&gt;
&lt;p&gt;\begin{equation}
\binom{-1/2}{n} = (-1)^n\binom{2n}{n}\frac 1 {2^{2n}}
\end{equation}&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Duarte and de Oliveira (&lt;a href="#citeproc_bib_item_1"&gt;2012&lt;/a&gt;) has a combinatorial proof.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="references"&gt;
 References
 &lt;a class="heading-link" href="#references"&gt;
 &lt;i class="fa-solid fa-link" aria-hidden="true" title="Link to heading"&gt;&lt;/i&gt;
 &lt;span class="sr-only"&gt;Link to heading&lt;/span&gt;
 &lt;/a&gt;
&lt;/h2&gt;
&lt;style&gt;.csl-entry{text-indent: -1.5em; margin-left: 1.5em;}&lt;/style&gt;&lt;div class="csl-bib-body"&gt;
 &lt;div class="csl-entry"&gt;&lt;a id="citeproc_bib_item_1"&gt;&lt;/a&gt;Duarte, Rui, and António Guedes de Oliveira. 2012. “New Developments of an Old Identity.” &lt;a href="https://doi.org/10.48550/ARXIV.1203.5424"&gt;https://doi.org/10.48550/ARXIV.1203.5424&lt;/a&gt;.&lt;/div&gt;
 &lt;div class="csl-entry"&gt;&lt;a id="citeproc_bib_item_2"&gt;&lt;/a&gt;Jensen, J. L. W. V. 1902. “Sur Une Identité D’abel et Sur D’autres Formules Analogues.” &lt;i&gt;Acta Mathematica&lt;/i&gt; 26 (0): 307–18. &lt;a href="https://doi.org/10.1007/bf02415499"&gt;https://doi.org/10.1007/bf02415499&lt;/a&gt;.&lt;/div&gt;
 &lt;div class="csl-entry"&gt;&lt;a id="citeproc_bib_item_3"&gt;&lt;/a&gt;Shijie, Zhu. 1303. &lt;i&gt;Jade Mirror of the Four Unknowns&lt;/i&gt;.&lt;/div&gt;
&lt;/div&gt;</description></item><item><title>Linear-time suffix array construction</title><link>https://curiouscoding.nl/posts/suffix-array-construction/</link><pubDate>Thu, 13 Oct 2022 00:00:00 +0200</pubDate><guid>https://curiouscoding.nl/posts/suffix-array-construction/</guid><description>&lt;div class="ox-hugo-toc toc"&gt;
&lt;div class="heading"&gt;Table of Contents&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#notation" &gt;Notation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#small-and-large-suffixes" &gt;Small and Large suffixes&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#building-the-suffix-array-from-a-smaller-one" &gt;Building the suffix array from a smaller one&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#visualization" &gt;Visualization&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;!--endtoc--&gt;
&lt;p&gt;These are some notes about linear time suffix array (SA) construction algorithms (SACA&amp;rsquo;s).&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;At &lt;a href="#visualization" &gt;the bottom&lt;/a&gt; you can find a visualization.&lt;/li&gt;
&lt;li&gt;&lt;a href="https://curiouscoding.nl/posts/alg-viz/" &gt;&lt;strong&gt;&lt;strong&gt;This page&lt;/strong&gt;&lt;/strong&gt;&lt;/a&gt; has an &lt;strong&gt;&lt;strong&gt;interactive demo&lt;/strong&gt;&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;History of suffix array construction algorithms:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;1990 first algorithm: Manber and Myers (&lt;a href="#citeproc_bib_item_2"&gt;1993&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;2002 small/large suffixes, explained below: Ko and Aluru (&lt;a href="#citeproc_bib_item_1"&gt;2005&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;2009 recursion only on &lt;em&gt;LMS&lt;/em&gt; suffixes: Nong, Zhang, and Chan (&lt;a href="#citeproc_bib_item_3"&gt;2009&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;a href="http://web.stanford.edu/class/archive/cs/cs166/cs166.1196/lectures/04/Small04.pdf" class="external-link" target="_blank" rel="noopener"&gt;These slides&lt;/a&gt; from Stanford are a nice reference for the last algorithm.&lt;/p&gt;</description></item><item><title>Reducing A* memory usage using fronts</title><link>https://curiouscoding.nl/posts/astar-memory-usage/</link><pubDate>Mon, 26 Sep 2022 00:00:00 +0200</pubDate><guid>https://curiouscoding.nl/posts/astar-memory-usage/</guid><description>&lt;div class="ox-hugo-toc toc"&gt;
&lt;div class="heading"&gt;Table of Contents&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#motivation" &gt;Motivation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#parititioning-a-memory-by-fronts" &gt;Parititioning A* memory by fronts&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#non-consistent-heuristics" &gt;Non-consistent heuristics&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#front-indexing" &gt;Front indexing&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#tracing-back-the-path" &gt;Tracing back the path&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;!--endtoc--&gt;
&lt;p&gt;Here is an idea to reduce the memory usage of A* by only storing one &lt;em&gt;front&lt;/em&gt; at
a time, similar to what Edlib and WFA do. Note that for now this &lt;strong&gt;will not
work&lt;/strong&gt;, but I&amp;rsquo;m putting this online anyway.&lt;/p&gt;
&lt;h2 id="motivation"&gt;
 Motivation
 &lt;a class="heading-link" href="#motivation"&gt;
 &lt;i class="fa-solid fa-link" aria-hidden="true" title="Link to heading"&gt;&lt;/i&gt;
 &lt;span class="sr-only"&gt;Link to heading&lt;/span&gt;
 &lt;/a&gt;
&lt;/h2&gt;
&lt;p&gt;In our &lt;a href="https://github.com/RagnarGrootKoerkamp/astar-pairwise-aligner" class="external-link" target="_blank" rel="noopener"&gt;implementation&lt;/a&gt; of A*PA, we use a hashmap to store the value of \(g\) of all
visited (explored/expanded) states by A*. This can take up a lot of memory and
simply reading/writing \(g\) in the hashmap can take over half the total execution
time.&lt;/p&gt;</description></item><item><title>Bidirectional A*</title><link>https://curiouscoding.nl/posts/bidirectional-astar/</link><pubDate>Thu, 28 Jul 2022 17:59:00 +0200</pubDate><guid>https://curiouscoding.nl/posts/bidirectional-astar/</guid><description>&lt;p&gt;These are some links and papers on bidirectional A* variants. Nothing
insightful at the moment.&lt;/p&gt;
&lt;dl&gt;
&lt;dt&gt;&lt;a href="https://www.coursera.org/lecture/algorithms-on-graphs/bidirectional-a-Qel6Q" class="external-link" target="_blank" rel="noopener"&gt;small lecture&lt;/a&gt;&lt;/dt&gt;
&lt;dd&gt;introduces \(h_f(u) = \frac 12 (\pi_f(u) - \pi_r)\). Not found
a paper yet.&lt;/dd&gt;
&lt;dt&gt;An Improved Bidirectional Heuristic Search Algorithm (Champeaux 1977)&lt;/dt&gt;
&lt;dd&gt;introduces a bidirectional variant&lt;/dd&gt;
&lt;dt&gt;Bidirectional Heuristic Search Again (Champeaux 1983)&lt;/dt&gt;
&lt;dd&gt;fixes a bug in the
above paper&lt;/dd&gt;
&lt;dt&gt;Efficient modified bidirectional A* algorithm for optimal route-finding&lt;/dt&gt;
&lt;dd&gt;Didn&amp;rsquo;t read closely yet.&lt;/dd&gt;
&lt;dt&gt;A new bidirectional algorithm for shortest paths (Pijls 2008)&lt;/dt&gt;
&lt;dd&gt;Actually a
new methods. Seems to cite useful papers.
&lt;p&gt;There 2 papers that cite this one may also be interesting.&lt;/p&gt;</description></item><item><title>The BiWFA meeting condition</title><link>https://curiouscoding.nl/posts/biwfa-meeting-condition/</link><pubDate>Mon, 11 Jul 2022 00:00:00 +0200</pubDate><guid>https://curiouscoding.nl/posts/biwfa-meeting-condition/</guid><description>&lt;p&gt;&lt;strong&gt;cross references:&lt;/strong&gt; &lt;a href="https://github.com/smarco/BiWFA-paper/issues/8" class="external-link" target="_blank" rel="noopener"&gt;BiWFA GitHub issue&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;It seems that getting the meeting/overlap condition of BiWFA
(Marco-Sola et al. (&lt;a href="#citeproc_bib_item_1"&gt;2023&lt;/a&gt;), Algorithm 1 and Lemma 2.1) correct is tricky.&lt;/p&gt;
&lt;p&gt;Let \(p := \max(x, o+e)\) be the maximal cost of any edge in the edit graph.
As in the BiWFA paper, let \(s_f\) and \(s_r\) be the distances of the &lt;em&gt;forward&lt;/em&gt; and
&lt;em&gt;reverse&lt;/em&gt; fronts computed so far.&lt;/p&gt;
&lt;p&gt;We prove the following lemma:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Lemma&lt;/strong&gt;
Once BiWFA has expanded the forward and reverse fronts up to \(s_f\) and \(s_r\) and
has found &lt;em&gt;some&lt;/em&gt; path of cost \(s \leq s_f + s_r\),
expanding the fronts until \(s&amp;rsquo;_f + s&amp;rsquo;_r \geq s+p+o\) is guaranteed to find a
&lt;em&gt;shortest&lt;/em&gt; path.&lt;/p&gt;</description></item><item><title>Benchmark attention points</title><link>https://curiouscoding.nl/posts/benchmarks/</link><pubDate>Thu, 28 Apr 2022 23:33:00 +0200</pubDate><guid>https://curiouscoding.nl/posts/benchmarks/</guid><description>&lt;p&gt;&lt;em&gt;Benchmarking is harder than you think, even when taking into account this rule.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;This post lists some lessons I learned while attempting to run benchmarks for
&lt;a href="https://github.com/RagnarGrootKoerkamp/astar-pairwise-aligner" class="external-link" target="_blank" rel="noopener"&gt;A* pairwise aligner&lt;/a&gt;. I was doing this on a laptop, which likely has different
characteristics from CPUs in a typical server rack. All the programs I run are
single threaded.&lt;/p&gt;
&lt;h2 id="hardware"&gt;
 Hardware
 &lt;a class="heading-link" href="#hardware"&gt;
 &lt;i class="fa-solid fa-link" aria-hidden="true" title="Link to heading"&gt;&lt;/i&gt;
 &lt;span class="sr-only"&gt;Link to heading&lt;/span&gt;
 &lt;/a&gt;
&lt;/h2&gt;
&lt;dl&gt;
&lt;dt&gt;Do not run while charging the laptop&lt;/dt&gt;
&lt;dd&gt;Charging makes the battery hot and causes throttling. Run either on
battery power or with a completely full battery to prevent this.&lt;/dd&gt;
&lt;dt&gt;Disable hyperthreading&lt;/dt&gt;
&lt;dd&gt;Completely disable hyperthreading in the BIOS.
Multiple programs running on the same core may fight for resources.&lt;/dd&gt;
&lt;/dl&gt;
&lt;h2 id="cpu-settings"&gt;
 CPU settings
 &lt;a class="heading-link" href="#cpu-settings"&gt;
 &lt;i class="fa-solid fa-link" aria-hidden="true" title="Link to heading"&gt;&lt;/i&gt;
 &lt;span class="sr-only"&gt;Link to heading&lt;/span&gt;
 &lt;/a&gt;
&lt;/h2&gt;
&lt;dl&gt;
&lt;dt&gt;Pin CPU frequency&lt;/dt&gt;
&lt;dd&gt;CPUs, especially laptops, have turboboost, (thermal) throttling, and powersave
features. Make sure to pin the CPU core frequency low enough that it can be
sustained for long times without throttling.
&lt;p&gt;In my case, the &lt;code&gt;performance&lt;/code&gt; governor can fix the CPU frequency. The base
frequency of my CPU is &lt;code&gt;2.6GHz&lt;/code&gt;, so that&amp;rsquo;s where I pinned it.&lt;/p&gt;</description></item><item><title>Proof sketch for linear time seed heuristic alignment</title><link>https://curiouscoding.nl/posts/linear-time-pa/</link><pubDate>Sun, 24 Apr 2022 00:00:00 +0200</pubDate><guid>https://curiouscoding.nl/posts/linear-time-pa/</guid><description>&lt;div class="ox-hugo-toc toc"&gt;
&lt;div class="heading"&gt;Table of Contents&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#pairwise-alignment-in-subquadratic-time" &gt;Pairwise alignment in subquadratic time&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#random-model" &gt;Random model&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#algorithm" &gt;Algorithm&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#seed-heuristic" &gt;Seed heuristic&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#match-pruning" &gt;Match pruning&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#analysis" &gt;Analysis&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#expanded-states" &gt;Expanded states&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#excess-errors" &gt;Excess errors&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#algorithmic-complexity" &gt;Algorithmic complexity&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;!--endtoc--&gt;
&lt;p&gt;This post is a proof sketch to show that A* with the &lt;em&gt;seed heuristic&lt;/em&gt;
(&lt;a href="#citeproc_bib_item_3"&gt;Groot Koerkamp and Ivanov 2024&lt;/a&gt;) does exact pairwise alignment of random strings with random
mutations in near linear time.&lt;/p&gt;
&lt;h2 id="pairwise-alignment-in-subquadratic-time"&gt;
 Pairwise alignment in subquadratic time
 &lt;a class="heading-link" href="#pairwise-alignment-in-subquadratic-time"&gt;
 &lt;i class="fa-solid fa-link" aria-hidden="true" title="Link to heading"&gt;&lt;/i&gt;
 &lt;span class="sr-only"&gt;Link to heading&lt;/span&gt;
 &lt;/a&gt;
&lt;/h2&gt;
&lt;p&gt;Backurs and Indyk (&lt;a href="#citeproc_bib_item_1"&gt;2018&lt;/a&gt;) show that computing edit distance can not be
done in strongly subquadratic time (i.e. \(O(n^{2-\delta})\) for any \(\delta &amp;gt;0\))
assuming the Strong Exponential Time Hypothesis.&lt;/p&gt;</description></item><item><title>Neighbour joining</title><link>https://curiouscoding.nl/posts/neighbour-joining/</link><pubDate>Fri, 12 Nov 2021 11:57:00 +0100</pubDate><guid>https://curiouscoding.nl/posts/neighbour-joining/</guid><description>&lt;p&gt;&lt;a href="https://en.wikipedia.org/wiki/Neighbor_joining" class="external-link" target="_blank" rel="noopener"&gt;Neighbour joining&lt;/a&gt; (NJ, &lt;a href="https://academic.oup.com/mbe/article/4/4/406/1029664" class="external-link" target="_blank" rel="noopener"&gt;paper&lt;/a&gt;) is a phylogeny reconstruction method.
It differs from UPGMA in the way it computes the distances between clusters.&lt;/p&gt;
&lt;p&gt;This algorithm first assumes that the phylogeny is a star graph.
Then it finds the pair of vertices that when merged and split out gives the
minimal total edge length \(S_{ij}\) of the new almost-star graph. (See eq. (4)
and figure 2a and 2b in the paper.)
\[
S_{i,j} = \frac1{2(n-2)} \sum_{k\not\in \{i,j\}}(d(i, k)+d(j,k)) + \frac 12
d(i,j)+\frac 1{n-2} \sum_{k&amp;lt;l,\, k, l\not\in\{i,j\}}d(k,l).
\]
After subtracting the sum of all pairwise distances (which is a constant) and multiplying by \(2(n-2)\), we obtain
the familiar
\[
Q(i, j) = (n-2) d(i, j) - \sum_{k=1}^n d(i, k) - \sum_{k=1}^n d(j, k).
\]
Thus, we merge the two vertices that minimize \(Q\).
The distance from the merging of vertices \(i\) and \(j\) to each other vertex
\(k\) is \(d_{(i-j)k} = (d_{i,k} + d_{j,k})/2\).&lt;/p&gt;</description></item><item><title>UPGMA</title><link>https://curiouscoding.nl/posts/upgma/</link><pubDate>Thu, 28 Oct 2021 11:56:00 +0200</pubDate><guid>https://curiouscoding.nl/posts/upgma/</guid><description>&lt;p&gt;&lt;a href="https://en.wikipedia.org/wiki/UPGMA" class="external-link" target="_blank" rel="noopener"&gt;Unweighted pair group method with arithmetic mean&lt;/a&gt; (UPGMA) is a phylogeny reconstruction method.&lt;/p&gt;
&lt;dl&gt;
&lt;dt&gt;Input&lt;/dt&gt;
&lt;dd&gt;Matrix of pairwise distances&lt;/dd&gt;
&lt;dt&gt;Output&lt;/dt&gt;
&lt;dd&gt;Phylogeny&lt;/dd&gt;
&lt;dt&gt;Algorithm&lt;/dt&gt;
&lt;dd&gt;Repeatedly merge the nearest two clusters. The distance between
clusters is the average of all pairwise distances between them. When merging
two clusters, the distances of the new cluster are the weighted averages of
distances from the two clusters being merged.&lt;/dd&gt;
&lt;dt&gt;Complexity&lt;/dt&gt;
&lt;dd&gt;\(O(n^3)\) naive, \(O(n^2 \ln n)\) using heap.&lt;/dd&gt;
&lt;/dl&gt;</description></item><item><title>Spaced k-mer and assembler methods</title><link>https://curiouscoding.nl/posts/spaced-kmer-review/</link><pubDate>Wed, 14 Jul 2021 00:00:00 +0200</pubDate><guid>https://curiouscoding.nl/posts/spaced-kmer-review/</guid><description>&lt;div class="ox-hugo-toc toc"&gt;
&lt;div class="heading"&gt;Table of Contents&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#spaced-k-mers" &gt;Spaced \(k\)-mers&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#minimap" &gt;Minimap&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#spades" &gt;SPAdes&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#mummer4" &gt;MUMmer4&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#blasr" &gt;BLASR&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#bowtie-2" &gt;Bowtie 2&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#patternhunter" &gt;Patternhunter&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#spaced-seeds-improve-k-mer-based-metagenomic-classification" &gt;Spaced seeds improve \(k\)-mer-based metagenomic classification&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#lomex" &gt;LoMeX&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#meeting-notes" &gt;Meeting notes&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;!--endtoc--&gt;
&lt;p&gt;Concepts:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Mapping&lt;/strong&gt; Map a sequence onto a reference genome/dataset&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Assembly&lt;/strong&gt; Build a genome from a set of reads
&lt;ul&gt;
&lt;li&gt;&lt;em&gt;de novo&lt;/em&gt; (implied): without using a reference genome&lt;/li&gt;
&lt;li&gt;Otherwise just called &lt;em&gt;mapping&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Typical complicating factors:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;read errors&lt;/li&gt;
&lt;li&gt;non-uniform coverage&lt;/li&gt;
&lt;li&gt;insert size variation&lt;/li&gt;
&lt;li&gt;chimeric reads (?)&lt;/li&gt;
&lt;li&gt;bireads&lt;/li&gt;
&lt;li&gt;non-uniform read coverage (as in metagenomics, i.e. multi cell
assembly)&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="spaced-k-mers"&gt;
 Spaced \(k\)-mers
 &lt;a class="heading-link" href="#spaced-k-mers"&gt;
 &lt;i class="fa-solid fa-link" aria-hidden="true" title="Link to heading"&gt;&lt;/i&gt;
 &lt;span class="sr-only"&gt;Link to heading&lt;/span&gt;
 &lt;/a&gt;
&lt;/h2&gt;
&lt;p&gt;Also called&lt;/p&gt;</description></item></channel></rss>