<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Paper-Review on CuriousCoding</title><link>https://curiouscoding.nl/categories/paper-review/</link><description>Recent content in Paper-Review on CuriousCoding</description><generator>Hugo</generator><language>en</language><lastBuildDate>Sun, 18 Jan 2026 00:00:00 +0100</lastBuildDate><atom:link href="https://curiouscoding.nl/categories/paper-review/index.xml" rel="self" type="application/rss+xml"/><item><title> Quotes from "The Evolution of Mathematical Software"</title><link>https://curiouscoding.nl/posts/evolution-of-mathematical-software/</link><pubDate>Sun, 18 Jan 2026 00:00:00 +0100</pubDate><guid>https://curiouscoding.nl/posts/evolution-of-mathematical-software/</guid><description>&lt;p&gt;These are some nice quotes from
&lt;a href="#citeproc_bib_item_1"&gt;“The Evolution of Mathematical Software”&lt;/a&gt;, Turing Lecture by the 2021
Turing Award winner Jack J. Dongarra, which talks about
algorithm and software development in the context of ever improving hardware.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;a large infrastructure of mathematical libraries [&amp;hellip;] that must be mainted,
ported, and enhanced for many years to come if the value of the application
codes that depend on it are to be preserved and extended.
The software that encapsulates all this time, energy, and thought, routinely
outlasts the hardware it was originally designed to run on.&lt;/p&gt;</description></item><item><title> Thoughts on "Static Retrieval Revisited"</title><link>https://curiouscoding.nl/posts/static-retrieval-revisited/</link><pubDate>Tue, 04 Nov 2025 00:00:00 +0100</pubDate><guid>https://curiouscoding.nl/posts/static-retrieval-revisited/</guid><description>&lt;div class="ox-hugo-toc toc has-section-numbers"&gt;
&lt;div class="heading"&gt;Table of Contents&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="section-num"&gt;1&lt;/span&gt; &lt;a href="#problem-definitions" &gt;Problem definitions&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2&lt;/span&gt; &lt;a href="#optimal-solutions" &gt;Optimal solutions&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;3&lt;/span&gt; &lt;a href="#why-an-overhead-is-necessary" &gt;Why an overhead is necessary&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;4&lt;/span&gt; &lt;a href="#augmented-retrieval" &gt;Augmented Retrieval&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;!--endtoc--&gt;
&lt;p&gt;These are some summarizing notes and thoughts on
&lt;a href="#citeproc_bib_item_2"&gt;“Static Retrieval Revisited: To Optimality and beyond”&lt;/a&gt; by &lt;a href="#citeproc_bib_item_2"&gt;Hu, Kuszmaul, Liang, Yu, Zhang, and Zhou&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id="problem-definitions"&gt;
 &lt;span class="section-num"&gt;1&lt;/span&gt; Problem definitions
 &lt;a class="heading-link" href="#problem-definitions"&gt;
 &lt;i class="fa-solid fa-link" aria-hidden="true" title="Link to heading"&gt;&lt;/i&gt;
 &lt;span class="sr-only"&gt;Link to heading&lt;/span&gt;
 &lt;/a&gt;
&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;&lt;strong&gt;Static Retrieval.&lt;/strong&gt;&lt;/strong&gt; Given \(n\) keys \(X\subseteq U\) and \(n\) $v$-bit values
\(f(X) \in [2^v]\), encode \(f: X\to [2^v]\).
The goal is use a minimal number of bits on top of the trivial
\(nv\) lower bound, while allowing efficient queries.&lt;/p&gt;</description></item><item><title>Thoughts on Singletrack</title><link>https://curiouscoding.nl/posts/singletrack/</link><pubDate>Tue, 04 Nov 2025 00:00:00 +0100</pubDate><guid>https://curiouscoding.nl/posts/singletrack/</guid><description>&lt;p&gt;This is a quick post summarizing the idea of
&lt;a href="#citeproc_bib_item_2"&gt;“Singletrack: An Algorithm for Improving Memory Consumption and Performance of Gap-Affine Sequence Alignment”&lt;/a&gt; by &lt;a href="#citeproc_bib_item_2"&gt;López-Villellas, Iñiguez, Jiménez-Blanco, Aguado-Puig, Moretó, Alastruey-Benedé, Ibáñez, and Marco-Sola&lt;/a&gt; to reduce memory
usage of affine-cost alignment by removing the need to store the affine layers
of the DP matrix.&lt;/p&gt;
&lt;p&gt;Affine-cost alignment uses a gap-open
cost \(o&amp;gt;0\), so that a gap of length \(\ell\) has cost \(o + \ell \cdot e\). The
classic DP solution for this is Gotoh&amp;rsquo;s method (&lt;a href="#citeproc_bib_item_1"&gt;Gotoh 1982&lt;/a&gt;) that uses two
additional DP matrices \(I\) and \(D\) (alongside the main \(M\) matrix):
one to store the best cost to get to state
\((i,j)\) while ending in an insertion, and one that ends with a deletion.&lt;/p&gt;</description></item><item><title>Understanding GreedyMini</title><link>https://curiouscoding.nl/posts/greedymini-analysis/</link><pubDate>Sun, 27 Apr 2025 00:00:00 +0200</pubDate><guid>https://curiouscoding.nl/posts/greedymini-analysis/</guid><description>&lt;div class="ox-hugo-toc toc"&gt;
&lt;div class="heading"&gt;Table of Contents&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#greedymini-results" &gt;GreedyMini Results&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#a-first-look" &gt;A first look&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#comparison-with-optimal-ilp-values" &gt;Comparison with optimal ILP values&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#large-alphabets" &gt;Large alphabets&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#analysing-greedymini-at-w-3" &gt;Analysing GreedyMini at \(w=3\)&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#w-3-k-3" &gt;\(w=3\), \(k=3\)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#w-7-k-3" &gt;\(w=7\), \(k=3\)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#w-3-k-4" &gt;\(w=3\), \(k=4\)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#w-3-k-5" &gt;\(w=3\), \(k=5\)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#w-3-k-6" &gt;\(w=3\), \(k=6\)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#w-3-k-7" &gt;\(w=3\), \(k=7\)&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#looking-at-fixed-k-5" &gt;Looking at fixed \(k=5\)&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#k-5-w-4" &gt;\(k=5\), \(w=4\)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#k-5-w-5" &gt;\(k=5\), \(w=5\)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#k-5-w-6" &gt;\(k=5\), \(w=6\)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#k-5-w-7" &gt;\(k=5\), \(w=7\)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#k-5-w-8" &gt;\(k=5\), \(w=8\)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#k-5-w-12" &gt;\(k=5\), \(w=12\)&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#investigating-w-5" &gt;Investigating \(w=5\)&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#w-5-k-8" &gt;\(w=5\), \(k=8\)&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#what-about-k-w-plus-1" &gt;What about \(k = w+1\)?&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;!--endtoc--&gt;
&lt;p&gt;In this post, we will look at the minimizer schemes generated by the greedy
minimizer (Golan et al. 2025).&lt;/p&gt;</description></item><item><title>Thoughts on Consensus MPHF and tiny pointers</title><link>https://curiouscoding.nl/posts/consensus/</link><pubDate>Wed, 12 Feb 2025 00:00:00 +0100</pubDate><guid>https://curiouscoding.nl/posts/consensus/</guid><description>&lt;div class="ox-hugo-toc toc has-section-numbers"&gt;
&lt;div class="heading"&gt;Table of Contents&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="section-num"&gt;1&lt;/span&gt; &lt;a href="#consensus" &gt;Consensus&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="section-num"&gt;1.1&lt;/span&gt; &lt;a href="#consensus-recsplit" &gt;Consensus-RecSplit&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2&lt;/span&gt; &lt;a href="#idea-consensus-ptrhash" &gt;IDEA: Consensus-PtrHash&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;3&lt;/span&gt; &lt;a href="#hashing" &gt;Tiny pointers and optimal open addressing hash tables&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;!--endtoc--&gt;
&lt;p&gt;These are some thoughts on the Consensus-based MPHF presented in
Lehmann et al. (&lt;a href="#citeproc_bib_item_4"&gt;2025&lt;/a&gt;), and how this could be applied to PtrHash:&lt;/p&gt;
&lt;p&gt;Lehmann, Hans-Peter, Peter Sanders, Stefan Walzer, and Jonatan Ziegler. 2025. “Combined Search and Encoding for Seeds, with an Application to Minimal Perfect Hashing.” &lt;i&gt;Arxiv&lt;/i&gt;. &lt;a href="https://doi.org/10.48550/ARXIV.2502.05613"&gt;&lt;a href="https://doi.org/10.48550/ARXIV.2502.05613" class="external-link" target="_blank" rel="noopener"&gt;https://doi.org/10.48550/ARXIV.2502.05613&lt;/a&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Below are also some thoughts on the papers on tiny pointers, used to achieve
hash tables with load factors very close to 1: Bender et al. (&lt;a href="#citeproc_bib_item_1"&gt;2021&lt;/a&gt;), Farach-Colton, Krapivin, and Kuszmaul (&lt;a href="#citeproc_bib_item_2"&gt;2024&lt;/a&gt;).&lt;/p&gt;</description></item><item><title>Comments on Brisk</title><link>https://curiouscoding.nl/posts/brisk/</link><pubDate>Fri, 29 Nov 2024 00:00:00 +0100</pubDate><guid>https://curiouscoding.nl/posts/brisk/</guid><description>&lt;div class="ox-hugo-toc toc"&gt;
&lt;div class="heading"&gt;Table of Contents&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#overview" &gt;Overview&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#detailed-comments" &gt;Detailed comments&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#general" &gt;General&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#abstract" &gt;Abstract&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#1-dot-introduction" &gt;1. Introduction&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#2-dot-methods" &gt;2. Methods&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#2-dot-1-outline" &gt;2.1 Outline&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#2-dot-2-indexing-super-k-mers" &gt;2.2 Indexing super-k-mers&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#2-dot-3-lazy-encoding" &gt;2.3 Lazy encoding&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#2-dot-4-probing" &gt;2.4 Probing&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#2-dot-5-superbuckets" &gt;2.5 Superbuckets&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#2-dot-6-implementation-details" &gt;2.6 Implementation details&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#3-dot-results" &gt;3. Results&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#3-dot-1-parameters" &gt;3.1 Parameters&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#3-dot-2-multicore" &gt;3.2 Multicore&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#3-dot-4-comparison" &gt;3.4 Comparison&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#3-dot-5-query-times" &gt;3.5 Query times&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#4-dot-conclusion" &gt;4. Conclusion&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;!--endtoc--&gt;
&lt;p&gt;These are some (biased) comments on Brisk,
a dynamic k-mer dictionary (&lt;a href="#citeproc_bib_item_5"&gt;Smith et al. 2024&lt;/a&gt;).&lt;/p&gt;</description></item><item><title>Comments on GreedyMini</title><link>https://curiouscoding.nl/posts/greedymini/</link><pubDate>Mon, 04 Nov 2024 00:00:00 +0100</pubDate><guid>https://curiouscoding.nl/posts/greedymini/</guid><description>&lt;div class="ox-hugo-toc toc"&gt;
&lt;div class="heading"&gt;Table of Contents&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#overview" &gt;Overview&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#detailed-comments" &gt;Detailed comments&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#terminology" &gt;Terminology&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#abstract" &gt;Abstract&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#preliminaries" &gt;Preliminaries&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#methods" &gt;Methods&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#3-dot-5-transformations" &gt;3.5 Transformations&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#results" &gt;Results&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#discussion" &gt;Discussion&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#comments-on-expected-density-of-random-minimizers" &gt;Comments on &amp;ldquo;Expected density of random minimizers&amp;rdquo;&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;!--endtoc--&gt;
&lt;p&gt;These are some (biased) comments on &lt;a href="#citeproc_bib_item_2"&gt;“Greedymini: Generating Low-Density Dna Minimizers”&lt;/a&gt;
(&lt;a href="#citeproc_bib_item_2"&gt;Golan et al. 2024&lt;/a&gt;), which introduces the &lt;code&gt;GreedyMini&lt;/code&gt; minimizer scheme.
(Meanwhile, this has been published as Golan et al. (&lt;a href="#citeproc_bib_item_3"&gt;2025&lt;/a&gt;).)&lt;/p&gt;
&lt;p&gt;At the bottom, there are also some comment on Golan and Shur (&lt;a href="#citeproc_bib_item_4"&gt;2025&lt;/a&gt;).&lt;/p&gt;</description></item><item><title>Comments on 'When Less is More' minimizer review</title><link>https://curiouscoding.nl/posts/minimizer-review-comments/</link><pubDate>Tue, 15 Oct 2024 00:00:00 +0200</pubDate><guid>https://curiouscoding.nl/posts/minimizer-review-comments/</guid><description>&lt;div class="ox-hugo-toc toc"&gt;
&lt;div class="heading"&gt;Table of Contents&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#the-importance-of-ordering" &gt;The importance of ordering&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#asymptotically-optimal-minimizers" &gt;Asymptotically optimal minimizers&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;!--endtoc--&gt;
&lt;p&gt;These are some (biased) comments on &lt;a href="#citeproc_bib_item_5"&gt;“When Less Is More: Sketching with Minimizers in Genomics”&lt;/a&gt; (&lt;a href="#citeproc_bib_item_5"&gt;Ndiaye et al. 2024&lt;/a&gt;).&lt;/p&gt;
&lt;h2 id="the-importance-of-ordering"&gt;
 The importance of ordering
 &lt;a class="heading-link" href="#the-importance-of-ordering"&gt;
 &lt;i class="fa-solid fa-link" aria-hidden="true" title="Link to heading"&gt;&lt;/i&gt;
 &lt;span class="sr-only"&gt;Link to heading&lt;/span&gt;
 &lt;/a&gt;
&lt;/h2&gt;
&lt;blockquote&gt;
&lt;p&gt;the interest lies in constructing a minimizer with a density within a constant
factor, i.e., \(O(1/w)\) for any \(k\). With lexicographic ordering, minimizers can
achieve such density, but with large \(k\) values (\(\geq \log_{|Σ|}(w)-c\) for a
constant \(c\)), which might not be desirable (&lt;a href="#citeproc_bib_item_9"&gt;Zheng, Kingsford, and Marçais 2020&lt;/a&gt;). However, random
ordering can result in a lower density than that of the lexicographic ordering.
Thus, random ordering (implemented with pseudo-random hash functions) is
usually used in practice.&lt;/p&gt;</description></item><item><title>Thoughts on POASTA</title><link>https://curiouscoding.nl/posts/poasta/</link><pubDate>Tue, 28 May 2024 00:00:00 +0200</pubDate><guid>https://curiouscoding.nl/posts/poasta/</guid><description>&lt;div class="ox-hugo-toc toc"&gt;
&lt;div class="heading"&gt;Table of Contents&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#summary" &gt;Summary&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#background" &gt;Background&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#review-comments" &gt;Review comments&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#dfs" &gt;DFS&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#supplementary-methods" &gt;Supplementary methods&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#details-of-pruning" &gt;Details of pruning&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#evals" &gt;Evals&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#discussion" &gt;Discussion&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#code-and-repo" &gt;Code &amp;amp; repo&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;!--endtoc--&gt;
&lt;p&gt;Here are some thoughts on POASTA (&lt;a href="#citeproc_bib_item_2"&gt;van Dijk et al. 2024&lt;/a&gt;), a recent affine-cost
sequence-to-DAG (POA) aligner inspired by WFA and using A*.&lt;/p&gt;
&lt;h2 id="summary"&gt;
 Summary
 &lt;a class="heading-link" href="#summary"&gt;
 &lt;i class="fa-solid fa-link" aria-hidden="true" title="Link to heading"&gt;&lt;/i&gt;
 &lt;span class="sr-only"&gt;Link to heading&lt;/span&gt;
 &lt;/a&gt;
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Take a query and a directed acyclic graph (DAG).&lt;/li&gt;
&lt;li&gt;Align the query to the &lt;strong&gt;full&lt;/strong&gt; DAG. It&amp;rsquo;s like global alignment for graphs.
&lt;ul&gt;
&lt;li&gt;In fact I think the graph doesn&amp;rsquo;t actually have to be acyclic, as long as it has
a start and end. (When there is a cycle, the maximum remaining path length
is simply \(\infty\).)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Do greedy extension of matches, similar to WFA and A*PA.
&lt;ul&gt;
&lt;li&gt;Note that this is not as strong as full diagonal transition as done by WFA
and &lt;a href="https://github.com/lh3/gwfa" class="external-link" target="_blank" rel="noopener"&gt;gWFA&lt;/a&gt; (graph WFA for unit costs only), which only consider farthest reaching states.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;In fact, this is &lt;strong&gt;the first&lt;/strong&gt; implementation of affine-cost WFA!&lt;/li&gt;
&lt;li&gt;It also uses A* with the classic gap-cost heuristic extended to graphs.
&lt;ul&gt;
&lt;li&gt;For each point in the graph the minimal and maximal remaining distance is
computed, and if the remaining query length is outside this range, the
difference to get into the range is a lowerbound on number of indels.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Greedy extension is applied (although this is inherent when using WFA).&lt;/li&gt;
&lt;li&gt;Suboptimal states in superbubbles are pruned using additional logic.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="background"&gt;
 Background
 &lt;a class="heading-link" href="#background"&gt;
 &lt;i class="fa-solid fa-link" aria-hidden="true" title="Link to heading"&gt;&lt;/i&gt;
 &lt;span class="sr-only"&gt;Link to heading&lt;/span&gt;
 &lt;/a&gt;
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Daniel: why is nobody doing exact banded alignment, i.e., simple band
doubling, for exact DP-based alignment. We are still not convinced that A*/WFA
is faster than DP, especially when divergence is not super low (\(&amp;lt;1\%\)).&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="review-comments"&gt;
 Review comments
 &lt;a class="heading-link" href="#review-comments"&gt;
 &lt;i class="fa-solid fa-link" aria-hidden="true" title="Link to heading"&gt;&lt;/i&gt;
 &lt;span class="sr-only"&gt;Link to heading&lt;/span&gt;
 &lt;/a&gt;
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Fig 1 confuses me: (partly Daniel)&lt;/p&gt;</description></item><item><title>Review of refined minimizes</title><link>https://curiouscoding.nl/posts/refined-minimizer/</link><pubDate>Fri, 26 Jan 2024 00:00:00 +0100</pubDate><guid>https://curiouscoding.nl/posts/refined-minimizer/</guid><description>&lt;div class="ox-hugo-toc toc"&gt;
&lt;div class="heading"&gt;Table of Contents&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#summary" &gt;Summary&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#main-issues" &gt;Main issues&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#1-dot-introduction" &gt;1. Introduction&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#2-dot-methods" &gt;2. Methods&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#2-dot-3-heuristic" &gt;2.3 heuristic&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#3-dot-results" &gt;3. Results&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#discussion" &gt;Discussion&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#code" &gt;Code&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;!--endtoc--&gt;
&lt;p&gt;These are my review-like notes on refined minimizers, introduced in Pan and Reinert (&lt;a href="#citeproc_bib_item_4"&gt;2024&lt;/a&gt;).&lt;/p&gt;
&lt;h2 id="summary"&gt;
 Summary
 &lt;a class="heading-link" href="#summary"&gt;
 &lt;i class="fa-solid fa-link" aria-hidden="true" title="Link to heading"&gt;&lt;/i&gt;
 &lt;span class="sr-only"&gt;Link to heading&lt;/span&gt;
 &lt;/a&gt;
&lt;/h2&gt;
&lt;p&gt;The paper introduces &lt;em&gt;refined minimizers&lt;/em&gt;, a new scheme for sampling canonical
minimizers that is less biased than the usual scheme.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Instead of taking the minimum of the minimizer of the forward and reverse
strand, the minimizer of the strand with the higher &lt;code&gt;GT&lt;/code&gt; density is chosen.&lt;/li&gt;
&lt;li&gt;The less bias towards small minimizers causes a more equal distribution of
frequency of selected kmers.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="main-issues"&gt;
 Main issues
 &lt;a class="heading-link" href="#main-issues"&gt;
 &lt;i class="fa-solid fa-link" aria-hidden="true" title="Link to heading"&gt;&lt;/i&gt;
 &lt;span class="sr-only"&gt;Link to heading&lt;/span&gt;
 &lt;/a&gt;
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;The methods contain a number of mistakes in the math and proofs.&lt;/li&gt;
&lt;li&gt;The limit to \(|s|\) needs to be made much more precise. In fact it is a
\(k\to\infty\) limit (rather than a \(w\to\infty\) limit), which seems not as useful in practice.&lt;/li&gt;
&lt;li&gt;A comparison to NtHash2 should be made, for both kmer frequency distribution
and speed.&lt;/li&gt;
&lt;li&gt;The provided code (&lt;a href="https://github.com/xp3i4/mini_benchmark" class="external-link" target="_blank" rel="noopener"&gt;github:xp3i4/mini_benchmark&lt;/a&gt;) &lt;a href="https://github.com/xp3i4/mini_benchmark/issues/1" class="external-link" target="_blank" rel="noopener"&gt;segfaults&lt;/a&gt; and is undocumented.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="1-dot-introduction"&gt;
 1. Introduction
 &lt;a class="heading-link" href="#1-dot-introduction"&gt;
 &lt;i class="fa-solid fa-link" aria-hidden="true" title="Link to heading"&gt;&lt;/i&gt;
 &lt;span class="sr-only"&gt;Link to heading&lt;/span&gt;
 &lt;/a&gt;
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;em&gt;the minimizer concept is a data structure&lt;/em&gt;: to me, minimizers by themselves are not a data structure.&lt;/li&gt;
&lt;li&gt;\(w&amp;gt;k\): &lt;strong&gt;not needed&lt;/strong&gt;. \(w\geq 1\) is sufficient.&lt;/li&gt;
&lt;li&gt;In many places, &lt;code&gt;\citep&lt;/code&gt; citations like (&lt;a href="#citeproc_bib_item_4"&gt;Pan and Reinert 2024&lt;/a&gt;) would have
been more appropriate then &lt;code&gt;\citet&lt;/code&gt; ones like Pan and Reinert (&lt;a href="#citeproc_bib_item_4"&gt;2024&lt;/a&gt;).&lt;/li&gt;
&lt;li&gt;&lt;em&gt;of a predefined ordering scheme&lt;/em&gt;: the minimum of/over some set &lt;strong&gt;with respect
to&lt;/strong&gt; some ordering scheme.&lt;/li&gt;
&lt;li&gt;nitpicky imprecision: \(X\) is the set of &lt;strong&gt;positions of kmers&lt;/strong&gt;, not simply the set
of &lt;strong&gt;kmer strings themselves&lt;/strong&gt;. (Or I suppose \(X\) could be a list of kmers.)
(Otherwise we have \(|X| \leq 4^k\) and \(|S|\to\infty\) so that
\(\rho\to 0\).)&lt;/li&gt;
&lt;li&gt;&lt;em&gt;a k-mer \(X = x\)&lt;/em&gt; =&amp;gt; Why not just \(x\)? The notation is confusing.&lt;/li&gt;
&lt;li&gt;\(n(x)/|S|\) is not really an &lt;em&gt;average&lt;/em&gt; (there is only one string \(S\)); rather it&amp;rsquo;s a density.&lt;/li&gt;
&lt;li&gt;The definition of \(V\) is not clear to me. What is random? What is counted?&lt;/li&gt;
&lt;li&gt;&lt;em&gt;3. Its density converges&lt;/em&gt; =&amp;gt; For \(w\to \infty\) or \(k\to\infty\) or both?&lt;/li&gt;
&lt;li&gt;&lt;code&gt;CMP&lt;/code&gt; (branch conditions) can be one of the slowest instructions on modern
hardware. Branch misses in an inner loop for minimizer computation can
severely affect performance.&lt;/li&gt;
&lt;li&gt;Simple operations and L1 accesses can be pipelined and latency can be hidden,
making them take 2-4x time less in practice. This makes branch-misses up to 4
times as bad, relatively.&lt;/li&gt;
&lt;li&gt;Are lexicographic minimizers used much in practice?&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="2-dot-methods"&gt;
 2. Methods
 &lt;a class="heading-link" href="#2-dot-methods"&gt;
 &lt;i class="fa-solid fa-link" aria-hidden="true" title="Link to heading"&gt;&lt;/i&gt;
 &lt;span class="sr-only"&gt;Link to heading&lt;/span&gt;
 &lt;/a&gt;
&lt;/h2&gt;
&lt;p&gt;There are a number of mistaken in the math here here and some unclarities that could use fixing.&lt;/p&gt;</description></item><item><title>Loukides, Pissis, Thankachan, Zuba :: Suffix-Prefix Queries on a Dictionary</title><link>https://curiouscoding.nl/posts/apsp/</link><pubDate>Fri, 07 Jul 2023 00:00:00 +0200</pubDate><guid>https://curiouscoding.nl/posts/apsp/</guid><description>&lt;div class="ox-hugo-toc toc"&gt;
&lt;div class="heading"&gt;Table of Contents&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#comments" &gt;Comments&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#prelims" &gt;Prelims&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#one-to-one" &gt;One-to-One&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#one-to-all" &gt;One-to-All&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#report-and-count" &gt;Report and Count&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#top-k" &gt;Top-\(K\)&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#a-small-rant-on-and-tau-micro-macro-trees" &gt;A small rant on $τ$-micro-macro trees&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#ideas-for-simplification" &gt;Ideas for simplification&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#replace-and-tau-micro-macro-tree" &gt;Replace $τ$-micro-macro tree&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#heavy-light-decomposition--hld--for-count-queries-in-o--log-n--time" &gt;Heavy-Light-Decomposition (HLD) for \(Count\) queries in \(O(\log n)\) time&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#finding-the-largest-l-with-count--i-l--geq-k-in-o--log-n--time" &gt;Finding the largest \(l\) with \(Count(i, l) \geq K\) in \(O(\log n)\) time&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#reporting-matching-strings" &gt;Reporting matching strings&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#comparison" &gt;Comparison&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#closing-thoughts" &gt;Closing thoughts&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;!--endtoc--&gt;
&lt;p&gt;\[\newcommand{\dol}{\$}\]&lt;/p&gt;
&lt;p&gt;These are some comments and new ideas on the paper by Loukides, Pissis, Thankachan, and Zuba (&lt;a href="#citeproc_bib_item_7"&gt;2023&lt;/a&gt;).&lt;/p&gt;</description></item></channel></rss>