<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Thesis on CuriousCoding</title><link>https://curiouscoding.nl/categories/thesis/</link><description>Recent content in Thesis on CuriousCoding</description><generator>Hugo</generator><language>en</language><lastBuildDate>Thu, 10 Apr 2025 00:00:00 +0200</lastBuildDate><atom:link href="https://curiouscoding.nl/categories/thesis/index.xml" rel="self" type="application/rss+xml"/><item><title>Thesis: Optimal Throughput Bioinformatics</title><link>https://curiouscoding.nl/posts/thesis/</link><pubDate>Thu, 10 Apr 2025 00:00:00 +0200</pubDate><guid>https://curiouscoding.nl/posts/thesis/</guid><description>&lt;div class="ox-hugo-toc toc has-section-numbers"&gt;
&lt;div class="heading"&gt;Table of Contents&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#abstract" &gt;Abstract&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;1&lt;/span&gt; &lt;a href="#introduction" &gt;Introduction&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="section-num"&gt;1.1&lt;/span&gt; &lt;a href="#part-1-pairwise-alignment" &gt;Part 1: Pairwise Alignment&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;1.2&lt;/span&gt; &lt;a href="#part-2-low-density-minimizers" &gt;Part 2: Low Density Minimizers&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;1.3&lt;/span&gt; &lt;a href="#part-3-high-throughput-bioinformatics" &gt;Part 3: High Throughput Bioinformatics&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2&lt;/span&gt; &lt;a href="#discussion" &gt;Discussion&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2.1&lt;/span&gt; &lt;a href="#pairwise-alignment" &gt;Pairwise Alignment&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2.2&lt;/span&gt; &lt;a href="#low-density-minimizers" &gt;Low Density Minimizers&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2.3&lt;/span&gt; &lt;a href="#high-throughput-bioinformatics" &gt;High Throughput Bioinformatics&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2.4&lt;/span&gt; &lt;a href="#propositions" &gt;Propositions&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;!--endtoc--&gt;
&lt;p&gt;This post contains the abstract, introduction, and conclusion of my thesis (&lt;a href="#citeproc_bib_item_3"&gt;Groot Koerkamp 2025a&lt;/a&gt;).
The full PDF is &lt;a href="https://curiouscoding.nl/thesis.pdf" &gt;here&lt;/a&gt;.
Individual chapters are based on blog posts and/or papers and introduced in
detail in the introduction below. In brief:&lt;/p&gt;</description></item><item><title>A History of Pairwise Alignment</title><link>https://curiouscoding.nl/posts/pairwise-alignment/</link><pubDate>Wed, 09 Apr 2025 00:00:00 +0200</pubDate><guid>https://curiouscoding.nl/posts/pairwise-alignment/</guid><description>&lt;div class="ox-hugo-toc toc has-section-numbers"&gt;
&lt;div class="heading"&gt;Table of Contents&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="section-num"&gt;1&lt;/span&gt; &lt;a href="#a-brief-history" &gt;A Brief History&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="section-num"&gt;1.1&lt;/span&gt; &lt;a href="#a-pa" &gt;A*PA&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;1.2&lt;/span&gt; &lt;a href="#a-pa2" &gt;A*PA2&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;1.3&lt;/span&gt; &lt;a href="#overview" &gt;Overview&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2&lt;/span&gt; &lt;a href="#problem-statement" &gt;Problem Statement&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;3&lt;/span&gt; &lt;a href="#alignment-types" &gt;Alignment types&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;4&lt;/span&gt; &lt;a href="#cost-models" &gt;Cost Models&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="section-num"&gt;4.1&lt;/span&gt; &lt;a href="#minimizing-cost-versus-maximizing-score" &gt;Minimizing Cost versus Maximizing Score&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;5&lt;/span&gt; &lt;a href="#dp" &gt;The Classic DP Algorithms&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;6&lt;/span&gt; &lt;a href="#linear-memory-using-divide-and-conquer" &gt;Linear Memory using Divide and Conquer&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;7&lt;/span&gt; &lt;a href="#graphs" &gt;Dijkstra&amp;rsquo;s Algorithm and A*&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;8&lt;/span&gt; &lt;a href="#computational-volumes" &gt;Computational Volumes and Band Doubling&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;9&lt;/span&gt; &lt;a href="#diagonal-transition" &gt;Diagonal Transition&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;10&lt;/span&gt; &lt;a href="#parallelism" &gt;Parallelism&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;11&lt;/span&gt; &lt;a href="#lcs-and-contours" &gt;LCS and Contours&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;12&lt;/span&gt; &lt;a href="#some-tools" &gt;Some Tools&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;13&lt;/span&gt; &lt;a href="#subquadratic-methods-and-lower-bounds" &gt;Subquadratic Methods and Lower Bounds&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;14&lt;/span&gt; &lt;a href="#summary" &gt;Summary&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;!--endtoc--&gt;
&lt;p&gt;This is Chapter 2 of my &lt;a href="https://curiouscoding.nl/posts/thesis/" &gt;thesis&lt;/a&gt; (&lt;a href="#citeproc_bib_item_27"&gt;Groot Koerkamp 2025&lt;/a&gt;), to introduce the first part on Pairwise Alignment.
Please cite the thesis instead of this post.&lt;/p&gt;</description></item><item><title>Low Density Minimizers</title><link>https://curiouscoding.nl/posts/minimizers/</link><pubDate>Tue, 08 Apr 2025 00:00:00 +0200</pubDate><guid>https://curiouscoding.nl/posts/minimizers/</guid><description>&lt;div class="ox-hugo-toc toc has-section-numbers"&gt;
&lt;div class="heading"&gt;Table of Contents&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="section-num"&gt;1&lt;/span&gt; &lt;a href="#theory-of-sampling-schemes" &gt;Theory of Sampling Schemes&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="section-num"&gt;1.1&lt;/span&gt; &lt;a href="#introduction" &gt;Introduction&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;1.2&lt;/span&gt; &lt;a href="#overview" &gt;Overview&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;1.3&lt;/span&gt; &lt;a href="#theory-of-sampling-schemes" &gt;Theory of sampling schemes&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;1.4&lt;/span&gt; &lt;a href="#notation" &gt;Notation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;1.5&lt;/span&gt; &lt;a href="#types-of-sampling-schemes" &gt;Types of sampling schemes&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;1.6&lt;/span&gt; &lt;a href="#computing-the-density" &gt;Computing the density&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;1.7&lt;/span&gt; &lt;a href="#random-mini-density" &gt;The density of random minimizers&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;1.8&lt;/span&gt; &lt;a href="#universal-hitting-sets" &gt;Universal hitting sets&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;1.9&lt;/span&gt; &lt;a href="#asymptotic-results" &gt;Asymptotic results&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;1.10&lt;/span&gt; &lt;a href="#variants" &gt;Variants&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2&lt;/span&gt; &lt;a href="#lower-bounds" &gt;Lower Bounds on Sampling Scheme Density&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2.1&lt;/span&gt; &lt;a href="#schleimer-et-al-dot-s-bound" &gt;Schleimer et al.&amp;rsquo;s bound&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2.2&lt;/span&gt; &lt;a href="#mar%c3%a7ais-et-al-dot-s-bound" &gt;Marçais et al.&amp;rsquo;s bound&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2.3&lt;/span&gt; &lt;a href="#improving-and-extending-mar%c3%a7ais-et-al-dot-s-bound" &gt;Improving and extending Marçais et al.&amp;rsquo;s bound&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2.4&lt;/span&gt; &lt;a href="#near-tight-lb" &gt;A near-tight lower bound on the density of forward sampling schemes&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2.5&lt;/span&gt; &lt;a href="#lower-bound-eval" &gt;Discussion&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;3&lt;/span&gt; &lt;a href="#sampling-schemes" &gt;Practical Sampling Schemes&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="section-num"&gt;3.1&lt;/span&gt; &lt;a href="#lexmin" &gt;Variants of lexicographic minimizers&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#lex-eval" &gt;Evaluation&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;3.2&lt;/span&gt; &lt;a href="#uhs-inspired-schemes" &gt;UHS-inspired schemes&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;3.3&lt;/span&gt; &lt;a href="#syncmer-based-schemes" &gt;Syncmer-based schemes&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;3.4&lt;/span&gt; &lt;a href="#open-closed-minimizer" &gt;Open-closed minimizer&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#oc-eval" &gt;Evaluation&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;3.5&lt;/span&gt; &lt;a href="#modmini" &gt;Mod-minimizer&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#theoretical-density" &gt;Theoretical density&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#modmini-eval" &gt;Evaluation&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;3.6&lt;/span&gt; &lt;a href="#sampling-schemes-discussion" &gt;Discussion&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;4&lt;/span&gt; &lt;a href="#selection-schemes" &gt;Towards Optimal Selection Schemes&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="section-num"&gt;4.1&lt;/span&gt; &lt;a href="#bd-anchors" &gt;Bidirectional anchors&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;4.2&lt;/span&gt; &lt;a href="#sus-anchors" &gt;Sus-anchors&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#sus-anchor-eval" &gt;Evaluation&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;4.3&lt;/span&gt; &lt;a href="#selection-schemes-discussion" &gt;Discussion&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;!--endtoc--&gt;
&lt;p&gt;This is Part 2 of my &lt;a href="https://curiouscoding.nl/posts/thesis/" &gt;thesis&lt;/a&gt; (&lt;a href="#citeproc_bib_item_15"&gt;Groot Koerkamp 2025&lt;/a&gt;), containing chapters 6 to 9 on Low Density Minimizers.
Please cite the thesis instead of this post.&lt;/p&gt;</description></item><item><title>Beyond Global Alignment</title><link>https://curiouscoding.nl/posts/mapping/</link><pubDate>Mon, 07 Apr 2025 00:00:00 +0200</pubDate><guid>https://curiouscoding.nl/posts/mapping/</guid><description>&lt;div class="ox-hugo-toc toc has-section-numbers"&gt;
&lt;div class="heading"&gt;Table of Contents&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="section-num"&gt;1&lt;/span&gt; &lt;a href="#semi-global-variants" &gt;Variants of semi-global alignment&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2&lt;/span&gt; &lt;a href="#text-searching" &gt;Fast text searching&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2.1&lt;/span&gt; &lt;a href="#skip-cost-for-overlap-alignments" &gt;Skip-cost for overlap alignments&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2.2&lt;/span&gt; &lt;a href="#search-results" &gt;Results&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;3&lt;/span&gt; &lt;a href="#mapping" &gt;Mapping using A*Map&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="section-num"&gt;3.1&lt;/span&gt; &lt;a href="#seeding" &gt;Seeding&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;3.2&lt;/span&gt; &lt;a href="#chaining" &gt;Chaining&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;3.3&lt;/span&gt; &lt;a href="#aligning" &gt;Aligning&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;3.4&lt;/span&gt; &lt;a href="#a-map" &gt;A*Map&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;3.5&lt;/span&gt; &lt;a href="#results" &gt;Results&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;!--endtoc--&gt;
&lt;p&gt;This is Chapter 5 of my &lt;a href="https://curiouscoding.nl/posts/thesis/" &gt;thesis&lt;/a&gt; (&lt;a href="#citeproc_bib_item_11"&gt;Groot Koerkamp 2025&lt;/a&gt;).
Please cite the thesis instead of this post.&lt;/p&gt;
&lt;hr&gt;
&lt;div class="notice summary"&gt;
 &lt;div class="notice-title"&gt;
 &lt;i class="fa-solid " aria-hidden="true"&gt;&lt;/i&gt;Summary
 &lt;/div&gt;
 &lt;div class="notice-content"&gt;
&lt;p&gt;So far, we have considered only algorithms for &lt;em&gt;global&lt;/em&gt; alignment.
In this chapter, we consider &lt;em&gt;semi-global&lt;/em&gt; alignment and its variants instead,
where a pattern (query) is searched in a longer string (reference).
There are many flavours of semi-global alignment, depending on the
(relative) sizes of the inputs. We list these variants, and introduce
some common approaches to solve this problem.&lt;/p&gt;</description></item><item><title>High Throughput Bioinformatics</title><link>https://curiouscoding.nl/posts/throughput/</link><pubDate>Sun, 06 Apr 2025 00:00:00 +0200</pubDate><guid>https://curiouscoding.nl/posts/throughput/</guid><description>&lt;div class="ox-hugo-toc toc has-section-numbers"&gt;
&lt;div class="heading"&gt;Table of Contents&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="section-num"&gt;1&lt;/span&gt; &lt;a href="#introduction" &gt;Introduction&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="section-num"&gt;1.1&lt;/span&gt; &lt;a href="#overview" &gt;Overview&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2&lt;/span&gt; &lt;a href="#compute-bound" &gt;Optimizing Compute Bound Code: Random Minimizers&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2.1&lt;/span&gt; &lt;a href="#avoiding-branch-misses" &gt;Avoiding Branch Misses&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2.2&lt;/span&gt; &lt;a href="#simd-processing-in-parallel" &gt;SIMD: Processing In Parallel&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2.3&lt;/span&gt; &lt;a href="#instruction-level-parallelism" &gt;Instruction Level Parallelism&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2.4&lt;/span&gt; &lt;a href="#input-format" &gt;Input Format&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;3&lt;/span&gt; &lt;a href="#memory-bound" &gt;Optimizing Memory Bound Code: Minimal Perfect Hashing&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="section-num"&gt;3.1&lt;/span&gt; &lt;a href="#using-less-memory" &gt;Using Less Memory&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;3.2&lt;/span&gt; &lt;a href="#reducing-memory-accesses" &gt;Reducing Memory Accesses&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;3.3&lt;/span&gt; &lt;a href="#interleaving-memory-accesses" &gt;Interleaving Memory Accesses&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;3.4&lt;/span&gt; &lt;a href="#batching-streaming-and-prefetching" &gt;Batching, Streaming, and Prefetching&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;!--endtoc--&gt;
&lt;p&gt;This is Chapter 10 of my &lt;a href="https://curiouscoding.nl/posts/thesis/" &gt;thesis&lt;/a&gt; (&lt;a href="#citeproc_bib_item_12"&gt;Groot Koerkamp 2025a&lt;/a&gt;), to introduce the last part on High Throughput Bioinformatics.&lt;/p&gt;</description></item></channel></rss>