<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Highlight on CuriousCoding</title><link>https://curiouscoding.nl/tags/highlight/</link><description>Recent content in Highlight on CuriousCoding</description><generator>Hugo</generator><language>en</language><lastBuildDate>Thu, 26 Feb 2026 00:00:00 +0100</lastBuildDate><atom:link href="https://curiouscoding.nl/tags/highlight/index.xml" rel="self" type="application/rss+xml"/><item><title>Wheeler graphs</title><link>https://curiouscoding.nl/posts/wheeler-graphs/</link><pubDate>Thu, 26 Feb 2026 00:00:00 +0100</pubDate><guid>https://curiouscoding.nl/posts/wheeler-graphs/</guid><description>&lt;div class="ox-hugo-toc toc has-section-numbers"&gt;
&lt;div class="heading"&gt;Table of Contents&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="section-num"&gt;1&lt;/span&gt; &lt;a href="#deterministic-finite-automaton--dfa" &gt;Deterministic Finite Automaton (DFA)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2&lt;/span&gt; &lt;a href="#wheeler-dfa" &gt;Wheeler-DFA&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;3&lt;/span&gt; &lt;a href="#linear-graphs-prefix-array" &gt;Linear graphs: Prefix array&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;4&lt;/span&gt; &lt;a href="#de-bruijn-graphs-are-wheeler" &gt;De Bruijn graphs are Wheeler&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;5&lt;/span&gt; &lt;a href="#not-all-dfas-are-wheeler" &gt;Not all DFAs are Wheeler&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;6&lt;/span&gt; &lt;a href="#locating-patterns-via-binary-search" &gt;Locating patterns via binary search&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;7&lt;/span&gt; &lt;a href="#locating-patterns-via-the-boss-table" &gt;Locating patterns via the BOSS table&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;!--endtoc--&gt;
&lt;p&gt;These are some notes on Wheeler DFAs after chatting with Nicola
Prezza&lt;sup id="fnref:1"&gt;&lt;a href="#fn:1" class="footnote-ref" role="doc-noteref"&gt;1&lt;/a&gt;&lt;/sup&gt; and others
from the RAVEN lab at DSB 2026 in Venice.&lt;/p&gt;
&lt;h3 id="deterministic-finite-automaton--dfa"&gt;
 &lt;span class="section-num"&gt;1&lt;/span&gt; Deterministic Finite Automaton (DFA)
 &lt;a class="heading-link" href="#deterministic-finite-automaton--dfa"&gt;
 &lt;i class="fa-solid fa-link" aria-hidden="true" title="Link to heading"&gt;&lt;/i&gt;
 &lt;span class="sr-only"&gt;Link to heading&lt;/span&gt;
 &lt;/a&gt;
&lt;/h3&gt;
&lt;p&gt;A DFA is a graph where edges are labelled by characters.
Each node can have at most one outgoing edge with each label. (Otherwise it
would be &lt;em&gt;non-deterministic&lt;/em&gt;.)&lt;/p&gt;</description></item><item><title>DEFLATE, gzip, zlib, libz, et al.</title><link>https://curiouscoding.nl/posts/gzip/</link><pubDate>Mon, 09 Feb 2026 00:00:00 +0100</pubDate><guid>https://curiouscoding.nl/posts/gzip/</guid><description>&lt;h3 id="file-formats"&gt;
 &lt;span class="section-num"&gt;1&lt;/span&gt; File formats
 &lt;a class="heading-link" href="#file-formats"&gt;
 &lt;i class="fa-solid fa-link" aria-hidden="true" title="Link to heading"&gt;&lt;/i&gt;
 &lt;span class="sr-only"&gt;Link to heading&lt;/span&gt;
 &lt;/a&gt;
&lt;/h3&gt;
&lt;p&gt;This stuff is all insanely confusing. My summary:&lt;/p&gt;
&lt;dl&gt;
&lt;dt&gt;DEFLATE&lt;/dt&gt;
&lt;dd&gt;The &amp;lsquo;original&amp;rsquo; compression method. Works in 32kB blocks, and for
each stores a small header with the compression mode and optional huffman
encoded dictionary. It applies Lempel-Ziv'77 compression of replacing common texts
by back-references. It uses a 32kiB context window for this, which may extend
beyond the start of the block.&lt;/dd&gt;
&lt;dt&gt;zlib&lt;/dt&gt;
&lt;dd&gt;An implementation of DEFLATE. The file format wraps the raw deflate
blocks in a header and footer.&lt;/dd&gt;
&lt;dt&gt;gzip (GNU zip)&lt;/dt&gt;
&lt;dd&gt;Another file format around DEFLATE, consisting of a small header containing
eg the original file name, then a list of DEFLATE blocks, and lastly a CRC32 checksum.&lt;/dd&gt;
&lt;dt&gt;blocked gzip (BGZF, blocked gzip format)&lt;/dt&gt;
&lt;dd&gt;A file format developed for bioinformatics that is
just multiple GZIP files concatenated. This allows faster compression and
decompression by parallellizing over independent blocks, as well as random
access via a small auxiliary index of block starts.
This is backwards compatible with plain gzip.&lt;/dd&gt;
&lt;/dl&gt;
&lt;h3 id="implementations"&gt;
 &lt;span class="section-num"&gt;2&lt;/span&gt; Implementations
 &lt;a class="heading-link" href="#implementations"&gt;
 &lt;i class="fa-solid fa-link" aria-hidden="true" title="Link to heading"&gt;&lt;/i&gt;
 &lt;span class="sr-only"&gt;Link to heading&lt;/span&gt;
 &lt;/a&gt;
&lt;/h3&gt;
&lt;dl&gt;
&lt;dt&gt;&lt;code&gt;zlib&lt;/code&gt;&lt;/dt&gt;
&lt;dd&gt;The original C library.&lt;/dd&gt;
&lt;dt&gt;&lt;code&gt;zlib-ng&lt;/code&gt;&lt;/dt&gt;
&lt;dd&gt;Modern C implementation of zlib using SIMD.&lt;/dd&gt;
&lt;dt&gt;&lt;code&gt;libz-sys&lt;/code&gt; crate&lt;/dt&gt;
&lt;dd&gt;Rust bindings to &lt;code&gt;zlib&lt;/code&gt; and &lt;code&gt;zlib-ng&lt;/code&gt;.&lt;/dd&gt;
&lt;dt&gt;&lt;code&gt;zlib-rs&lt;/code&gt; crate&lt;/dt&gt;
&lt;dd&gt;Pure Rust re-implementation.&lt;/dd&gt;
&lt;dt&gt;&lt;code&gt;zlib-rs-sys&lt;/code&gt; crate&lt;/dt&gt;
&lt;dd&gt;zlib-compatible C-API to &lt;code&gt;zlib-rs&lt;/code&gt;&lt;/dd&gt;
&lt;dt&gt;&lt;code&gt;flate2&lt;/code&gt; crate&lt;/dt&gt;
&lt;dd&gt;High level Rust crate with uniform bindings to multiple zlib implementations.&lt;/dd&gt;
&lt;/dl&gt;
&lt;h3 id="containers"&gt;
 &lt;span class="section-num"&gt;3&lt;/span&gt; Containers
 &lt;a class="heading-link" href="#containers"&gt;
 &lt;i class="fa-solid fa-link" aria-hidden="true" title="Link to heading"&gt;&lt;/i&gt;
 &lt;span class="sr-only"&gt;Link to heading&lt;/span&gt;
 &lt;/a&gt;
&lt;/h3&gt;
&lt;figure class="inset large"&gt;&lt;img src="https://curiouscoding.nl/ox-hugo/gzip.svg"
 alt="Figure 1: Overview of the containers in a (blocked) GZIP file. Mim stores checkpoints at the start of DEFLATE blocks, which do not coincide with the start of GZIP blocks!"&gt;&lt;figcaption&gt;
 &lt;p&gt;&lt;span class="figure-number"&gt;Figure 1: &lt;/span&gt;Overview of the containers in a (blocked) GZIP file. Mim stores checkpoints at the start of DEFLATE blocks, which do &lt;em&gt;not&lt;/em&gt; coincide with the start of GZIP blocks!&lt;/p&gt;</description></item><item><title>Releasing Rust SIMD binaries to GitHub, BioConda, and PyPI</title><link>https://curiouscoding.nl/posts/release-flow/</link><pubDate>Sun, 25 Jan 2026 00:00:00 +0100</pubDate><guid>https://curiouscoding.nl/posts/release-flow/</guid><description>&lt;div class="ox-hugo-toc toc has-section-numbers"&gt;
&lt;div class="heading"&gt;Table of Contents&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="section-num"&gt;1&lt;/span&gt; &lt;a href="#testing" &gt;Testing&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2&lt;/span&gt; &lt;a href="#releasing-libraries" &gt;Releasing libraries&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2.1&lt;/span&gt; &lt;a href="#changelog" &gt;Changelog&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2.2&lt;/span&gt; &lt;a href="#cargo-release" &gt;&lt;code&gt;cargo release&lt;/code&gt;&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2.3&lt;/span&gt; &lt;a href="#cratesio" &gt;crates.io&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;3&lt;/span&gt; &lt;a href="#releasing-binaries" &gt;Releasing binaries&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="section-num"&gt;3.1&lt;/span&gt; &lt;a href="#cratesio-bin" &gt;Crates.io&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;3.2&lt;/span&gt; &lt;a href="#avx2" &gt;The pain of AVX2&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="section-num"&gt;3.2.1&lt;/span&gt; &lt;a href="#ensure-simd" &gt;&lt;code&gt;ensure_simd&lt;/code&gt;&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;3.3&lt;/span&gt; &lt;a href="#profile-selection" &gt;Profile selection&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;3.4&lt;/span&gt; &lt;a href="#github" &gt;GitHub&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;3.5&lt;/span&gt; &lt;a href="#binstall" &gt;&lt;code&gt;cargo binstall&lt;/code&gt;&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;3.6&lt;/span&gt; &lt;a href="#pypi" &gt;PyPI&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;3.7&lt;/span&gt; &lt;a href="#bioconda" &gt;Bioconda&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;4&lt;/span&gt; &lt;a href="#conclusion" &gt;Conclusion&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;!--endtoc--&gt;
&lt;p&gt;This post collects the GitHub and BioConda CI configurations I use for
maintaining and releasing Sassy. This includes releasing to:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;crates.io (&lt;a href="#cratesio" &gt;libraries&lt;/a&gt;, &lt;a href="#cratesio-bin" &gt;binaries&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;&lt;a href="#github" &gt;GitHub releases&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#binstall" &gt;cargo binstall&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#pypi" &gt;PyPI&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#bioconda" &gt;Bioconda&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We also have some specific settings to ensure the distributed binaries use &lt;a href="#avx2" &gt;AVX2
SIMD instructions&lt;/a&gt;, as well as some configuration to distribute x86-64 binaries
that transparently switch to a AVX-512 enabled implementation when possible.
(&lt;strong&gt;TODO&lt;/strong&gt;: Write on cargo-multivers for shipping AVX-512 compatible binaries.)&lt;/p&gt;</description></item><item><title>Trying to understand DDR memory</title><link>https://curiouscoding.nl/posts/ddr/</link><pubDate>Tue, 20 Jan 2026 00:00:00 +0100</pubDate><guid>https://curiouscoding.nl/posts/ddr/</guid><description>&lt;div class="ox-hugo-toc toc has-section-numbers"&gt;
&lt;div class="heading"&gt;Table of Contents&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="section-num"&gt;1&lt;/span&gt; &lt;a href="#questions" &gt;Questions&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2&lt;/span&gt; &lt;a href="#a-load-of-articles-blogs-pages-to-read" &gt;A load of articles/blogs/pages to read&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2.1&lt;/span&gt; &lt;a href="#wikipedia-articles" &gt;Wikipedia articles&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2.2&lt;/span&gt; &lt;a href="#more-posts" &gt;More posts&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2.3&lt;/span&gt; &lt;a href="#notes" &gt;Notes&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2.4&lt;/span&gt; &lt;a href="#my-own-ram" &gt;My own RAM&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2.5&lt;/span&gt; &lt;a href="#continued-notes" &gt;Continued notes&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2.6&lt;/span&gt; &lt;a href="#address-mapping-notation" &gt;Address mapping notation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2.7&lt;/span&gt; &lt;a href="#intel-spec" &gt;Intel spec&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2.8&lt;/span&gt; &lt;a href="#rank-interleaving" &gt;Rank interleaving&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2.9&lt;/span&gt; &lt;a href="#nontemporal-reads-writes" &gt;Nontemporal reads/writes&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;3&lt;/span&gt; &lt;a href="#remap-using-performance-counters" &gt;reMap: using Performance counters&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;4&lt;/span&gt; &lt;a href="#sudoku" &gt;Sudoku&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="section-num"&gt;4.1&lt;/span&gt; &lt;a href="#step-1-dram-addressing-functions" &gt;Step 1: DRAM addressing functions&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;4.2&lt;/span&gt; &lt;a href="#step-2-row-column-bits" &gt;Step 2: row/column bits&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;4.3&lt;/span&gt; &lt;a href="#step-3-validation" &gt;Step 3: validation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;4.4&lt;/span&gt; &lt;a href="#step-4-which-function-is-what" &gt;Step 4: which function is what?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;4.5&lt;/span&gt; &lt;a href="#refreshes" &gt;Refreshes&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;4.6&lt;/span&gt; &lt;a href="#consecutive-accesses" &gt;Consecutive Accesses&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;5&lt;/span&gt; &lt;a href="#sudoku-now-with-only-1-dimm" &gt;Sudoku, now with only 1 DIMM&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="section-num"&gt;5.1&lt;/span&gt; &lt;a href="#setup" &gt;setup&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;5.2&lt;/span&gt; &lt;a href="#1-dot-reverse-functions" &gt;1. reverse functions&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;5.3&lt;/span&gt; &lt;a href="#2-dot-identify-bits" &gt;2. identify bits&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;5.4&lt;/span&gt; &lt;a href="#3-dot-validate-mapping" &gt;3. validate mapping&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;5.5&lt;/span&gt; &lt;a href="#4-dot-decompose-functions" &gt;4. decompose functions&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;6&lt;/span&gt; &lt;a href="#results" &gt;Final results&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;7&lt;/span&gt; &lt;a href="#decode-dimms" &gt;&lt;code&gt;decode-dimms&lt;/code&gt;&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="section-num"&gt;7.1&lt;/span&gt; &lt;a href="#bank-groups" &gt;Bank groups&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;7.2&lt;/span&gt; &lt;a href="#refresh" &gt;Refresh&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;7.3&lt;/span&gt; &lt;a href="#random-access-throughput" &gt;Random access throughput&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;8&lt;/span&gt; &lt;a href="#cpu-benchmarks" &gt;CPU benchmarks&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="section-num"&gt;8.1&lt;/span&gt; &lt;a href="#cpu-benchmarks" &gt;cpu-benchmarks&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="section-num"&gt;8.1.1&lt;/span&gt; &lt;a href="#random-access-throughput-1-dimm" &gt;random access throughput 1 DIMM&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;8.1.2&lt;/span&gt; &lt;a href="#random-access-throughput-2-dimm" &gt;random access throughput 2 DIMM&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;8.2&lt;/span&gt; &lt;a href="#memory-read-experiment" &gt;memory-read-experiment&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="section-num"&gt;8.2.1&lt;/span&gt; &lt;a href="#strided-reading-1-dimm" &gt;strided reading 1 DIMM&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;8.2.2&lt;/span&gt; &lt;a href="#strided-reading-2-dimm" &gt;strided reading 2 DIMM&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;9&lt;/span&gt; &lt;a href="#tinymembench" &gt;&lt;code&gt;tinymembench&lt;/code&gt;&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;10&lt;/span&gt; &lt;a href="#remaining-questions" &gt;Remaining questions&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;!--endtoc--&gt;
&lt;p&gt;These are chronological (and thus, only lightly organized) notes on my attempt to
understand how DDR4 and DDR5 RAM memory work.&lt;/p&gt;</description></item><item><title>Asymptotic elevators</title><link>https://curiouscoding.nl/posts/asymptotic-elevators/</link><pubDate>Mon, 22 Dec 2025 00:00:00 +0100</pubDate><guid>https://curiouscoding.nl/posts/asymptotic-elevators/</guid><description>&lt;p&gt;I was listening to an episode of the &lt;em&gt;well there&amp;rsquo;s your problem&lt;/em&gt; podcast about
pencil towers (&lt;a href="https://www.youtube.com/watch?v=BvMYplJ59TE&amp;amp;t=11297s" class="external-link" target="_blank" rel="noopener"&gt;youtube&lt;/a&gt;), and it had a section on how elevators are a problem because they
require a lot of space. So here&amp;rsquo;s a mathematical version of that.&lt;/p&gt;
&lt;h3 id="problem-statement"&gt;
 Problem statement
 &lt;a class="heading-link" href="#problem-statement"&gt;
 &lt;i class="fa-solid fa-link" aria-hidden="true" title="Link to heading"&gt;&lt;/i&gt;
 &lt;span class="sr-only"&gt;Link to heading&lt;/span&gt;
 &lt;/a&gt;
&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Given are \(n\) people that need to go to floors \(1, 2, \dots, n\).&lt;/li&gt;
&lt;li&gt;Elevators have constant acceleration, and must be standing still to
enter/exit.&lt;sup id="fnref:1"&gt;&lt;a href="#fn:1" class="footnote-ref" role="doc-noteref"&gt;1&lt;/a&gt;&lt;/sup&gt;&lt;/li&gt;
&lt;li&gt;Possible elevator configurations:
&lt;ol&gt;
&lt;li&gt;single elevator over entire height&lt;/li&gt;
&lt;li&gt;partition the height in disjoint intervals, and then one elevator per interval&lt;/li&gt;
&lt;li&gt;double-deck: a single elevator that is \(h\) floors high&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Not&lt;/em&gt; allowed: two free-moving elevators above each other that make sure to
never bump into each other.&lt;/li&gt;
&lt;li&gt;Elevators have infinite capacity.&lt;/li&gt;
&lt;li&gt;There are \(k\) elevator shafts.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Question:&lt;/strong&gt; How much total travel time do you need to get everyone home, if
everybody arrives at the same time.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Harder(?):&lt;/strong&gt; What if the people arrive in a random permutation (1 per time
step), and their clock starts ticking as soon as they arrive?&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="observations"&gt;
 Observations
 &lt;a class="heading-link" href="#observations"&gt;
 &lt;i class="fa-solid fa-link" aria-hidden="true" title="Link to heading"&gt;&lt;/i&gt;
 &lt;span class="sr-only"&gt;Link to heading&lt;/span&gt;
 &lt;/a&gt;
&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Going \(n\) floors up takes at least \(O(\sqrt n)\) time.&lt;/li&gt;
&lt;li&gt;Total travel time is at least \(\sum_{i=0}^n O(\sqrt i) = O(n \sqrt n)\)&lt;/li&gt;
&lt;li&gt;\(1\) elevator carrying everyone going 1 step at a time: \(O(n^2)\) total time&lt;/li&gt;
&lt;li&gt;$2$-elevator sqrt-decomposition: one elevator stops every \(\sqrt n\) floors,
and then a (set of) second elevators for the final up to \(\sqrt n\) floors.
&lt;ul&gt;
&lt;li&gt;first elevator: \(n\) people times \(n/(\sqrt n)/2\) &amp;lsquo;big steps&amp;rsquo; on average
times \(\sqrt{\sqrt n}\) time per big step is \(O(n^{7/4})\)&lt;/li&gt;
&lt;li&gt;second elevator: \(n\) people times \((\sqrt n)\) &amp;lsquo;small steps&amp;rsquo; on average
times \(1\) per small step is \(O(n^{3.2})\)&lt;/li&gt;
&lt;li&gt;so overall the big steps dominate&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;$2$-elevator, one that stops every \(B\) floors and then \(B\) that stop every
one floor:
&lt;ul&gt;
&lt;li&gt;the first one: \(n \cdot (n/B) \cdot \sqrt B = n^2/\sqrt B\)&lt;/li&gt;
&lt;li&gt;the second one: \(n \cdot B = nB\).&lt;/li&gt;
&lt;li&gt;solve for equality: \(B = n/\sqrt B\) =&amp;gt; \(B = n^{2/3}\)&lt;/li&gt;
&lt;li&gt;so \(n^{1+2/3}\) solution&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;\(\lg n\) elevator binary tree:
&lt;ul&gt;
&lt;li&gt;\(2^k \leq n \leq 2^{k+1}\)&lt;/li&gt;
&lt;li&gt;Take first elevator to \(2^k\) if needed: time \(\sqrt{2^k} = O(\sqrt n)\)&lt;/li&gt;
&lt;li&gt;There take second elevator that goes up \(2^{k-1}\) floors if needed.&lt;/li&gt;
&lt;li&gt;Then \(2^{k-2}\) levels up&lt;/li&gt;
&lt;li&gt;and so on until the last floor.&lt;/li&gt;
&lt;li&gt;Total time per person averages \(\sqrt{2^k}/2 + \sqrt{2^{k-1}}/2 + \dots + \sqrt{2}/2 + \sqrt{1}/2 =
O(\sqrt{2^k}) = O(\sqrt n)\), so up-to-a-constant optimal total travel
time \(O(n \sqrt n)\).&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Open questions:&lt;/strong&gt;&lt;/p&gt;</description></item><item><title>Overview of static data structures</title><link>https://curiouscoding.nl/posts/static-data-structures/</link><pubDate>Wed, 17 Dec 2025 00:00:00 +0100</pubDate><guid>https://curiouscoding.nl/posts/static-data-structures/</guid><description>&lt;div class="ox-hugo-toc toc has-section-numbers"&gt;
&lt;div class="heading"&gt;Table of Contents&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="section-num"&gt;1&lt;/span&gt; &lt;a href="#classification-of-static-data-structures" &gt;Classification of static data structures&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2&lt;/span&gt; &lt;a href="#space-lower-bounds-and-practical-approaches" &gt;Space lower bounds and practical approaches&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2.1&lt;/span&gt; &lt;a href="#rank" &gt;Rank&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2.2&lt;/span&gt; &lt;a href="#rank-plus-select" &gt;Rank + Select&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2.3&lt;/span&gt; &lt;a href="#minimal-perfect-hash-function--mphf" &gt;Minimal perfect hash function (MPHF)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2.4&lt;/span&gt; &lt;a href="#monotone-mphf" &gt;&lt;span class="org-todo todo TODO"&gt;TODO&lt;/span&gt; Monotone MPHF&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2.5&lt;/span&gt; &lt;a href="#order-preserving-mphf" &gt;&lt;span class="org-todo todo TODO"&gt;TODO&lt;/span&gt; Order-preserving MPHF&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2.6&lt;/span&gt; &lt;a href="#static-retrieval-static-function-with-static-values" &gt;Static retrieval: Static function with static values&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2.7&lt;/span&gt; &lt;a href="#updatable-retrieval-static-function-with-mutable-values" &gt;Updatable retrieval: Static function with mutable values&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2.8&lt;/span&gt; &lt;a href="#static-set--membership" &gt;Static set (membership)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2.9&lt;/span&gt; &lt;a href="#static-ordered-set" &gt;Static ordered set&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2.10&lt;/span&gt; &lt;a href="#static-dictionary-static-keys-and-values" &gt;Static dictionary: static keys and values&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2.11&lt;/span&gt; &lt;a href="#updatable-dictionary-with-mutable-values" &gt;Updatable dictionary with mutable values&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2.12&lt;/span&gt; &lt;a href="#dynamic-dictionary-with-mutable-keys-and-values" &gt;&lt;span class="org-todo todo TODO"&gt;TODO&lt;/span&gt; Dynamic dictionary with mutable keys and values&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2.13&lt;/span&gt; &lt;a href="#static-filter" &gt;&lt;span class="org-todo todo TODO"&gt;TODO&lt;/span&gt; Static filter&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2.14&lt;/span&gt; &lt;a href="#ordered-static-updatable-dynamic-dictionary" &gt;&lt;span class="org-todo todo TODO"&gt;TODO&lt;/span&gt; Ordered static/updatable/dynamic dictionary?&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;3&lt;/span&gt; &lt;a href="#summary" &gt;Summary table&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;!--endtoc--&gt;
&lt;p&gt;\[
\newcommand{\K}{\mathbb K}
\newcommand{\V}{\mathbb V}
\newcommand{\c}[1]{\mathbf{\mathsf{#1}}}
\]&lt;/p&gt;</description></item><item><title>Distributing Rust SIMD Binaries</title><link>https://curiouscoding.nl/posts/distributing-rust-simd-binaries/</link><pubDate>Thu, 20 Nov 2025 00:00:00 +0100</pubDate><guid>https://curiouscoding.nl/posts/distributing-rust-simd-binaries/</guid><description>&lt;div class="ox-hugo-toc toc has-section-numbers"&gt;
&lt;div class="heading"&gt;Table of Contents&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="section-num"&gt;1&lt;/span&gt; &lt;a href="#what-s-inside" &gt;What&amp;rsquo;s inside&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2&lt;/span&gt; &lt;a href="#compile-time-feature-detection" &gt;Compile-time feature detection&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2.1&lt;/span&gt; &lt;a href="#other-solutions" &gt;Other solutions&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;3&lt;/span&gt; &lt;a href="#rust-s-default-target-cpu" &gt;Rust&amp;rsquo;s default target-cpu&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="section-num"&gt;3.1&lt;/span&gt; &lt;a href="#cargo-build-vs-cargo-install" &gt;&lt;code&gt;cargo build&lt;/code&gt; vs &lt;code&gt;cargo install&lt;/code&gt;&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;4&lt;/span&gt; &lt;a href="#hardware-support" &gt;Hardware support&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;5&lt;/span&gt; &lt;a href="#distributing-binaries" &gt;Distributing binaries&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="section-num"&gt;5.1&lt;/span&gt; &lt;a href="#github-releases" &gt;GitHub Releases&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;5.2&lt;/span&gt; &lt;a href="#bioconda" &gt;Bioconda&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;5.3&lt;/span&gt; &lt;a href="#pypi" &gt;Pypi&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;6&lt;/span&gt; &lt;a href="#open-questions" &gt;Open questions&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;!--endtoc--&gt;
&lt;p&gt;Many of my rust crates and binaries building on them use SIMD instructions. Notably:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;packed-seq&lt;/code&gt;, a library for 2-bit encoding DNA (&lt;a href="https://github.com/rust-seq/packed-seq" class="external-link" target="_blank" rel="noopener"&gt;gh&lt;/a&gt;, &lt;a href="https://docs.rs/packed-seq/latest/packed_seq/" class="external-link" target="_blank" rel="noopener"&gt;docs.rs&lt;/a&gt;),&lt;/li&gt;
&lt;li&gt;&lt;code&gt;simd-minimizers&lt;/code&gt;, a library for fast computation of &lt;em&gt;minimizers&lt;/em&gt; (&lt;a href="https://github.com/rust-seq/simd-minimizers" class="external-link" target="_blank" rel="noopener"&gt;gh&lt;/a&gt;, &lt;a href="https://docs.rs/simd-minimizers/latest/simd_minimizers/" class="external-link" target="_blank" rel="noopener"&gt;docs.rs&lt;/a&gt;, &lt;a href="https://doi.org/10.4230/LIPIcs.SEA.2025.20" class="external-link" target="_blank" rel="noopener"&gt;paper&lt;/a&gt;),&lt;/li&gt;
&lt;li&gt;&lt;code&gt;deacon&lt;/code&gt;, a tool for fast decontamination of reads (&lt;a href="https://github.com/bede/deacon" class="external-link" target="_blank" rel="noopener"&gt;gh&lt;/a&gt;, &lt;a href="https://doi.org/10.1101/2025.06.09.658732" class="external-link" target="_blank" rel="noopener"&gt;preprint&lt;/a&gt;),&lt;/li&gt;
&lt;li&gt;&lt;code&gt;sassy&lt;/code&gt;, a SIMD-based approximate string searching library and binary (&lt;a href="https://github.com/RagnarGrootKoerkamp/sassy" class="external-link" target="_blank" rel="noopener"&gt;gh&lt;/a&gt;, &lt;a href="https://docs.rs/sassy/latest/sassy/" class="external-link" target="_blank" rel="noopener"&gt;docs.rs&lt;/a&gt;, &lt;a href="https://doi.org/10.1101/2025.07.22.666207" class="external-link" target="_blank" rel="noopener"&gt;preprint&lt;/a&gt;),&lt;/li&gt;
&lt;li&gt;&lt;code&gt;barbell&lt;/code&gt;, a tool for demultiplexer based on sassy (&lt;a href="https://github.com/rickbeeloo/barbell" class="external-link" target="_blank" rel="noopener"&gt;gh&lt;/a&gt;, &lt;a href="https://doi.org/10.1101/2025.10.22.683865" class="external-link" target="_blank" rel="noopener"&gt;preprint&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;all support fast algorithms based on both x86-64 (x64, henceforth) AVX2 and
aarch64 NEON instructions. The question of this post is: how to effectively
distribute binaries using these libraries?&lt;/p&gt;</description></item><item><title>Three log scientist</title><link>https://curiouscoding.nl/posts/three-log-scientist/</link><pubDate>Tue, 12 Aug 2025 00:00:00 +0200</pubDate><guid>https://curiouscoding.nl/posts/three-log-scientist/</guid><description>&lt;p&gt;A rating system for theoretical computer scientists.
The more logarithms there are (i.e. the more &amp;ldquo;\(\log\)&amp;rdquo; before your variables),
the higher your reputation will be.
No-log theoretical computer scientists are virtually non-existent, as virtually
all non-trivial algorithms require use of logarithms.
Most are one-log scientists.
In the old times (well, I&amp;rsquo;m young, so these look like old times to me at least), one would occasionally find a piece of code done by a three-log scientist and shiver with awe.&lt;/p&gt;</description></item><item><title>SimdSketch: a fast bucket sketch</title><link>https://curiouscoding.nl/posts/simd-sketch/</link><pubDate>Sun, 09 Mar 2025 00:00:00 +0100</pubDate><guid>https://curiouscoding.nl/posts/simd-sketch/</guid><description>&lt;div class="ox-hugo-toc toc has-section-numbers"&gt;
&lt;div class="heading"&gt;Table of Contents&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="section-num"&gt;1&lt;/span&gt; &lt;a href="#jaccard-similarity" &gt;Jaccard similarity&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2&lt;/span&gt; &lt;a href="#hash-schemes" &gt;Hash schemes&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2.1&lt;/span&gt; &lt;a href="#minhash" &gt;MinHash&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2.2&lt;/span&gt; &lt;a href="#s-mins-sketch" &gt;$s$-mins sketch&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2.3&lt;/span&gt; &lt;a href="#bottom-s" &gt;Bottom-\(s\) sketch&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2.4&lt;/span&gt; &lt;a href="#fracminhash" &gt;FracMinHash&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2.5&lt;/span&gt; &lt;a href="#bucket-sketch" &gt;Bucket sketch&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2.6&lt;/span&gt; &lt;a href="#mod-bucket-hash--new" &gt;Mod-bucket hash (new?)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2.7&lt;/span&gt; &lt;a href="#variants" &gt;Variants&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;3&lt;/span&gt; &lt;a href="#compressing-sketches" &gt;Compressing sketches&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="section-num"&gt;3.1&lt;/span&gt; &lt;a href="#b-bit-hashing" &gt;$b$-bit hashing&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="section-num"&gt;3.1.1&lt;/span&gt; &lt;a href="#accounting-for-collisions" &gt;Accounting for collisions&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;3.2&lt;/span&gt; &lt;a href="#hyperminhash" &gt;HyperMinHash&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;4&lt;/span&gt; &lt;a href="#densification-strategies" &gt;Densification strategies&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;5&lt;/span&gt; &lt;a href="#simdsketch" &gt;SimdSketch&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;6&lt;/span&gt; &lt;a href="#evaluation" &gt;Evaluation&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="section-num"&gt;6.1&lt;/span&gt; &lt;a href="#setup" &gt;Setup&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="section-num"&gt;6.1.1&lt;/span&gt; &lt;a href="#tools" &gt;Tools&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;6.1.2&lt;/span&gt; &lt;a href="#inputs" &gt;Inputs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;6.1.3&lt;/span&gt; &lt;a href="#parameters" &gt;Parameters&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;6.1.4&lt;/span&gt; &lt;a href="#metrics" &gt;Metrics&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;6.2&lt;/span&gt; &lt;a href="#raw-results" &gt;Raw results&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;6.3&lt;/span&gt; &lt;a href="#correlation" &gt;Correlation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;6.4&lt;/span&gt; &lt;a href="#comparison-speed" &gt;Comparison speed&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;6.5&lt;/span&gt; &lt;a href="#low-similarity-data" &gt;Low-similarity data&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;7&lt;/span&gt; &lt;a href="#discussion" &gt;Discussion&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;8&lt;/span&gt; &lt;a href="#future-work" &gt;&lt;span class="org-todo todo TODO"&gt;TODO&lt;/span&gt; / Future work&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;!--endtoc--&gt;
&lt;p&gt;\[
\newcommand{\sketch}{\mathsf{sketch}}
\]&lt;/p&gt;</description></item><item><title>Thoughts on Consensus MPHF and tiny pointers</title><link>https://curiouscoding.nl/posts/consensus/</link><pubDate>Wed, 12 Feb 2025 00:00:00 +0100</pubDate><guid>https://curiouscoding.nl/posts/consensus/</guid><description>&lt;div class="ox-hugo-toc toc has-section-numbers"&gt;
&lt;div class="heading"&gt;Table of Contents&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="section-num"&gt;1&lt;/span&gt; &lt;a href="#consensus" &gt;Consensus&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="section-num"&gt;1.1&lt;/span&gt; &lt;a href="#consensus-recsplit" &gt;Consensus-RecSplit&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2&lt;/span&gt; &lt;a href="#idea-consensus-ptrhash" &gt;IDEA: Consensus-PtrHash&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;3&lt;/span&gt; &lt;a href="#hashing" &gt;Tiny pointers and optimal open addressing hash tables&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;!--endtoc--&gt;
&lt;p&gt;These are some thoughts on the Consensus-based MPHF presented in
Lehmann et al. (&lt;a href="#citeproc_bib_item_4"&gt;2025&lt;/a&gt;), and how this could be applied to PtrHash:&lt;/p&gt;
&lt;p&gt;Lehmann, Hans-Peter, Peter Sanders, Stefan Walzer, and Jonatan Ziegler. 2025. “Combined Search and Encoding for Seeds, with an Application to Minimal Perfect Hashing.” &lt;i&gt;Arxiv&lt;/i&gt;. &lt;a href="https://doi.org/10.48550/ARXIV.2502.05613"&gt;&lt;a href="https://doi.org/10.48550/ARXIV.2502.05613" class="external-link" target="_blank" rel="noopener"&gt;https://doi.org/10.48550/ARXIV.2502.05613&lt;/a&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Below are also some thoughts on the papers on tiny pointers, used to achieve
hash tables with load factors very close to 1: Bender et al. (&lt;a href="#citeproc_bib_item_1"&gt;2021&lt;/a&gt;), Farach-Colton, Krapivin, and Kuszmaul (&lt;a href="#citeproc_bib_item_2"&gt;2024&lt;/a&gt;).&lt;/p&gt;</description></item><item><title>PtrHash: Notes on adapting PTHash in Rust</title><link>https://curiouscoding.nl/posts/ptrhash-log/</link><pubDate>Thu, 21 Sep 2023 00:00:00 +0200</pubDate><guid>https://curiouscoding.nl/posts/ptrhash-log/</guid><description>&lt;div class="ox-hugo-toc toc"&gt;
&lt;div class="heading"&gt;Table of Contents&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#questions-and-remarks-on-pthash-paper" &gt;Questions and remarks on PTHash paper&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#ideas-for-improvement" &gt;Ideas for improvement&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#parameters" &gt;Parameters&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#align-packed-vectors-to-cachelines" &gt;Align packed vectors to cachelines&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#prefetching" &gt;Prefetching&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#faster-modulo-operations" &gt;Faster modulo operations&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#store-dictionary-d-sorted-using-elias-fano-coding" &gt;Store dictionary \(D\) sorted using Elias-Fano coding&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#how-many-bits-of-n-and-hash-entropy-do-we-need" &gt;How many bits of \(n\) and hash entropy do we need?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#ideas-for-faster-construction" &gt;Ideas for faster construction&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#implementation-log" &gt;Implementation log&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#hashing-function" &gt;Hashing function&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#bitpacking-crates" &gt;Bitpacking crates&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#construction" &gt;Construction&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#fastmod" &gt;Fastmod&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#try-out-fastdivide-and-reciprocal-crates" &gt;&lt;span class="org-todo todo TODO"&gt;TODO&lt;/span&gt; Try out &lt;code&gt;fastdivide&lt;/code&gt; and &lt;code&gt;reciprocal&lt;/code&gt; crates&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#first-benchmark" &gt;First benchmark&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#faster-bucket-computation" &gt;Faster bucket computation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#branchless-for-real-now--aka-the-trick-of-thirds" &gt;Branchless, for real now! (aka the trick-of-thirds)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#compiling-and-benchmarking-pthash" &gt;Compiling and benchmarking PTHash&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#compact-encoding" &gt;Compact encoding&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#find-the-x-differences" &gt;Find the \(x\) differences&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#fastreduce-revisited" &gt;&lt;code&gt;FastReduce&lt;/code&gt; revisited&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#is-there-a-problem-if-gcd--m-n--is-large" &gt;&lt;span class="org-todo todo TODO"&gt;TODO&lt;/span&gt; Is there a problem if \(\gcd(m, n)\) is large?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#faster-hashing" &gt;Faster hashing&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#try-xxhash" &gt;&lt;span class="org-todo todo TODO"&gt;TODO&lt;/span&gt; Try xxhash&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#an-experiment" &gt;An experiment&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#compiler-struggles" &gt;Compiler struggles&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#prefetching-at-last" &gt;Prefetching, at last&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#prefetching-with-vectorization" &gt;&lt;span class="org-todo todo TODO"&gt;TODO&lt;/span&gt; Prefetching with vectorization&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#inverting-hki" &gt;Inverting \(h(k_i)\)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#another-day-of-progress" &gt;Another day of progress&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#possible-sorting-algorithms" &gt;&lt;span class="org-todo todo TODO"&gt;TODO&lt;/span&gt; Possible sorting algorithms&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#diving-into-the-inverse-hash-problem" &gt;Diving into the inverse hash problem&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#bringing-it-home" &gt;Bringing it home&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#hash-inversion-for-faster-pthash-construction" &gt;Hash-inversion for faster PTHash construction&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#fast-path-for-small-buckets" &gt;Fast path for small buckets&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#dictionary-encoding" &gt;&lt;span class="org-todo todo TODO"&gt;TODO&lt;/span&gt; Dictionary encoding&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#larger-buckets" &gt;&lt;span class="org-todo todo TODO"&gt;TODO&lt;/span&gt; Larger buckets&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#prefetching-free-slots" &gt;&lt;span class="org-todo todo TODO"&gt;TODO&lt;/span&gt; Prefetching free slots&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#filling-the-last-few-empty-slots-needs-very-high-k-i" &gt;Filling the last few empty slots needs very high \(k_i\)!&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#perfect-matching-for-the-tail" &gt;Perfect matching for the tail&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#peeling-for-size-1-buckets" &gt;Peeling for size-1 buckets&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#greedy-peeling-1-assigning-from-hard-to-easy" &gt;Greedy peeling 1: Assigning from hard to easy&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#peeling-and-cuckoo-hashing-for-larger-buckets-dot" &gt;&lt;span class="org-todo todo TODO"&gt;TODO&lt;/span&gt; Peeling and cuckoo hashing for larger buckets.&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#sunday-morning-ideas" &gt;Sunday morning ideas&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#dinic" &gt;Dinic&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#new-iterative-greedy-assignment-idea" &gt;New iterative greedy assignment idea&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#cuckoo-hashing-again" &gt;Cuckoo hashing, again&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#cuckoo-hashing-displacing-for-real-now" &gt;Cuckoo hashing / displacing, &lt;em&gt;for real now&lt;/em&gt;&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#displacing-globally" &gt;Displacing globally&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#running-it" &gt;Running it&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#limitations" &gt;Limitations&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#cleanup-and-revisiting-defaults" &gt;Cleanup and revisiting defaults&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#sum-instead-of-xor" &gt;&lt;span class="org-todo todo TODO"&gt;TODO&lt;/span&gt; Sum instead of xor?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#revisiting-alpha-1" &gt;Revisiting \(\alpha &amp;lt; 1\)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#elias-fano-for-the-remap-dictionary" &gt;Elias-Fano for the remap-dictionary&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#global-iterative-prioritizing" &gt;Global iterative prioritizing&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#cleanup-removing-peeling-and-suboptimal-displacing-code" &gt;Cleanup: removing peeling and suboptimal displacing code&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#some-speedups-to-the-displacement-algorithm" &gt;Some speedups to the displacement algorithm&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#runtime-analysis-of-displacement-algorithm" &gt;&lt;span class="org-todo todo TODO"&gt;TODO&lt;/span&gt; Runtime analysis of displacement algorithm&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#optimal-prefetching-strategy" &gt;&lt;span class="org-todo todo TODO"&gt;TODO&lt;/span&gt; Optimal prefetching strategy&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#are-we-close-to-the-memory-bandwidth" &gt;Are we close to the memory bandwidth?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#more-sorting-algorithm-resources" &gt;More sorting algorithm resources&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#and-some-resources-on-partitioning" &gt;And some resources on partitioning&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#partitioning-to-reduce-memory-latency" &gt;Partitioning to reduce memory latency&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#back-from-a-break" &gt;Back from a break!&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#speeding-up-the-search-for-pilots" &gt;Speeding up the search for pilots&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#multiplyreduce" &gt;&lt;code&gt;MultiplyReduce&lt;/code&gt;&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#linux-hugepages" &gt;Linux hugepages?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#dropping-the-bucket-split" &gt;Dropping the bucket split?&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#build-performance" &gt;Build performance&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#an-alternative" &gt;An alternative&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#query-performance" &gt;Query performance&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#query-memory-bandwidth" &gt;Query memory bandwidth&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#some-more-experiments" &gt;Some more experiments&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#multithreading-benchmark" &gt;Multithreading benchmark&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#multithreading-queries-satisfaction-at-last" &gt;Multithreading queries: satisfaction at last&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#packing-difference-from-expected-position" &gt;Packing difference from expected position&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#local-packing-ideas" &gt;Local packing ideas&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#query-times-for-different-remapping-structures" &gt;Query times for different remapping structures&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#sharding" &gt;Sharding&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#128bit-hashing" &gt;128bit hashing&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#varying-the-partition-size" &gt;Varying the partition size&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#ptrhash-part-2" &gt;PtrHash, part 2&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#phobic" &gt;Phobic&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#for-ptrhash" &gt;&lt;span class="org-todo todo TODO"&gt;TODO&lt;/span&gt; for PtrHash&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;!--endtoc--&gt;
&lt;p&gt;\[
%\newcommand{\mm}{\,\%\,}
\newcommand{\mm}{\bmod}
\newcommand{\lxor}{\oplus}
\newcommand{\K}{\mathcal K}
\]&lt;/p&gt;</description></item><item><title>String algorithm visualizations</title><link>https://curiouscoding.nl/posts/alg-viz/</link><pubDate>Tue, 08 Nov 2022 00:00:00 +0100</pubDate><guid>https://curiouscoding.nl/posts/alg-viz/</guid><description>&lt;ol&gt;
&lt;li&gt;Select the algorithm to visualize&lt;/li&gt;
&lt;li&gt;Click the buttons, or click the canvas and use the indicated keys&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Suffix-array construction is explained &lt;a href="https://curiouscoding.nl/posts/suffix-array-construction/" &gt;here&lt;/a&gt; and BWT is explained &lt;a href="https://curiouscoding.nl/posts/bwt/" &gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Source code is &lt;a href="https://github.com/RagnarGrootKoerkamp/alg-viz" class="external-link" target="_blank" rel="noopener"&gt;on GitHub&lt;/a&gt;.&lt;/p&gt;
&lt;script defer src="https://curiouscoding.nl/js/alg-viz.js" type="module"&gt;&lt;/script&gt;&lt;/head&gt;
&lt;div class="controls"&gt;
&lt;label for="algorithm"&gt;Algorithm&lt;/label&gt;
&lt;select name="algorithm" id="algorithm"&gt;
 &lt;option value="suffix-array"&gt;Suffix Array Construction&lt;/option&gt;
 &lt;option value="bwt"&gt;Burrows-Wheeler Transform&lt;/option&gt;
 &lt;option value="bibwt"&gt;Bidirectional BWT&lt;/option&gt;
&lt;/select&gt;
&lt;br/&gt;
&lt;label for="string"&gt;String&lt;/label&gt; &lt;input type="string" name="string" id="string"/&gt;&lt;br/&gt;
&lt;label for="query"&gt;Query&lt;/label&gt; &lt;input type="string" name="query" id="query"/&gt;&lt;br/&gt;
&lt;button class="button-primary" id="prev"&gt;prev (←/backspace)&lt;/button&gt;
&lt;button class="button-primary" id="next"&gt;next (→/space)&lt;/button&gt;
&lt;br/&gt;
&lt;label for="delay"&gt;Delay (s)&lt;/label&gt; &lt;input type="number" name="delay" id="delay" value="0.8"/&gt;&lt;br/&gt;
&lt;button class="button-primary" id="faster"&gt;faster (↑/+/f)&lt;/button&gt;
&lt;button class="button-primary" id="slower"&gt;slower (↓/-/s)&lt;/button&gt;
&lt;button class="button-primary" id="pauseplay"&gt;pause/play (p/return)&lt;/button&gt;
&lt;/div&gt;
&lt;div class="canvas"&gt;
&lt;canvas id="canvas" tabindex='1' width="1600" height="1200"&gt;&lt;/canvas&gt;
&lt;/div&gt;</description></item><item><title>Revised Oxford Bioinformatics latex template</title><link>https://curiouscoding.nl/posts/bioinformatics-template/</link><pubDate>Thu, 22 Sep 2022 12:13:00 +0200</pubDate><guid>https://curiouscoding.nl/posts/bioinformatics-template/</guid><description>&lt;p&gt;I made an improved version of the Oxford Bioinformatics latex template. See the &lt;a href="https://github.com/RagnarGrootKoerkamp/oxford-bioinformatics-template" class="external-link" target="_blank" rel="noopener"&gt;Github repository&lt;/a&gt;.&lt;/p&gt;</description></item><item><title>Benchmark attention points</title><link>https://curiouscoding.nl/posts/benchmarks/</link><pubDate>Thu, 28 Apr 2022 23:33:00 +0200</pubDate><guid>https://curiouscoding.nl/posts/benchmarks/</guid><description>&lt;p&gt;&lt;em&gt;Benchmarking is harder than you think, even when taking into account this rule.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;This post lists some lessons I learned while attempting to run benchmarks for
&lt;a href="https://github.com/RagnarGrootKoerkamp/astar-pairwise-aligner" class="external-link" target="_blank" rel="noopener"&gt;A* pairwise aligner&lt;/a&gt;. I was doing this on a laptop, which likely has different
characteristics from CPUs in a typical server rack. All the programs I run are
single threaded.&lt;/p&gt;
&lt;h2 id="hardware"&gt;
 Hardware
 &lt;a class="heading-link" href="#hardware"&gt;
 &lt;i class="fa-solid fa-link" aria-hidden="true" title="Link to heading"&gt;&lt;/i&gt;
 &lt;span class="sr-only"&gt;Link to heading&lt;/span&gt;
 &lt;/a&gt;
&lt;/h2&gt;
&lt;dl&gt;
&lt;dt&gt;Do not run while charging the laptop&lt;/dt&gt;
&lt;dd&gt;Charging makes the battery hot and causes throttling. Run either on
battery power or with a completely full battery to prevent this.&lt;/dd&gt;
&lt;dt&gt;Disable hyperthreading&lt;/dt&gt;
&lt;dd&gt;Completely disable hyperthreading in the BIOS.
Multiple programs running on the same core may fight for resources.&lt;/dd&gt;
&lt;/dl&gt;
&lt;h2 id="cpu-settings"&gt;
 CPU settings
 &lt;a class="heading-link" href="#cpu-settings"&gt;
 &lt;i class="fa-solid fa-link" aria-hidden="true" title="Link to heading"&gt;&lt;/i&gt;
 &lt;span class="sr-only"&gt;Link to heading&lt;/span&gt;
 &lt;/a&gt;
&lt;/h2&gt;
&lt;dl&gt;
&lt;dt&gt;Pin CPU frequency&lt;/dt&gt;
&lt;dd&gt;CPUs, especially laptops, have turboboost, (thermal) throttling, and powersave
features. Make sure to pin the CPU core frequency low enough that it can be
sustained for long times without throttling.
&lt;p&gt;In my case, the &lt;code&gt;performance&lt;/code&gt; governor can fix the CPU frequency. The base
frequency of my CPU is &lt;code&gt;2.6GHz&lt;/code&gt;, so that&amp;rsquo;s where I pinned it.&lt;/p&gt;</description></item><item><title>28000x speedup with Numba.CUDA</title><link>https://curiouscoding.nl/posts/numba-cuda-speedup/</link><pubDate>Mon, 24 May 2021 00:00:00 +0200</pubDate><guid>https://curiouscoding.nl/posts/numba-cuda-speedup/</guid><description>&lt;div class="ox-hugo-toc toc"&gt;
&lt;div class="heading"&gt;Table of Contents&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#cuda-overview" &gt;CUDA Overview&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#profiling" &gt;Profiling&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#optimizing-tensor-sketch" &gt;Optimizing Tensor Sketch&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#cpu-code" &gt;CPU code&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#v0-original-python-code" &gt;V0: Original python code&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#v1-numba" &gt;V1: Numba&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#v2-multithreading" &gt;V2: Multithreading&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#gpu-code" &gt;GPU code&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#v3-a-first-gpu-version" &gt;V3: A first GPU version&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#v4-parallel-kernel-invocations" &gt;V4: Parallel kernel invocations&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#v5-single-kernel-with-many-blocks" &gt;V5: Single kernel with many blocks&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#v6-detailed-profiling-kernel-compute" &gt;V6: Detailed profiling: Kernel Compute&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#v7-detailed-profiling-kernel-latency" &gt;V7: Detailed profiling: Kernel Latency&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#v8-detailed-profiling-shared-memory-access-pattern" &gt;V8: Detailed profiling: Shared Memory Access Pattern&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#v9-more-work-per-thread" &gt;V9: More work per thread&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#v10-cache-seq-to-shared-memory" &gt;V10: Cache seq to shared memory&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#v11-hashes-and-signs-in-shared-memory" &gt;V11: Hashes and signs in shared memory&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#v12-revisiting-blocks-per-kernel" &gt;V12: Revisiting blocks per kernel&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#v13-passing-a-tuple-of-sequences" &gt;V13: Passing a tuple of sequences&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#v14-better-hardware" &gt;V14: Better hardware&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#v15-dynamic-shared-memory" &gt;V15: Dynamic shared memory&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#wrap-up" &gt;Wrap up&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;!--endtoc--&gt;
&lt;p&gt;&lt;strong&gt;Backlinks&lt;/strong&gt;: &lt;a href="https://www.reddit.com/r/CUDA/comments/mq1yrm/28000x_speedup_with_numbacuda/" class="external-link" target="_blank" rel="noopener"&gt;r/CUDA&lt;/a&gt;, &lt;a href="https://numba.discourse.group/t/blog-28000x-speedup-with-numba-cuda/667" class="external-link" target="_blank" rel="noopener"&gt;Numba discourse&lt;/a&gt;&lt;/p&gt;</description></item></channel></rss>