<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Mphf on CuriousCoding</title><link>https://curiouscoding.nl/tags/mphf/</link><description>Recent content in Mphf on CuriousCoding</description><generator>Hugo</generator><language>en</language><lastBuildDate>Thu, 15 Jan 2026 00:00:00 +0100</lastBuildDate><atom:link href="https://curiouscoding.nl/tags/mphf/index.xml" rel="self" type="application/rss+xml"/><item><title>Overview of static data structures</title><link>https://curiouscoding.nl/posts/static-data-structures/</link><pubDate>Wed, 17 Dec 2025 00:00:00 +0100</pubDate><guid>https://curiouscoding.nl/posts/static-data-structures/</guid><description>&lt;div class="ox-hugo-toc toc has-section-numbers"&gt;
&lt;div class="heading"&gt;Table of Contents&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="section-num"&gt;1&lt;/span&gt; &lt;a href="#classification-of-static-data-structures" &gt;Classification of static data structures&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2&lt;/span&gt; &lt;a href="#space-lower-bounds-and-practical-approaches" &gt;Space lower bounds and practical approaches&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2.1&lt;/span&gt; &lt;a href="#rank" &gt;Rank&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2.2&lt;/span&gt; &lt;a href="#rank-plus-select" &gt;Rank + Select&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2.3&lt;/span&gt; &lt;a href="#minimal-perfect-hash-function--mphf" &gt;Minimal perfect hash function (MPHF)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2.4&lt;/span&gt; &lt;a href="#monotone-mphf" &gt;&lt;span class="org-todo todo TODO"&gt;TODO&lt;/span&gt; Monotone MPHF&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2.5&lt;/span&gt; &lt;a href="#order-preserving-mphf" &gt;&lt;span class="org-todo todo TODO"&gt;TODO&lt;/span&gt; Order-preserving MPHF&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2.6&lt;/span&gt; &lt;a href="#static-retrieval-static-function-with-static-values" &gt;Static retrieval: Static function with static values&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2.7&lt;/span&gt; &lt;a href="#updatable-retrieval-static-function-with-mutable-values" &gt;Updatable retrieval: Static function with mutable values&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2.8&lt;/span&gt; &lt;a href="#static-set--membership" &gt;Static set (membership)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2.9&lt;/span&gt; &lt;a href="#static-ordered-set" &gt;Static ordered set&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2.10&lt;/span&gt; &lt;a href="#static-dictionary-static-keys-and-values" &gt;Static dictionary: static keys and values&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2.11&lt;/span&gt; &lt;a href="#updatable-dictionary-with-mutable-values" &gt;Updatable dictionary with mutable values&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2.12&lt;/span&gt; &lt;a href="#dynamic-dictionary-with-mutable-keys-and-values" &gt;&lt;span class="org-todo todo TODO"&gt;TODO&lt;/span&gt; Dynamic dictionary with mutable keys and values&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2.13&lt;/span&gt; &lt;a href="#static-filter" &gt;&lt;span class="org-todo todo TODO"&gt;TODO&lt;/span&gt; Static filter&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2.14&lt;/span&gt; &lt;a href="#ordered-static-updatable-dynamic-dictionary" &gt;&lt;span class="org-todo todo TODO"&gt;TODO&lt;/span&gt; Ordered static/updatable/dynamic dictionary?&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;3&lt;/span&gt; &lt;a href="#summary" &gt;Summary table&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;!--endtoc--&gt;
&lt;p&gt;\[
\newcommand{\K}{\mathbb K}
\newcommand{\V}{\mathbb V}
\newcommand{\c}[1]{\mathbf{\mathsf{#1}}}
\]&lt;/p&gt;</description></item><item><title>PtrHash</title><link>https://curiouscoding.nl/slides/ptrhash-text/</link><pubDate>Sun, 20 Jul 2025 00:00:00 +0200</pubDate><guid>https://curiouscoding.nl/slides/ptrhash-text/</guid><description>&lt;script src="https://curiouscoding.nl/livereload.js?mindelay=10&amp;amp;v=2&amp;amp;port=1313&amp;amp;path=livereload" data-no-instant defer&gt;&lt;/script&gt;
&lt;h2 id="hashing"&gt;
 Minimal Perfect Hash Functions
 &lt;a class="heading-link" href="#hashing"&gt;
 &lt;i class="fa-solid fa-link" aria-hidden="true" title="Link to heading"&gt;&lt;/i&gt;
 &lt;span class="sr-only"&gt;Link to heading&lt;/span&gt;
 &lt;/a&gt;
&lt;/h2&gt;
&lt;h3 id="h-keys"&gt;
 A set of \(n\) keys
 &lt;a class="heading-link" href="#h-keys"&gt;
 &lt;i class="fa-solid fa-link" aria-hidden="true" title="Link to heading"&gt;&lt;/i&gt;
 &lt;span class="sr-only"&gt;Link to heading&lt;/span&gt;
 &lt;/a&gt;
&lt;/h3&gt;
&lt;figure class="large"&gt;&lt;img src="https://curiouscoding.nl/ox-hugo/h-keys.svg"&gt;
&lt;/figure&gt;

&lt;h3 id="h-hash"&gt;
 A hash function
 &lt;a class="heading-link" href="#h-hash"&gt;
 &lt;i class="fa-solid fa-link" aria-hidden="true" title="Link to heading"&gt;&lt;/i&gt;
 &lt;span class="sr-only"&gt;Link to heading&lt;/span&gt;
 &lt;/a&gt;
&lt;/h3&gt;
&lt;figure class="large"&gt;&lt;img src="https://curiouscoding.nl/ox-hugo/h-hash.svg"&gt;
&lt;/figure&gt;

&lt;h3 id="h-collision"&gt;
 A hash function: collisions!
 &lt;a class="heading-link" href="#h-collision"&gt;
 &lt;i class="fa-solid fa-link" aria-hidden="true" title="Link to heading"&gt;&lt;/i&gt;
 &lt;span class="sr-only"&gt;Link to heading&lt;/span&gt;
 &lt;/a&gt;
&lt;/h3&gt;
&lt;figure class="large"&gt;&lt;img src="https://curiouscoding.nl/ox-hugo/h-collision.svg"&gt;
&lt;/figure&gt;

&lt;h3 id="h-injection"&gt;
 A hash function: injective / &lt;em&gt;perfect&lt;/em&gt;
 &lt;a class="heading-link" href="#h-injection"&gt;
 &lt;i class="fa-solid fa-link" aria-hidden="true" title="Link to heading"&gt;&lt;/i&gt;
 &lt;span class="sr-only"&gt;Link to heading&lt;/span&gt;
 &lt;/a&gt;
&lt;/h3&gt;
&lt;figure class="large"&gt;&lt;img src="https://curiouscoding.nl/ox-hugo/h-injection.svg"&gt;
&lt;/figure&gt;

&lt;h3 id="h-bijection"&gt;
 A hash function: bijective / &lt;em&gt;minimal &amp;amp; perfect&lt;/em&gt;
 &lt;a class="heading-link" href="#h-bijection"&gt;
 &lt;i class="fa-solid fa-link" aria-hidden="true" title="Link to heading"&gt;&lt;/i&gt;
 &lt;span class="sr-only"&gt;Link to heading&lt;/span&gt;
 &lt;/a&gt;
&lt;/h3&gt;
&lt;figure class="large"&gt;&lt;img src="https://curiouscoding.nl/ox-hugo/h-bijection.svg"&gt;
&lt;/figure&gt;

&lt;h2 id="problem-statement"&gt;
 Problem statement
 &lt;a class="heading-link" href="#problem-statement"&gt;
 &lt;i class="fa-solid fa-link" aria-hidden="true" title="Link to heading"&gt;&lt;/i&gt;
 &lt;span class="sr-only"&gt;Link to heading&lt;/span&gt;
 &lt;/a&gt;
&lt;/h2&gt;
&lt;p&gt;Given a set of \(n\) keys \(K\subseteq \mathbb K\), build a function \(h\) satisfying&lt;/p&gt;</description></item><item><title>Thoughts on Consensus MPHF and tiny pointers</title><link>https://curiouscoding.nl/posts/consensus/</link><pubDate>Wed, 12 Feb 2025 00:00:00 +0100</pubDate><guid>https://curiouscoding.nl/posts/consensus/</guid><description>&lt;div class="ox-hugo-toc toc has-section-numbers"&gt;
&lt;div class="heading"&gt;Table of Contents&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="section-num"&gt;1&lt;/span&gt; &lt;a href="#consensus" &gt;Consensus&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="section-num"&gt;1.1&lt;/span&gt; &lt;a href="#consensus-recsplit" &gt;Consensus-RecSplit&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2&lt;/span&gt; &lt;a href="#idea-consensus-ptrhash" &gt;IDEA: Consensus-PtrHash&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;3&lt;/span&gt; &lt;a href="#hashing" &gt;Tiny pointers and optimal open addressing hash tables&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;!--endtoc--&gt;
&lt;p&gt;These are some thoughts on the Consensus-based MPHF presented in
Lehmann et al. (&lt;a href="#citeproc_bib_item_4"&gt;2025&lt;/a&gt;), and how this could be applied to PtrHash:&lt;/p&gt;
&lt;p&gt;Lehmann, Hans-Peter, Peter Sanders, Stefan Walzer, and Jonatan Ziegler. 2025. “Combined Search and Encoding for Seeds, with an Application to Minimal Perfect Hashing.” &lt;i&gt;Arxiv&lt;/i&gt;. &lt;a href="https://doi.org/10.48550/ARXIV.2502.05613"&gt;&lt;a href="https://doi.org/10.48550/ARXIV.2502.05613" class="external-link" target="_blank" rel="noopener"&gt;https://doi.org/10.48550/ARXIV.2502.05613&lt;/a&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Below are also some thoughts on the papers on tiny pointers, used to achieve
hash tables with load factors very close to 1: Bender et al. (&lt;a href="#citeproc_bib_item_1"&gt;2021&lt;/a&gt;), Farach-Colton, Krapivin, and Kuszmaul (&lt;a href="#citeproc_bib_item_2"&gt;2024&lt;/a&gt;).&lt;/p&gt;</description></item><item><title>PtrHash: Minimal Perfect Hashing at RAM Throughput</title><link>https://curiouscoding.nl/posts/ptrhash/</link><pubDate>Mon, 03 Feb 2025 00:00:00 +0100</pubDate><guid>https://curiouscoding.nl/posts/ptrhash/</guid><description>&lt;div class="ox-hugo-toc toc has-section-numbers"&gt;
&lt;div class="heading"&gt;Table of Contents&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#abstract" &gt;Abstract&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;1&lt;/span&gt; &lt;a href="#sec:orgebb9721" &gt;Introduction&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;2&lt;/span&gt; &lt;a href="#sec:orgfe4e2e9" &gt;Related work&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;3&lt;/span&gt; &lt;a href="#sec:orgce4a522" &gt;PtrHash&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="section-num"&gt;3.1&lt;/span&gt; &lt;a href="#sec:org06ce748" &gt;Overview&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;3.2&lt;/span&gt; &lt;a href="#sec:construction" &gt;Construction&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;3.3&lt;/span&gt; &lt;a href="#sec:bucket-fn" &gt;Bucket Assignment Functions&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;3.4&lt;/span&gt; &lt;a href="#remapping" &gt;Remapping using CacheLineEF&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;4&lt;/span&gt; &lt;a href="#sec:orgbf28892" &gt;Results&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="section-num"&gt;4.1&lt;/span&gt; &lt;a href="#construction-eval" &gt;Construction&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class="section-num"&gt;4.1.1&lt;/span&gt; &lt;a href="#sec:orge11d60c" &gt;Bucket Functions&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;4.1.2&lt;/span&gt; &lt;a href="#sec:org9f908d8" &gt;Tuning Parameters for Construction&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;4.1.3&lt;/span&gt; &lt;a href="#sec:orgece074a" &gt;Remap&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;4.2&lt;/span&gt; &lt;a href="#sec:comparison" &gt;Comparison to Other Methods&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;span class="section-num"&gt;5&lt;/span&gt; &lt;a href="#sec:org9f032dd" &gt;Conclusions and Future Work&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#sec:throughput" &gt;Appendix A: Query Throughput&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#sec:orgabb5dd4" &gt;Batching and Streaming&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#throughput-evaluation" &gt;Evaluation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#multi-threaded-throughput." &gt;Multi-threaded Throughput.&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#sec:sharding" &gt;Appendix B: Sharding&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#sec:sharding-eval" &gt;Evaluation&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;!--endtoc--&gt;
&lt;p&gt;This is the online version of my SEA 2025 paper on PtrHash (&lt;a href="https://doi.org/10.48550/arXiv.2502.15539" class="external-link" target="_blank" rel="noopener"&gt;DOI&lt;/a&gt;, &lt;a href="https://curiouscoding.nl/papers/ptrhash.pdf" &gt;PDF&lt;/a&gt;).
The original development-log can be found &lt;a href="../ptrhash-log" &gt;here&lt;/a&gt;.&lt;/p&gt;</description></item><item><title>Notes on SsHash</title><link>https://curiouscoding.nl/posts/sshash/</link><pubDate>Mon, 15 Jan 2024 00:00:00 +0100</pubDate><guid>https://curiouscoding.nl/posts/sshash/</guid><description>&lt;div class="ox-hugo-toc toc"&gt;
&lt;div class="heading"&gt;Table of Contents&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#paper-summary" &gt;Paper summary&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#intro" &gt;Intro&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#prelims" &gt;Prelims&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#related-work" &gt;Related work&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#sparse-and-skew-hashing" &gt;Sparse and skew hashing&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#remarks" &gt;Remarks&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#ideas" &gt;Ideas&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;!--endtoc--&gt;
&lt;p&gt;\[\newcommand{\S}{\mathcal{S}}\]&lt;/p&gt;
&lt;h2 id="paper-summary"&gt;
 Paper summary
 &lt;a class="heading-link" href="#paper-summary"&gt;
 &lt;i class="fa-solid fa-link" aria-hidden="true" title="Link to heading"&gt;&lt;/i&gt;
 &lt;span class="sr-only"&gt;Link to heading&lt;/span&gt;
 &lt;/a&gt;
&lt;/h2&gt;
&lt;h3 id="intro"&gt;
 Intro
 &lt;a class="heading-link" href="#intro"&gt;
 &lt;i class="fa-solid fa-link" aria-hidden="true" title="Link to heading"&gt;&lt;/i&gt;
 &lt;span class="sr-only"&gt;Link to heading&lt;/span&gt;
 &lt;/a&gt;
&lt;/h3&gt;
&lt;p&gt;SsHash (&lt;a href="#citeproc_bib_item_7"&gt;Pibiri 2022&lt;/a&gt;) is a datastructure for indexing kmers.
Given a set of kmers \(\S\), it supports two operations:&lt;/p&gt;
&lt;dl&gt;
&lt;dt&gt;\(Lookup(g)\)&lt;/dt&gt;
&lt;dd&gt;return the unique id \(i\in [|\S|]\) of the kmer \(g\).&lt;/dd&gt;
&lt;dt&gt;\(Access(i)\)&lt;/dt&gt;
&lt;dd&gt;return the kmer corresponding to id \(i\).&lt;/dd&gt;
&lt;/dl&gt;
&lt;p&gt;It also supports &lt;em&gt;streaming&lt;/em&gt; queries, looking up all kmers from a longer string
consecutively, by expoiting the overlap between them.&lt;/p&gt;</description></item><item><title>PtrHash: Notes on adapting PTHash in Rust</title><link>https://curiouscoding.nl/posts/ptrhash-log/</link><pubDate>Thu, 21 Sep 2023 00:00:00 +0200</pubDate><guid>https://curiouscoding.nl/posts/ptrhash-log/</guid><description>&lt;div class="ox-hugo-toc toc"&gt;
&lt;div class="heading"&gt;Table of Contents&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#questions-and-remarks-on-pthash-paper" &gt;Questions and remarks on PTHash paper&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#ideas-for-improvement" &gt;Ideas for improvement&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#parameters" &gt;Parameters&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#align-packed-vectors-to-cachelines" &gt;Align packed vectors to cachelines&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#prefetching" &gt;Prefetching&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#faster-modulo-operations" &gt;Faster modulo operations&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#store-dictionary-d-sorted-using-elias-fano-coding" &gt;Store dictionary \(D\) sorted using Elias-Fano coding&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#how-many-bits-of-n-and-hash-entropy-do-we-need" &gt;How many bits of \(n\) and hash entropy do we need?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#ideas-for-faster-construction" &gt;Ideas for faster construction&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#implementation-log" &gt;Implementation log&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#hashing-function" &gt;Hashing function&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#bitpacking-crates" &gt;Bitpacking crates&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#construction" &gt;Construction&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#fastmod" &gt;Fastmod&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#try-out-fastdivide-and-reciprocal-crates" &gt;&lt;span class="org-todo todo TODO"&gt;TODO&lt;/span&gt; Try out &lt;code&gt;fastdivide&lt;/code&gt; and &lt;code&gt;reciprocal&lt;/code&gt; crates&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#first-benchmark" &gt;First benchmark&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#faster-bucket-computation" &gt;Faster bucket computation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#branchless-for-real-now--aka-the-trick-of-thirds" &gt;Branchless, for real now! (aka the trick-of-thirds)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#compiling-and-benchmarking-pthash" &gt;Compiling and benchmarking PTHash&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#compact-encoding" &gt;Compact encoding&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#find-the-x-differences" &gt;Find the \(x\) differences&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#fastreduce-revisited" &gt;&lt;code&gt;FastReduce&lt;/code&gt; revisited&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#is-there-a-problem-if-gcd--m-n--is-large" &gt;&lt;span class="org-todo todo TODO"&gt;TODO&lt;/span&gt; Is there a problem if \(\gcd(m, n)\) is large?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#faster-hashing" &gt;Faster hashing&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#try-xxhash" &gt;&lt;span class="org-todo todo TODO"&gt;TODO&lt;/span&gt; Try xxhash&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#an-experiment" &gt;An experiment&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#compiler-struggles" &gt;Compiler struggles&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#prefetching-at-last" &gt;Prefetching, at last&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#prefetching-with-vectorization" &gt;&lt;span class="org-todo todo TODO"&gt;TODO&lt;/span&gt; Prefetching with vectorization&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#inverting-hki" &gt;Inverting \(h(k_i)\)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#another-day-of-progress" &gt;Another day of progress&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#possible-sorting-algorithms" &gt;&lt;span class="org-todo todo TODO"&gt;TODO&lt;/span&gt; Possible sorting algorithms&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#diving-into-the-inverse-hash-problem" &gt;Diving into the inverse hash problem&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#bringing-it-home" &gt;Bringing it home&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#hash-inversion-for-faster-pthash-construction" &gt;Hash-inversion for faster PTHash construction&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#fast-path-for-small-buckets" &gt;Fast path for small buckets&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#dictionary-encoding" &gt;&lt;span class="org-todo todo TODO"&gt;TODO&lt;/span&gt; Dictionary encoding&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#larger-buckets" &gt;&lt;span class="org-todo todo TODO"&gt;TODO&lt;/span&gt; Larger buckets&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#prefetching-free-slots" &gt;&lt;span class="org-todo todo TODO"&gt;TODO&lt;/span&gt; Prefetching free slots&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#filling-the-last-few-empty-slots-needs-very-high-k-i" &gt;Filling the last few empty slots needs very high \(k_i\)!&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#perfect-matching-for-the-tail" &gt;Perfect matching for the tail&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#peeling-for-size-1-buckets" &gt;Peeling for size-1 buckets&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#greedy-peeling-1-assigning-from-hard-to-easy" &gt;Greedy peeling 1: Assigning from hard to easy&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#peeling-and-cuckoo-hashing-for-larger-buckets-dot" &gt;&lt;span class="org-todo todo TODO"&gt;TODO&lt;/span&gt; Peeling and cuckoo hashing for larger buckets.&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#sunday-morning-ideas" &gt;Sunday morning ideas&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#dinic" &gt;Dinic&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#new-iterative-greedy-assignment-idea" &gt;New iterative greedy assignment idea&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#cuckoo-hashing-again" &gt;Cuckoo hashing, again&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#cuckoo-hashing-displacing-for-real-now" &gt;Cuckoo hashing / displacing, &lt;em&gt;for real now&lt;/em&gt;&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#displacing-globally" &gt;Displacing globally&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#running-it" &gt;Running it&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#limitations" &gt;Limitations&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#cleanup-and-revisiting-defaults" &gt;Cleanup and revisiting defaults&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#sum-instead-of-xor" &gt;&lt;span class="org-todo todo TODO"&gt;TODO&lt;/span&gt; Sum instead of xor?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#revisiting-alpha-1" &gt;Revisiting \(\alpha &amp;lt; 1\)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#elias-fano-for-the-remap-dictionary" &gt;Elias-Fano for the remap-dictionary&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#global-iterative-prioritizing" &gt;Global iterative prioritizing&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#cleanup-removing-peeling-and-suboptimal-displacing-code" &gt;Cleanup: removing peeling and suboptimal displacing code&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#some-speedups-to-the-displacement-algorithm" &gt;Some speedups to the displacement algorithm&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#runtime-analysis-of-displacement-algorithm" &gt;&lt;span class="org-todo todo TODO"&gt;TODO&lt;/span&gt; Runtime analysis of displacement algorithm&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#optimal-prefetching-strategy" &gt;&lt;span class="org-todo todo TODO"&gt;TODO&lt;/span&gt; Optimal prefetching strategy&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#are-we-close-to-the-memory-bandwidth" &gt;Are we close to the memory bandwidth?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#more-sorting-algorithm-resources" &gt;More sorting algorithm resources&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#and-some-resources-on-partitioning" &gt;And some resources on partitioning&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#partitioning-to-reduce-memory-latency" &gt;Partitioning to reduce memory latency&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#back-from-a-break" &gt;Back from a break!&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#speeding-up-the-search-for-pilots" &gt;Speeding up the search for pilots&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#multiplyreduce" &gt;&lt;code&gt;MultiplyReduce&lt;/code&gt;&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#linux-hugepages" &gt;Linux hugepages?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#dropping-the-bucket-split" &gt;Dropping the bucket split?&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#build-performance" &gt;Build performance&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#an-alternative" &gt;An alternative&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#query-performance" &gt;Query performance&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#query-memory-bandwidth" &gt;Query memory bandwidth&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#some-more-experiments" &gt;Some more experiments&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#multithreading-benchmark" &gt;Multithreading benchmark&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#multithreading-queries-satisfaction-at-last" &gt;Multithreading queries: satisfaction at last&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#packing-difference-from-expected-position" &gt;Packing difference from expected position&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#local-packing-ideas" &gt;Local packing ideas&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#query-times-for-different-remapping-structures" &gt;Query times for different remapping structures&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#sharding" &gt;Sharding&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#128bit-hashing" &gt;128bit hashing&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#varying-the-partition-size" &gt;Varying the partition size&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#ptrhash-part-2" &gt;PtrHash, part 2&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#phobic" &gt;Phobic&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#for-ptrhash" &gt;&lt;span class="org-todo todo TODO"&gt;TODO&lt;/span&gt; for PtrHash&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;!--endtoc--&gt;
&lt;p&gt;\[
%\newcommand{\mm}{\,\%\,}
\newcommand{\mm}{\bmod}
\newcommand{\lxor}{\oplus}
\newcommand{\K}{\mathcal K}
\]&lt;/p&gt;</description></item><item><title>BBHash: some ideas</title><link>https://curiouscoding.nl/posts/bbhash/</link><pubDate>Mon, 04 Sep 2023 00:00:00 +0200</pubDate><guid>https://curiouscoding.nl/posts/bbhash/</guid><description>&lt;div class="ox-hugo-toc toc"&gt;
&lt;div class="heading"&gt;Table of Contents&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#possible-speedup" &gt;Possible speedup?&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;!--endtoc--&gt;
&lt;p&gt;BBHash Limasset et al. (&lt;a href="#citeproc_bib_item_1"&gt;2017&lt;/a&gt;) uses multiple &lt;em&gt;layers&lt;/em&gt; to create a minimal perfect
hashing functions (MPHF), that hashes some input set into \([n]\).&lt;/p&gt;
&lt;p&gt;(See also my &lt;a href="https://curiouscoding.nl/posts/ptrhash/" &gt;note on PTHash&lt;/a&gt; (&lt;a href="#citeproc_bib_item_2"&gt;Pibiri and Trani 2021&lt;/a&gt;).)&lt;/p&gt;
&lt;p&gt;Simply said, it maps the \(n\) elements into \([\gamma \cdot n]\) using hashing function \(h_0\).
The \(k_0\) elements that have collisions are mapped into \([\gamma \cdot k_0]\)
using \(h_1\).
Then, the \(k_1\) elements with collisions are mapped into \([\gamma \cdot k_1]\),
and so on.&lt;/p&gt;</description></item></channel></rss>