Nthash on CuriousCodinghttps://curiouscoding.nl/tags/nthash/Recent content in Nthash on CuriousCodingHugoenFri, 12 Jul 2024 00:00:00 +0200Computing random minimizers, fasthttps://curiouscoding.nl/posts/fast-minimizers/Fri, 12 Jul 2024 00:00:00 +0200https://curiouscoding.nl/posts/fast-minimizers/Table of Contents 1 Introduction 1.1 Results 2 Random minimizers 3 Algorithms 3.1 Problem statement Problem A: Only the set of minimizers Problem B: The minimizer of each window Problem C: Super-k-mers Which problem to solve Canonical k-mers 3.2 The naive algorithm Performance characteristics 3.3 Rephrasing as sliding window minimum 3.4 The queue Performance characteristics 3.5 Jumping: Away with the queue Performance characteristics 3.6 Re-scan Performance characteristics 3.7 Split windows Performance characteristics 4 Analysing what we have so far 4.Perfect NtHash for Robust Minimizershttps://curiouscoding.nl/posts/nthash/Sun, 31 Dec 2023 00:00:00 +0100https://curiouscoding.nl/posts/nthash/Table of Contents NtHash Minimizers Robust minimizers Is NtHash injective on kmers? Searching for a collision Proving perfection Alternatives SmHasher results TODO benchmark NtHash, NtHash2, FxHash NtHash NtHash (Mohamadi et al. 2016) is a rolling hash suitable for hashing any kind of text, but made for DNA originally. For a string of length \(k\) it is a \(64\) bit value computed as:
\begin{equation} h(x) = \bigoplus_{i=0}^{k-1} rot^i(h(x_i)) \end{equation}
where \(h(x_i)\) assigns a fixed \(64\) bit random value to each character, \(rot^i\) rotates the bits \(i\) places, and \(\bigoplus\) is the xor over all terms.