Highlight on CuriousCodinghttps://curiouscoding.nl/tags/highlight/Recent content in Highlight on CuriousCodingHugoenFri, 12 Jul 2024 00:00:00 +0200Computing random minimizers, fasthttps://curiouscoding.nl/posts/fast-minimizers/Fri, 12 Jul 2024 00:00:00 +0200https://curiouscoding.nl/posts/fast-minimizers/Table of Contents 1 Introduction 1.1 Results 2 Random minimizers 3 Algorithms 3.1 Problem statement Problem A: Only the set of minimizers Problem B: The minimizer of each window Problem C: Super-k-mers Which problem to solve Canonical k-mers 3.2 The naive algorithm Performance characteristics 3.3 Rephrasing as sliding window minimum 3.4 The queue Performance characteristics 3.5 Jumping: Away with the queue Performance characteristics 3.6 Re-scan Performance characteristics 3.7 Split windows Performance characteristics 4 Analysing what we have so far 4.A near-tight lower bound on minimizer densityhttps://curiouscoding.nl/posts/minimizer-lower-bound/Tue, 25 Jun 2024 00:00:00 +0200https://curiouscoding.nl/posts/minimizer-lower-bound/Table of Contents Succinct background Definitions Lower bounds A new lower bound Discussion Post scriptum Acknowledgement In this post I will prove a new lower bound on the density of any minimizer or forward sampling scheme: \[ d(f) \geq \frac{\lceil\frac{w+k}{w}\rceil}{w+k} = \frac{\lceil\frac{\ell+1}{w}\rceil}{\ell+1}. \]
In particular, this implies that when \(k=1\), any forward sampling scheme has density at least \(2/(w+1)\), and thus that random minimizers are optimal in this case.A*PA2: Up to 19x faster exact global alignmenthttps://curiouscoding.nl/posts/astarpa2/Sat, 23 Mar 2024 00:00:00 +0100https://curiouscoding.nl/posts/astarpa2/Table of Contents Abstract 1 Introduction 1.1 Contributions 1.2 Previous work 1.2.1 Needleman-Wunsch 1.2.2 Graph algorithms 1.2.3 Computational volumes 1.2.4 Parallelism 1.2.5 Tools 2 Preliminaries 3 Methods 3.1 Band-doubling 3.2 Blocks 3.3 Memory 3.4 SIMD 3.5 SIMD-friendly sequence profile 3.6 Traceback 3.7 A* 3.7.1 Bulk-contours update 3.7.2 Pre-pruning 3.8 Determining the rows to compute 3.8.1 Sparse heuristic invocation 3.9 Incremental doubling 4 Results 4.1 Setup 4.2 Comparison with other aligners 4.Mod-minimizers and other minimizershttps://curiouscoding.nl/posts/mod-minimizers/Thu, 18 Jan 2024 00:00:00 +0100https://curiouscoding.nl/posts/mod-minimizers/Table of Contents Applications Background Minimizers Density bounds Robust minimizers PASHA Miniception Closed syncmers Bd-anchors New: Mod-minimizers Experiments Conclusion Small k experiments Search methods Directed minimizer \(k=1\), \(w=2\) \(k=1\), \(w=4\) \(k=1\), \(w=5\) \(k=2\), \(w=2\) \(k=2\), \(w=4\) Notes Reading list \[ \newcommand{\d}{\mathrm{d}} \newcommand{\L}{\mathcal{L}} \]
This post introduces some background for minimizers and some experiments for a new minimizer variant. That new variant is now called the mod-minimizer and available as a preprint at bioRxiv (Groot Koerkamp and Pibiri 2024).One Billion Row Challengehttps://curiouscoding.nl/posts/1brc/Wed, 03 Jan 2024 00:00:00 +0100https://curiouscoding.nl/posts/1brc/Table of Contents External links The problem Initial solution: 105s First flamegraph Bytes instead of strings: 72s Manual parsing: 61s Inline hash keys: 50s Faster hash function: 41s A new flame graph Perf it is Something simple: allocating the right size: 41s memchr for scanning: 47s memchr crate: 29s get_unchecked: 28s Manual SIMD: 29s Profiling Revisiting the key function: 23s PtrHash perfect hash function: 17s Larger masks: 15s Reduce pattern matching: 14s Memory map: 12s Parallelization: 2.Perfect NtHash for Robust Minimizershttps://curiouscoding.nl/posts/nthash/Sun, 31 Dec 2023 00:00:00 +0100https://curiouscoding.nl/posts/nthash/Table of Contents NtHash Minimizers Robust minimizers Is NtHash injective on kmers? Searching for a collision Proving perfection Alternatives SmHasher results TODO benchmark NtHash, NtHash2, FxHash NtHash NtHash (Mohamadi et al. 2016) is a rolling hash suitable for hashing any kind of text, but made for DNA originally. For a string of length \(k\) it is a \(64\) bit value computed as:
\begin{equation} h(x) = \bigoplus_{i=0}^{k-1} rot^i(h(x_i)) \end{equation}
where \(h(x_i)\) assigns a fixed \(64\) bit random value to each character, \(rot^i\) rotates the bits \(i\) places, and \(\bigoplus\) is the xor over all terms.PTRHash: Notes on adapting PTHash in Rusthttps://curiouscoding.nl/posts/ptrhash/Thu, 21 Sep 2023 00:00:00 +0200https://curiouscoding.nl/posts/ptrhash/Table of Contents Questions and remarks on PTHash paper Ideas for improvement Parameters Align packed vectors to cachelines Prefetching Faster modulo operations Store dictionary \(D\) sorted using Elias-Fano coding How many bits of \(n\) and hash entropy do we need? Ideas for faster construction Implementation log Hashing function Bitpacking crates Construction Fastmod TODO Try out fastdivide and reciprocal crates First benchmark Faster bucket computation Branchless, for real now! (aka the trick-of-thirds) Compiling and benchmarking PTHash Compact encoding Find the \(x\) differences FastReduce revisited TODO Is there a problem if \(\gcd(m, n)\) is large?String algorithm visualizationshttps://curiouscoding.nl/posts/alg-viz/Tue, 08 Nov 2022 00:00:00 +0100https://curiouscoding.nl/posts/alg-viz/ Select the algorithm to visualize Click the buttons, or click the canvas and use the indicated keys Suffix-array construction is explained here and BWT is explained here.
Source code is on GitHub.
Algorithm Suffix Array Construction Burrows-Wheeler Transform Bidirectional BWT String Query prev (←/backspace) next (→/space) Delay (s) faster (↑/+/f) slower (↓/-/s) pause/play (p/return)A survey of exact global pairwise alignmenthttps://curiouscoding.nl/posts/pairwise-alignment-history/Fri, 01 Apr 2022 00:00:00 +0200https://curiouscoding.nl/posts/pairwise-alignment-history/Table of Contents Variants of pairwise alignment Cost models Alignment types A chronological overview of global pairwise alignment Algorithms in detail Classic DP algorithms Cubic algorithm of Needleman and Wunsch (1970) A quadratic DP Local alignment Affine costs Minimizing vs. maximizing duality Four Russians method TODO \(O(ns)\) methods TODO Exponential search on band TODO LCS: thresholds, $k$-candidates and contours TODO Diagonal transition: furthest reaching and wavefronts TODO Suffixtree for \(O(n+s^2)\) expected runtime Using less memory Computing the score in linear space Divide-and-conquer TODO LCSk[++] algorithms Theoretical lower bound TODO A note on DP (toposort) vs Dijkstra vs A* TODO Tools TODO Notes for other posts Semi-global alignment papers Approximate pairwise aligners Old vs new papers Note: This is a living document, and will likely remain so for a while.28000x speedup with Numba.CUDAhttps://curiouscoding.nl/posts/numba-cuda-speedup/Mon, 24 May 2021 00:00:00 +0200https://curiouscoding.nl/posts/numba-cuda-speedup/Table of Contents CUDA Overview Profiling Optimizing Tensor Sketch CPU code V0: Original python code V1: Numba V2: Multithreading GPU code V3: A first GPU version V4: Parallel kernel invocations V5: Single kernel with many blocks V6: Detailed profiling: Kernel Compute V7: Detailed profiling: Kernel Latency V8: Detailed profiling: Shared Memory Access Pattern V9: More work per thread V10: Cache seq to shared memory V11: Hashes and signs in shared memory V12: Revisiting blocks per kernel V13: Passing a tuple of sequences V14: Better hardware V15: Dynamic shared memory Wrap up Xrefs: r/CUDA, Numba discourse