Lablog on CuriousCodinghttps://curiouscoding.nl/categories/lablog/Recent content in Lablog on CuriousCodingHugoenTue, 01 Oct 2024 00:00:00 +0000[WIP] Progress on fast suffix array searchinghttps://curiouscoding.nl/posts/suffix-array-searching-log/Tue, 01 Oct 2024 00:00:00 +0000https://curiouscoding.nl/posts/suffix-array-searching-log/Here’s a lablog.
Background Compare with suffix arrays with a twist: https://www.cai.sk/ojs/index.php/cai/article/view/2019_3_555 Compare with https://github.com/mranisz/sa, which is based on Compact and hash based variants of the suffix array https://journals.pan.pl/dlibra/publication/121376/edition/105762/content Here’s a bike
A figure of a bike.
Binary searching Eytzinger Btrees MultithreadingMod-minimizers and other minimizershttps://curiouscoding.nl/posts/mod-minimizers/Thu, 18 Jan 2024 00:00:00 +0100https://curiouscoding.nl/posts/mod-minimizers/Table of Contents Applications Background Minimizers Density bounds Robust minimizers PASHA Miniception Closed syncmers Bd-anchors New: Mod-minimizers Experiments Conclusion Small k experiments Search methods Directed minimizer \(k=1\), \(w=2\) \(k=1\), \(w=4\) \(k=1\), \(w=5\) \(k=2\), \(w=2\) \(k=2\), \(w=4\) Notes Reading list \[ \newcommand{\d}{\mathrm{d}} \newcommand{\L}{\mathcal{L}} \]
This post introduces some background for minimizers and some experiments for a new minimizer variant. That new variant is now called the mod-minimizer and published at WABI24 (Groot Koerkamp and Pibiri 2024).One Billion Row Challengehttps://curiouscoding.nl/posts/1brc/Wed, 03 Jan 2024 00:00:00 +0100https://curiouscoding.nl/posts/1brc/Table of Contents External links The problem Initial solution: 105s First flamegraph Bytes instead of strings: 72s Manual parsing: 61s Inline hash keys: 50s Faster hash function: 41s A new flame graph Perf it is Something simple: allocating the right size: 41s memchr for scanning: 47s memchr crate: 29s get_unchecked: 28s Manual SIMD: 29s Profiling Revisiting the key function: 23s PtrHash perfect hash function: 17s Larger masks: 15s Reduce pattern matching: 14s Memory map: 12s Parallelization: 2.Notes on implementing Longest Common Repeat (LCR)https://curiouscoding.nl/posts/longest-common-repeat/Wed, 06 Dec 2023 00:00:00 +0100https://curiouscoding.nl/posts/longest-common-repeat/Table of Contents Notes Coloured Tree Problem Generic sparse suffix array Sparse suffix array on minimizers Discussion / TODOs Evals These are my running notes on implementing an algorithm for Longest Common Repeat using minimizers.
Notes Coloured Tree Problem See Lemma 3 at here
Generic sparse suffix array paper: https://arxiv.org/pdf/2310.09023.pdf code: https://github.com/lorrainea/SSA/blob/main/PA/ssa.cc For random strings and \(b \leq n / \log n\), direct radix sort on $2log n + log log n$-bit prefixes is sufficient for \(O(n)\) runtime.PTRHash: Notes on adapting PTHash in Rusthttps://curiouscoding.nl/posts/ptrhash/Thu, 21 Sep 2023 00:00:00 +0200https://curiouscoding.nl/posts/ptrhash/Table of Contents Questions and remarks on PTHash paper Ideas for improvement Parameters Align packed vectors to cachelines Prefetching Faster modulo operations Store dictionary \(D\) sorted using Elias-Fano coding How many bits of \(n\) and hash entropy do we need? Ideas for faster construction Implementation log Hashing function Bitpacking crates Construction Fastmod TODO Try out fastdivide and reciprocal crates First benchmark Faster bucket computation Branchless, for real now! (aka the trick-of-thirds) Compiling and benchmarking PTHash Compact encoding Find the \(x\) differences FastReduce revisited TODO Is there a problem if \(\gcd(m, n)\) is large?