Software
For all my projects, feel free to create issues and/or reach out for help in using them. My work is more on algorithm development rather than direct bioinformatics applications, and so I appreciate getting in contact with potential users :)
Tools Link to heading
Sassy, with Rick Beeloo (Beeloo and Groot Koerkamp 2025):
SIMD-based approximate string matching
- Status: done.
- Search short (10-100, or up to 1kbp) patterns in long texts.
- Supports
ACTG, IUPAC, and ASCII.
A*PA2, with Pesho Ivanov (Groot Koerkamp and Ivanov 2024; Groot Koerkamp 2024):
Global pairwise alignment based on SIMD, bitpacking, and band-doubling
- Reliable, but only supports
ACTGinput.
- Reliable, but only supports
SIMD-based bottom and bucket sketches.
- Status: basic version done; needs polishing.
- Builds on
seq-hash.
mim, with Rob Patro (Patro et al. 2025):
a small auxiliary
.mimindex alongside.fastx.gzfiles for multi-threaded decompression.
Tools building on this Link to heading
- Deacon, by Bede Constantinides (Constantinides, Lees, and Crook 2025): Fast read filtering/decontamination. Builds on simd-minimizers.
- Barbell, by Rick Beeloo (Beeloo et al. 2025): Fast demultiplexing. Builds on sassy.
Libraries Link to heading
DNA bitpacking Link to heading
Crates using 2-bit packed sequence representations and SIMD algorithms on top of them. All developed together with Igor Martayan as part of simd-minimizers (Groot Koerkamp and Martayan 2025).
- packed_seq: Slowly growing library for managing 2-bit encoded
ACTGDNA sequences.- Also supports 2+1 bit encoding to indicate ambiguous (N) characters.
- seq-hash: Streams over all nt-hashes (or other hashes) of a packed sequences.
- simd-minimizers: compute minimizers of a sequence.
- Supports skipping over ambiguous windows.
Dependency graph Link to heading
| |
Data structures Link to heading
- cacheline_ef: Elias-Fano encoding, one cacheline at a time. Part of PtrHash.
- PtrHash (Groot Koerkamp 2025): A fast minimal perfect hash function.
Experimental/incomplete Link to heading
- Minimizers: reference implementations and experiments for minimizer and sampling schemes.
- sshash-rs: quick but incomplete reimplementation of SSHash.
- suffix array searching (blog): static search trees are faster than binary search.