Software
For all my projects, feel free to create issues and/or reach out for help in using them. My work is more on algorithm development rather than direct bioinformatics applications, and so I appreciate getting in contact with potential users :)
Tools Link to heading
Maintained:
- Deacon by Bede Constantinides: Fast read filtering, building on simd-minimizers.
- Barbell by Rick Beeloo: Fast and accurate demultiplexing, building on Sassy.
- Sassy & Sassy 2 with Rick Beeloo: up to 10x faster SIMD-based approximate string matching.
- BAPCtools: CLI tool for ICPC-style problem development.
Inactive:
- A*PA2 with Pesho Ivanov: near-linear global pairwise alignment.
Proof-of-concept:
- simd-sketch: up to 100x faster SIMD-based bottom and bucket sketches.
- mim with Rob Patro: fast multi-threaded gzip decompression.
Libraries Link to heading
Pairwise alignment:
- pa_types: utils for pairwise alignment and Cigar strings, used by A*PA and Sassy.
2-bit DNA packing with SIMD support, all developed with Igor Martayan:
- simd-minimizers: compute minimizers of a sequence up to 6x faster.
- seq-hash: streaming k-mer hashing of packed sequences.
- packed-seq: 2-bit encoded ACTG and 2+1-bit ACTGN DNA sequences.
Low-level:
- prefetch-index: small utility for cross-platform prefetching.
- ensure-simd: small utility to compile-time-check that SIMD is enabled.
| |
Data structures Link to heading
- kPHF-set: A fast static hash set.
- kPtrHash: A fast non-minimal k-PHF.
- SimdQuickHeap: The fastest priority queue.
- QuadRank: Single-cache-miss rank queries.
- PtrHash: A fast minimal perfect hash function.
- cacheline_ef: Elias-Fano encoding, one cacheline at a time. Part of PtrHash.
Misc Link to heading
- oxford-bioinformatics template: Cleaned-up version of the overleaf template with fixed defaults.
Experimental/incomplete Link to heading
- STPD: In-progress STPD implementation with incremental construction.
- CCH: up to 2x faster SIMD-based implementation of customizable contraction hierarchies.
- amplicon-clustering: a small sassy-based experiment for amplicon clustering.
- b-select: re-implementation of B-tree based select queries (Pibiri and Kanda 2021).
- merge: 3-way git merge based on edit distance.
- Minimizers: reference implementations and experiments for minimizer and sampling schemes.
- sshash-rs: quick but incomplete re-implementation of SSHash (Pibiri 2022).
- suffix array searching: static search trees for suffix array searching.
- static-search-tree (blog): Code alongside the 40x faster binary search post.