Software
For all my projects, feel free to create issues and/or reach out for help in using them. My work is more on algorithm development rather than direct bioinformatics applications, and so I appreciate getting in contact with potential users :)
Tools Link to heading
- Sassy: SIMD-based approximate string matching
- Status: done.
- Search short (10-100, or up to 1kbp) patterns in long texts.
- Supports
ACTG
, IUPAC, and ASCII.
- simd-sketch: SIMD-based bottom and bucket sketches.
- Status: basic version done; could use polishing.
- Builds on
seq-hash
.
- A*PA2: Global pairwise alignment based on SIMD, bitpacking, and band-doubling
- Reliable, but only supports
ACTG
input.
- Reliable, but only supports
Bitpacked DNA crates Link to heading
Crates using 2-bit packed sequence representations and SIMD algorithms on top of them.
- packed_seq: Slowly growing library for managing 2-bit encoded
ACTG
DNA sequences.- Also supports 2+1 bit encoding to indicate ambiguous (N) characters.
- seq-hash: Streams over all nt-hashes (or other hashes) of a packed sequences.
- simd-minimizers: compute minimizers of a sequence.
- Support skipping over ambiguous windows.
Further libraries Link to heading
- cacheline_ef: Elias-Fano encoding, one cacheline at a time.
- PtrHash: A fast minimal perfect hash function.
Experimental/incomplete Link to heading
- Minimizers: reference implementations and experiments for minimizer and sampling schemes.
- sshahs-rs: quick but incomplete reimplementation of SSHash.
- suffix array searching: static search trees are faster than binary search.