Software

For all my projects, feel free to create issues and/or reach out for help in using them. My work is more on algorithm development rather than direct bioinformatics applications, and so I appreciate getting in contact with potential users :)

Tools Link to heading

  • Sassy: SIMD-based approximate string matching
    • Status: done.
    • Search short (10-100, or up to 1kbp) patterns in long texts.
    • Supports ACTG, IUPAC, and ASCII.
  • simd-sketch: SIMD-based bottom and bucket sketches.
    • Status: basic version done; could use polishing.
    • Builds on seq-hash.
  • A*PA2: Global pairwise alignment based on SIMD, bitpacking, and band-doubling
    • Reliable, but only supports ACTG input.

Bitpacked DNA crates Link to heading

Crates using 2-bit packed sequence representations and SIMD algorithms on top of them.

  • packed_seq: Slowly growing library for managing 2-bit encoded ACTG DNA sequences.
    • Also supports 2+1 bit encoding to indicate ambiguous (N) characters.
  • seq-hash: Streams over all nt-hashes (or other hashes) of a packed sequences.
  • simd-minimizers: compute minimizers of a sequence.
    • Support skipping over ambiguous windows.

Further libraries Link to heading

  • cacheline_ef: Elias-Fano encoding, one cacheline at a time.
  • PtrHash: A fast minimal perfect hash function.

Experimental/incomplete Link to heading

  • Minimizers: reference implementations and experiments for minimizer and sampling schemes.
  • sshahs-rs: quick but incomplete reimplementation of SSHash.
  • suffix array searching: static search trees are faster than binary search.