Software

For all my projects, feel free to create issues and/or reach out for help in using them. My work is more on algorithm development rather than direct bioinformatics applications, and so I appreciate getting in contact with potential users :)

Foundational Crates:

  • packed_seq: Slowly growing library for managing 2-bit encoded ACTG DNA sequences.
  • simd-minimizers: SIMD-based implementation of random minimizers.
    • Builds on packed_seq.
    • No good support yet for non-ACTG characters.
  • cacheline_ef: Elias-Fano encoding, one cacheline at a time.

Libraries & Tools:

  • A*PA2: Global pairwise alignment based on SIMD, bitpacking, and band-doubling
    • Reliable, but only supports ACTG input.
  • PtrHash: A fast minimal perfect hash function.
    • Reliable, but randomized construction remains slightly annoying.
  • simd-sketch: SIMD-based bottom and bucket sketches.
    • Status: basic version done; could use polishing.
    • Builds on simd-minimizers.
  • Sassy: SIMD-based approximate string matching
    • Status: in development.
    • Search short (~32, or up to 1kbp) patterns in long texts.
    • Supports ACTG, IUPAC, and ASCII.

Experimental Research Projects:

  • Minimizers: reference implementations and experiments for minimizer and sampling schemes.