Software

For all my projects, feel free to create issues and/or reach out for help in using them. My work is more on algorithm development rather than direct bioinformatics applications, and so I appreciate getting in contact with potential users :)

Tools Link to heading

Maintained:

  • Deacon by Bede Constantinides: Fast read filtering, building on simd-minimizers.
  • Barbell by Rick Beeloo: Fast and accurate demultiplexing, building on Sassy.
  • Sassy & Sassy 2 with Rick Beeloo: up to 10x faster SIMD-based approximate string matching.
  • BAPCtools: CLI tool for ICPC-style problem development.

Inactive:

  • A*PA2 with Pesho Ivanov: near-linear global pairwise alignment.

Proof-of-concept:

  • simd-sketch: up to 100x faster SIMD-based bottom and bucket sketches.
  • mim with Rob Patro: fast multi-threaded gzip decompression.

Libraries Link to heading

Pairwise alignment:

  • pa_types: utils for pairwise alignment and Cigar strings, used by A*PA and Sassy.

2-bit DNA packing with SIMD support, all developed with Igor Martayan:

  • simd-minimizers: compute minimizers of a sequence up to 6x faster.
  • seq-hash: streaming k-mer hashing of packed sequences.
  • packed-seq: 2-bit encoded ACTG and 2+1-bit ACTGN DNA sequences.

Low-level:

  • prefetch-index: small utility for cross-platform prefetching.
  • ensure-simd: small utility to compile-time-check that SIMD is enabled.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
digraph {
node [fontname = "Helvetica"];
Barbell -> Sassy
Barbell -> pa_types
Sassy -> pa_types
Deacon -> simd_minimizers
Deacon -> packed_seq
simd_minimizers -> seq_hash
simd_minimizers -> packed_seq
seq_hash -> packed_seq
simd_sketch -> packed_seq
simd_sketch -> seq_hash
"A*PA2" -> pa_types
}

Data structures Link to heading

Misc Link to heading

Experimental/incomplete Link to heading

  • STPD: In-progress STPD implementation with incremental construction.
  • CCH: up to 2x faster SIMD-based implementation of customizable contraction hierarchies.
  • amplicon-clustering: a small sassy-based experiment for amplicon clustering.
  • b-select: re-implementation of B-tree based select queries (Pibiri and Kanda 2021).
  • merge: 3-way git merge based on edit distance.
  • Minimizers: reference implementations and experiments for minimizer and sampling schemes.
  • sshash-rs: quick but incomplete re-implementation of SSHash (Pibiri 2022).
  • suffix array searching: static search trees for suffix array searching.
  • static-search-tree (blog): Code alongside the 40x faster binary search post.