Projects
1 Pairwise alignment
1.1 A*PA
A pairwise aligner based on A*, in collaboration with Pesho Ivanov. A*PA uses A* with new heuristics to speed up global pairwise alignment. A*PA2 is much faster by using a DP-based method.
- Code, slides
- A*PA paper: PDF, bioRxiv, Bioinformatics (supplement separate)
- A*PA2 paper: PDF, (outdated) bioRxiv, WABI/LIPIcs
- Talk at CWI: 60 min recording, but unfortunately it does not show the blackboard well.
- Blogposts
- Pairwise alignment history
- Computational volumes
- A*PA2, the blogpost version of the paper.
- Local doubling, an idea that didn’t make it into the final paper.
1.2 PA-Bench
PA-Bench is a benchmarking framework for pairwise aligners, in collaboration with Daniel Liu.
2 Minimizers
2.1 Open-closed mod-minimizer
A minimizer scheme with near-optimal density when \(k\gg w\) and good density when \(k<w\), in collaboration with Giulio Ermanno Pibiri and Daniel Liu.
- Blogpost, Rust code, C++ code
- WABI24 Paper: PDF, bioRxiv, WABI/LIPIcs, slides
- Extended paper: The open-closed mod-minimizer: PDF, bioRxiv
2.2 Density lower bound
A near-tight lower bound for the density of forward sampling schemes, in collaboration with Bryce Kille.
- Blogpost, slides
- Paper: PDF, bioRxiv
- Code: minimizer implementations, ILP & analysis
2.3 Practical schemes for small \(k\)
While the mod-minimizer is practical and near-optimal for large \(k\), the best schemes for small \(k\) are somewhat slow to compute. Here the goal is to develop a simple and near-optimal schemes.
3 High throughput bioinformatics
3.1 PTRHash
Fast and small minimal perfect hash function:
3.2 Fast random minimizers
A 10x faster implementation of random minimizers.
- Blogpost, code,
- 90min invited talk at Johns Hopkins going over the post above, with the last 15min on low density minimizers.
3.3 One Billion Row Challenge
While not bioinformatics, this is a popular post: