High Throughput Bioinformatics: Theory to/from Practice
My background Link to heading
- Mathematics
- Undergrad, Masters, IMO (international math olympiad, 2x)
- Physics
- Undergrad
- Computer science
- Masters, ICPC (international collegiate programming contest, world finals)
- Software engineering
- Algorithm engineering
- PhD, Postdoc
- Bioinformatics
- PhD
International co-authors in bioinformatics Link to heading
- Theory
- KIT, Venice, Halifax
- Bioinformatics
- Venice, Lille, Helsinki, Maryland
- Practice
- Birmingham, Utrecht (de-facto PhD “supervisor”)


Background Link to heading
- DNA sequencing is cheap → data grows exponentially
- High pace of new, fast, heuristic tools
- Theoretical CS does match modern hardware
- Need for new theory and engineered algorithms
Proposal: From space lower bounds on perfect hashing to curing cancer Link to heading
Theoretical time & space lower bounds
Data structure
- Theoretical → practical → engineered (my core skill)
Software library
- Research-only → developer-friendly
Tools using library
- Academic → user-friendly
Scientists/doctors/…
- Run tool → analysis → … → cure cancer
“Push” theory to practice
- Both new and 20y old
“Pull” practical problems up into theory
- Design for throughput
Bridge the gap
Few engineers in the field
Concretely Link to heading
- Pairwise alignment / edit distance:
- A*PA (theory) → A*PA2 (engineered) → Sassy (practical library) → Barbell (software for DNA sequencing)
- Minimizers:
- Density lower bound (theory) → SimdMinizers (engineered, practical) → Deacon (software), …
- Sketching
- Minhash (classic) → SimdSketch (engineered) → ??? → Sketchlib (software)
- Static hash sets
- k-PHF lower bound (theory) → k-PHF-set (engineered) → ??? ← Deacon (software)
- Text indexing
- STPD (theory) → ??? → ???
- Develop \(\sqrt n\)-complexity model
- Motivated by practice
Algorithm Engineering group @ KIT Link to heading
- Strong theoreticians
- Strong in data structures
- Strong engineers
Bring math & theory to health.
Why I need YIG Prep Pro Link to heading
- How to sell “doing everything”?
- How to sell “not doing anything”?
- Travel budget for networking
- Presenting at ~7 conferences this year
Outlook
- Highly interdisciplinary group
- hire an engineer
- close contact with users