High Throughput Bioinformatics: Theory to/from Practice

My background Link to heading

  • Mathematics
    • Undergrad, Masters, IMO (international math olympiad, 2x)
  • Physics
    • Undergrad
  • Computer science
    • Masters, ICPC (international collegiate programming contest, world finals)
  • Software engineering
    • Google
  • Algorithm engineering
    • PhD, Postdoc
  • Bioinformatics
    • PhD

International co-authors in bioinformatics Link to heading

  • Theory
    • KIT, Venice, Halifax
  • Bioinformatics
    • Venice, Lille, Helsinki, Maryland
  • Practice
    • Birmingham, Utrecht (de-facto PhD “supervisor”)

Background Link to heading

  • DNA sequencing is cheap → data grows exponentially
    • High pace of new, fast, heuristic tools
  • Theoretical CS does match modern hardware
    • Need for new theory and engineered algorithms

Proposal: From space lower bounds on perfect hashing to curing cancer Link to heading

  • Theoretical time & space lower bounds

  • Data structure

    • Theoretical → practical → engineered (my core skill)
  • Software library

    • Research-only → developer-friendly
  • Tools using library

    • Academic → user-friendly
  • Scientists/doctors/…

    • Run tool → analysis → … → cure cancer
  • “Push” theory to practice

    • Both new and 20y old
  • “Pull” practical problems up into theory

    • Design for throughput
  • Bridge the gap

  • Few engineers in the field

     

     

     

     

     

Concretely Link to heading

  • Pairwise alignment / edit distance:
    • A*PA (theory) → A*PA2 (engineered) → Sassy (practical library) → Barbell (software for DNA sequencing)
  • Minimizers:
    • Density lower bound (theory) → SimdMinizers (engineered, practical) → Deacon (software), …
  • Sketching
    • Minhash (classic) → SimdSketch (engineered) → ??? → Sketchlib (software)
  • Static hash sets
    • k-PHF lower bound (theory) → k-PHF-set (engineered) → ??? ← Deacon (software)
  • Text indexing
    • STPD (theory) → ??? → ???
  • Develop \(\sqrt n\)-complexity model
    • Motivated by practice

Algorithm Engineering group @ KIT Link to heading

  • Strong theoreticians
  • Strong in data structures
  • Strong engineers

Bring math & theory to health.

Why I need YIG Prep Pro Link to heading

  • How to sell “doing everything”?
  • How to sell “not doing anything”?
  • Travel budget for networking
    • Presenting at ~7 conferences this year

Outlook

  • Highly interdisciplinary group
    • hire an engineer
    • close contact with users