1 High performance computing bioinformatics
1.1 What is bioinformatics?
1.2 A*PA
1.3 A*PA2
2 High Optimal performance bioinformatics
Proposition
Optimal does not mean anything.
2.1 Optimal?
- Which metric?
- Space?
- Time?
- Theory? Practice?
- How optimal?
- Within some constant factor? (\(O(n)\))
- Sublinear space overhead? (\(o(n)\))
- Within \(2×\)?
2.2 Big-\(O\)
Proposition
Big-\(O\) was nice, but does not align with modern hardware.
- If you have the memory, use it!
- CPUs are not turing machines.
- SDSL is succinct in theory: \(o(n)\) overhead on theoretical minimum space.
- … but practice is hard,
- … and also up to \(1000\times\) slower to query!
- Using \(10\%\) extra space is fine, and can avoid all the slowdown.
- Not all \(O(1)\) are equal:
- one
100ns
cache miss per character? (FM-index, cough) - one
0.003ns
comparison per character? - \(O(\log n)\) or even \(O(\sqrt n)\) fast steps is better than \(O(1)\) slow steps!
- one
2.3 Minimizers
2.4 Minimizers: a lower bound
2.5 Mod-minimizers: a near-optimal scheme
2.6 Outlook:: future schemes
3 Optimal performance throughput bioinformatics
Proposition
Most bioinformatics applications need throughput, not latency.
3.1 Why throughput?
- Performance can be anything!
- Throughput, not latency!