Papers
- AStarix: Fast and Optimal Sequence-to-Graph Alignment
- Fast and Optimal Sequence-to-Graph Alignment Guided by Seeds
AStarix is a method for aligning sequences (reads) to graphs:
- Input
- A reference sequence or graph
- Alignment costs \((\Delta_{match}, \Delta_{subst}, \Delta_{del}, \Delta_{ins})\) for a match, substitution, insertion and deletion
- Sequence(s) to align
- Output
- An optimal alignment of each input sequence
The input is a reference graph (automaton really) \(G_r = (V_r, E_r)\) with edges \(E_r \subseteq V_r\times V_r\times \Sigma\) that indicate the transitions between states.
At the core of AStarix, there is an implicit alignment graph \(G_a^q\), whose states (nodes) \(\langle v, i\rangle\) are a pair of a position in \(G_r\) and the number of characters of the sequence \(q\) that has already been matched. Transitions (weighted edges) in \(G_a^q\) correspond to matching/substituting/deleting/inserting one character of the sequence \(q\) with their respective cost. Note that this is a generalization of the transition costs in the DP table for conventional quadratic time alignment algorithms.
The optimal alignment is a path from \(\langle u, 0\rangle\) to \(\langle v, |q|\rangle\) for some \(u,v\in V_r\) with minimal cost.
As the name suggests, the A* algorithm is used with a heuristic to find such a path. To make this work, two optimizations are done:
Seed heuristic: A* needs a heuristic function \(h\) that lower bounds the distance between vertices: \(d(u, v) \geq h(u, v)\).
[WIP]