**Papers**

- AStarix: Fast and Optimal Sequence-to-Graph Alignment
- Fast and Optimal Sequence-to-Graph Alignment Guided by Seeds

AStarix is a method for aligning sequences (reads) to graphs:

- Input
- A reference sequence or graph
- Alignment costs \((\Delta_{match}, \Delta_{subst}, \Delta_{del}, \Delta_{ins})\) for a match, substitution, insertion and deletion
- Sequence(s) to align

- Output
- An optimal alignment of each input sequence

The input is a reference graph (automaton really) \(G_r = (V_r, E_r)\) with edges \(E_r \subseteq V_r\times V_r\times \Sigma\) that indicate the transitions between states.

At the core of AStarix, there is an implicit *alignment graph* \(G_a^q\), whose
states (nodes) \(\langle v, i\rangle\) are a pair of a position in \(G_r\) and the
number of characters of the sequence \(q\) that has already been matched.
Transitions (weighted edges) in \(G_a^q\) correspond to
matching/substituting/deleting/inserting one character of the sequence \(q\) with
their respective cost.
Note that this is a generalization of the transition costs in the DP table for
conventional quadratic time alignment algorithms.

The *optimal alignment* is a path from \(\langle u, 0\rangle\) to \(\langle
v, |q|\rangle\) for some \(u,v\in V_r\) with minimal cost.

As the name suggests, the A* algorithm is used with a heuristic to find such a path. To make this work, two optimizations are done:

**Seed heuristic:** A* needs a heuristic function \(h\) that lower bounds the
distance between vertices: \(d(u, v) \geq h(u, v)\).

[WIP]