# AStarix

Papers

AStarix is a method for aligning sequences (reads) to graphs:

Input
• A reference sequence or graph
• Alignment costs $$(\Delta_{match}, \Delta_{subst}, \Delta_{del}, \Delta_{ins})$$ for a match, substitution, insertion and deletion
• Sequence(s) to align
Output
An optimal alignment of each input sequence

The input is a reference graph (automaton really) $$G_r = (V_r, E_r)$$ with edges $$E_r \subseteq V_r\times V_r\times \Sigma$$ that indicate the transitions between states.

At the core of AStarix, there is an implicit alignment graph $$G_a^q$$, whose states (nodes) $$\langle v, i\rangle$$ are a pair of a position in $$G_r$$ and the number of characters of the sequence $$q$$ that has already been matched. Transitions (weighted edges) in $$G_a^q$$ correspond to matching/substituting/deleting/inserting one character of the sequence $$q$$ with their respective cost. Note that this is a generalization of the transition costs in the DP table for conventional quadratic time alignment algorithms.

The optimal alignment is a path from $$\langle u, 0\rangle$$ to $$\langle v, |q|\rangle$$ for some $$u,v\in V_r$$ with minimal cost.

As the name suggests, the A* algorithm is used with a heuristic to find such a path. To make this work, two optimizations are done:

Seed heuristic: A* needs a heuristic function $$h$$ that lower bounds the distance between vertices: $$d(u, v) \geq h(u, v)$$.

[WIP]