Thoughts on Singletrack

This is a quick post summarizing the idea of “Singletrack: An Algorithm for Improving Memory Consumption and Performance of Gap-Affine Sequence Alignment” by López-Villellas, Iñiguez, Jiménez-Blanco, Aguado-Puig, Moretó, Alastruey-Benedé, Ibáñez, and Marco-Sola to reduce memory usage of affine-cost alignment by removing the need to store the affine layers of the DP matrix.

Affine-cost alignment uses a gap-open cost $o>0$, so that a gap of length $\ell$ has cost $o + \ell \cdot e$. The classic DP solution for this is Gotoh’s method (Gotoh 1982) that uses two additional DP matrices $I$ and $D$ (alongside the main $M$ matrix): one to store the best cost to get to state $(i,j)$ while ending in an insertion, and one that ends with a deletion.

Normally, all 3 (or 5, in case of dual-affine costs) matrices are stored so they can be accessed during traceback. Singletrack provides a way to run the traceback while only needing the main $M$ matrix, so that only $1/3$rd or $1/5$th of the memory is needed.

For reference, the usual traceback goes roughly like this:

Start in state $(i,j) \gets (m, n)$.
If $M[i][j] = M[i-1][j-1]$: there is a match; update $(i,j)\gets (i-1, j-1)$.
If $M[i][j] = M[i-1][j-1]+x$: there is a mismatch; update $(i,j)\gets (i-1, j-1)$.
If $M[i][j] = D[i][j]$: a deletion ends here. Update $(i,j)\gets(i{-}1,j)$ as long as $D[i][j] = D[i-1][j] + e$.
If $M[i][j] = I[i][j]$: an insertion ends here. Update $(i,j)\gets(i,j{-}1)$ as long as $I[i][j] = I[i-1][j] + e$.

The main observation of singletrack is that $I$ and $D$ here are not necessary! Instead, once we get to step 4 and determined that there is no match, we know that there is some $\ell$ such that $M[i][j] = M[i-\ell][j] + o + \ell \cdot e$ or $M[i][j] = M[i][j-\ell] + o + \ell \cdot e$. And thus, we can simply iterate $\ell$ from $0$ to $\max(i,j)$ until we find the one that works, without ever needing $I$ or $D$.

This technique not only applies to the classic $O(n^2)$ DP formulation, but can also be applied to the Suzuki-Kasahara difference-recurrence relation (Suzuki and Kasahara 2018) for up to 1.4$\times$ speedup, and to the WFA algorithm (Marco-Sola et al. 2021) for up to 2.1$\times$ speedup.

References Link to heading

Gotoh, Osamu. 1982. “An Improved Algorithm for Matching Biological Sequences.” Journal of Molecular Biology 162 (3): 705–8. https://doi.org/10.1016/0022-2836(82)90398-9.

López-Villellas, Lorién, Cristian Iñiguez, Albert Jiménez-Blanco, Quim Aguado-Puig, Miquel Moretó, Jesús Alastruey-Benedé, Pablo Ibáñez, and Santiago Marco-Sola. 2025. “Singletrack: An Algorithm for Improving Memory Consumption and Performance of Gap-Affine Sequence Alignment,” November. https://doi.org/10.1101/2025.10.31.685625.

Marco-Sola, Santiago, Juan Carlos Moure, Miquel Moreto, and Antonio Espinosa. 2021. “Fast Gap-Affine Pairwise Alignment Using the Wavefront Algorithm.” Bioinformatics 37 (4): 456–63. https://doi.org/10.1093/bioinformatics/btaa777.

Suzuki, Hajime, and Masahiro Kasahara. 2018. “Introducing Difference Recurrence Relations for Faster Semi-Global Alignment of Long Sequences.” BMC Bioinformatics 19 (S1). https://doi.org/10.1186/s12859-018-2014-8.