The BiWFA meeting condition

11 July 2022 2-minute read

cross references: BiWFA GitHub issue

It seems that getting the meeting/overlap condition of BiWFA (Marco-Sola et al. (2023), Algorithm 1 and Lemma 2.1) correct is tricky.

Let \(p := \max(x, o+e)\) be the maximal cost of any edge in the edit graph. As in the BiWFA paper, let \(s_f\) and \(s_r\) be the distances of the forward and reverse fronts computed so far.

We prove the following lemma:

Lemma Once BiWFA has expanded the forward and reverse fronts up to \(s_f\) and \(s_r\) and has found some path of cost \(s \leq s_f + s_r\), expanding the fronts until \(s’_f + s’_r \geq s+p+o\) is guaranteed to find a shortest path.

Proof

Increment \(s_f\) and \(s_r\) up to the point (but excluding) where \(s’_f + s’_r = s + p + o\).
Let \(\pi\) be a shortest path in the edit graph of cost \(s_\pi\).
Let \(I\) be the interval of distances on the path \(\pi\) from the start that are covered by both the forward and reverse fronts. (This probably needs a more formal definition.)
\(I = [s_\pi -s’_r+o, s’_f)\) or \(I = (s_\pi - s’_r+o, s’_f]\) is covered both in the forward and reverse direction for all (affine) layers (\(M\), \(I\), \(D\)). The side on which \(I\) is open depends on whether \(s’_f\) or \(s’_r\) was incremented last.
Since \(s_\pi \leq s\), we have
\begin{align} |I| &= s’_f - (s_\pi - s’_r + o)\\ &= s’_f + s’_r - s_\pi - o \\ &= s+p+o - s_\pi - o \\ &= s-s_\pi +p \\ &\geq p. \end{align}
Each edge in the edit graph has cost at most \(p = \max(x, o+e)\).
\(\pi\) contains at least one node every \(p\) fronts, as long as \(s_f \leq s_\pi\) and \(s_r \leq s_\pi\).
Since the interval \(I\) has size \(p\) it contains at least one node of \(\pi\).
Thus, the forward and reverse wavefronts overlap sufficiently to find a shortest path.

References

Marco-Sola, Santiago, Jordan M Eizenga, Andrea Guarracino, Benedict Paten, Erik Garrison, and Miquel Moreto. 2023. “Optimal Gap-Affine Alignment in O(s) Space.” Edited by Pier Luigi Martelli. Bioinformatics 39 (2). https://doi.org/10.1093/bioinformatics/btad074.