This is Chapter 2 of my thesis, to introduce the first part on Pairwise Alignment.
Summary
In this chapter, we survey methods for global pairwise alignment, which is
the problem of finding the minimal number of mutations between two genomic sequences.
There is a vast amount of literature on algorithms for pairwise alignment and
its variants. We start with the classic DP
algorithm by Needleman and Wunsch and go over a number of improvements, both
algorithmic and in the implementation.
We focus on those methods that A*PA and A*PA2 build on.
Attribution
This chapter is based on the introductions of the A*PA and A*PA2 papers
(Groot Koerkamp and Ivanov 2024; Groot Koerkamp 2024), and also on unpublished notes on pairwise alignment.
The A*PA paper has joint first-authorship with Pesho Ivanov, and also the
unpublished notes are coauthored with Pesho Ivanov.
The problem of pairwise alignment was originally introduced in genomics in 1970 by
Needleman and Wunsch (Needleman and Wunsch 1970).
In their paper, they use their method to align proteins of around 100 amino acids.
This way, they find the mutations between the two sequences, which can be
insertions, deletions, or substitutions of single amino acids.
Specifically, the alignment corresponds to the minimal number of such
mutations that is needed to transform one sequence into the other.
Since this first paper, the sequences being aligned have grown significantly in length.
For example, the mutations between different strains of the bases long
SARS-CoV-2 (COVID) virus can be found this way. Even more, full-genome alignment
of bases long human genomes is entering the picture, although
here approximate rather than exact methods tend to be used.
Algorithmic improvements. Along with the increasing length and amount of genomic sequences, alignment
algorithms have significantly improved since their introduction over 50 years
ago.
This started with a significant number of algorithmic improvements from 1970
to 1990.
These first brought the runtime down from cubic ( for sequences of
length and ) to near-linear when the sequences are similar, with
the band-doubling method of Ukkonen (Ukkonen 1985) with complexity .
This is still used by the popular tool Edlib (Šošić and Šikić 2017).
At the same time, that complexity was further improved to in
expectation by Ukkonen and Myers (Ukkonen 1985; Myers 1986) with the diagonal
transition method, and also this is used in the popular BiWFA (Marco-Sola et al. 2020, 2023) aligner.
BiWFA further incorporates the divide-and-conquer technique or Hirschberg
(Hirschberg 1975) to reduce its memory usage.
All this is to say: there was a large number of early algorithmic contributions,
and at the core, the best methods conceptually haven’t changed much since then (Medvedev 2023).
Implementation improvements. Nevertheless, the amount of genomic data has significantly increased since then, and
algorithms had to speed up at the same time to keep up.
As the algorithmic speedups seemed exhausted, starting in roughly 1990, the
focus shifted more towards more efficient implementations of these methods.
By 1999, this resulted in the bitpacking algorithm of Myers (Myers 1999),
that provides up to speedup by packing 64 adjacent states of the
internal DP matrix into two 64 bit computer words. With more recent advances
in computer hardware, this has also extended to SIMD instructions that can do up
to 512-bit instructions, providing another up to speedup (Suzuki and Kasahara 2018).
Approximate alignment.
At the same time, there also has been a rise in popularity of approximate
alignment algorithms and tools. Unlike all methods considered so far, these are
not guaranteed to find the minimal number of mutations between the two
sequences. Rather, they sacrifice this optimality guarantee to achieve a
speedup over exact methods.
Two common techniques are x-drop (Zhang et al. 2000) and static banding,
that both significantly reduce the region of the DP
matrix (Figure 1) that needs to be computed to find an alignment.
Before proceeding with the survey, we briefly highlight the contributions A*PA
and A*PA2 make.
Despite all the algorithmic contributions so far, in a
retrospective on pairwise alignment (Medvedev 2023), Medvedev observed that
a major open problem is to implement an algorithm with linear-like empirical
scaling on inputs where the edit distance is linear in .
Indeed, the best algorithms so far scale as when the edit distance is
, and this is still linear in when the edit distance is, say, .
A*PA attempts to break this boundary.
As we will see soon, pairwise alignment corresponds to finding the shortest path
in a graph. The classic algorithm for this is Dijkstra’s algorithm
(Dijkstra 1959). A faster version of it that can use domain-specific
information is the A* algorithm (Hart, Nilsson, and Raphael 1968). It achieves this by using
a heuristic that estimates the cost of the (remaining) alignment.
In A*PA, we take the seed heuristic of A*ix (Ivanov et al. 2020; Ivanov, Bichsel, and Vechev 2022) as a
starting point and improve it to the gap-chaining seed heuristic, similar to
(Myers and Miller 1995), and extend it with inexact matches. For inputs
with uniform error rate up to , this can estimate the remaining distance
very accurately. By additionally adding pruning, A*PA2 ends up being near-linear
in practice when errors are uniformly distributed, improving on the quadratic
behaviour of previous exact methods.
A drawback of A*PA is that it is based on plain A*, which, like Dijkstra’s
algorithm, is a relatively cache-inefficient graph algorithm.
On the other hand, some of the fastest aligners use DP (dynamic programming) with bitpacking to
very efficiently compute the edit distance between sequences, even though they
do not have the near-linear scaling of A*PA.
In A*PA2, we
build a highly optimized aligner.
It merges the ideas of band-doubling and bit-packing (as already used by Edlib
(Šošić and Šikić 2017)) with SIMD and the heuristics developed for A*PA.
This results in significant speedups over both A*PA and Edlib.
In particular, A*PA2 is up to faster per visited state.
As Fickett stated 40 years ago (Fickett 1984, 1) and still true today,
at present one must choose between an algorithm which gives the best alignment
but is expensive, and an algorithm which is fast but may not give the best
alignment.
A*PA2 narrows this gap, and is nearly as fast as approximate methods.
The remainder of this chapter reviews the history of exactglobal pairwise
alignment in more detail.
In particular, we focus on those methods that A*PA and A*PA2 build on, including
algorithmic improvements and implementation techniques.
Rather than presenting all work strictly chronologically, we treat them topic by topic.
At times, we include formal notation for the concepts we introduce, which will
be useful in later chapters.
We start our survey with a formal problem statement for pairwise alignment
(2). Then, we list a number of variants of global alignment
(3, 4).
While these are not our focus, they can help to contextualize other existing methods.
Then we move on to the classic DP algorithms in 5 and the algorithmic improvements in later sections.
These are also covered in the surveys by Kruskal (Kruskal 1983) and Navarro (Navarro 2001).
In 13 we present some theoretical results on
the complexity of the pairwise alignment problem and the best worst-case
methods (although not practical).
We also introduce some methods for the strongly related longest common
subsequence (LCS) problem (11).
Then, in 12, we briefly explain the methods used in some common tools that are the
main baseline for the comparison of A*PA and A*PA2.
We end with a table summarizing the papers discussed in this chapter, 14.
Scope.
There is also a vast literature on text searching, where all (approximate) occurrences of a
short pattern in a long text must be found. This field has been very active since
around 1990, and again includes a large number of papers.
We consider these mostly out of scope and refer the reader to Navarro’s survey (Navarro 2001).
More recently, read mapping has become a crucial part of bioinformatics, and
indeed, there is also a plethora of different tools for aligning and mapping
reads. This is a generalization of text searching where patterns tend to be
significantly longer (100 to 10000 of bases, rather than tens of characters).
Due to the amounts of data involved, most solutions to this problem are approximate, with
the notable exception of A*ix (Ivanov et al. 2020; Ivanov, Bichsel, and Vechev 2022), which is the precursor for the work on A*PA
presented in subsequent chapters, and POASTA (van Dijk et al. 2024).
we refer the reader to the survey by Alser at al. (Alser et al. 2021) for a thorough
overview of many tools and methods used for read alignment.
Lastly, we again note that most moderns read mappers and alignment tools are
approximate, in that they are not guaranteed to return an alignment with
provably minimal cost. A*PA and A*PA2 are both exact methods, and thus we will
focus on these. We again refer the reader to (Alser et al. 2021).
Given two sequences and of lengths and , compute the edit
distance between them.
Before looking into solutions to this problem, we first cover some theory to precisely define it.
Input sequences.
As input, we take two sequences and of lengths and over an alphabet that is typically of
size . We usually assume that .
We refer substrings
as to a prefix as
and to a suffix as .
Edit distance.
The edit distance is the minimum number of
insertions, deletions, and substitutions needed to convert into .
In practice, we also consider the divergence, which is the
average number of errors per character. This is
different from the error rate, which we consider to be the (relative)
number of errors applied to a pair of sequence. The error rate is typically
higher than the divergence, since random errors can cancel each other.
Figure 1: An example of an edit graph (left) corresponding to the alignment of strings ABCA and ACBBA, adapted from (Sellers 1974). Bold edges indicate matches of cost 0, while all other edges have cost 1. All edges are directed from the top-left to the bottom-right. The shortest path of cost 2 is highlighted. The middle shows the corresponding dynamic programming (DP) matrix containing the distance (g(u)) to each state. The right shows the final alignment corresponding to the shortest path through the graph, where a C is inserted between the first A and B, and the initial C is substituted for a B.
Dynamic programming.
Pairwise alignment has classically been approached as a dynamic programming (DP)
problem. For input strings of lengths and , this method creates a table that is filled cell by cell using a recursive formula, as we.
There are many algorithms based on DP, as we will see in 5.
Edit graph.
The alignment graph or edit graph (Figure 1) is a way to formalize edit distance
(Vintsyuk 1968; Ukkonen 1985).
It contains states (, ) as vertices.
It further contains edges, such that an edge of cost 0 corresponds to a pair
of matching characters, and an edge of cost 1 corresponds to an insertion,
deletion, or substitution.
The vertical insertion and
horizontal deletion edges have the form and of cost 1.
Diagonal edges are and have cost 0 when and
substitution cost 1 otherwise. A shortest path from to in the edit graph corresponds to an alignment of and .
The distance from to is the length of the shortest (minimal
cost) path from to , and we use edit distance, distance, length, and cost interchangeably.
Further we write
for the distance from the start to ,
for the distance from to the end, and for the minimal cost
of a path through .
In figures, we draw sequence at the top and sequence on the left. Index
will always be used for and indicates a column, while index is used
for and indicates a row.
Shortest path algorithms.
Using this graph, the problem of pairwise alignment reduces to finding a
shortest path in a graph. There are many shortest path algorithms for graphs,
and indeed, many of them are used for pairwise alignment.
Since the graph is acyclic,
the simplest method is to greedily process the states in any topologically
sorted order such as row-wise, column-wise, or anti-diagonal by anti-diagonal.
We then start by setting , and
find the distance to any other state as the minimum distance to an incoming
neighbour plus the cost of the final edge. As we will see soon, this is often
implemented using dynamic programming (DP).
Dijkstra’s
shortest path algorithm,
which visits states in order of increasing distance, can also be applied here (Dijkstra 1959). This
does require that all edges have non-negative weights.
An extension of Dijkstra’s algorithm is A* (Hart, Nilsson, and Raphael 1968), which visits
states in order of increasing ‘‘anticipated total distance’’.
Figure 2: Different types of pairwise alignment. Bold vertices and edges indicate where each type of alignment may start and end. Local-extension alignment can end anywhere, whereas local alignment can also start anywhere. Like clipping, these modes require a score for matching characters, while in other cases, a cost model (with cost-0 matches) suffices. This figure is based on an earlier version that was made in collaboration with Pesho Ivanov.
There are a few variants of pairwise alignments and edit distance. While the
focus of this chapter is (unit cost) edit distance, it is helpful to first have
an overview of the different variants since most papers each assume a slightly
different context.
In global pairwise alignment, the two sequences must be fully matched against
each other. In practice though, there are a number of
different settings, see Figure 2.
Global: Align both sequences fully, end-to-end.
Ends-free:Ends-free alignment allows one of the sequences on
each end to have a (small) number of unmatched characters.
This way, the alignment is still mostly from the top-left to the bottom-right,
and global-alignment algorithms still work.
Semi-global: Align a full sequence to a substring of a reference, e.g., when
mapping a read onto a larger genome.
Extension: Align one sequence to a prefix of the other.
Overlap: Align two partially overlapping reads against each other.
Local-extension: Align a prefix of the two sequences. Unlike all previous
methods, this alignment can end anywhere in the edit graph.
This requires a cost model that trades off matching aligned characters with
not matching characters at all, and thus gives a positive score to the
matching bases.
Local: Local alignment takes this a step further and aligns a substring of
to a substring of .
This is somewhat like ends-free, but now we may skip the start/end of both
sequences at the same time.
The increased freedom in the location of the alignment prevents the faster
global-alignment algorithms from working well.
Clipping: When a sequence (read) is aligned onto a longer sequence
(reference), it can happen that the read has, say, around 100 bp of noise at
the ends that can not be aligned. When doing a local alignment, these
characters will simply not be matched. It can be desirable to add an
additional fixed-cost clipping penalty that is applied whenever the start or
end of the read indeed has unaligned bases, so that effectively, an affine
cost is paid for this.
In this chapter, we only consider global alignment and corresponding algorithms.
These methods also work for ends-free alignment when the number of characters
that may be skipped is relatively small.
Later, in Chapter 5, we look deeper into semi-global alignment and its variants.
Figure 3: Overview of different cost models. The highlighted edge indicates the cost of matching characters, while all remaining edges indicate the cost of mismatch or indel edges. The longest common subsequence (LCS) is a score rather than a cost, and counts the maximal number of matching characters. This figure is based on an earlier version that was made in collaboration with Pesho Ivanov.
There are different models to specify the cost of each edit operation
(Figure 3) (Spouge 1991). In particular, in a biological setting the probability of various
types of mutations may not be equal, and thus, the associated costs should be different.
We list some of them here, from simple to more complicated.
Hamming distance: The hamming distance (Hamming 1950) between two
sequences is the number
of substitutions required to transform one into the other, where insertions or
deletions are not allowed. This is simple to compute in linear time.
LCS: The longest common subsequence maximizes the number of matches, or
equivalently, minimizes the number of indels (insertions or deletions) while
not allowing substitutions.
Unit cost edit distance / Levenshtein distance:
The classic edit distance counts the minimum number of indels and/or
substitutions needed, where each has a cost of 1.
Edit distance:
In general, the edit distance allows for arbitrary indel and substitution costs.
Matches/mismatches between characters and have cost .
Inserting/deleting a character has cost and respectively.
Usually the cost of a match is 0 or negative () and the
cost of a mismatch is positive ( for ).
In this chapter, when we use edit distance, we usually mean the unit-cost version.
Affine cost:
Insertions and deletions in DNA sequences are somewhat rare,
but that once there is an indel, it is relatively common for it to be longer
than a single character. This is modelled using affine costs (Smith, Waterman, and Fitch 1981; Gotoh 1982), where there is
a cost to open a gap, and a cost to extend a gap, so that the cost
of a gap of length is .
It is also possible to have different parameters and for insertions
and deletions.
Dual affine:
Affine costs are not sufficient to capture all biological
processes: the gap cost can give a too large penalty to very long indels of
length 100 to 1000. To fix this, a second gap cost can be introduced
with separate parameters , with for example an offset of
and an extend cost of .
The cost of a gap of length is now given by .
More general, a piecewise linear cost can be considered as well (GOTOH 1990).
Concave: Even more general, we can give gaps of length a cost , where is a
concave function of , where longer gaps become relatively
less expensive. Double-affine costs are an example of a concave gap cost.
Arbitrary: Even more general, we can merge the concave gap costs with
arbitrary substitution costs for (mis)matches.
In practice, most methods for global alignment use a match cost , fixed mismatch
cost for , and fixed indel cost
.
In the chapter on semi-global alignment and mapping (blog), we discuss
additional cost models used when the alignment mode is not global.
So far, most of the cost models we considered are just that: cost models. They focus
on minimizing the cost of the edits between two sequences, and usually assume
that all costs are , so that in particular matching two characters has a
cost of 0.
In some settings, scores are considered instead, which are simply the negative
of the cost. In this setting, matching characters usually give a positive score,
so that this is explicitly rewarded. This is for example the case when finding
the longest common subsequence, where each pair of matching characters gives a
score of 1, and everything else has a score of 0.
Both approaches have their benefits. When using non-negative costs, all edges in the
alignment graph have non-negative weights. This significantly simplifies the
shortest path problem, since this is, for example, a requirement for Dijkstra’s algorithm.
Scores, on the other hand, work better for overlap, extension, and local
alignment: in these cases, the empty alignment is usually a solution, and thus,
we must give some bonus to the matching of characters to compensate for the
inevitable mismatches that will also occur.
Unfortunately, this more general setting usually means that algorithms have to
explore a larger part of the alignment graph.
The ratio between the match bonus
(score ) and mismatch penalty (score ) influences the trade-off between
how many additional characters must be matched for each additional mismatch.
Cost-vs-score duality.
For the problem of longest common subsequence there is a duality
between scores and costs. When is the
length of the LCS, and is the cost of aligning the two sequences via
the LCS cost model where indels cost 1$ and mismatches are not allowed, we have
Thus, maximizing the number of matched characters is equivalent to minimizing
the number of insertions and deletions.
A similar duality holds for global alignment: there is a direct correspondence between
maximizing score and minimizing cost (Smith, Waterman, and Fitch 1981; Eizenga and Paten 2022):
given a scoring model with fixed affine costs , ,
and , there is a cost-model (with ) that
yields the same optimal alignment.
We are now ready to look into the first algorithms.
We start with DP algorithms, that process the graph one column at a time. Note
that we present all algorithms as similar as possible: they go from the top-left
to the bottom-right, and always minimize the cost. We write for the
cost to state .
Smith et al. (Smith, Waterman, and Fitch 1981) provide a nice overview of the similarities and
differences between the early approaches.
Note that for the sake of exposition, we start with the paper of Needleman and
Wunsch (Needleman and Wunsch 1970), even though Vintsyuk (Vintsyuk 1968) already discovered a very similar method
a few years before, although in a different domain.
Figure 4: The cubic algorithm as introduced by Needleman and Wunsch (Needleman and Wunsch 1970). Consider the highlighted cell. In the cubic algorithm, we first compute the cost between the two preceding characters, which are both B and thus create a match. Then, we consider all earlier cells in the the preceding row and column, and consider all gaps of arbitrary length (k) and cost (w_k=k). The quadratic algorithm does not support arbitrary gap costs, but relies on (w_k=k). This allows it to only consider three neighbouring states.
Needleman-Wunsch’ cubic algorithm.
The problem of pairwise alignment of biological sequences was first formalized
by Needleman and Wunsch (Needleman and Wunsch 1970). They provide a cubic recurrence
that assumes a (mis)match between and of cost
and an arbitrary gap cost .
The recursion uses that before matching and ,
either and are matched to each other, or one of them is
matched to some other character:
The value of is the final cost of the alignment.
The total runtime is since each of the cells requires work.
A quadratic DP.
The cubic DP was improved into a quadratic DP by Sellers (Sellers 1974) and
Wagner and Fisher (Wagner and Fischer 1974).
The improvement comes from dropping the arbitrary gap cost , so that
instead of trying all indels in each position, only one insertion and
one deletion is tried:
This algorithm takes time since it now does constant work per DP cell.
This quadratic DP is now called the Needleman-Wunsch (NW) algorithm (Figure 5a, Figure 6a).
Gotoh (Gotoh 1982) refers to it as Needleman-Wunsch-Sellers’ algorithm, to
highlight the speedup that Sellers contributed (Sellers 1974).
Apparently Gotoh was not aware of the identical formulation of Wagner and Fischer (Wagner and Fischer 1974).
Vintsyuk published a quadratic algorithm already before
Needleman and Wunsch (Vintsyuk 1968), but in the context of speech
recognition.
Instead of a cost of matching characters, there is some cost associated
to matching two states, and it does not allow deletions:
Sankoff also gives a quadratic recursion (Sankoff 1972), similar to the one by
Sellers (Sellers 1974), but specifically for LCS. This leads to the recursion
where we use to indicate that the score is maximized.
Local alignment.
Smith and Waterman (Smith and Waterman 1981) introduce a DP for local alignment.
The structure of their algorithm is similar to the cubic DP of
Needleman and Wunsch and allows for arbitrary gap costs .
While introduced as a maximization of score, here we present it as minimizing
cost (with ) for consistency. The new addition is a term, that can
reset the alignment whenever the cost goes above 0.
The best local alignment ends in the smallest value of in the table.
The quadratic algorithm for local alignment is now usually referred to as the
Smith-Waterman-Gotoh (SWG) algorithm, since the ideas introduced by Gotoh (Gotoh 1982) can
be used to reduce the runtime from cubic by assuming affine costs,
just like to how Sellers (Sellers 1974) sped up the Needleman-Wunsch algorithm (Needleman and Wunsch 1970) for global alignment
costs by assuming linear gap costs.
Note though that Gotoh only mentions this speedup in passing, and
that Smith and Waterman (Smith and Waterman 1981) could have directly based their idea on the quadratic
algorithm of Sellers (Sellers 1974) instead.
Affine costs.
To my knowledge, the first mention of affine costs of the form is
by Smith, Waterman, and Fitch (Smith, Waterman, and Fitch 1981).
Gotoh (Gotoh 1982) generalized the quadratic recursion to these affine costs,
to circumvent the cubic runtime needed for the arbitrary
gap costs of Waterman (Waterman, Smith, and Beyer 1976).
This is done by introducing two additional matrices
and that contain the minimal cost to get to where the
last step is required to be an insertion or deletion respectively:
This algorithm run in time.
Gotoh also mentions that this method can be modified to solve the local
alignment of Smith and Waterman (Smith and Waterman 1981) in quadratic time.
Later, Gotoh further extended the method to support double affine costs and more
general piecewise linear gap costs (GOTOH 1990).
Traceback.
To compute the final alignment, we can follow the trace of the DP matrix:
starting at the end , we can repeatedly determine which of the preceding DP-states
was optimal as predecessor and store these states. This takes linear time, but
requires quadratic memory since all states could be on the optimal path, and
thus we need to keep the entire matrix in memory. Gotoh
notes (Gotoh 1982) that if only the final score is required,
only the last two columns of the DP matrix (and and ) are needed at
any time, so that linear memory suffices.
Hirschberg (Hirschberg 1975) introduces a divide-and-conquer algorithm to
compute the LCS of two sequences in linear space.
Instead of computing the full alignment from
to , we first fix a column halfway, .
This splits the problem
into two halves: we compute the forward DP matrix for all , and introduce a backward DP that is computed for all
. Here, is the minimal cost for aligning suffixes
of length and of and . It is shown that
there must exist a such that , and we can find this as the that minimizes
.
At this point, we know that the point is part of an optimal alignment.
The two resulting subproblems of aligning to and
to can now be solved recursively using the same
technique, where again we find the midpoint of the alignment. This recursive
process is shown in Figure 5e and Figure 6e.
The recursion stops as soon as the alignment problem becomes trivial, or, in
practice, small enough to solve with the usual quadratic-memory approach.
Space complexity.
The benefit of this method is that it only uses linear memory: each forward or
reverse DP is only needed to compute the scores in the final column, and thus
can be done in linear memory. After the midpoint is
found, the results of the left and right subproblem can be discarded before
recursing. Additionally, the space for the solution itself is linear.
Time complexity.
We analyse the time complexity following (Myers and Miller 1988).
The first step takes time.
We are then left with two subproblems of size and
. Since , their total size is . Thus, the total time in the first layer
of the recursion is . Extending this, we see that the total number of states
halves with each level of the recursion. Thus, the total time is bounded by
Indeed, in practice this algorithm indeed takes around twice as long to find an
alignment as the non-recursive algorithm takes to find just the score.
Applications.
Hirschberg introduced this algorithm for computing the longest common
subsequence (Hirschberg 1975).
It was then applied multiple times to reduce the space complexity of other
variants as well:
Myers first applied it to the LCS algorithm (Myers 1986),
and also improved the algorithm by Gotoh (Gotoh 1982) to
linear memory (Myers and Miller 1988).
Similarly, BiWFA (Marco-Sola et al. 2023) improves the space complexity of WFA from
to
working memory, where is the cost of the alignment.
Dijkstra’s algorithm.
Both Ukkonen (Ukkonen 1985) and Myers (Myers 1986)
remarked that pairwise alignment can be solved using Dijkstra’s algorithm
(Dijkstra 1959) (Figure 5b, Figure 6b),
which visits states by increasing distance.
Ukkonen gave a bound of , whereas Myers’ analysis uses the fact
that there are only at distance (see 8)
and that a discrete priority queue that avoids the is sufficient, and thus concludes that the
algorithms runs in .
the resulting algorithm involves a relatively complex discrete priority queue
and this queue may contain as many as entries even in the case where just
the length of the […] shortest edit script is being computed.
And indeed, I am not aware of any tool that practically implemented Dijkstra’s algorithm to
compute the edit distance.
A*and the gap cost heuristic.
Hadlock realized (Hadlock 1988) that Dijkstra’s algorithm can be improved
upon by using A* (Hart, Nilsson, and Raphael 1968; Hart, Nilsson, and Raphael 1972; Pearl 1984), a more informed algorithm that uses a
heuristic function that gives a lower bound on the remaining edit distance
between two suffixes. He proposes two heuristics, one based on character
frequencies, and also the widely
used gap cost heuristic
(Ukkonen 1985; Hadlock 1988; Spouge 1989, 1991; Myers and Miller 1995).
This uses the difference in length between two sequences as a lower bound on
their edit distance (Figure 5f, Figure 6f):
We specifically highlight the papers by Wu et al. (Wu et al. 1990) and Papamichail and Papamichail
(Papamichail and Papamichail 2009), where the authors’ method exactly matches the A* algorithm
with the gap-heuristic, in combination with diagonal transition (Section
9, Figure 5g, Figure 6g).
Seed heuristic.
Much more recently, A*ix (Ivanov et al. 2020; Ivanov, Bichsel, and Vechev 2022) introduced the much stronger seed heuristic
for the problem of sequence-to-graph alignment. This heuristic
splits the sequence into disjoint k-mers, and uses that at least one edit is
needed for each remaining k-mer that is not present in sequence .
In A*PA (Groot Koerkamp and Ivanov 2024) we will improve this into the
gap-chaining seed heuristic and add pruning, which results in near-linear
alignment when the divergence is sufficient low.
Notation.
To prepare for the theory on A*PA, we now introduce some formal terminology and
notation for Dijkstra’s algorithm and A*.
Dijkstra’s algorithm finds a shortest path from
to by expanding (generating all successors) vertices in order of
increasing distance from the start.
This next vertex to be expanded is chosen from a set of open vertices.
The A* algorithm, instead, directs the
search towards a target by expanding vertices in order of increasing , where is a heuristic function that estimates the distance
to the end and is the shortest length of a path from to
found so far. We say that is fixed when the distance to has been
found, i.e., . A heuristic is admissible if it is a lower bound on the
remaining distance (), which guarantees that A* has found a
shortest path as soon as it expands . A heuristic dominates (is
more accurate than) another heuristic when for
all vertices . A dominant heuristic will usually (but not
always (Holte 2010)) expand less vertices. Note that Dijkstra’s
algorithm is equivalent to A* using a heuristic that is always 0, and that
both algorithms require non-negative edge costs.
We end our discussion of graph algorithms with the following observation,
as Spouge states (Spouge 1991, 3),
algorithms exploiting the lattice structure of an alignment graph are usually faster,
This suggests a radical approach to A* search complexities: dispense with the
lists [of open states] if there is a natural order for vertex expansion.
In A*PA2 (Groot Koerkamp 2024),
we follow this advice and replace the plain A* search in A*PA with a much
more efficient approach based on computational volumes that merges DP and A*.
Figure 5: Schematic overview of global pairwise alignment methods. The shaded areas indicate states computed by each method, and darker shades indicate states that are computed multiple times. In practice, diagonal transition only computes a very sparse set of states, as indicated by lines rather than an area.
Figure 6: A detailed example of each method shown in Figure 5. Shaded areas indicate computed values, and darker shades states are computed more than once. The yellow path indicates the optimal alignment. For diagonal transition (DT), the wavefronts are indicates by black lines, and this grey lines indicate a best path to each state. The top right (j) shows contours for the longest common subsequence.
So far, Dijkstra’s algorithm is the only method we’ve seen that is faster than
. But as remarked by Spouge, unfortunately it tends to be slow.
To our knowledge, Wilbur and Lipman (Wilbur and Lipman 1983; Wilbur and Lipman 1984) are the first to
give a sub-quadratic algorithm, by only considering states near diagonals with many
k-mer matches, not unlike the approach taken by modern seed-and-extend
algorithms.
However, this method is not exact, i.e., it could return a
suboptimal alignment. Nevertheless, they raise the question whether the
alignments found by their method are closer to biological truth than the true
minimal cost alignments found by exact algorithms.
Reordering the matrix computation.
The main reason the methods so far are quadratic is that they compute the entire
matrix. But, especially when the two sequences are similar, the
optimal alignment is likely to be close to the main diagonal.
Thus, Fickett (Fickett 1984) proposes to compute the entries of the DP matrix
in a new order: Instead of column by column, we can first compute all entries at
distance up to some threshold, and if this does not yet result in a path to the end (), we can expand this computed area to a
larger area with distance up to , and so on, until we try a .
In fact, when is increased by 1 at a time this is equivalent to Dijkstra’s
algorithm (Figure 5b, Figure 6b).
Vertices at distance can never be more than diagonals away
from the main diagonal, and hence, computing them can be done in time.
This can be much faster than when and are both small, and works
especially well when is not too much larger than .
For example, can be set as a known upper bound for the
data being aligned, or as the length of some known suboptimal alignment.
Gap heuristic.
In parallel, Ukkonen introduced a very similar idea (Ukkonen 1985), statically bounding the
computation to only those states that can be contained in a path of length at most
from the start to the end of the graph. In particular, it uses the gap
heuristic, so that the minimal cost of an alignment
containing is
Ukkonen’s algorithm then only considers those states for which (Figure 5h, Figure 6h).
Thus, instead that the actual
distance to a state is at most (), it requires that
the best possible cost of a path containing is sufficiently low.
Band doubling.
Ukkonen also introduces band doubling (Ukkonen 1985).
The method of Fickett computes all states with distance up to some threshold .
The idea of band doubling is that
if it turns out that ,
then it can be doubled to , , and so on, until a is found.
As we already saw, testing takes time.
Now suppose we test ,
, , , up to . Then the total
cost of this is
Thus, band doubling finds an optimal alignment in time.
Note that computed values are not reused between iterations, so that each state
is computed twice on average.
Two tools implementing this band doubling are Edlib and KSW2.
Computational volumes.
Spouge unifies the methods of Fickett and Ukkonen in computational volumes
(Spouge 1989, 1991), which are subgraphs of the full edit graph that are guaranteed
to contain all shortest paths.
Thus, to find an alignment, it is sufficient to only consider the states in such
a computational volume.
Given a bound , some examples of
computational volumes are:
Around 1985, the diagonal transition (Figure 5c, Figure 6c) algorithm was independently discovered by
Ukkonen (Ukkonen 1983, 1985) (for edit distance) and Myers
(Myers 1986) (for LCS). It hinges on the
observation that along diagonals of the edit graph (or DP matrix), the value of
never decreases (Lemma Ukkonen 1985, 3), as can be seen in Figure 1.
We already observed before that when the edit distance is , only the
diagonals above and below the main diagonal are needed, and on these diagonals,
we only are interested in the values up to . Thus, on each diagonal, there
are at most transition from a distance to distance .
We call the farthest state along a diagonal with a given distance a farthest
reaching state. Specifically, given a diagonal , we consider
the farthest on this diagonal (i.e., with ) at distance ().
Then we write to indicate the antidiagonal of this farthest
reaching state. (Note that more commonly (Ukkonen 1985; Marco-Sola et al. 2020), just the column is used to
indicate how far along diagonal the farthest reaching state can be
found.
Using leads to more symmetric formulas.)
In order to write the recursive formula on the , we need a helper
function: returns the length of the longest
common prefix between and , which indicates how far we can walk along the diagonal
for free starting at . We call this extending from .
The recursion then starts with as the farthest state along
the main diagonal with cost 0.
A wavefront is the set of farthest reaching states for a given distance, as
shown by black lines in Figure 6c.
To compute wavefront in terms of ,
we first find the farthest state at distance on diagonal that is adjacent to a state
at distance :
From this state, with coordinates and , we can possibly walk further along the diagonal for free to
obtain the farthest reaching point:
The edit distance between two sequences is then the smallest such that
.
Time complexity.
The total number of farthest reaching states is , since there are
diagonal within distance , and each has at most farthest reaching
states.
The total time spent on is at most , since on each of the
diagonals, the calls cover at most characters in total.
Thus, the worst case of this method is . Nevertheless, Ukkonen observes (Ukkonen 1985)
that in practice the total time needed for can be small, and Myers proves
(Myers 1986) that the LCS-version of the algorithm does run in expected when we assume that the
input is a random pair of sequences with distance .
Myers also notes that the can be computed in by first building (in
time) a suffix tree on the input strings and then using an auxiliary
data structure to answer lowest-common-ancestor queries, leading to a worst-case
algorithm. However, this does not perform well in practice.
We remark here that when the divergence is fixed at, say, ,
still grows quadratically in , and thus, in practice still method still
becomes slow when the inputs become too long.
Space complexity. A naive implementation of the method requires
memory to store all values of (on top of the input sequences).
If only the distance is needed, only the last front has to be stored and
additional memory suffices.
To reduce the memory, Hirschberg’s divide-and-conquer technique can
also be applied here (Myers 1986): we can run two instances of the search in
parallel, from the start and end of the alignment graph, until they meet. Then,
this meeting point must be on the optimal alignment, and we can recurse into the
two sub-problems. These now have distance , so that overall, the cost is
Applications.
Wu et al. (Wu et al. 1990) and Papamichail and Papamichail (Papamichail and Papamichail 2009) apply diagonal transition to align
sequences of different lengths, by incorporating the gap-heuristic (Figure 5g, Figure 6g).
Diagonal transition has also been extended to linear and affine costs in the
wavefront alignment algorithm (WFA) (Marco-Sola et al. 2020) in a way similar
to (Gotoh 1982), by introducing multiple layers to the graph.
Similar to Myers (Myers 1986), BiWFA (Marco-Sola et al. 2023) applies Hirscherg’s
divide-and-conquer approach (Hirschberg 1975) to obtain memory usage
(on top of the input).
So far we have mostly focused on the theoretical time complexity of methods.
However, since the introduction of diagonal transition around 1985,
no further significant breakthroughs in theoretical complexity have been found.
Thus, since then, the focus has shifted away from reducing the number of
computed states and towards computing states faster through more efficient
implementations and more modern hardware. Most of the developments in this area
were first developed for either semi-global or local alignment, but they just as
much apply to global alignment.
As Spouge notes (Spouge 1989) in the context of computational volumes:
The order of computation (row major, column major or antidiagonal) is just a
minor detail in most algorithms.
But this decision exactly at the core of most efficient implementations.
Anti-diagonals.
Hughey (Hughey 1996) notes that values along anti-diagonals of the DP
matrix are not dependent on each other, and thus can be computed in parallel.
Wozniak (Wozniak 1997) applied SIMD (single
instruction, multiple data) instructions for this purpose, which are special CPU instructions
that operate on multiple (currently up to 512 bits, for AVX-512) computer words at a time.
Vertical packing.
Rognes and Seeberg (Rognes and Seeberg 2000, 702) also use microparallelism, but use vertical
instead of anti-diagonal vectors:
The advantage of this approach is the much-simplified and faster loading of the
vector of substitution scores from memory. The disadvantage is that data
dependencies within the vector must be handled.
Indeed, when using vertical vectors a sequence profile (see below) can be used
to quickly determine the (mis)match score of each of the character in the
vector. However, the DP cells now depend on each other, and it may be
necessarily to (slowly) iterate through the values in the vector to handle
insertions corresponding to vertical edges in the edit graph.
Striped SIMD.
To work around the dependencies between adjacent states in each vector, Farrar
(Farrar 2006) introduces an alternative striped SIMD scheme where lanes are
interleaved with each other. Thus, the query is split into, say, 8 segments
that are aligned in parallel (each in one lane of the SIMD vector).
In this case, there are still dependencies between adjacent segments, and these
are resolved using a separate while loop, for as long as needed.
This is used by for example BSAlign (Shao and Ruan 2024).
Bitpacking.
An observation that we have not used so far is that for unit cost edit
distance specifically, it can be shown that the difference between distances to adjacent states
is always in .
Myers (Myers 1999) uses this fact to encode adjacent differences into
two -bit words: one word in which bit indicates that the ’th difference
is , and one word in which bit indicates that the ’th difference is .
If we additionally know the difference along the top edge, Myers’ method can
efficiently compute the output differences of a rectangle in only 20 instructions.
We call each consecutive non-overlapping chunk of 64 rows a lane, so that
there are lanes, where the last lane may be padded.
As presented originally, for text searching, this method only uses 17 instructions, but some additional
instructions are needed to carry the horizontal difference to the next lane when .
Currently, Edlib (Šošić and Šikić 2017) is the most popular tool that implements
bitpacking, alongside band doubling and divide-and-conquer, so that it has a
complexity of .
pubfncompute_block_simd_myers(// 0 or 1. Indicates -1 difference on top.
hp0: &mutSimd<u64,4>,// 0 or 1. Indicates -1 difference on top.
hm0: &mutSimd<u64,4>,// 64-bit indicator of +1 differences on left.
vp: &mutSimd<u64,4>,// 64-bit indicator of -1 differences on left.
vm: &mutSimd<u64,4>,// 64-bit indicator of chars equal to top char.
eq: Simd<u64,4>,){letvx=eq|*vm;leteq=eq|*hm0;// The addition carries information between rows.
lethx=(((eq&*vp)+*vp)^*vp)|eq;lethp=*vm|!(hx|*vp);lethm=*vp&hx;// Extract the high bit as bottom difference.
letright_shift=Simd::<u64,4>::splat(63);lethpw=hp>>right_shift;lethmw=hm>>right_shift;// Insert the top horizontal difference.
letleft_shift=Simd::<u64,4>::splat(1);lethp=(hp<<left_shift)|*hp0;lethm=(hm<<left_shift)|*hm0;// Update the input-output parameters.
*hp0=hpw;*hm0=hmw;*vp=hm|!(vx|hp);*vm=hp&vx;}
Code Snippet 1:
Rust code for SIMD version of Myers' bitpacking algorithm, taking 20 instructions.
pubfncompute_block_simd_bitpal(// 0 or 1. Indicates 0 difference on top.
hz0: &mutSimd<u64,4>,// 0 or 1. Indicates -1 difference on top.
hp0: &mutSimd<u64,4>,// 64-bit indicator of -1 differences on left.
vm: &mutSimd<u64,4>,// 64-bit indicator of -1 and 0 differences on left.
vmz: &mutSimd<u64,4>,// 64-bit indicator of chars equal to top char.
eq: Simd<u64,4>,){leteq=eq|*vm;letris=!eq;letnotmi=ris|*vmz;letcarry=*hp0|*hz0;// The addition carries info between rows.
letmasksum=(notmi+*vmz+carry)&ris;lethz=masksum^notmi^*vm;lethp=*vm|(masksum&*vmz);// Extract the high bit as bottom difference.
letright_shift=Simd::<u64,4>::splat(63);lethzw=hz>>right_shift;lethpw=hp>>right_shift;// Insert the top horizontal difference.
letleft_shift=Simd::<u64,4>::splat(1);lethz=(hz<<left_shift)|*hz0;lethp=(hp<<left_shift)|*hp0;// Update the input-output parameters.
*hz0=hzw;*hp0=hpw;*vm=eq&hp;*vmz=hp|(eq&hz);}
Code Snippet 2:
Rust code for SIMD version of Bitpal's bitpacking algorithm, taking 20 instructions.
Profile.
The DP recurrence relation depends on the sequences and via
, which is 1 when . When using a
vertorized method, we would like to query this information efficiently for
multiple pairs at once. When using vertical vectors, this can be done
efficiently using a profile (Rognes and Seeberg 2000).
For Myers’ bitpacking, this looks as follows.
For each character in the alphabet, the bitvector stores for each character of
whether it equals as a single bit. Then, when the lane for rows to
is processed in column , we can simply read the indicator word corresponding to
these lanes from the bitvector for () and directly use it in the bitwise algorithm.
For SIMD and SWAR methods that use packed integer values (rather than single
bits), the same can be done, where we can simply write the values of all .
Difference recurrence relations.
For more general cost models, such as affine costs, direct bitpacking does not work,
since differences between adjacent states can be larger than 1.
Still, it is beneficial to consider differences between adjacent states rather
than absolute distances: these are typically smaller, so that they require fewer
bits to store and more of them can be processed in parallel
(Suzuki and Kasahara 2018). Suzuki and Kasahara introduce this technique for affine-cost
alignment, and this has subsequently been used by KSW2 and BSAlign (Shao and Ruan 2024).
Blocks.
A separate improvement is made by
Block aligner (Liu and Steinegger 2023), an approximate aligner that can also handle
position-specific scoring matrices. Its main novelty is to divide the
computation into large blocks. This results in highly predictable code, and
benefits the execution speed, even though some more (redundant) states may be computed.
So far, all pairwise alignment methods we looked at are based on the alignment graph. The
longest common subsequence problem also admits different solutions. See e.g.
(Bergroth, Hakonen, and Raita, n.d.) for a survey.
Sparse dynamic programming.
Instead of finding a minimal-cost path through a graph, we can search for the
longest chain of matches (Hirschberg 1975, 1977; Hunt and Szymanski 1977). Suppose there are matches in
total, where each match is a pair such that . We can then
process these matches from left to right (by increasing and ), and for
each of them determine the longest chain of matches ending in them (Figure 6j).
By extension, we determine for each state the length of the
LCS of and .
Such methods that only consider a subset of vertices of the graph or DP-matrix
are using sparse dynamic programming, and are reviewed and extended in
(Eppstein et al. 1992a, 1992b).
Note that can never decrease as we move right or
down the matrix, and this allows to efficiently store the values of a column via
a list of thresholds of rows where the LCS jumps from to . Then, the
value of a cell can be found using binary search, so that the overall algorithm
runs in .
While this is slow in general, when there are only few () matches,
as may be the case when comparing lines of code, this algorithm is much faster
than previous quadratic methods.
Contours. The regions of equal create a set of contours (Figure 6j), where contour is the
boundary between the regions with and .
Each contour is determined by a set of dominant matches
for which is larger than both and .
LCSk. We also mention here the LCSk variant, where the task is to maximize the number
of length- matches between the two input strings.
This was first introduced around 1982 by Wilbur and Lipman
(Wilbur and Lipman 1983; Wilbur and Lipman 1984), and rediscovered in 2014
(Benson, Levy, and Shalom 2014; Pavetić, Žužić, and Šikić 2014; Pavetić et al. 2017; Deorowicz and Grabowski 2013). When choosing
sufficiently larger than , this has the benefit that the number
of -mer matches between the two strings is typically much smaller than ,
so that the runtime becomes feasible. The drawback, however, is
that this not provide an exact solution to the original LCS problem.
Chaining k-mers. A solution to the LCSk problem consist of a sequence of matching
-mers. Together, these form a chain, which is formally defined as a
sequence of vertices , , in a partially ordered set (whose
transitive close is a directed acyclic graph), such that . Then, LCSk is equivalent to finding the longest chain in the poset of
k-mer matches. In this formulation, a score (the length) is maximized. Myers and
Miller (Myers and Miller 1995) instead consider a version where the cost of a
chain is minimized, by using the gap cost over the gap between consecutive
k-mer matches in the chain.
Dedicated global alignment implementations implementing band-doubling are much
rarer, and we list some recent ones here. For more, we refer to the survey of Alser et al. (Alser et al. 2021).
KSW2 (Li 2016) implements banded alignment using the difference recurrence
(Suzuki and Kasahara 2018) with SIMD, and supports (double) affine costs.
Edlib (Šošić and Šikić 2017) implements band doubling (Ukkonen 1985) using the computational
volume, similar to A* with the gap-heuristic.
It uses Myers’ bitpacking (Myers 1999). For traceback, it uses Hirschberg’s divide-and-conquer
approach (Hirschberg 1975): once the distance is found, the alignment is started over from both
sides towards the middle column, where a state on the shortest path is
determined. This is recursively applied to the left and right halves until the
sequences are short enough that quadratic memory can be used.
We end this chapter with a discussion of some more theoretical methods that have
a worst case that is slightly better than quadratic.
Lower bounds.
Backurs and Indyk (Backurs and Indyk 2018) have shown that unit cost edit distance can not be solved in
time for any , on the condition that the Strong
Exponential Time Hypothesis (SETH) is true. Soon after, it was also shown that
SETH implies that
LCS also can not be solved in time for any
(Abboud, Backurs, and Williams 2015).
Figure 7: In the four Russians method, the (ntimes m) grid is divided into blocks of size (rtimes r). For each block, differences between DP table cells along the top row (R) and left column (S) are the input, together with the corresponding substrings of (A) and (B). The output are the differences along the bottom row (R’) and right column (S’). For each possible input of a block, the corresponding output is precomputed, so that the DP table can be filled by using lookups only. Red shaded states are not visited.
Four Russians method.
The so called four Russians method was introduced by (Arlazarov et al. 1970).
It is a general method to speed up DP algorithms from to ,
provided that entries are integers and all dependencies are ’local’.
This idea was applied to pairwise alignment by Masek (Masek and Paterson 1980),
resulting
in the first subquadratic worst-case algorithm for edit distance.
It works by partitioning
the matrix in blocks of size , for some , as
shown in figure Figure 7. Consider the differences and between
adjacent DP cells along the top row () and left column () of
the block. The core observation is that the differences and along
the bottom row and right column of the block only depend on , , and
the substrings and . This means that for
some value of depending on the alphabet size , is small enough so that we can precompute the
values of and for all possibilities of in time. In practice, needs to be quite small.
Using these precomputed values, the DP can be sped up by doing a single
lookup for each of the blocks, for a total runtime of . The runtime was originally reported as , but subsequent
papers realized that the differences along each block boundary fit in a
single machine word, so that lookups are indeed instead of .
While this is the only known subquadratic worst-case algorithm, it
does not break the lower bound, since grows subpolynomial.
Masek’s method requires a constant size alphabet.
A first extension to general alphabets increased the time to (Bille and Farach-Colton 2008), and this was later
improved to (Grabowski 2016). An algorithm
with similar complexity also works for LCS.
Applications.
Wu et al. provide an implementation of this
method for approximate string matching (Wu, Manber, and Myers 1996). They suggest a block size of , for or , and provide efficient ways of transitioning from one
block to the next.
Nowadays, the bit-parallel technique of Myers (Myers 1999) has
replaced four Russians, since it can compute up to 64 cells in a single step,
while not having to wait for (comparatively) slow lookups of the precomputed data.
Table 1:
Chronological overview of papers related to exact global pairwise alignment. Parameters are sequence lengths and with . The (edit) distance is . The number of matching characters or k-mers is . The length of the LCS is . is the word size, and lastly we assume a fixed-size alphabet . Time is worst-case unless noted otherwise, and space usage is to obtain the full alignment. Methods in bold are newly introduced or combined.
Abboud, Amir, Arturs Backurs, and Virginia Vassilevska Williams. 2015. “Quadratic-Time Hardness of Lcs and Other Sequence Similarity Measures.” arXiv. https://doi.org/10.48550/ARXIV.1501.07053.
Alpern, Bowen, Larry Carter, and Kang Su Gatlin. 1995. “Microparallelism and High-Performance Protein Matching.” Proceedings of the 1995 Acm/Ieee Conference on Supercomputing (Cdrom) - Supercomputing ’95. https://doi.org/10.1145/224170.224222.
Alser, Mohammed, Jeremy Rotman, Dhrithi Deshpande, Kodi Taraszka, Huwenbo Shi, Pelin Icer Baykal, Harry Taegyun Yang, et al. 2021. “Technology Dictates Algorithms: Recent Developments in Read Alignment.” Genome Biology 22 (1). https://doi.org/10.1186/s13059-021-02443-7.
Altschul, Stephen F., and Bruce W. Erickson. 1986. “Optimal Sequence Alignment Using Affine Gap Costs.” Bulletin of Mathematical Biology 48 (5–6): 603–16. https://doi.org/10.1007/bf02462326.
Arlazarov, V. L., Y. A. Dinitz, M. A. Kronrod, and I. A. Faradzhev. 1970. “On Economical Construction of the Transitive Closure of an Oriented Graph.” Dokl. Akad. Nauk Sssr 194 (3): 487–88. http://www.ams.org/mathscinet-getitem?mr=0269546.
Backurs, Arturs, and Piotr Indyk. 2018. “Edit Distance Cannot Be Computed in Strongly Subquadratic Time (Unless Seth Is False).” Siam Journal on Computing 47 (3): 1087–97. https://doi.org/10.1137/15m1053128.
Baeza-Yates, Ricardo, and Gaston H. Gonnet. 1992. “A New Approach to Text Searching.” Communications of the Acm 35 (10): 74–82. https://doi.org/10.1145/135239.135243.
Benson, Gary, Yozen Hernandez, and Joshua Loving. 2013. “A Bit-Parallel, General Integer-Scoring Sequence Alignment Algorithm.” Lecture Notes in Computer Science, 50–61. https://doi.org/10.1007/978-3-642-38905-4_7.
Bergroth, L., H. Hakonen, and T. Raita. n.d. “A Survey of Longest Common Subsequence Algorithms.” In Proceedings Seventh International Symposium on String Processing and Information Retrieval. Spire 2000, 39–48. Spire-00. IEEE Comput. Soc. https://doi.org/10.1109/spire.2000.878178.
Bille, Philip, and Martin Farach-Colton. 2008. “Fast and Compact Regular Expression Matching.” Theoretical Computer Science 409 (3): 486–96. https://doi.org/10.1016/j.tcs.2008.08.042.
Daily, Jeff. 2016. “Parasail: Simd c Library for Global, Semi-Global, and Local Pairwise Sequence Alignments.” Bmc Bioinformatics 17 (1). https://doi.org/10.1186/s12859-016-0930-z.
Deorowicz, Sebastian, and Szymon Grabowski. 2013. “Efficient Algorithms for the Longest Common Subsequence in -Length Substrings.” arXiv. https://doi.org/10.48550/ARXIV.1311.4552.
Dijk, Lucas R. van, Abigail L. Manson, Ashlee M. Earl, Kiran V Garimella, and Thomas Abeel. 2024. “Fast and Exact Gap-Affine Partial Order Alignment with Poasta.” Biorxiv. https://doi.org/10.1101/2024.05.23.595521.
Dijkstra, E. W. 1959. “A Note on Two Problems in Connexion with Graphs.” Numerische Mathematik 1 (1): 269–71. https://doi.org/10.1007/bf01386390.
Döring, Andreas, David Weese, Tobias Rausch, and Knut Reinert. 2008. “Seqan an Efficient, Generic c++ Library for Sequence Analysis.” Bmc Bioinformatics 9 (1). https://doi.org/10.1186/1471-2105-9-11.
Eizenga, Jordan M., and Benedict Paten. 2022. “Improving the Time and Space Complexity of the Wfa Algorithm and Generalizing Its Scoring,” January. https://doi.org/10.1101/2022.01.12.476087.
Eppstein, David, Zvi Galil, Raffaele Giancarlo, and Giuseppe F. Italiano. 1992a. “Sparse Dynamic Programming I: Linear Cost Functions.” Journal of the Acm 39 (3): 519–45. https://doi.org/10.1145/146637.146650.
———. 1992b. “Sparse Dynamic Programming Ii: Convex and Concave Cost Functions.” Journal of the Acm 39 (3): 546–67. https://doi.org/10.1145/146637.146656.
Farrar, Michael. 2006. “Striped Smith–Waterman Speeds Database Searches Six Times over Other Simd Implementations.” Bioinformatics 23 (2): 156–61. https://doi.org/10.1093/bioinformatics/btl582.
Grabowski, Szymon. 2016. “New Tabulation and Sparse Dynamic Programming Based Techniques for Sequence Similarity Problems.” Discrete Applied Mathematics 212 (October): 96–103. https://doi.org/10.1016/j.dam.2015.10.040.
Groot Koerkamp, Ragnar. 2024. “A*PA2: Up to 19 Faster Exact Global Alignment.” In 24th International Workshop on Algorithms in Bioinformatics (Wabi 2024), edited by Solon P. Pissis and Wing-Kin Sung, 312:17:1–17:25. Leibniz International Proceedings in Informatics (Lipics). Dagstuhl, Germany: Schloss Dagstuhl – Leibniz-Zentrum für Informatik. https://doi.org/10.4230/LIPIcs.WABI.2024.17.
Groot Koerkamp, Ragnar, and Pesho Ivanov. 2024. “Exact Global Alignment Using A* with Chaining Seed Heuristic and Match Pruning.” Edited by Tobias Marschall. Bioinformatics 40 (3). https://doi.org/10.1093/bioinformatics/btae032.
Hadlock, Frank O. 1988. “Minimum Detour Methods for String or Sequence Comparison.” Congressus Numerantium 61: 263–74.
Hart, Peter E., Nils J. Nilsson, and Bertram Raphael. 1972. “Correction to ``a Formal Basis for the Heuristic Determination of Minimum Cost Paths’’.” Acm Sigart Bulletin, no. 37 (December): 28–29. https://doi.org/10.1145/1056777.1056779.
Hart, Peter, Nils Nilsson, and Bertram Raphael. 1968. “A Formal Basis for the Heuristic Determination of Minimum Cost Paths.” Ieee Transactions on Systems Science and Cybernetics 4 (2): 100–107. https://doi.org/10.1109/tssc.1968.300136.
Hirschberg, D. S. 1975. “A Linear Space Algorithm for Computing Maximal Common Subsequences.” Communications of the Acm 18 (6): 341–43. https://doi.org/10.1145/360825.360861.
Hirschberg, Daniel S. 1977. “Algorithms for the Longest Common Subsequence Problem.” Journal of the Acm 24 (4): 664–75. https://doi.org/10.1145/322033.322044.
Holte, Robert C. 2010. “Common Misconceptions Concerning Heuristic Search.” In Proceedings of the Third Annual Symposium on Combinatorial Search, SOCS 2010, Stone Mountain, Atlanta, Georgia, Usa, July 8-10, 2010, edited by Ariel Felner and Nathan R. Sturtevant. AAAI Press. http://aaai.org/ocs/index.php/SOCS/SOCS10/paper/view/2073.
Hunt, James W., and Thomas G. Szymanski. 1977. “A Fast Algorithm for Computing Longest Common Subsequences.” Communications of the Acm 20 (5): 350–53. https://doi.org/10.1145/359581.359603.
Hyyrö, Heikki, Kimmo Fredriksson, and Gonzalo Navarro. 2005. “Increased Bit-Parallelism for Approximate and Multiple String Matching.” Acm Journal of Experimental Algorithmics 10 (December). https://doi.org/10.1145/1064546.1180617.
Ivanov, Pesho, Benjamin Bichsel, Harun Mustafa, André Kahles, Gunnar Rätsch, and Martin Vechev. 2020. “Astarix: Fast and Optimal Sequence-to-Graph Alignment.” In Research in Computational Molecular Biology, 104–19. Springer International Publishing. https://doi.org/10.1007/978-3-030-45257-5_7.
Ivanov, Pesho, Benjamin Bichsel, and Martin Vechev. 2022. “Fast and Optimal Sequence-to-Graph Alignment Guided by Seeds.” In Research in Computational Molecular Biology, 306–25. Springer International Publishing. https://doi.org/10.1007/978-3-031-04749-7_22.
Kruskal, Joseph B. 1983. “An Overview of Sequence Comparison: Time Warps, String Edits, and Macromolecules.” Siam Review 25 (2): 201–37. https://doi.org/10.1137/1025045.
Liu, Daniel, and Martin Steinegger. 2023. “Block Aligner: an adaptive SIMD-accelerated aligner for sequences and position-specific scoring matrices.” Bioinformatics, btad487. https://doi.org/10.1093/bioinformatics/btad487.
Loving, Joshua, Yozen Hernandez, and Gary Benson. 2014. “Bitpal: A Bit-Parallel, General Integer-Scoring Sequence Alignment Algorithm.” Bioinformatics 30 (22): 3166–73. https://doi.org/10.1093/bioinformatics/btu507.
Marco-Sola, Santiago, Jordan M Eizenga, Andrea Guarracino, Benedict Paten, Erik Garrison, and Miquel Moreto. 2023. “Optimal Gap-Affine Alignment in O(s) Space.” Edited by Pier Luigi Martelli. Bioinformatics 39 (2). https://doi.org/10.1093/bioinformatics/btad074.
Marco-Sola, Santiago, Juan Carlos Moure, Miquel Moreto, and Antonio Espinosa. 2020. “Fast gap-affine pairwise alignment using the wavefront algorithm.” Bioinformatics 37 (4): 456–63. https://doi.org/10.1093/bioinformatics/btaa777.
Masek, William J., and Michael S. Paterson. 1980. “A Faster Algorithm Computing String Edit Distances.” Journal of Computer and System Sciences 20 (1): 18–31. https://doi.org/10.1016/0022-0000(80)90002-1.
Medvedev, Paul. 2023. “Theoretical Analysis of Edit Distance Algorithms.” Communications of the Acm 66 (12): 64–71. https://doi.org/10.1145/3582490.
———. 1999. “A Fast Bit-Vector Algorithm for Approximate String Matching Based on Dynamic Programming.” Journal of the Acm 46 (3): 395–415. https://doi.org/10.1145/316542.316550.
———. 1995. “Chaining Multiple-Alignment Fragments in Sub-Quadratic Time.” In Proceedings of the Sixth Annual Acm-Siam Symposium on Discrete Algorithms, 38–47. Soda ’95. San Francisco, California, USA: Society for Industrial and Applied Mathematics.
Nakatsu, Narao, Yahiko Kambayashi, and Shuzo Yajima. 1982. “A Longest Common Subsequence Algorithm Suitable for Similar Text Strings.” Acta Informatica 18 (2). https://doi.org/10.1007/bf00264437.
Needleman, Saul B., and Christian D. Wunsch. 1970. “A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins.” Journal of Molecular Biology 48 (3): 443–53. https://doi.org/10.1016/0022-2836(70)90057-4.
Papamichail, Dimitris, and Georgios Papamichail. 2009. “Improved Algorithms for Approximate String Matching (Extended Abstract).” Bmc Bioinformatics 10 (S1). https://doi.org/10.1186/1471-2105-10-s1-s10.
Pavetić, Filip, Ivan Katanić, Gustav Matula, Goran Žužić, and Mile Šikić. 2017. “Fast and Simple Algorithms for Computing Both and .” arXiv. https://doi.org/10.48550/ARXIV.1705.07279.
Rognes, Torbjørn, and Erling Seeberg. 2000. “Six-Fold Speed-up of Smith–Waterman Sequence Database Searches Using Parallel Processing on Common Microprocessors.” Bioinformatics 16 (8): 699–706. https://doi.org/10.1093/bioinformatics/16.8.699.
Sankoff, David. 1972. “Matching Sequences under Deletion/Insertion Constraints.” Proceedings of the National Academy of Sciences 69 (1): 4–6. https://doi.org/10.1073/pnas.69.1.4.
Sellers, Peter H. 1974. “On the Theory and Computation of Evolutionary Distances.” Siam Journal on Applied Mathematics 26 (4): 787–93. https://doi.org/10.1137/0126070.
Shao, Haojing, and Jue Ruan. 2024. “Bsalign: A Library for Nucleotide Sequence Alignment.” Genomics, Proteomics & Bioinformatics, March. https://doi.org/10.1093/gpbjnl/qzae025.
Smith, T. F., M. S. Waterman, and W. M. Fitch. 1981. “Comparative Biosequence Metrics.” Journal of Molecular Evolution 18 (1): 38–46. https://doi.org/10.1007/bf01733210.
Smith, T.F., and M.S. Waterman. 1981. “Identification of Common Molecular Subsequences.” Journal of Molecular Biology 147 (1): 195–97. https://doi.org/10.1016/0022-2836(81)90087-5.
Spouge, John L. 1989. “Speeding up Dynamic Programming Algorithms for Finding Optimal Lattice Paths.” Siam Journal on Applied Mathematics 49 (5): 1552–66. https://doi.org/10.1137/0149094.
Suzuki, Hajime, and Masahiro Kasahara. 2018. “Introducing Difference Recurrence Relations for Faster Semi-Global Alignment of Long Sequences.” BMC Bioinformatics 19 (S1). https://doi.org/10.1186/s12859-018-2014-8.
Szalkowski, Adam, Christian Ledergerber, Philipp Krähenbühl, and Christophe Dessimoz. 2008. “Swps3 – Fast Multi-Threaded Vectorized Smith-Waterman for Ibm Cell/B.E. And 86/Sse2.” Bmc Research Notes 1 (1): 107. https://doi.org/10.1186/1756-0500-1-107.
Wagner, Robert A., and Michael J. Fischer. 1974. “The String-to-String Correction Problem.” Journal of the Acm 21 (1): 168–73. https://doi.org/10.1145/321796.321811.
Walia, Sumit, Cheng Ye, Arkid Bera, Dhruvi Lodhavia, and Yatish Turakhia. 2024. “Talco: Tiling Genome Sequence Alignment Using Convergence of Traceback Pointers.” In 2024 Ieee International Symposium on High-Performance Computer Architecture (Hpca). IEEE. https://doi.org/10.1109/hpca57654.2024.00044.
Wilbur, W J, and D J Lipman. 1983. “Rapid Similarity Searches of Nucleic Acid and Protein Data Banks.” Proceedings of the National Academy of Sciences 80 (3): 726–30. https://doi.org/10.1073/pnas.80.3.726.
Wilbur, W. John, and David J. Lipman. 1984. “The Context Dependent Comparison of Biological Sequences.” Siam Journal on Applied Mathematics 44 (3): 557–67. https://doi.org/10.1137/0144038.
Zhang, Zheng, Scott Schwartz, Lukas Wagner, and Webb Miller. 2000. “A Greedy Algorithm for Aligning DNA Sequences.” Journal of Computational Biology 7 (1–2): 203–14. https://doi.org/10.1089/10665270050081478.
Zhao, Mengyao, Wan-Ping Lee, Erik P. Garrison, and Gabor T. Marth. 2013. “Ssw Library: An Simd Smith-Waterman c/c++ Library for Use in Genomic Applications.” Edited by Leonardo Mariño-Ram\’ırez. Plos One 8 (12): e82138. https://doi.org/10.1371/journal.pone.0082138.
Šošić, Martin, and Mile Šikić. 2017. “Edlib: A c/c++ Library for Fast, Exact Sequence Alignment Using Edit Distance.” Edited by John Hancock. Bioinformatics 33 (9): 1394–95. https://doi.org/10.1093/bioinformatics/btw753.