Shortest paths, bucket queues, and A* on the edit graph

This note summarizes some papers I was reading while investigating the history of A* for pairwise alignment, and related to that the first usage of a bucket queue. Schrijver (2012) provides a nice overview of general shortest path methods.

Shortest path algorithms ..

.. in general

  • Moore (1959) was already presented in 1957. I did not find a PDF of this paper but Schrijver (2012) summarizes it well: For unit-cost graphs it presents an \(O(m)\) BFS algorithm, and for general weighted graphs an \(O(mn)\) algorithm.

  • Dijkstra (1959) introduces the classic \(O(n^2)\) algorithm of repeatedly expanding the open vertex with shortest distance from the start (using more modern terminology), and recommends storing the shortest distance to each node together with the node.

    This was later improved to \(O(n\lg n)\) using e.g. a priority queue.

.. for circuit design

  • Lee (1961) introduces an algorithm for layout our electrical wires on a chip. It is very similar to Dijkstra’s classical algorithm (Dijkstra 1959), but stores distances together with the queue/set of open states, rather than a separate dictionary. It’s also slightly more general by using a vector of cost functions \((f_1, \dots, f_r)\), and allowing path-dependent costs.
  • Hart, Nilsson, and Raphael (1968; Hart, Nilsson, and Raphael 1972) introduces A*.
  • Rubin (1974, 1976) builds on Lee (1961), and among others:
    • It notes that Dijkstra’s algorithm (storing costs in states) can be used to speed up of Lee’s algorithm.
    • It uses A* with Manhattan distance heuristic.
    • It uses DFS on states with the same $f$-value.
    • Somewhat confusingly, this paper lists Hitchner (1968) in its references, but doesn’t actually mention it in the text.

Bucket queues

  • Dial (1969) is listed as the first introduction of a bucket queue on Wikipedia and uses linked lists of nodes with the same distance, again wrapping around at the maximum arc length.

  • Hitchner (1968) writes about storing open states keyed by distance (aka a bucket queue), and attributes it to Peter S. Loubal:

    Rather than storing temporary label values in the storage locations for each node number, it stores node numbers in a table in which the row number of the table is the value of the label.

    Further, he writes

    Also, it is possible to save space by using a “wrap-around” table which has as many rows as the number of units in the longest arc length plus 1.

    I tried finding some paper by Loubal that explains it, but unsuccessfully.

  • Hoel (1976) applies the bucket queue of Hitchner (1968) to Lee (1961):

    by storing frontier cells in an array of stacks rather than a single list, costly searching operations can be eliminated without significantly increasing storage requirements.

    It uses an array of stacks, again wrapping around at the maximum arc length. It also discusses using linked lists instead of arrays for the stacks.

Shortest path algorithms by Hadlock

It was only this week that I became aware of Hadlock’s papers, apparently having missed citations to his work in (Spouge 1989, 1991). Sadly, some of his papers are hard to find online.

Grid graphs

  • Hadlock (1977) introduces a detour based algorithm that is essentially equivalent to A*. He uses the Manhattan distance as a heuristic, and defines the detour of an edge as the detour cost1 \(d(u, v) = f(v) - f(u) = edgecost(u,v) + h(v) - h(u)\). Like normal A*, this is equivalent to using the heuristic as a potential and updating edge costs accordingly.

    On grid graphs, the detour cost of any path is integer and all \(d(u,v)\) are either \(0\) (when moving towards the target) or \(2\) (when moving away). Hence it suffices to do a BFS, starting with vertices with \(0\) detour, than \(2\) detour, and so on. This is easily implemented using two stacks or a double ended queue.

  • Hadlock (1979) extends the Minimum Detour Algorithm to non-unit costs. As an application, it introduces a heuristic for paths with a minimal number of turns.

Strings

  • Hadlock (1988b): This was presented at a conference. I have not found in online, but requested the ETH library to scan the physical proceedings.

    According to papers that cite it, this seems to be the first application of A* to the alignment graph. The heuristics used are the gap-cost and a heuristic based on character frequencies.

  • Hadlock (1989): This is a preprint/submitted paper cited by Hadlock (1988a) and Spouge (1989), but I have not been able to find any mention of it elsewhere at all.

  • Hadlock (1988a) uses different substitution costs \(s(a,b) = |ord(a)-ord(b)|\) on the edit graph, where the cost of substituting a character depends on the distance between the letters. It again presents the minimum detour algorithm, this time with an upper bound \(\tau\) on the cost of paths searched. It then introduces a specific new heuristic based on character frequencies for the new setting of substitution costs.

Spouge’s computational volumes

  • Spouge (1989) introduces computational volumes: a subset of states of the edit graph that is guaranteed to contain all shortest paths. The main observation is that A* is slow because of bookkeeping (both the distance to each explored state and the frontier) and maintaining of a queue. Computing the states of a computational volume can be done much faster, since there is a natural order to expand the vertices (by row, column, or anti-diagonal).
  • Spouge (1991) provides a detailed algorithm implementation of the computational volume method and benchmark it.

References

Dial, Robert B. 1969. “Algorithm 360: Shortest-Path Forest with Topological Ordering [H].” Communications of the Acm 12 (11): 632–33. https://doi.org/10.1145/363269.363610.
Dijkstra, E. W. 1959. “A Note on Two Problems in Connexion with Graphs.” Numerische Mathematik 1 (1): 269–71. https://doi.org/10.1007/bf01386390.
Hadlock, Frank O. 1977. “A Shortest Path Algorithm for Grid Graphs.” Networks 7 (4): 323–34. https://doi.org/10.1002/net.3230070404.
———. 1979. “Optimum Paths in Lattice Graphs.” In Information Linkage between Applied Mathematics and Industry, 421–30. Elsevier.
———. 1988a. “An Efficient Algorithm for Pattern Detection and Classification.” Proceedings of the First International Conference on Industrial and Engineering Applications of Artificial Intelligence and Expert Systems - Iea/Aie ’88. https://doi.org/10.1145/55674.55676.
———. 1988b. “Minimum Detour Methods for String or Sequence Comparison.” Congressus Numerantium 61: 263–74.
———. 1989. “Minimum Detour Methods for Levenshtein Based String Classification.” submitted.
Hart, Peter E., Nils J. Nilsson, and Bertram Raphael. 1972. “Correction to ``a Formal Basis for the Heuristic Determination of Minimum Cost Paths’’.” Acm Sigart Bulletin, no. 37 (December): 28–29. https://doi.org/10.1145/1056777.1056779.
Hart, Peter, Nils Nilsson, and Bertram Raphael. 1968. “A Formal Basis for the Heuristic Determination of Minimum Cost Paths.” Ieee Transactions on Systems Science and Cybernetics 4 (2): 100–107. https://doi.org/10.1109/tssc.1968.300136.
Hitchner, L. E. 1968. “A Comparative Investigation of the Computational Efficiency of Shortest Path Algorithms,” November.
Hoel, Jeffrey H. 1976. “Some Variations of Lee’s Algorithm.” Ieee Transactions on Computers C–25 (1): 19–24. https://doi.org/10.1109/tc.1976.5009200.
Lee, C. Y. 1961. “An Algorithm for Path Connections and Its Applications.” Ieee Transactions on Electronic Computers EC-10 (3): 346–65. https://doi.org/10.1109/tec.1961.5219222.
Moore, E.F. 1959. The Shortest Path through a Maze. Bell Telephone System. Technical Publications. Monograph. Bell Telephone System.
Rubin, F. 1974. “The Lee Path Connection Algorithm.” Ieee Transactions on Computers C–23 (9): 907–14. https://doi.org/10.1109/t-c.1974.224054.
Rubin, Frank. 1976. “Correction to ``the Lee Path Connection Algorithm’’.” Ieee Transactions on Computers C–25 (2): 208. https://doi.org/10.1109/tc.1976.5009239.
Schrijver, Alexander. 2012. “On the History of the Shortest Path Problem.” Optimization Stories, January, 155–67. https://doi.org/10.4171/dms/6/19.
Spouge, John L. 1989. “Speeding up Dynamic Programming Algorithms for Finding Optimal Lattice Paths.” Siam Journal on Applied Mathematics 49 (5): 1552–66. https://doi.org/10.1137/0149094.
———. 1991. “Fast Optimal Alignment.” Bioinformatics 7 (1): 1–7. https://doi.org/10.1093/bioinformatics/7.1.1.

  1. I’m leaving out a factor \(2\) here. ↩︎