Table of Contents
Here I’ll list some variants of the FM-index and r-index, and techniques used in making faster indices.
This post was written with a lot of input by Simon Gene Gottlieb, as well as discussions with Rob Patro.
Applications of FM-index Link to heading
- approximate pattern matching of short (150bp short reads, 23bp primers) patterns with some number of errors
- matching statistics
- MEM
- SMEM
- Pseudo matching length (PML, in Movi and Movi2)
- not yet implemented in ropebwt
Data structure variants Link to heading
- wavelet tree (Grossi, Gupta, and Vitter 2003)
- multi-ary wavelet trees (Bowe 2010)
- wavelet matrix (Claude, Navarro, and Ordóñez 2015)
- n-step (Chacón et al. 2013)
- only useful for exact applications?
- compressed multiple steps: (Langarita et al. 2022)
- b-move (Depuydt et al. 2024)
- run-block BWT (Song and Langmead 2024)
r-index variants:
- r-index (Gagie, Navarro, and Prezza 2020)
- spumoni (Ahmed et al. 2021)
- spumoni2 (Ahmed et al. 2023)
- movi (Zakeri et al. 2024)
- movi2 (Zakeri et al. 2025)
- ropebwt (Li 2024)
- flattened bitvectors (Gottlieb and Reinert 2025b)
- pfp-fm (Hong et al. 2024, 2023)
New idea for large (kmer/minimizer/pfp) alphabets Link to heading
- Elias-Fano coded occurrence list per character. 2bits/token overhead for uniform random text.
Search schemes Link to heading
Approximate matching against an FM-index is slow if you start at the start. Indeed, seed-chain-extend methods start by seeking exact $k$-mer matches. Search schemes extend that idea to an exact setting. In the most basic form: a length-\(m\) pattern with \(k\) errors contains at least one exact match of a substring of length \(\lfloor m/k\rfloor\). Thus, we can first search those, and then extends left and right using a bidirectional index.
Papers in this line of work:
- Columba 1.1 (Renders, Depuydt, and Fostier 2022)
- hato: automated search schemes: (Renders et al. 2024)
- Columba (Renders et al. 2025)
- Sahara (Gottlieb and Reinert 2025a): weighted model to better predict cost
Open questions [to me]? Link to heading
- Start at any point, instead of exactly at ‘block’ boundaries?
- Can we dynamically go either left or right, choosing the one resulting in less branching? Or the one that gives the largest reduction in number of matches?
References Link to heading
Ahmed, Omar Y., Massimiliano Rossi, Travis Gagie, Christina Boucher, and Ben Langmead. 2023. “Spumoni 2: Improved Classification Using a Pangenome Index of Minimizer Digests.” Genome Biology 24 (1). https://doi.org/10.1186/s13059-023-02958-1.
Ahmed, Omar, Massimiliano Rossi, Sam Kovaka, Michael C. Schatz, Travis Gagie, Christina Boucher, and Ben Langmead. 2021. “Pan-Genomic Matching Statistics for Targeted Nanopore Sequencing.” Iscience 24 (6): 102696. https://doi.org/10.1016/j.isci.2021.102696.
Bowe, Alex. 2010. “mutiary Wavelet Trees in Practice.” School of Computer Science and Information Technology RMIT University, Melbourne, Australia. https://raw.githubusercontent.com/alexbowe/wavelet-paper/thesis/thesis.pdf.
Chacón, Alejandro, Juan Carlos Moure, Antonio Espinosa, and Porfidio Hernández. 2013. “N-Step Fm-Index for Faster Pattern Matching.” Procedia Computer Science 18: 70–79. https://doi.org/10.1016/j.procs.2013.05.170.
Claude, Francisco, Gonzalo Navarro, and Alberto Ordóñez. 2015. “The Wavelet Matrix: An Efficient Wavelet Tree for Large Alphabets.” Information Systems 47 (January): 15–32. https://doi.org/10.1016/j.is.2014.06.002.
Depuydt, Lore, Luca Renders, Simon Van de Vyver, Lennart Veys, Travis Gagie, and Jan Fostier. 2024. “B-Move: Faster Bidirectional Character Extensions in a Run-Length Compressed Index.” In Lipics, Volume 312, Wabi 2024, 312:10:1–10:18. Schloss Dagstuhl – Leibniz-Zentrum für Informatik. https://doi.org/10.4230/LIPICS.WABI.2024.10.
Gagie, Travis, Gonzalo Navarro, and Nicola Prezza. 2020. “Fully Functional Suffix Trees and Optimal Text Searching in Bwt-Runs Bounded Space.” Journal of the Acm 67 (1): 1–54. https://doi.org/10.1145/3375890.
Gottlieb, Simon Gene, and Knut Reinert. 2025a. “Search Schemes for Approximate String Matching.” Nar Genomics and Bioinformatics 7 (1). https://doi.org/10.1093/nargab/lqaf025.
———. 2025b. “Engineering Rank Queries on Bit Vectors and Strings.” Algorithms for Molecular Biology 20 (1). https://doi.org/10.1186/s13015-025-00291-9.
Grossi, Roberto, Ankur Gupta, and Jeffrey Scott Vitter. 2003. “High-Order Entropy-Compressed Text Indexes.” In Proceedings of the Fourteenth Annual Acm-Siam Symposium on Discrete Algorithms, 841–50. Soda ’03. Baltimore, Maryland: Society for Industrial and Applied Mathematics.
Hong, Aaron, Marco Oliva, Dominik Köppl, Hideo Bannai, Christina Boucher, and Travis Gagie. 2023. “Acceleration of Fm-Index Queries through Prefix-Free Parsing.” In Lipics, Volume 273, Wabi 2023, 273:13:1–13:16. Schloss Dagstuhl – Leibniz-Zentrum für Informatik. https://doi.org/10.4230/LIPICS.WABI.2023.13.
———. 2024. “Pfp-Fm: An Accelerated Fm-Index.” Algorithms for Molecular Biology 19 (1). https://doi.org/10.1186/s13015-024-00260-8.
Langarita, Ruben, Adria Armejach, Javier Setoain, Pablo Ibanez-Marin, Jesus Alastruey-Benede, and Miquel Moreto. 2022. “Compressed Sparse Fm-Index: Fast Sequence Alignment Using Large K-Steps.” Ieee/Acm Transactions on Computational Biology and Bioinformatics 19 (1): 355–68. https://doi.org/10.1109/tcbb.2020.3000253.
Li, Heng. 2024. “Bwt Construction and Search at the Terabase Scale.” Edited by Lenore Cowen. Bioinformatics 40 (12). https://doi.org/10.1093/bioinformatics/btae717.
Renders, Luca, Lore Depuydt, Travis Gagie, and Jan Fostier. 2025. “Columba: Fast Approximate Pattern Matching with Optimized Search Schemes.” Edited by Peter Robinson. Bioinformatics 41 (12). https://doi.org/10.1093/bioinformatics/btaf652.
Renders, Luca, Lore Depuydt, Sven Rahmann, and Jan Fostier. 2024. “Automated Design of Efficient Search Schemes for Lossless Approximate Pattern Matching.” In Research in Computational Molecular Biology, 164–84. Springer Nature Switzerland. https://doi.org/10.1007/978-1-0716-3989-4_11.
Renders, Luca, Lore Depuydt, and Jan Fostier. 2022. “Approximate Pattern Matching Using Search Schemes and in-Text Verification.” In Bioinformatics and Biomedical Engineering, 419–35. Springer International Publishing. https://doi.org/10.1007/978-3-031-07802-6_36.
Song, Li, and Ben Langmead. 2024. “Centrifuger: Lossless Compression of Microbial Genomes for Efficient and Accurate Metagenomic Sequence Classification.” Genome Biology 25 (1). https://doi.org/10.1186/s13059-024-03244-4.
Zakeri, Mohsen, Nathaniel K. Brown, Omar Y. Ahmed, Travis Gagie, and Ben Langmead. 2024. “Movi: A Fast and Cache-Efficient Full-Text Pangenome Index.” Iscience 27 (12): 111464. https://doi.org/10.1016/j.isci.2024.111464.
Zakeri, Mohsen, Nathaniel K. Brown, Travis Gagie, and Ben Langmead. 2025. “Movi 2: Fast and Space-Efficient Queries on Pangenomes,” October. https://doi.org/10.1101/2025.10.16.682873.