Table of Contents
Here I’ll briefly list some FM-index and related implementations around the web. Implementations seem relatively inconsistent, mostly because the FM-index is more of a ‘wrapper’ type around a given Burrows-Wheeler-transform and an occurrences list implementation. Both can be implemented in various ways. In particular occurrences should be stored using a wavelet tree for optimal compression.
- The nucleic-acid repo contains a completely unoptimised version.
- The Rust-bio crate contains a generic FM-index. It stores a sampled occurrences array, so that space is relatively small but lookups take \(O(k)\) time for sampling factor \(k\).
- SDSL-lite contains a wavelet tree and compressed suffix array implementation based on it, that provides the same functionality as an FM-index.
- There is the Quad Wavelet Tree (QWT) Rust crate (Ceregini, Kurpicz, and Venturini 2023). This uses a 4-ary tree instead of the usual binary wavelet tree, and improves latency by around a factor 2 over SDSL wavelet trees.
- Dominik Kempa has the Faster-Minuter index (Gog et al. 2019) that contains an improved wavelet tree as well.
- GEM-Cutter contain a GPU implementation of the FM-index (Chacon et al. 2015).
- There is also RopeBWT3 (Li 2024), which is basically a run-length compressed BWT with a B+ tree on top for fast queries.
A note on SDSL versions
- github:simongog/sdsl is the original, with last commit in 2013.
- github:simongog/sdsl-lite is v2, with last commit in 2019, and seems the most used currently.
- github:xxsds/sdsl-lite is v3 and seems to be actively maintained at the time of writing (Jan 2025), and is recommended by the original developers. From a quick glance, I think it’s somewhat restructured and truly a v3, not just a v2.1. However, it seems to be much less popular.
- github:vgteam/sdsl-lite is a fork of the original
sdsl-lite
, with, I think, a number of small bug fixes and some updates for recent compiler versions.
Then there are also some rust versions:
- github:sdsl-rs/sdsl-rs: Rust bindings for SDSL. Only partially done and seems to have stalled.
- github:jltsiren/simple-sds: A complete reimplementation in Rust. Seems to roughly follow the original API and naming, but only a reasonable subset. (Hence probably significantly easier to use.)