FM-index implementations

Here I’ll briefly list some FM-index and related implementations around the web. Implementations seem relatively inconsistent, mostly because the FM-index is more of a ‘wrapper’ type around a given Burrows-Wheeler-transform and an occurrences list. Both can be implemented in various ways. In particular occurrences should be stored using a wavelet tree for optimal compressing.

  • The nucleic-acid repo contains a completely unoptimised version.
  • The Rust-bio crate contains a generic FM-index. It stores a sampled occurrences array, so that space is relatively small but lookups take \(O(k)\) time for sampling factor \(k\).
  • SDSL contains a wavelet tree and compressed suffix array implementation based on it, that provides the same functionality as an FM-index.
  • There is the Quad Wavelet Tree (QWT) Rust crate (Ceregini, Kurpicz, and Venturini 2023). This uses a 4-ary tree instead of the usual binary wavelet tree, and improves latency by around a factor 2 over SDSL wavelet trees.
  • Dominik Kempa has the Faster-Minuter index (Gog et al. 2019) that contains an improved wavelet tree as well.
  • GEM-Cutter contain a GPU implementation of the FM-index (Chacon et al. 2015).
  • There is also RopeBWT3 (Li 2024), which is basically a run-length compressed BWT with a B+ tree on top for fast queries.