FM-index implementations

Table of Contents

Here I’ll briefly list some FM-index and related implementations around the web. Implementations seem relatively inconsistent, mostly because the FM-index is more of a ‘wrapper’ type around a given Burrows-Wheeler-transform and an occurrences list implementation. Both can be implemented in various ways. In particular occurrences should be stored using a wavelet tree for optimal compression.

  • The nucleic-acid repo contains a completely unoptimised version.
  • The Rust-bio crate contains a generic FM-index. It stores a sampled occurrences array, so that space is relatively small but lookups take \(O(k)\) time for sampling factor \(k\).
  • SDSL-lite contains a wavelet tree and compressed suffix array implementation based on it, that provides the same functionality as an FM-index.
  • There is the Quad Wavelet Tree (QWT) Rust crate (Ceregini, Kurpicz, and Venturini 2023). This uses a 4-ary tree instead of the usual binary wavelet tree, and improves latency by around a factor 2 over SDSL wavelet trees.
  • Dominik Kempa has the Faster-Minuter index (Gog et al. 2019) that contains an improved wavelet tree as well.
  • GEM-Cutter contain a GPU implementation of the FM-index (Chacon et al. 2015).
  • There is also RopeBWT3 (Li 2024), which is basically a run-length compressed BWT with a B+ tree on top for fast queries.

A note on SDSL versions

  • github:simongog/sdsl is the original, with last commit in 2013.
  • github:simongog/sdsl-lite is v2, with last commit in 2019, and seems the most used currently.
  • github:xxsds/sdsl-lite is v3 and seems to be actively maintained at the time of writing (Jan 2025), and is recommended by the original developers. From a quick glance, I think it’s somewhat restructured and truly a v3, not just a v2.1. However, it seems to be much less popular.
  • github:vgteam/sdsl-lite is a fork of the original sdsl-lite, with, I think, a number of small bug fixes and some updates for recent compiler versions.

Then there are also some rust versions:

  • github:sdsl-rs/sdsl-rs: Rust bindings for SDSL. Only partially done and seems to have stalled.
  • github:jltsiren/simple-sds: A complete reimplementation in Rust. Seems to roughly follow the original API and naming, but only a reasonable subset. (Hence probably significantly easier to use.)