Types of tigs

1 De Bruijn graph

Consider an edge-centric De Bruijn graph, where each edge corresponds to a k-mer, and nodes are the \(k-1\) overlaps between adjacent k-mers. In the figures, all edges are directed towards the right.

2 k-mers

The goal is now to store all edges / k-mers of the graph efficiently. A spectrum preserving string set (SPSS) is a set of strings whose k-mers are the k-mers of the input graph, that does not contain duplicate k-mers (Rahman and Medvedev 2020).

The most simple solution is to just store all k-mers individually.

3 Unitigs

The non-branching paths in a the graph. These can be stored as compressed strings, but are shown as k-mer-by-k-mer paths in the graph.

4 Simplitigs

Adjacent unitigs overlap in \(k-1\) characters, so it’s more efficient to store them adjacently (Břinda, Baym, and Kucherov 2021). One way to construct simplitigs is by simply greedily extending paths, and the resulting strings are called greedy simplitigs.

It turns out that an optimal decomposition into simplitigs can be found in linear time by considering an Eulerian circuit. The resulting optimal simplitigs are called eulertigs (Schmidt and Alanko 2022).

5 Matchtigs

We can achieve more compression by joining unitigs ‘at a distance’, and duplicating some in-between k-mers (Schmidt et al. 2021).

6 Masked superstrings

We can achieve even more compression by introducing some new k-mers during the overlapping process, and then mask these out (shown as dashed) (Sladký, Veselý, and Břinda 2023).

References

Břinda, Karel, Michael Baym, and Gregory Kucherov. 2021. “Simplitigs as an Efficient and Scalable Representation of de Bruijn Graphs.” Genome Biology 22 (1). https://doi.org/10.1186/s13059-021-02297-z.
Rahman, Amatur, and Paul Medvedev. 2020. “Representation of \$$k$\$-Mer Sets Using Spectrum-Preserving String Sets.” In Research in Computational Molecular Biology, 152–68. Springer International Publishing. https://doi.org/10.1007/978-3-030-45257-5_10.
Schmidt, Sebastian, Shahbaz Khan, Jarno Alanko, Giulio E. Pibiri, and Alexandru I. Tomescu. 2021. “Matchtigs: Minimum Plain Text Representation of Kmer Sets,” December. https://doi.org/10.1101/2021.12.15.472871.
Schmidt, Sebastian, and Jarno N. Alanko. 2022. “Eulertigs: Minimum Plain Text Representation of K-Mer Sets without Repetitions in Linear Time.” Schloss Dagstuhl – Leibniz-Zentrum für Informatik. https://doi.org/10.4230/LIPICS.WABI.2022.2.
Sladký, Ondřej, Pavel Veselý, and Karel Břinda. 2023. “Masked Superstrings as a Unified Framework for Textualk-Mer Set Representations,” February. https://doi.org/10.1101/2023.02.01.526717.