Created: 2025-03-05 Wed 02:08
CABAACBD
ACBDCABA
CAB
A
ACBD
ACBDCAB
A
CABAACBD
X
ACBDXCAB
A
CAB
A
ACBD
ACBDXCAB
🤔
CABAACBDX
rotations:
CAB
A
ACBD........
.AB
A
ACBDX.......
..B
A
ACBDXC......
...
A
ACBDXCA.....
....ACBDXCAB....
<— the A
is not here.....CBDXCAB
A
...
......BDXCAB
A
A..
.......DXCAB
A
AC.
........XCAB
A
ACB
In the w+1 rotations, we need at least 2 samples.
C
A
BCA
C
AB
CAC.....
.
AB
CACC....
..BC
AC
CX...
...C
AC
CXY..
....
AC
CXYZ.
.....
CC
XYZX
C
AB
C
ACC
XYZX
Density of minimizer scheme is ≥1/σk:
sample exactly every AAA
k-mer, and nothing else.
E
A
DCAE......
.
A
DCAEB.....
..DC
A
EBE....
...C
A
EBEC...
....
A
EBECD..
.....E
B
ECDC.
......
B
ECDCD
A
AAABCD...
.
A
AABCDE..
..
A
ABCDEF.
...
A
BCDEFG
A
ABACD..
.ABACD
A
.
..B
A
CDAE
CABA
: is ABA
or A
smaller?
ABA
smaller for stability.AB
is the smallest unique substring.AA
BACD..
.
AB
ACDA.
..B
AC
DAE
AAAA
is BAD:
ABB order:
A
followed by many non-A
is smallest: ABBBBBBBBB
Anti-lexicographic order:
A
small, followed by largest possible suffix: AZZZZZ
is minimal
0010101
cycle:
00
1010......
.
01010
1.....
..1
0101
0....
...0101
00
...
....101
00
1..
.....01
00
10.
......1
00
101
01010
sus is not overlap free
AAA
is not overlap freeGoal: find two non-overlapping substrings.
Goal:
For every w+1 window, find two non-overlapping small strings.
011...11
, search 00...0011...11