Created: 2025-04-23 Wed 15:27
CABAACBDACBDCABACABAACBDACBDCABACABAACBDXACBDXCABACABAACBDACBDXCAB 🤔
CABAACBDX rotations:
CABAACBD.........ABAACBDX.........BAACBDXC.........AACBDXCA.........ACBDXCAB.... <— the A is not here.....CBDXCABA.........BDXCABAA.........DXCABAAC.........XCABAACBIn the \(w+1\) rotations, we need at least 2 samples.
\[\newcommand{\order}{\mathcal{O}}\]
CABCACABCAC......ABCACC......BCACCX......CACCXY......ACCXYZ......CCXYZXCABCACCXYZX
Density of minimizer scheme is \(\geq 1/\sigma^k\):
sample exactly every AAA k-mer, and nothing else.
EADCAE.......ADCAEB.......DCAEBE.......CAEBEC.......AEBECD.......EBECDC.......BECDCDAAAABCD....AAABCDE....AABCDEF....ABCDEFGAABACD...ABACDA...BACDAE
CABA: is ABA or A smaller?
ABA smaller for stability.AB is the smallest unique substring.AABACD...ABACDA...BACDAE
AAAA is BAD:
ABB order:
A followed by many non-A is smallest: ABBBBBBBBB
Anti-lexicographic order:
A small, followed by largest possible suffix: AZZZZZ is minimal
0010101 cycle:
001010.......010101.......101010.......010100.......101001.......010010.......10010101010 sus is not overlap free
AAA is not overlap freeGoal: find two non-overlapping substrings.
Goal:
For every \(w+1\) window, find two non-overlapping small strings.
011...11, search 00...0011...11