Sassy 1 & 2: Fuzzy searching DNA using SIMD

Previously Link to heading

Previously Link to heading

Long-read global alignment is just not a problem people have… Link to heading

… Sassy: 100bp semi-global alignment Link to heading

Takeaway 1: Math yay, Bio nay Link to heading

Takeaway 2: Do simple stuff fast Link to heading

Approximate String Matching Link to heading

  • Find pattern \(P\) in text \(T\).
    • \(m := |P|\), \(n := |T|\)
  • Allow up to \(k\) unit-cost errors.
  • Find all matches.

 

  • IUPAC support: Handle ACTG, N, YR...

 

  • Not semi-global alignment
  • Not \(\Theta(nm/w)\)
  • Not new: Lots and LOTS of old literature

Intricacies of ASM: What is a “match”? Link to heading

  • The end position?
  • A start and end position?
  • An alignment?

Intricacies of ASM: Which “matches” to report? Link to heading

  • The end position?

    • One endpoint? (Semi-global)
    • All endpoints? (Navarro'01)
    • Local minima! (Sassy)
  • A start and end position?

    • One pair per endpoint?
    • All pairs?
  • An alignment?

    • All alignments?
    • One per endpoint?
    • But which one??????
    • In-progress PR by Fulcrum Genomics:
      iterate over all “reasonable” alignments

ASM is not reverse-complement invariant! Link to heading

Overhang: when \(P\) extends beyond \(T\) Link to heading

ASM in \(O(\lceil n/w\rceil k)\) Link to heading

  • Myers’ bitpacking on \(w=64\)-bit blocks.
  • “Early break” when all states in block have cost \(>k\), after \(\approx 2k\) rows.
  • Horizontal tiling because \(k\ll w\) makes \(O(n \lceil k/w\rceil)\) inefficient.

SIMD: Chunking the text Link to heading

  • Split the text in 4 chunks.
  • Process 1 chunk per SIMD-lane.
    • The blue blocks are all evaluated at the same time!

Sassy 1: search long text at \({>}1\) Gbp/s Link to heading

  • Params:
    • \(20 \leq |P| \leq 1000\), \(n=10^5\)
    • 1 thread, fwd search only
  • Results:
    • k=3: 1.1 Gbp/s
    • k=20: 600 Mbp/s
    • 10\(\times\) Edlib, 128\(\times\) parasail

Applications:

  • Crispr off-target searching
    • Many short patterns, fixed reference → Columba (Renders+ 2025)
  • Barcode detection for demultiplexing
    • Many short patterns, reads → Batching!

CLI: sassy grep -p <pattern> -k 3 <hg>.fa Link to heading

Sassy 2: batching \(\gg\) chunking Link to heading

  • Text chunking requires gathering and transposing
  • Take pattern suffix of len \(w’\in\{16,32\}\)
    • Random DNA has edit distance \(\approx 45\%\).
    • \(w’>2k+\varepsilon\) ensures few spurious matches.
  • One suffix per SIMD lane.
  • Post-process if suffix matches with cost \(\leq k\).

Sassy 2: search at 8 Gbp/s Link to heading

  • A: 128 patterns of len 23, 1 thread, \(k\in \{0,3,4\}\). B: \(n=10^5\), \(k=3\).

Sassy 2 Application: Crispr off-target & Barbell Link to heading

  • Search a 23bp guide RNA in a human genome with \(k=3\) in 30ms or amortized >100 Gbp/s on 16 threads.
    • 3.7\(×\) Sassy1, 35\(×\) Edlib
  • Scan Nanopore reads for 96 ONT rapid barcodes with \(k=3\) at 1.2 Gbp/s!
    • 4.6\(×\) Sassy1, 45\(×\) Edlib
  • Barbell (Beeloo+) is a fast & accurate demultiplexer using Sassy2.
    • Poster @ RECOMB tomorrow evening!