Past, ongoing, and future research plans
Table of Contents
Some lists of ongoing work to keep things organised for myself.
Upcoming journal papers Link to heading
- Sassy (Beeloo and Groot Koerkamp 2025): submitted to bioinformatics
- Deacon (Constantinides, Lees, and Crook 2025): to be submitted
- Barbell (Beeloo et al. 2025): to be submitted
- Mim (Patro et al. 2025): to be submitted
- Sassy v2: to be finished and submitted (maybe WABI/Recomb-seq?)
Upcoming conference papers Link to heading
- QuadRank: SEA (Jan 23)
- QuickHeap: SEA (Jan 23)? ESA B (April)?
- Chaining: WABI (May)? Recomb-Seq (March)?
- SimdSketch: WABI (May)? Recomb-Seq (March)?
- Small-k minimizers: WABI (May)?
- k-PHF hashmap: ESA B (April)
Upcoming conference dates & deadlines Link to heading
Sorted by CfP deadline.
- Recomb:
CfP: Nov 7 2025 (full paper: Nov 14)- May 26-29 2026, Thessaloniki, Greece
- DSB:
CfP: Jan 9 2026- Feb 18-19 2026, Venice, Italy
- HiTSeq:
- CfP: proceedings: Jan 20, 2026, abstract: April 9, 2026, poster: May 7, 2026
- ISMB: July 12-16, Washington DC
- SEA:
- CfP: January 23, 2026
- June 22-24 2026, Copenhagen, Denmark
- papers: QuadRank, (QuickHeap)
- Recomb-Seq:
- CfP: Proceedings: March 12 2026, Short talks / posters: ~March 2026
- May 24-25 2026, Thessaloniki, Greece
- ESA:
- CfP: ~April
- September 0-4 2026, L’Aquila, Italy
- WABI:
- CfP: ~May
- September 0-4 2026, L’Aquila, Italy
Dormant Link to heading
- Minimizers:
- publish the anti-lex scheme?
- continue investigation of greedymini to develop ‘understandable’ equivalent schemes.
- Find exact optimal schemes for all \(k\equiv 1\pmod w\)
- lower bound for local schemes
Code:
- sshash-rs (gh)
- suffix array searching (gh, post) (Bahne et al. 2019)
Long term plans Link to heading
If anything here interests you, feel free to reach out for collaborations.
- engineer STPD (Becker et al. 2025): write a highly optimized implementation.
- minimizer space?
- fuzzy version by only using minimizer start positions?
- Optimal CPU/memory performance given 3D latency and cooling constraints.
- Pairwise alignment review paper based on thesis chapter 2 (this post).
- Review on approximate string matching, this post (will never happen).
- Minimizers review paper based on thesis part 2 (this post).
Abandoned Link to heading
- Affine A*PA2: too much of an overhaul.
- Spaced $k$-mer similarity: see this post.
- Expected linear time A*PA: see a draft in this post.
- Linear memory WFA: Instead of storing furthest reaching points for all wavefronts, it is sufficient to only store critical points where paths split/merge. This should lower memory usage of WFA to close to linear, without needing BiWFA. See this post. This has similar vibes to TALCO (Walia et al. 2024).
- Local doubling: see this post.
References Link to heading
Bahne, Johannes, Nico Bertram, Marvin Böcker, Jonas Bode, Johannes Fischer, Hermann Foot, Florian Grieskamp, et al. 2019. “Sacabench: Benchmarking Suffix Array Construction.” In String Processing and Information Retrieval, 407–16. Springer International Publishing. https://doi.org/10.1007/978-3-030-32686-9_29.
Becker, Ruben, Davide Cenzato, Travis Gagie, Ragnar Groot Koerkamp, Sung-Hwan Kim, Giovanni Manzini, and Nicola Prezza. 2025. “Compressing Suffix Trees by Path Decompositions.” arXiv; arXiv. https://doi.org/10.48550/ARXIV.2506.14734.
Beeloo, Rick, Ragnar Groot Koerkamp, Xiu Jia, Marian J. Broekhuizen-Stins, Lieke van IJken, Els M. Broens, Aldert Zomer, and Bas E. Dutilh. 2025. “Barbell Resolves Demultiplexing and Trimming Issues in Nanopore Data,” October. https://doi.org/10.1101/2025.10.22.683865.
Beeloo, Rick, and Ragnar Groot Koerkamp. 2025. “Sassy: Searching Short DNA Strings in the 2020s,” July. https://doi.org/10.1101/2025.07.22.666207.
Constantinides, Bede, John Lees, and Derrick W Crook. 2025. “Deacon: Fast Sequence Filtering and Contaminant Depletion,” June. https://doi.org/10.1101/2025.06.09.658732.
Patro, Rob, Siddhant Bharti, Prajwal Singhania, Rakrish Dhakal, Thomas J. Dahlstrom, and Ragnar Groot Koerkamp. 2025. “Mim: A Lightweight Auxiliary Index to Enable Fast, Parallel, Gzipped Fastq Parsing,” November. https://doi.org/10.1101/2025.11.24.690271.
Walia, Sumit, Cheng Ye, Arkid Bera, Dhruvi Lodhavia, and Yatish Turakhia. 2024. “TALCO: Tiling Genome Sequence Alignment Using Convergence of Traceback Pointers.” In 2024 Ieee International Symposium on High-Performance Computer Architecture (Hpca). IEEE. https://doi.org/10.1109/hpca57654.2024.00044.