Skip to main content
[Preprint]. 2024 Aug 30:2024.08.29.610342. [Version 1] doi: 10.1101/2024.08.29.610342

Figure 4. Diverse training data expands the representation space thus making the basecaller generalizable to novel modifications.

Figure 4.

(A) Performance of individually and jointly-trained basecallers on ac4C reads was visualized with the genome viewer graph, which shows per-nucleotide CIGAR fractions. All, the jointly-trained basecaller by all the oligo types except for ac4C; other acronyms denote individually-trained basecallers. For individually (B) and jointly-trained (C) basecallers, read fragments mapped to the boxed region were first converted as representation vectors with different basecaller encoders, then visualized by a UMAP plot. Train denotes reads used for training the corresponding basecaller. (D) Spatial distributions of different oligo types in the UMAP space as shown in (C). Black-to-green and red palette denotes ac4C and training reads, respectively.