Figure - PMC

Skip to main content

An official website of the United States government

Here's how you know

Here's how you know

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

View full-text article in PMC

. 2024 May;34(5):778–783. doi: 10.1101/gr.278730.123

Search in PMC
Search in PubMed
View in NLM Catalog
Add to search

© 2024 Gamaarachchi et al.; Published by Cold Spring Harbor Laboratory Press

This article, published in Genome Research, is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.

PMC Copyright notice

Figure 2. — Comparison of experimental and simulated NA12878 signal data sets. (A–C) Frequency histograms show distributions of raw signal values (A), basecalled read lengths (B), and Phred quality scores (C) in experimental data (orange) and simulated data sets from Squigulator (orange) or DeepSimulator (purple), based on the reference individual NA12878. A Guppy HAC basecalling model was used. (D,E) For the same data sets, bar charts show the relative frequencies of each possible base substitution (D), and line plots show the relative frequency of insertions and deletions of different sizes (E). Substitution and indels errors are determined relative to the GRCh38 reference genome after alignment with minimap2. (F) Guppy basecalling accuracy (HAC model), as measured by read:reference identity score distributions, for experimental (upper) and simulated (lower) data sets. Simulated data are from Squigulator (red) or DeepSimulator with context-independent (purple) or context-dependent (blue) settings. (G) ROC curves evaluate accuracy of SNV detection with Clair3 on the same data sets (colors as above). (H) ROC curves evaluate concordance of SNVs detected with real experimental NA12878 data set versus simulated data from Squigulator or DeepSimulator (colors as above). SUP basecalling was used to maximize accuracy of SNV detection. The left vertical axes in ROC curves show absolute numbers of detected SNVs, and right vertical axes show fraction of true positives detected (i.e., recall or sensitivity).