Skip to main content
. Author manuscript; available in PMC: 2021 Nov 29.
Published in final edited form as: Nat Genet. 2021 May 6;53(6):869–880. doi: 10.1038/s41588-021-00861-8

Extended Data Fig. 7. Functional noncoding sequences and SNVs associated with HbF levels in patients with SCD.

Extended Data Fig. 7

(a) The ratio of the mutation burden in patients with SCD with high HbF to that in patients with SCD with normal HbF at genomic loci with high BPRSHbF (the top 200). The x-axis represents the threshold of minor allele frequency (MAF) that was used to filter variants. The y-axis represents the different window sizes centered on genomic loci with high BPRSHbF. The number in each cell represents the ratio of the normalized mutation burden (see Methods) in patients with SCD with high RBC HbF levels to that in patients with SCD with normal HbF levels.

(b) The precision-recall curve representing the performance of a random forest model that predicts HbF levels by using the mutation burden within two groups of genomic loci. The green curve represents the model including only 18 common GWAS variants, and the red curve represents the model including the common GWAS variants plus 56 variants with high BPRSHbF. Dashed lines represent the precision at 75% recall rate.

(c) A box plot showing a pair-wise performance comparison of the two models. n= 400 random samplings. P-value is determined using paired two-tailed t-test. Box depicts the interquartile range; central line indicates the median and whiskers indicate minimum/maximum values.