Skip to main content
. 2021 Feb 8;38(6):2428–2445. doi: 10.1093/molbev/msab036

Table 3.

CpG Drive and Analysis of syn-SNV Changing CpG along the SARS-CoV-2 Genome.

CpG↓SNV CpG↑SNV
L CpG CpG Drive SNV Tot /SNV (%) /CpGSNV (%) Tot /SNV (%) /CpGSNV (%)
3′UTR 162 5 −0.87 81 18 22 90 2 2 10
With counts 6,020 1,597 27 99 20 0.3 1
5′UTR 211 13 −1.67 56 19 34 95 1 2 5
With counts 48,806 47,446 97 99 328 1 1
N 1,260 39 −0.95 115 24 21 75 8 7 25
With counts 4,745 2,239 47 94 146 3 6
M 669 20 −1.00 58 12 21 86 2 4 14
With counts 5,508 245 5 45 302 6 55
ORF10 117 5 −2.01 6 2 33 100 0 0 0
With counts 233 14 60 100 0 0 0
ORF7a 366 7 −0.61 23 5 22 56 4 17 44
With counts 544 243 45 88 33 6 12
ORF8 366 8 −0.82 25 5 20 63 3 12 38
With counts 2,146 107 5 86 17 1 14
ORF3a 828 17 −0.62 54 10 19 67 5 9 33
With counts 1,530 244 16 87 36 2 13
ORF1a 13,203 160 0.05 848 86 10 52 81 10 49
With counts 79,937 3,930 5 72 1,560 2 28
ORF1b 8,088 115 0.003 432 47 11 48 52 12 53
With counts 33,788 7,342 22 84 1,434 4 26
E 228 11 −1.84 10 1 10 100% 0 0 0
With counts 311 69 22 100 0 0 0
S 3,822 29 0.61 223 16 7 47 18 8 53
With counts 14,540 571 4 44 729 5 56
ORF6 186 1 0.18 10 0 0 0 2 20 100
With counts 275 0 0 0 40 15 100
ORF7b 132 1 −0.28 9 0 0 0 2 22 100
With counts 406 0 0 0 14 4 100

Note.—The table gives, for all the ORFs and the 5′ and 3′UTRs of SARS-CoV-2 ancestral genome, the length of the region (L), the number of CpG motifs (CpG), the CpG drive (feqf), the syn-SNV, and the total numbers and percentages of syn-SNV removing a CpG motif (CG↓) or adding it (CG↑), with respect to total number of syn-SNV (/SNV) or to the total number of syn-SNV affecting CpG (/CpGSNV). For the noncoding 5′ and 3′UTRs, all SNV are taken into account with no restriction to syn-SNV and the noncoding forces are used; the equilibrium force is −1.16 (and not −1.79 as for ORFs) releasing such constraint. UTRs and ORFs and are sorted according to the density of CpG removing SNV (CG↓ SNV/L). The regions underlined are the most reliable for statistical analysis as they present at least 20 syn-SNV. Numbers and percentages of SNV are given with and without taking into account SNV counts. Data from GISAID (Elbe and Buckland-Merrett 2017), see Materials and Methods for details on data analysis (last update October 05, 2020). Ancestral genome GISAID ID: EPI_ISL_406798. SNV with <5 counts are excluded from the data. ROC: Receiver Operating Characteristic; AUROC: Area Under the Receiver Operating Characteristic.