Table 3.
CpG Drive and Analysis of syn-SNV Changing CpG along the SARS-CoV-2 Genome.
CpG↓SNV | CpG↑SNV | |||||||||
---|---|---|---|---|---|---|---|---|---|---|
L | CpG | CpG Drive | SNV | Tot | /SNV (%) | /CpGSNV (%) | Tot | /SNV (%) | /CpGSNV (%) | |
3′UTR | 162 | 5 | −0.87 | 81 | 18 | 22 | 90 | 2 | 2 | 10 |
With counts | 6,020 | 1,597 | 27 | 99 | 20 | 0.3 | 1 | |||
5′UTR | 211 | 13 | −1.67 | 56 | 19 | 34 | 95 | 1 | 2 | 5 |
With counts | 48,806 | 47,446 | 97 | 99 | 328 | 1 | 1 | |||
N | 1,260 | 39 | −0.95 | 115 | 24 | 21 | 75 | 8 | 7 | 25 |
With counts | 4,745 | 2,239 | 47 | 94 | 146 | 3 | 6 | |||
M | 669 | 20 | −1.00 | 58 | 12 | 21 | 86 | 2 | 4 | 14 |
With counts | 5,508 | 245 | 5 | 45 | 302 | 6 | 55 | |||
ORF10 | 117 | 5 | −2.01 | 6 | 2 | 33 | 100 | 0 | 0 | 0 |
With counts | 233 | 14 | 60 | 100 | 0 | 0 | 0 | |||
ORF7a | 366 | 7 | −0.61 | 23 | 5 | 22 | 56 | 4 | 17 | 44 |
With counts | 544 | 243 | 45 | 88 | 33 | 6 | 12 | |||
ORF8 | 366 | 8 | −0.82 | 25 | 5 | 20 | 63 | 3 | 12 | 38 |
With counts | 2,146 | 107 | 5 | 86 | 17 | 1 | 14 | |||
ORF3a | 828 | 17 | −0.62 | 54 | 10 | 19 | 67 | 5 | 9 | 33 |
With counts | 1,530 | 244 | 16 | 87 | 36 | 2 | 13 | |||
ORF1a | 13,203 | 160 | 0.05 | 848 | 86 | 10 | 52 | 81 | 10 | 49 |
With counts | 79,937 | 3,930 | 5 | 72 | 1,560 | 2 | 28 | |||
ORF1b | 8,088 | 115 | 0.003 | 432 | 47 | 11 | 48 | 52 | 12 | 53 |
With counts | 33,788 | 7,342 | 22 | 84 | 1,434 | 4 | 26 | |||
E | 228 | 11 | −1.84 | 10 | 1 | 10 | 100% | 0 | 0 | 0 |
With counts | 311 | 69 | 22 | 100 | 0 | 0 | 0 | |||
S | 3,822 | 29 | 0.61 | 223 | 16 | 7 | 47 | 18 | 8 | 53 |
With counts | 14,540 | 571 | 4 | 44 | 729 | 5 | 56 | |||
ORF6 | 186 | 1 | 0.18 | 10 | 0 | 0 | 0 | 2 | 20 | 100 |
With counts | 275 | 0 | 0 | 0 | 40 | 15 | 100 | |||
ORF7b | 132 | 1 | −0.28 | 9 | 0 | 0 | 0 | 2 | 22 | 100 |
With counts | 406 | 0 | 0 | 0 | 14 | 4 | 100 |
Note.—The table gives, for all the ORFs and the 5′ and 3′UTRs of SARS-CoV-2 ancestral genome, the length of the region (L), the number of CpG motifs (CpG), the CpG drive , the syn-SNV, and the total numbers and percentages of syn-SNV removing a CpG motif (CG↓) or adding it (CG↑), with respect to total number of syn-SNV (/SNV) or to the total number of syn-SNV affecting CpG (/CpGSNV). For the noncoding 5′ and 3′UTRs, all SNV are taken into account with no restriction to syn-SNV and the noncoding forces are used; the equilibrium force is −1.16 (and not −1.79 as for ORFs) releasing such constraint. UTRs and ORFs and are sorted according to the density of CpG removing SNV (CG↓ SNV/L). The regions underlined are the most reliable for statistical analysis as they present at least 20 syn-SNV. Numbers and percentages of SNV are given with and without taking into account SNV counts. Data from GISAID (Elbe and Buckland-Merrett 2017), see Materials and Methods for details on data analysis (last update October 05, 2020). Ancestral genome GISAID ID: EPI_ISL_406798. SNV with <5 counts are excluded from the data. ROC: Receiver Operating Characteristic; AUROC: Area Under the Receiver Operating Characteristic.