Skip to main content
. 2021 Feb 8;38(6):2428–2445. doi: 10.1093/molbev/msab036

Table 4.

Analysis of Nucleotidic Motifs Preceding CpG in the N ORF and Their ZAS.

Motif n Position of CpG CpG↓ syn-SNV (with counts) ZAS
CAUUGGCCG 4 905 509 3.03
CGGAAUGUCG 5 953 607 2.19
CAUAUUGACG 5 1,074 13 2.04
CGCAGUGGGGCG 7 104 54 8.48
CUAACAAAGACG 7 384 385 8.33
CUGGCAAUGGCG 7 642 9 8.33
CGAGGACAAGGCG 8 213 10 1.71
CCCCGCAUUACG 47 35 0.30
AAUAAUACUGCG 149 5 0.15
CUUGGUUCACCG 162 17 0.15
AUGCUGCAAUCG 471 34 0.15
AGAAGGGAGGCG 534 6 0.15
CACAAGCUUUCG 822 194 0.15
UUGCCCCCAGCG 930 15 0.15
AGCGCUUCAGCG 938 21 0.30
CAGCGUUCUUCG 945 45 0.30
GUCACACCUUCG 980 104 0.15
CCUUCGGGAACG 986 55 0.30
CAAGCCUUACCG 1,148 121 0.15
CGGCAGACG 4 829 0 3.18
CUACCAGACG 5 277 0 2.04
CACGUAGUCG 5 571 0 2.19
CAAAACAACGUCG 8 121 0 1.71
CGUGGUGGUGACG 8 294 0 1.71

Note.—The top seven lines show subsequences of N ORF (of the Wuhan ancestral strain, GISAID ID: EPI_ISL_406798) of the type CnxGxCG, where the spacer nx (highlighted in red) includes n = 4, 5, 7, or 8 nucleotides, for which the CpG dinucleotide was lost in one or more of the syn-SNV. These motifs were shown to be binding patterns for the ZAP protein in (Luo et al. 2020); the dissociation constants were measured for repeated A spacers, with values (in μM) Kd(4)=0.33±0.05,Kd(5)=0.49±0.10,Kd(7)=0.12±0.04,Kd(8)=0.64±0.14 (Luo et al. 2020). The next 12 lines show the other CpG lost through mutations and their ten preceding nucleotides, which do not correspond to motifs tested in Luo et al. (2020). The last five lines show other subsequences in the N protein corresponding to ZAP-binding motifs (Luo et al. 2020), but for which no loss of CpG is observed in the sequence data. The column ZAS gives the score associated to the subsequence considered, computed from the above dissociation constants (see Materials and Methods for technical details). Data from GISAID (Elbe and Buckland-Merrett 2017), see Materials and Methods for details on data analysis (last update October 05, 2020).