Skip to main content
Wiley - PMC COVID-19 Collection logoLink to Wiley - PMC COVID-19 Collection
. 2021 May 15;93(9):5638–5643. doi: 10.1002/jmv.27062

Preliminary report on severe acute respiratory syndrome coronavirus 2 (SARS‐CoV‐2) Spike mutation T478K

Simone Di Giacomo 1, Daniele Mercatelli 1, Amir Rakhimov 1, Federico M Giorgi 1,
PMCID: PMC8242375  PMID: 33951211

Abstract

Several severe acute respiratory syndrome coronavirus 2 (SARS‐CoV‐2) variants have emerged, posing a renewed threat to coronavirus disease 2019 containment and to vaccine and drug efficacy. In this study, we analyzed more than 1,000,000 SARS‐CoV‐2 genomic sequences deposited up to April 27, 2021, on the GISAID public repository, and identified a novel T478K mutation located on the SARS‐CoV‐2 Spike protein. The mutation is structurally located in the region of interaction with human receptor ACE2 and was detected in 11,435 distinct cases. We show that T478K has appeared and risen in frequency since January 2021, predominantly in Mexico and the United States, but we could also detect it in several European countries.

Keywords: COVID‐19, genomic surveillance, SARS‐CoV‐2, Spike, Spike:T478K, S:T478K, T478K

Highlights

  • We analyzed 1,180,571 SARS‐CoV‐2 samples from the public repository GISAID (updated to April 27, 2021).

  • We detected a mutation in SARS‐CoV‐2 Spike (S) protein amino acid 478, S:T478K, which has been growing in sequence in North America (especially Mexico) since January, 2021.

  • S:T478K is one of the characterizing mutations of lineage B.1.1.519, which is currently independent from B.1.1.7 and B.1.351.

  • S:T478K is affecting the Spike binding domain with human receptor ACE2, increasing the electrostatic potential on the interface.

  • Previous experiments show that S:T478K is a possible genetic route for SARS‐CoV‐2 to escape immune recognition.

1. INTRODUCTION

Severe acute respiratory syndrome coronavirus 2 (SARS‐CoV‐2), the etiological cause of coronavirus disease 2019 (COVID‐19) is responsible for the most severe pandemic outbreak of the current century. 1 Naturally, it is the object of unprecedented scientific scrutiny, with more than one million SARS‐CoV‐2 genomic sequences having been generated and publicly shared since December 2019. This avalanche of data was made possible, thanks to the efforts of thousands of contributing laboratories across the World, and collected by the GISAID initiative database. 2 This currently allows to run nearly real‐time operations of genomic surveillance, by scrutinizing the evolution of the virus temporally and geographically. 3 In the first 17 months since the appearance of SARS‐CoV‐2, genomic surveillance has proven itself fundamental in tracking viral outbreaks 4 and in identifying potential new variants of clinical concern. One of these is the variant B.1.1.7, 5 characterized by 18 mutations over the reference genomic sequence (NCBI entry NC_045512.2, most notably a mutation A23063T, causing an aminoacidic change N501Y in the viral Spike protein interaction domain with human receptor angiotensin‐converting enzyme 2 (ACE2). 6 The interaction with ACE2, a surface protein expressed in human respiratory epithelial cells, is one of the key mechanisms for viral entry in the host, and it is a molecular mechanism directly connected with host specificity, early transmissibility, 7 and higher viral infectivity. 8

N501Y is only one of the 9 Spike mutations of variant B.1.1.7, also characterized by mutations in polyprotein open reading frame (ORF1a), proteins ORF8, and nucleocapsid (N). 9 Another mutation in the Spike protein, D614G, was prevalent in early 2020 and is currently present in more than 90% of all circulating SARS‐CoV‐2s; this mutation is not located in the interaction domain with ACE2, but it has been associated with increased entry efficiency into human host cells. 10 The US Center for Disease Control and Prevention defined “variants of concern” all those mutations and lineages, which have been associated with an increase in transmissibility and virulence, a decrease in the effectiveness of social and public health measures targeting the virus, and in general all mutations potentially affecting COVID‐19 epidemiology. 11 Virtually all variants of concern contain mutations in the SARS‐CoV‐2 Spike protein, such as variant B.1.351 (Spike mutations K417N, E484K, N501Y, D614G, and A701V) and variants B.1.427/B.1.429 (Spike mutations S13I, W152C, L452R, and D614G), and many of these reside on the receptor‐binding domain (RBD), a region located between residues 350–550 of Spike and directly binding to the human ACE2.6, 12 Recombinant vaccines comprising the Spike RBD induced potent antibody response in immunized animal models, such as mice, rabbits, and primates. 13 On the other hand, mutations in the RBD can improve viral affinity to ACE2 and the evasion from neutralization antibodies.8, 14, 15, 16 A recently published clinical study 17 has shown also a decreased vaccine efficacy against the lineage B.1.351 (carrying Spike mutations E484K and N501Y), testifying the need to track and monitor all SARS‐CoV‐2 mutations, with a particular accent on those affecting the Spike RBD.

In this short communication, we will show a report on a novel SARS‐CoV‐2 Spike mutation, T478K, which is also located at the interface of the Spike/ACE2 interaction, and it is worryingly rising in prevalence among SARS‐CoV‐2 sequences collected since the beginning of 2021.

2. MATERIALS AND METHODS

We downloaded all publicly available SARS‐CoV‐2 genomic sequences from the GISAID database on April 27, 2021. This yielded 1,180,571 samples, annotated with features, such as collection date, region of origin, age, and sex of the infected patient. Only viruses collected from human hosts were kept for further processing, discarding, for example, environmental samples or viruses obtained from other mammals. We compared all these sequences with the SARS‐CoV‐2 Wuhan genome NC_045512.2, using a gene annotation file in GFF3 format available as Supplementary File 1. This provided 27,388,937 mutations when compared with the reference. These nucleotide mutations were then converted in corresponding cumulative effects on protein sequence using the Coronapp pipeline. 18 The 3D rendering of the location of S:T478K in the SARS‐CoV‐2 Spike/Human ACE2 complex was based on the crystal structure from, 19 deposited in the Protein Data Bank 20 entry 6VW1. Comparison between wild‐type and mutant Spike was performed using the Pymol suite 21 with the adaptive Poisson–Boltzmann Solver plugin. 22 All statistical analysis, algorithms, and plotting were implemented with the R software. 23

3. RESULTS

In total, we could detect the Spike:T478K (S:T478K) mutation in 11,435 distinct patients as of April 27, 2021, more than twice the number observed one month before, on March 26, 2021 (4214). The majority of these mutations (20,205 samples, Figure 1A and Figure S1) are associated with PANGOLIN lineage B.1.1.519; S:T478K is present in 97.0% of B.1.1.519 cases. The remaining S:T478K events are distributed in small numbers (N < 250) in other lineages phylogenetically not derived from B.1.1.519, supporting the hypothesis that this mutation has arisen more than once in distinct events. S:T478K is also present in 68 out of 85 (80%) reported samples from the B.1.214.3; however, the low total number of cases for this lineage do not make it a variant of concern yet.

Figure 1.

Figure 1

(A) Prevalence of Spike mutation T478K in PANGO lineages. The top 10 lineages are reported, sorted by number of S:T478K samples over a total number of lineage samples. The discrete number of S:T478K is reported on top of each bar. Most S:T478K‐carrying samples are classified in the B.1.1.519 lineage. (B) Frequency of sequenced severe acute respiratory syndrome coronavirus 2 (SARS‐CoV‐2) genomes carrying the S:T478K mutation, divided into 10 years age ranges. The total number of S:T478K patients for the specified age range is reported on top of the bars. (C) Pie chart showing the distribution of S:T478K by patient sex. (D) Number of S:T478K samples over total samples sequenced from each country. The 10 countries with higher frequency (in percentage) are shown. Discrete numbers of S:T478K are reported on top of each bar. (E) Geographic global projection of S:T478K cases detected in each country. The color scale indicates the number of SARS‐CoV‐2 genomes carrying the S:T478K mutation, in logarithm‐10 scale

S:T478K does not seem to be significantly associated with patient age (one‐way analysis of variance test p > .1, Figure 1B and Figure S2), nor with patient sex (Figure 1C). The geographic distribution of S:T478K (Figure 1D and Figure S3) shows a noticeable prevalence in Mexico, where it constitutes 52.8% (3202 distinct cases) of all sequenced SARS‐CoV‐2 genomes. We could detect S:T478K mutations in 7133 samples from the United States of America, totaling 2.7% of all genomes generated in the country. The S:T478K is therefore primarily present in North America, constituting more than 50% of all the sequences generated in Mexico (Figure 1D and Figure S4). S:t478k has been detected also in European countries, such as Germany, Sweden, and Switzerland (Figure 1E and Table S1).

One of the reasons of concern about S:T478K is that it is rapidly growing over time, both in the number of detected samples (Figure 2A) and in prevalence, calculated as the number of cases over the total number of sequenced genomes (Figure 2B). We detected this growth starting at the beginning of 2021, and S:T478K is, at the time of writing (April 27, 2021) characterizing more than 2.0% of all sequenced SARS‐CoV‐2. As a comparison, we show the growth observed for Spike mutations S:N501Y, which rose in November 2020 (Figure 2C), and S:D614G, which exponentially grew in frequency starting from February 2020 (Figure 2D).

Figure 2.

Figure 2

(A) Number of sequenced SARS‐CoV‐2 genomes carrying S:T478K mutation over time, measured weekly. (B) Prevalence over time of S:T478K in the SARS‐CoV‐2 population, measured as the number of S:T478K genomes over the total number of sequenced genomes. (C) Prevalence over time of S:N501Y in the SARS‐CoV‐2 population. (D) Prevalence over time of S:D614G in the SARS‐CoV‐2 population. SATRS‐CoV‐2, severe acute respiratory syndrome coronavirus 2

The location of S:T478K is within the interaction domain with the human receptor ACE2, roughly encompassing amino acids 350 to 550 of the SARS‐CoV‐2 Spike protein. In particular, the position of S:T478K is on the interface with ACE2, as shown by crystal structures of the complex (Figure 3A). The amino acid change from the polar but uncharged threonine (T) to a basic, charged lysine (K) is predicted to increase the electrostatic potential of Spike to a more positive surface, in a region directly contacting ACE2 (Figure 3B). Also, the larger side chain of lysine is predicted to increase the steric hindrance of the mutant, possibly further affecting the Spike/ACE2 interaction (Figure 3C).

Figure 3.

Figure 3

(A) 3D representation of the SARS‐CoV‐2 Spike/Human ACE2 interacting complex, derived from the crystal structure from. 19 (B) Representation of the SARS‐CoV‐2 Spike electrostatic potential calculated with the Adaptive Poisson–Boltzmann Solver (APBS) program 22 implemented in PyMOL. 21 Molecular surface was colored according to the molecular electrostatic potential (ranging from −2.0 [red] to 2.0 [blue]) in T478 (reference, above) and in K478 (more recent mutation, below). (C) 3D detail of structural superposition of WT SARS‐CoV‐2 RBD and S:T478K. T478 side chains are colored in cyan, while K478 side chains are colored in red. ACE2, angiotensin‐converting enzyme 2; RBD, receptor‐binding domain; SARS‐CoV‐2, severe acute respiratory syndrome coronavirus 2; WT, wild‐type

S:T478K is frequently co‐occurring with three other Spike mutations located outside the canonical ACE2 interaction region. One is D614G (99.83% co‐occurrence), one of the founding events of SARS‐CoV‐2 lineage B, currently the most diffused Worldwide (Table 1). The other two are P681H and T732A, with 93.8% and 88.7% co‐occurrence with S:T478K, respectively (Table 1). We could detect S:T478K in copresence with other Spike mutations as well, but currently all at much lower frequencies (<4%). The Spike S:T478K mutation is frequently co‐existing also with mutations in other proteins, such as the diffused two‐aa Nucleocapsid mutation N:RG203KR, and mutations in nonstructural proteins (NSPs) derived from the polyprotein encoded ORF1, which include for example the viral RNA‐dependent RNA polymerase NSP12 (Table 2).

Table 1.

Ten SARS‐CoV‐2 Spike mutations most frequently co‐occurring with S:T478K

Number of cases Percentage
S:D614G 11428 99.83
S:P681H 10739 93.81
S:T732A 10157 88.73
S:L5F 439 3.84
S:L452R 241 2.11
S:P681R 224 1.96
S:D950N 222 1.94
S:T19R 215 1.88
S:E156 174 1.52
S:E154A 149 1.3

Note: We report the number of genomes where both mutations are present, and the percentage over the total number of samples where S:T478K is reported.

Abbreviation: SARS‐CoV‐2, severe acute respiratory syndrome coronavirus 2.

Table 2.

Ten SARS‐CoV‐2 non‐Spike mutations most frequently co‐occurring with S:T478K

Number of cases Percentage
NSP12b:P314L 11310 98.8
NSP4:T492I 10716 93.61
NSP6:I49V 10692 93.4
NSP3:P141S 10687 93.36
N:RG203KR 10622 92.79
NSP9:T35I 10150 88.67
ORF8:L4P 1617 14.13
N:Q418H 1052 9.19
NSP2:T44I 1044 9.12
ORF8:A65V 616 5.38

Note: We report the number of genomes where both mutations are present, and the percentage over the total number of samples where S:7478 K is reported.

Abbreviation: N, nucleocapsid protein; NSP, nonstructural protein; ORF, open reading frame; SARS‐CoV‐2, severe acute respiratory syndrome coronavirus 2.

4. DISCUSSION

In this short communication, we report the distribution of the Spike mutation S:T478K and its recent growth in prevalence in the SARS‐CoV‐2 population. While there is currently no report of association of this variant with clinical features, S:T478K's rapid growth may indicate an increased adaption of SARS‐CoV‐2 variants carrying it, particularly lineage B.1.1.519. The distribution of this mutation, which emerged from the B.1 lineage carrying S:D614G, but is independent of the S:N501Y mutation, is higher in North America, 24 but we could detect it also in several European countries. T478K has been detected in other phylogenetically non‐derived lineages from B.1.1.519, supporting the hypothesis that this mutation arose more than once in distinct events. Since the highest abundance of this mutation seems to be in Mexico and USA, this may allow to hypothesize a founder effect in which a chance founder event was followed by natural selection progression, since the frequency of the mutation has, slowly but steadily, increased in the first months of 2021.

The location of S:T478K in the interaction complex with human ACE2 may affect the affinity with human cells and therefore influence viral infectivity. An in silico molecular dynamics study on the protein structure of Spike has predicted that the T478K mutation, substituting a non‐charged amino acid (Threonine) with a positive one (Lysine) may significantly alter the electrostatic surface of the protein (Figure 3), and therefore the interaction with ACE2, drugs, or antibodies, 25 and that the effect can be increased if combined by other co‐occurring Spike mutations (see Table 1). Another experiment showed that T478K and T478R mutants were enriched when SARS‐CoV‐2 viral cultures were tested against weak neutralizing antibodies, 26 highlighting, at least in vitro, a possible genetic route the virus can follow to escape immune recognition. Everything considered, we believe that the continued genetical and clinical monitoring of S:T478K and other Spike mutations are of paramount importance to better understand COVID‐19 and be able to better counteract its future developments.

CONFLICT OF INTERESTS

The authors declare that there are no conflict of interests.

AUTHOR CONTRIBUTIONS

Simone Di Giacomo and Federico M. Giorgi contributed to conceptualization. Federico M. Giorgi contributed to funding acquisition and writing—original draft preparation. Daniele Mercatelli and Simone Di Giacomo contributed to writing—review and editing. Amir Rakhimov and Federico M. Giorgi contributed to Methodology. Daniele Mercatelli, Simone Di Giacomo, and Federico M. Giorgi contributed to validation. Amir Rakhimov and Federico M. Giorgi provided software. All authors contributed to the study and approved the final version of the manuscript.

Supporting information

Supporting information.

Supporting information.

Supporting information.

Supporting information.

Supporting information.

Supporting information.

Supporting information.

ACKNOWLEDGMENTS

The authors are very grateful to the GISAID Initiative and all its data contributors, that is, the authors from the originating laboratories responsible for obtaining the specimens and the Submitting laboratories where genetic sequence data were generated and shared via the GISAID Initiative, on which this study is based. This study was funded by the Italian Ministry of University and Research, Montalcini grant.

Di Giacomo S, Mercatelli D, Rakhimov A and Giorgi FM. Preliminary report on severe acute respiratory syndrome coronavirus 2 (SARS‐CoV‐2) Spike mutation T478K. J Med Virol. 2021;93:5638–5643. 10.1002/jmv.27062

Simone Di Giacomo and Daniele Mercatelli contributed equally to this study.

DATA AVAILABILITY STATEMENT

All data supporting this study is available on the GISAID portal https://www.gisaid.org/

REFERENCES

  • 1. Fontanet A, Autran B, Lina B, Kieny MP, Karim SSA, Sridhar D. SARS‐CoV‐2 variants and ending the COVID‐19 pandemic. Lancet. 2021;397:952‐954. 10.1016/S0140-6736(21)00370-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. Shu Y, McCauley J. GISAID: global initiative on sharing all influenza data—from vision to reality. Euro Surveill. 2017:22. 10.2807/1560-7917.ES.2017.22.13.30494 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Mercatelli D, Giorgi FM. Geographic and genomic distribution of SARS‐CoV‐2 mutations. Front Microbiol. 2020;11:1800. 10.3389/fmicb.2020.01800 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Mercatelli D, Holding AN, Giorgi FM. Web tools to fight pandemics: the COVID‐19 experience. Brief Bioinform. 2021;22:690‐700. 10.1093/bib/bbaa261 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Rambaut A, Holmes EC, O'toole Á, et al. A dynamic nomenclature proposal for SARS‐CoV‐2 lineages to assist genomic epidemiology. Nat Microbiol. 2020;5:1403‐1407. 10.1038/s41564-020-0770-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Ortuso F, Mercatelli D, Guzzi PH, Giorgi FM. Structural genetics of circulating variants affecting the SARS‐CoV‐2 Spike/human ACE2 complex. J Biomol Struct Dyn. 2021:1‐11. 10.1080/07391102.2021.1886175 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Leung K, Shum MH, Leung GM, Lam TT, Wu JT. Early transmissibility assessment of the N501Y mutant strains of SARS‐CoV‐2 in the United Kingdom, October to November 2020. Euro Surveill. 2021:26. 10.2807/1560-7917.ES.2020.26.1.2002106 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Khanhan A, Zia T, Suleman M, et al. Higher infectivity of the SARS‐CoV‐2 new variants is associated with K417N/T, E484K, and N501Y mutants: an insight from structural data. J Cell Physiol. 2021:jcp.30367. 10.1002/jcp.30367 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Galloway SE, Paul P, MacCannell DR, et al. Emergence of SARS‐CoV‐2 B.1.1.7 lineage—United States, December 29, 2020‐January 12, 2021. MMWR Morb Mortal Wkly Rep. 2021;70:95‐99. 10.15585/mmwr.mm7003e2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Ozono S, Zhang Y, Ode H, et al. SARS‐CoV‐2 D614G Spike mutation increases entry efficiency with enhanced ACE2‐binding affinity. Nat Commun. 2021;12:848. 10.1038/s41467-021-21118-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. CDC . SARS‐CoV‐2 variants of concern. 2021. https://www.cdc.gov/coronavirus/2019-ncov/cases-updates/variant-surveillance/variant-info.html. Accessed April 30, 2021.
  • 12. Chakraborty S. Evolutionary and structural analysis elucidates mutations on SARS‐CoV2 Spike protein with altered human ACE2 binding affinity. Biochem Biophys Res Commun. 2021;538:97‐103. 10.1016/j.bbrc.2021.01.035 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Yang J, Wang W, Chen Z, et al. A vaccine targeting the RBD of the S protein of SARS‐CoV‐2 induces protective immunity. Nature. 2020;586:572‐577. 10.1038/s41586-020-2599-8 [DOI] [PubMed] [Google Scholar]
  • 14. McCallum M, Bassi J, Marco AD, et al. SARS‐CoV‐2 immune evasion by variant B.1.427/B.1.429. Immunology. 2021. [Google Scholar]
  • 15. Wang WB, Liang Y, Jin YQ, Zhang J, Su JG, Li QM. E484K mutation in SARS‐CoV‐2 RBD enhances binding affinity with HACE2 but reduces interactions with neutralizing antibodies and nanobodies: binding free energy calculation studies. Bioinformatics. 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Xie X, Liu Y, Liu J, et al. Neutralization of SARS‐CoV‐2 Spike 69/70 deletion, E484K and N501Y variants by BNT162b2 vaccine‐elicited Sera. Nature Med. 2021;27:620‐621. 10.1038/s41591-021-01270-4 [DOI] [PubMed] [Google Scholar]
  • 17. Madhi SA, Baillie V, Cutland CL, et al. Efficacy of the ChAdOx1 NCoV‐19 Covid‐19 vaccine against the B.1.351 variant. N Engl J Med. 2021. 10.1056/NEJMoa2102214 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Mercatelli D, Triboli L, Fornasari E, Ray F, Giorgi FM. Coronapp: a web application to annotate and monitor SARS‐CoV‐2 mutations. J Med Virol. 2020;93:3238‐3245. 10.1002/jmv.26678 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. Shang J, Ye G, Shi K, et al. Structural basis of receptor recognition by SARS‐CoV‐2. Nature. 2020;581:221‐224. 10.1038/s41586-020-2179-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Berman HM, Westbrook J, Feng Z, et al. The protein data bank. Nucleic Acids Res. 2000;28:235‐242. 10.1093/nar/28.1.235 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Yuan S, HC Stephen C, Slawomir F, Horst V. PyMOL and Inkscape bridge the data and the data visualization. Structure. 2016;24 2041‐2042. 10.1016/j.str.2016.11.012 [DOI] [PubMed] [Google Scholar]
  • 22. Jurrus E, Engel D, Star K, et al. Improvements to the APBS biomolecular solvation software suite. Prot Sci. 2018;27:112‐128. 10.1002/pro.3280 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23. Crawley MJ. The R Book. John Wiley & Sons; 2012. [Google Scholar]
  • 24. Lasek‐Nesselquist E, Pata J, Schneider E, George KS. A tale of three SARS‐CoV‐2 variants with independently acquired P681H mutations in New York State. medRxiv. 2021. 10.1101/2021.03.10.21253285 [DOI] [Google Scholar]
  • 25. Rezaei S, Sefidbakht Y, Uskoković V. Comparative molecular dynamics study of the receptor‐binding domains in SARS‐CoV‐2 and SARS‐CoV and the effects of mutations on the binding affinity. J Biomol Struct Dyn. 2020:1‐20. 10.1080/07391102.2020.1860829 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26. Muecksch F, Weisblum Y, Barnes CO, et al. Development of potency, breadth and resilience to viral escape mutations in SARS‐CoV‐2 neutralizing antibodies. bioRxiv. 2021. 10.1101/2021.03.07.434227 [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supporting information.

Supporting information.

Supporting information.

Supporting information.

Supporting information.

Supporting information.

Supporting information.

Data Availability Statement

All data supporting this study is available on the GISAID portal https://www.gisaid.org/


Articles from Journal of Medical Virology are provided here courtesy of Wiley

RESOURCES