LETTER
The coronavirus disease 2019 pandemic, caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), remains a global public health concern in 2022 (1). A recent study demonstrated that more than 30 sites with insertions and/or deletions (indels) existed between the genomes of SARS-CoV in 2003 and SARS-CoV-2 in 2019 (2). The majority of indels involved 10 or more consecutive bases, and 15 of the 17 sites had deletion-with-insertions, resulting in a thorough exchange of 10 to 330 consecutive bases. This type of mutation is not widely understood, and its prevalence among SARS-related coronaviruses other than SARS-CoV-2 is uncertain. Moreover, the exact origin and timing of the development of long indels in SARS-related coronaviruses remain unknown. Therefore, in this study, we compared the genomes of SARS-CoV in 2003 and SARS-related coronavirus, Rc-o319, which was sampled in 2013 from bats (Rhinolophus cornutus) in a cave in northern Japan (3), to determine the presence of this type of mutation in the natural environment. The genome sequences of SARS-CoV and Rc-o319 were obtained from the GenBank database at the National Institutes of Health (https://www.ncbi.nlm.nih.gov/genbank/). The GenBank accession ID for the SARS-CoV genome was AY345986 (4), and the GenBank accession ID for the Rc-o319 genome was LC556375 (3).
The Rc-o319 genome was 77.8% identical to that of SARS-CoV. Of the 6,590 mutated bases, 5,871 bases (89.1%) were point mutations, and 719 bases (10.9%) from 33 sites were indels. Of the 33 indel sites, 15 sites had deletion-with-insertions, 12 sites had sole deletions, and 6 sites had sole insertions. These mutations were concentrated in the open reading frame 1a (ORF1a) and S1 genes. The base lengths involved in the deletion-with-insertions ranged from 10 to 151 consecutive bases. The base lengths involved in the sole deletions ranged from 1 to 9 consecutive bases, and those involved in the sole insertions ranged from 1 to 29 consecutive bases. There were 14 medium-sized indels (13 with deletion-with-insertions and 1 with a sole insertion) involving 10 to 100 consecutive bases (Table 1). A line graph of the rolling average (±50 bases) for the point mutation rate at each base position and the distribution of indels across the Rc-o319 genome is shown in Fig. 1. The actual sequences involved in the deletion-with-insertions are shown in the lower panels. This type of mutation is different from classical mutations, such as sole insertions, sole deletions, inversions, duplications, translocations, or variable number tandem repeats. No parts of the long indels could be identified in the genomes of 2003 SARS-CoV or 2013 Rc-o319, in both the normal and inverse directions. Furthermore, most of the identified long indels differed from those identified between the genomes of 2003 SARS-CoV and 2019 SARS-CoV-2.
TABLE 1.
List of medium-sized indels involving 10 to 100 consecutive bases in the SARS-CoV and Rc-o319 genomesa
| Base position (SARS-CoV) | Base sequences in the SARS-CoV genome (genes; length) | Base position (Rc-o319) | Replaced bases in the corresponding gene positions of Rc-o319 (genes; length) |
|---|---|---|---|
| 3086–3129 b | 5′-cctgtgaacatgagtacggtacagaggatgattatcaaggtctc-3′ (ORF1a; 44 b) | 3086–3118 b | 5′-aagccagtgctgacttagactatgatggtcaat-3′ (ORF1a; 33 b) |
| 3151–3169 b | 5′-gctgaaacagttcgagttg-3′ (ORF1a; 19 b) | 3139–3148 b | 5′-cagacacctc-3′ (ORF1a; 10 b) |
| 3238–3263 b | 5′-cctacacctgaagaaccagttaatca-3′ (ORF1a; 26 b) | 3217–3254 b | 5′-gtagtgacaacaccaacaccagaagttgagactgacat-3′ (ORF1a; 38 b) |
| 4798–4809 b | 5′-agccccgtcgag-3′ (ORF1a; 12 b) | 4834–4841 b | 5′-ccttttac-3′ (ORF1a; 8 b) |
| 4831–4840 b | 5′-ctttcacttg-3′ (ORF1a; 10 b) | 4864–4873 b | 5′-aaagatttaa-3′ (ORF1a; 10 b) |
| 5977–5985 b | 5′-tctaacaca-3′ (ORF1a; 9 b) | 6010–6021 b | 5′-gatgtaaattct-3′ (ORF1a; 12 b) |
| 22077–22096 b | 5′-caacctatagatgtagttcg-3′ (S1; 20 b) | 22069–22100 b | 5′-acgtcctatgtgttggccataggtgccacctc-3′ (S1; 32 b) |
| 22117–22270 b | 5′-acactttgaaacctatttttaagttgcctcttggtattaacattacaaattttagagccattc-3′ (S1; 63 b) | 22121–22171 b | 5′-cacctcttgtaccattatggaaaatacccataggcttaaatataaccaact-3′ (S1; 51 b) |
| 22190–22223 b | 5′-ttcacctgctcaagacatttggggcacgtcagct-3′ (S1; 34 b) | 22182–22215 b | 5′-ggtgtaccttcgatctgataataccccactacag-3′ (S1; 34 b) |
| 22841–22914 b | 5′-atctaatgtgcctttctcccctgatggcaaaccttgcaccccacctgctcttaattgttattggccattaaatg-3′ (S1; 74 b) | 22833–22882 b | 5′-tgctcactatgattatcaagtgggtactcaatttaagtcatctcttaaga-3′ (S1; 50 b) |
| 23492–23537 b | 5′-atctattgtggcttatactatgtctttaggtgctgatagttcaatt-3′ (S1; 16 b) | 23460–23508 b | 5′-taagagaattgttgcttacgttatgtctcttggtgctgaaaactctgta-3′ (S1; 49 b) |
| 25964–25976 b | 5′-aaagacccaccga-3′ (ORF3a; 13 b) | 25936–25954 b | 5′-gatgaaagagaacacgaac-3′ (ORF3a; 19 b) |
| 27244–27249 b | 5′-tccata-3′ (noncoding region; 6 b) | 27219–27229 b | 5′-atctgttctct-3′ (ORF3a; 11 b) |
| 27867/27868 b | Insertion; no corresponding bases (ORF8) | 27845–27873 b | 5′-cccctctagctacgcgccagagtggaata-3′ (ORF8; 29 b) |
Three long indels involving ≥100 consecutive bases (one site in the ORF1a gene and two sites in the S1 gene) and 16 small indels involving <10 consecutive bases were excluded. b, base; ORF, open reading frame; Rc-o319, bat SARS-related coronavirus; S, spike; SARS-CoV, severe acute respiratory syndrome coronavirus.
FIG 1.
Distribution of point mutations and indels in the genome of bat Rc-o319. A line graph of the point mutation rate across the genome of Rc-o319, represented by the rolling average (±50 bases) of the point mutation rate at each base position, with the 95% confidence interval range, is shown. Solid blue (insertions), red (deletions), and black (deletion-with-insertions) bars show the locations and sizes of the 33 indel sites. The width of each solid bar represents the base size of the indel. Point mutations and relatively long indels were concentrated in nsp3 and S1. The lower two panels show the actual sequences in two indel sites supposedly with deletion-with-insertions in the nsp3 of ORF1a. Approximately 10 consecutive bases were exchanged between the genomes of SARS-CoV in 2003 (China) and Rc-o319 in 2013 (Japan). E, envelope; indels, insertion and/or deletion mutations; M, membrane; N, nucleocapsid; nsp, non-structural protein; ORF, open reading frame; Rc-o319, bat SARS-related coronavirus; S, spike; SARS-CoV, severe acute respiratory syndrome coronavirus.
The results of the present study imply the presence of a novel type of mutation based on deletion-with-insertions that broadly exists among SARS-like coronaviruses. Although relatively long sole insertions or deletions, such as the Δ382 variant (5), have been reported in SARS-CoV-2 isolated from humans, a deletion-with-insertion has not been reported during the pandemic in humans in the last 2 years. This suggests the possibility that some unknown host cell machinery in animal reservoirs, such as bats or pangolins, could have contributed to the development of deletion-with-insertions in SARS-related coronavirus genomes. Moreover, the finding of the present study that no parts of the long indels could be found in the virus genome itself indicates that the long indels are not derived from the virus genome. The RNAs from other viruses or host cells are candidates for the origin of long indels in SARS-related coronaviruses. Further studies regarding the molecular mechanisms and the prevalence of this type of mutation in the biosphere are warranted.
ACKNOWLEDGMENTS
We declare no competing interests.
The concept and design were completed by Akaishi and Horii. The acquisition, analysis, and interpretation were performed by Akaishi and Horii. Akaishi drafted the manuscript, and Horii and Ishii performed a critical revision of the manuscript for important intellectual content. Horii and Ishii provided supervision.
Contributor Information
Tetsuya Akaishi, Email: t-akaishi@med.tohoku.ac.jp.
Tom Gallagher, Loyola University Chicago.
REFERENCES
- 1.Gao Y, Cai C, Grifoni A, Müller TR, Niessl J, Olofsson A, Humbert M, Hansson L, Österborg A, Bergman P, Chen P, Olsson A, Sandberg JK, Weiskopf D, Price DA, Ljunggren H-G, Karlsson AC, Sette A, Aleman S, Buggert M. 2022. Ancestral SARS-CoV-2-specific T cells cross-recognize the Omicron variant. Nat Med 28:472–476. 10.1038/s41591-022-01700-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Akaishi T. 2022. Insertion-and-Deletion Mutations between the Genomes of SARS-CoV, SARS-CoV-2, and Bat Coronavirus RaTG13. Microbiol Spectr 10:e0071622. 10.1128/spectrum.00716-22. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Murakami S, Kitamura T, Suzuki J, Sato R, Aoi T, Fujii M, Matsugo H, Kamiki H, Ishida H, Takenaka-Uema A, Shimojima M, Horimoto T. 2020. Detection and Characterization of Bat Sarbecovirus Phylogenetically Related to SARS-CoV-2, Japan. Emerg Infect Dis 26:3025–3029. 10.3201/eid2612.203386. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Chim SSC, Tsui SKW, Chan KCA, Au TCC, Hung ECW, Tong YK, Chiu RWK, Ng EKO, Chan PKS, Chu CM, Sung JJY, Tam JS, Fung KP, Waye MMY, Lee CY, Yuen KY, Lo YMD. 2003. Genomic characterisation of the severe acute respiratory syndrome coronavirus of Amoy Gardens outbreak in Hong Kong. Lancet 362:1807–1808. 10.1016/S0140-6736(03)14901-X. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Young BE, Fong S-W, Chan Y-H, Mak T-M, Ang LW, Anderson DE, Lee CY-P, Amrun SN, Lee B, Goh YS, Su YCF, Wei WE, Kalimuddin S, Chai LYA, Pada S, Tan SY, Sun L, Parthasarathy P, Chen YYC, Barkham T, Lin RTP, Maurer-Stroh S, Leo Y-S, Wang L-F, Renia L, Lee VJ, Smith GJD, Lye DC, Ng LFP. 2020. Effects of a major deletion in the SARS-CoV-2 genome on the severity of infection and the inflammatory response: an observational cohort study. Lancet 396:603–611. 10.1016/S0140-6736(20)31757-8. [DOI] [PMC free article] [PubMed] [Google Scholar]

