Genome sequence diversity of SARS-CoV-2 obtained from clinical samples in Uzbekistan

Alisher Abdullaev; Abrorjon Abdurakhimov; Zebinisa Mirakbarova; Shakhnoza Ibragimova; Vladimir Tsoy; Sharofiddin Nuriddinov; Dilbar Dalimova; Shahlo Turdikulova; Ibrokhim Abdurakhmonov

doi:10.1371/journal.pone.0270314

. 2022 Jun 27;17(6):e0270314. doi: 10.1371/journal.pone.0270314

Genome sequence diversity of SARS-CoV-2 obtained from clinical samples in Uzbekistan

Alisher Abdullaev ^1,^*, Abrorjon Abdurakhimov ¹, Zebinisa Mirakbarova ¹, Shakhnoza Ibragimova ¹, Vladimir Tsoy ¹, Sharofiddin Nuriddinov ¹, Dilbar Dalimova ¹, Shahlo Turdikulova ¹, Ibrokhim Abdurakhmonov ^1,²

Editor: Vladimir Makarenkov³

PMCID: PMC9236271 PMID: 35759503

Abstract

Tracking temporal and spatial genomic changes and evolution of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) are among the most urgent research topics worldwide, which help to elucidate the coronavirus disease 2019 (COVID-19) pathogenesis and the effect of deleterious variants. Our current study concentrates genetic diversity of SARS-CoV-2 variants in Uzbekistan and their associations with COVID-19 severity. Thirty-nine whole genome sequences (WGS) of SARS-CoV-2 isolated from PCR-positive patients from Tashkent, Uzbekistan for the period of July-August 2021, were generated and further subjected to further genomic analysis. Genome-wide annotations of clinical isolates from our study have revealed a total of 223 nucleotide-level variations including SNPs and 34 deletions at different positions throughout the entire genome of SARS-CoV-2. These changes included two novel mutations at the Nonstructural protein (Nsp) 13: A85P and Nsp12: Y479N, which were unreported previously. There were two groups of co-occurred substitution patterns: the missense mutations in the Spike (S): D614G, Open Reading Frame (ORF) 1b: P314L, Nsp3: F924, 5`UTR:C241T; Nsp3:P2046L and Nsp3:P2287S, and the synonymous mutations in the Nsp4:D2907 (C8986T), Nsp6:T3646A and Nsp14:A1918V regions, respectively. The “Nextstrain” clustered the largest number of SARS-CoV-2 strains into the Delta clade (n = 32; 82%), followed by two Alpha-originated (n = 4; 10,3%) and 20A (n = 3; 7,7%) clades. Geographically the Delta clade sample sequences were grouped into several clusters with the SARS-CoV genotypes from Russia, Denmark, USA, Egypt and Bangladesh. Phylogenetically, the Delta isolates in our study belong to the two main subclades 21A (56%) and 21J (44%). We found that females were more affected by 21A, whereas males by 21J variant (χ2 = 4.57; p ≤ 0.05, n = 32). The amino acid substitution ORF7a:P45L in the Delta isolates found to be significantly associated with disease severity. In conclusion, this study evidenced that Identified novel substitutions Nsp13: A85P and Nsp12: Y479N, have a destabilizing effect, while missense substitution ORF7a: P45L significantly associated with disease severity.

Introduction

A novel coronavirus, SARS-CoV-2, was first identified in December 2019 (Wu et al., 2020) and rapidly spread around the world. According to Johns Hopkins Coronavirus Resource Center (CRC) SARS-CoV-2 infected hundreds of millions and killed more than 5 million of people until the middle of 2021 [1]. Origin of this dangerous virus is still controversial and has been a subject of interest for many researchers. Recently, analysis of genetic structure of the SARS-CoV-2 with different coronavirus genomes revealed that it has a mosaic genome and might be obtained via intragenic recombination of various virus strains [2] (Makarenkov, 2021). Scientists continue to study the SARS-CoV-2 genome and to date numerous whole genome sequence (WGS) analyses found mutational variations in the viral genome [3–6].

Currently, there a need for increased global sequencing efforts to identify the spread of new variants as quickly as possible, since analyzing genome sequencing and single-nucleotide polymorphism (SNP) calling have been a hotspot for a wide variety of epidemiological, clinical, and therapeutic efforts [3, 4, 7–9]. Although most mutations in the SARS-CoV-2 genome are expected to be either deleterious or relatively neutral, a small proportion will affect functional properties and may alter infectivity, disease severity in even after the host immunity background [3, 10, 11].

Genomic variation is informative for tracking the distribution of the virus and identifying major clades related to the various variants of SARS-CoV-2 with different epidemiological outcome [12]. The genomic variability of SARS-CoV-2 specimens scattered across the globe can underly geographically specific etiological effects. Previous studies revealed the prevalence of single nucleotide transitions as the major mutational type characterized by geographic and genomic specificity across the world [3].

Country of origin and time since the start of the pandemics were the most influential metadata associated with genomic variation. It was highlighted that some geographic regions (populations) have unusually high (many new variants) while others have low (isolated) viral phylogenetic diversity. Such studies provided a direction to prioritize genes associated with outcome predictors (e.g., health, therapeutic, and vaccine outcomes) and to improve DNA tests for predicting disease status [13]. Current study is aimed at identifying SARS-CoV-2 variants and mutation profile in the coronavirus genome presented during the second wave of coronavirus pandemic in Uzbekistan and assessing their impact on the disease severity.

Overall pattern of the COVID-19 pandemic in Uzbekistan, since the first confirmed case reported on the 15^th of March, 2020, has accompanied two distinct disease waves until October of 2021. Until the 20^th of October, 2021, a total of 182 060 cases and 1292 deaths were reported, of that the first peak occurred in July-August 2020, when within the two months, the total number of cases increased rapidly from 12 295 to 43 476 with two months increase of 31 181. The second waive has occurred in July to August of 2021, following the increase of infected people from 113 559 to 160 589 within the two (n = 47 030) months period (Fig 1; https://www.worldometers.info). During these two diseases waives, a number of deaths was 299 and 367, respectively, and accounted for 30.5% of total deaths (n = 2179) since pandemics started in Uzbekistan (Fig 2; https://www.worldometers.info). The lowest number of detected CVOID-19 cases in Uzbekistan (n = 1078) was observed in February 2021 (WHO, https://covid19.who.int/region/euro/country/uz).

Previous efforts [14], have characterized 18 high-quality WGS reads for SARS-COV-2 from the very early symptomatic COVID-19 patients, sampled from Tashkent region clinics for the period of October and beginning of December, 2020. After the first wave, WGS of 18 SARS-CoV-2 genomes, distributed in Uzbekistan at that period, have revealed a total of 128 SNPs, consisting of 45 shared and 83 unique mutations, phylogenetically suggesting their origin and spread from European and Near East countries because of international travels [14]. However, a periodic genome sequencing of SARS-CoV-2 should be carried out to identify the degree of mutations and novel variants of concern (VOC), potentially useful for disease diagnostics, monitoring and treatment.

Therefore, the main purpose of current study was to initiate another large-sample, WGS and genetic diversity evaluation study of the SARS-CoV-2, distributed during the second infection waive period in Uzbekistan, using virus samples isolated from COVID-19 positive patients with mild, moderate, and severe symptoms. Here we report that in the second COVID-19 disease-waive, a SARS-CoV-2 has changed into several new variants with majority (82%) distribution of Delta variant in the country. We have identified 223 nucleotide-level variations and 34 deletions at different positions with the two co-occurred genomic substitution patterns and two novel mutations (Nsp13: A85P and Nsp12: Y479N). Results showed that one of missense substitutions namely ORF7a: P45L was significantly associated with disease severity.

Materials and methods

Samples collection

The SARS-CoV-2 strains were isolated from nasopharyngeal swab samples (Huachenyang Technology, Shenzhen, China) of SARS-CoV-2 qPCR positive (Ct≤28) patients with mild, moderate, and severe symptoms who were being treated at the State Hospital Zangiota-1 located in Tashkent, Uzbekistan, during July 2021. The study included 20 male and 19 female patients, a total of 39 samples with average ages of 55,21 (SD = 14,9; Table 1). Ethical approval for this study was obtained from Ethics Committee of Center for Advanced Technologies under Ministry of Innovative development (Approval Date: May 5, 2021; Approval Number CAT-EC-2021/05-1). Samples collected were anonymized and distinct ID numbers were assigned to each, only keeping age and biological sex for the downstream analysis and reporting purposes. Because of anonymity of the collected data, a voluntary participation condition and non-invasiveness of sequencing experiment, with clear explanation of the purpose of sample collection to each participant, we received their verbal consent. All the experiments were carried out in accordance with the relevant guidelines and regulations.

Table 1. The average number of nucleotide and amino acid substitutions, and deletions per the SARS-CoV-2 genome among 39 samples identified in Uzbekistan (with respect to the reference genome NC_045512.2).

Mutation type	Average	SD	max	min
nucleotide substitutions	31.61	9.234	45	9
nucleotide deletions	16.46	4.381	25	13
amino acid substitutions	25.46	7.350	34	7
amino acid deletions	5.150	1.463	8	4

Open in a new tab

Data collection

Epidemiological, clinical, and disease severity data were extracted from electronic medical records using a standardized data collection form, kindly provided by State Hospital Zangiota-1 located in Tashkent, Uzbekistan. All data records were matched with previously assigned sample IDs and used as anonymized dataset for downstream analyses.

SARS-CoV-2 nucleic acid isolation

Total nucleic acid was extracted using the QIAseq DIRECT SARS-CoV-2 (Qiagen GmbH, Hilden, Germany). RNA quantity was evaluated through the Qubit RNA HS assays kits (Life Technologies, Carlsbad, California, USA) on Qubit® 2.0 Fluorometer (Life Technologies, Carlsbad, California, USA) according to manufacturer’s manual. The presence of SARS-CoV-2 in the purified RNA samples for the downstream steps was estimated by fluorescence detection of RdRp and N genes using Biotest SARS-CoV-2 RT-qPCR Kit (Biotest Lab LLC, Tashkent, Uzbekistan) according to manufacturer’s protocol on the QuantStudio™ 5 Real-Time PCR System (Applied Biosystems, Foster City, USA).

Next-generation sequencing

Whole genome amplification of the SARS-CoV-2 was performed using the CleanPlex® SARS-CoV-2 Research and Surveillance NGS Panel (Paragon Genomics Inc., Hayward, CA, USA). Briefly, cDNA was generated from previously extracted RNA using RT Primer Mix DP followed by RT reaction purification from RNA using CleanMag® Magnetic Beads (Paragon Genomics Inc., Hayward, CA, USA). Then, a multiplex PCR (mPCR) reaction using target-specific primers to amplify the entire SARS-CoV-2 genome was performed, using a 2-pool design, followed by digestion and post-digestion purification, and second mPCR reaction. This was to amplify and add sample-level i5 and i7 primers into the generated libraries for the Illumina sequencing platforms (Illumina, San Diego, CA, USA). Finally, the library was purified using CleanMag® Magnetic Beads (Paragon Genomics Inc., Hayward, CA, USA). Libraries were evaluated by gel-electrophoresis and considered for sequencing when a fragment size ~ 275 bp was obtained and the final concentration was above 2.0 ng/μl, measured on Qubit^TM with dsDNA HS Assay Kit (Thermo Fisher Scientific, Waltham, MA, USA).

After confirmation of the library quality, libraries were normalized to 10 nM and samples with unique index combinations were pooled in equimolar ratios to reach the recommended final concentration of 4 nM for sequencing. After a further quantification with Qubit dsDNA HS Assay Kit (Thermo Fisher Scientific, Waltham, MA, USA), pooled libraries were prepared for the sequencing, following the Standard Normalization protocol on MiSeq System Denature and Dilute Libraries Guide (Illumina, San Diego, CA, USA) with a final denaturation and dilution to 11 pM. Sequencing was carried-out on a MiSeq instrument (Illumina, San Diego, CA, USA) with Reagent Kit v3, using 20 pM PhiX control spike-in of 5% for low-diversity libraries, setting 2 x 150 cycles, and generating paired-end reads.

Genomic analyses, variants assessments and phylogeny

NGS raw data (FASTQ files) were generated from MiSeq Local Run Manager (Illumina, San Diego, CA, USA) and uploaded on the SOPHiA DDM platform (SOPHiA Genetics, Lausanne, Switzerland) for the external quality check, trimming of adaptors, variant call review, re-alignment of indels, quality measurements, and determination of the consensus genome by mapping to reference sequence MN908947 (NC 045512.2). For this, we have used a proprietary design pipeline to cover the entire genome.

The public database GISAID [15] was used for the BLAST searches and for mutation analysis. The Nextclade Web tool v.1.11.1 [16] was used to compare study sequences to SARS-CoV-2 reference sequences, assign them to clades, and determine their position within the SARS-CoV-2 phylogenetic tree.

Variant calling, and mutation identification was also additionally performed by using Genome Detective Coronavirus Typing Tool [17], CoV-GLUE v.1.1.108 [18] and COVID-19 genome annotator (http://giorgilab.unibo.it/coronannotator/).

The full-length genomic sequences of 39 coronaviruses were aligned using the L-INS-i method of MAFFT v7.310 [19] https://www.nature.com/articles/s41598-020-79484-8-ref-CR23. Phylogenetic tree of 39 samples, reference genome and genomes of variants constructed by the Neighbor-Joining method [20] was performed using MEGA X software [21]. The bootstrap consensus tree inferred from 10000 replicates is taken to represent the evolutionary history of the taxa analyzed. Branches corresponding to partitions reproduced in less than 50% bootstrap replicates are collapsed. The evolutionary distances were computed using the Maximum Composite Likelihood method. Codon positions included were 1st+2nd+3rd+Noncoding. All ambiguous positions were removed for each sequence pair (pairwise deletion option). The phylogenetic trees were visualized by MEGA X [21] and FigTree v1.4.4 software (http://tree.bio.ed.ac.uk/software/figtree/).

Data availability

Nucleotide sequences of SARS-CoV-2 isolates from Uzbekistan were submitted to GISAID on 2.08.2021 and available for registered users at https://www.epicov.org/, accession IDs from EPI_ISL_3188963 to EPI_ISL_3189001. The full sequence reads and genome annotation data used and/or analyzed in this study are also available in the S1 and S2 Files.

Statistical analysis

Summary statistics and the distribution of the primary data were visualized using BoxPlotR [22]. For association analysis, patients with mild and moderate disease severity were pooled into one "non-severe" category and compared with the "severe" (severe-critical) patient group. The association between viral genotypes and disease severity was investigated using the Fisher exact test and odds ratio (OR) calculation via a 2×2 contingency table. Comparisons were made between our phylogenetic clades [e.g., a number of patients, infected with a SARS-CoV-2 from the clade 21A (or defined SNPs) versus the number of patients, not infected with a SARS-CoV-2 from the clade 21J (or defined SNPs] and the different categories of disease severity (e.g., a number of patients with severe disease versus a number of patients without severe disease).

Protein stability prediction

Prediction the increased or decreased stability of protein upon amino-acid substitution was done by the estimation of difference in Gibbs free energy of unfolding (ΔΔG value in kcal/mol) between the mutated and wild-type proteins (ΔΔG = ΔG_mutant- ΔG_wild-type) [23, 24]. Although many prediction tools are available, the conflicting prediction results from different tools could cause confusion to users. In order to avoid such bias in predicting proteins stability, several programs have been used in our analyses, which exploited the diverse prediction models from machine-learning to energy-based force-fields [25], and the average ΔΔG value was calculated for each protein.

Results

Genome-wide characterization of SARS-CoV-2 isolates from Uzbekistan

Forty-eight samples collected from the COVID-19 positive patients after qPCR screening were selected for further WGS. Out of 48 samples isolated from SARS-CoV-2 positive patients, we generated 39 high-quality sequences. The average genome read length of 39 samples was 29 886.5 (σ = 4.3) nucleotides, the average content of nucleotide was 29.93% of Adenine (A), 32.11% of Thymidine (T), 19.60% of Guanine (G) and 18.32% of Cytosine (C). The GC content was 37.92 percent.

Genome-wide annotations of SARS-Cov-2 positive sample isolates in our study revealed a total of 179 nucleotide-level variations, including two novel mutations, which were unreported previously, and 34 deletions at different positions throughout the entire genome of SARS-CoV-2. Comparative sequence alignment of 179 mutations to the reference genome determined that 114 (63.7%) were missense (nonsynonymous) mutations and 65 (36.3%) were silent (synonymous) mutations. The average number of SNVs per isolate was 31.6 (σ = 9.23) with minimum and maximum values of 9 and 45 respectively (Table 1; Fig 3).

Fig 3 — The dots mark the median values, the bold bar marks interquartile range; nt–nucleotide; aa–amino acid.

Analysis of nucleotide substitutions rate in the SARS-CoV-2 genome showed prevalence of transitions of 68.1% with the most common C>T substitution of 42% (Figs 4 and 5; Table 2). The average number of transitions per isolate was 21. 1 (σ = 6.68), with a minimum and maximum number of 6 and 33, respectively (Fig 5). Among transversion events a G>T substitutions had the highest rate of 16% (Table 2; Fig 4).

Fig 4 — The cross marks the mean values, the bold line marks median.

Fig 5 — The cross marks the mean values, the bold line marks median.

Table 2. Single nucleotide substitutions frequency in the genome of SARS-CoV-2 isolates (n = 39) from Uzbekistan.

Nucleotide substitution	Type	Average per genome	Frequency
C>T	Transition	13.10	41.96%
G>T	Transversion	4.90	15.63%
A>G	transition	3.23	10.13%
T>C	transition	2.49	8.14%
G>A	transition	2.33	8.11%
C>G	transversion	1.69	5.12%
C>A	transversion	1.18	3.74%
T>G	transversion	0.97	3.12%
G>C	transversion	0.38	1.13%
A>T	transversion	0.31	1.12%
A>C	transversion	0.26	1.032%
T>A	transversion	0.21	0.768%
		Total:	100%

Open in a new tab

Detailed analysis on genomic distribution of SNPs and amino acid substitutions revealed that the most variable regions are the ORF1a and ORF1b with 59 and 43 SNPs, respectively (Table 3). Interestingly, the silent SNPs were more abundant in the Nsp3 and Nsp12 (RdRp), which were nine and 10, respectively. The highest number of amino acid substitutions was found in the ORF1a (n = 34), ORF1b (n = 24) and S (24) genes (Table 3). The only one amino acid substitution I82T was present in the Membrane (M) protein (n = 29; 74% of isolates) and no mutation found in the Envelope (E) protein. Among all genes with the highest rate of amino acid substitutions was the Spike (S) gene (6 mutations with the frequency of occurrence from 79% to 100%) and the ORF1b (3 mutations with the frequency of occurrence from 77% to 97%). In the S protein, the SARS-CoV-2 mutation D614G was identified in all isolates (n = 39; 100%) from our study, followed by R158G (n = 35; 90%), T19R (n = 34; 87%), D950N (n = 33; 85%), L452R (n = 33; 85%), T478K (n = 32; 82%) and P681R (n = 31; 79%). Frequencies of all identified missense mutation in SARS-CoV-2 isolates from our study have been presented in the Table 4.

Table 3. Genome distribution of the total number of identified nucleotide and amino acid substitutions among the SARS-CoV-2 (n = 39) samples from Uzbekistan.

Gene/region	Total number of SNVs (silent)	Total number of amino acid substitutions
5`UTR	3	-
3`UTR	2	-
ORF1a	59 (25)	34
ORF1b	43 (19)	24
ORF3a	5 (0)	5
ORF7a	6 (1)	5
ORF7b	2 (0)	2
ORF8	6 (2)	4
ORF9b	4 (1)	3
M	3 (2)	1
N	12 (1)	11
S	32 (8)	24

Open in a new tab

Table 4. The frequency of missense mutation in the SARS-CoV-2 genotypes identified in Uzbekistan (nucleotide and amino acid positions are indicated relative to the reference genome NC_045512.2).

Gene	Gene product/region	SNP (refseq position)	AA substitution	Number of samples	Frequency
ORF1a	Nsp2	C884T	R207C	22	0.56
	Nsp3	C6402T	P2046L	16	0.41
	Nsp3	C7124T	P2287S	16	0.41
	Nsp4	C10029T	T3255I	16	0.41
	Nsp6	A11201G	T3646A	15	0.38
	Nsp3	G4181T	A1306S	14	0.36
	Nsp4	G9053T	V2930L	14	0.36
	Nsp4	C9891T	A3209V	11	0.28
	Nsp3	C5184T	P1640L	9	0.23
	Nsp6	T11418C	V3718A	9	0.23
	Nsp3	C6539T	H2092Y	8	0.21
	Nsp6	C11195T	L3644F	7	0.18
	Nsp3	C3096T	S944L	7	0.18
	Nsp2	G1048T	K261N (K81N)	6	0.15
	Nsp2	C1191T	P309L	6	0.15
	Nsp3	T6954C	I2230T	4	0.10
	Nsp3	C3267T	T1001I	3	0.08
	Nsp3	C5388A	A1708D	2	0.05
	Nsp3	C5031T	T1589I	2	0.05
	Nsp3	C4455T	A1397V	1	0.03
	Nsp6	T11073C	F3603S	1	0.03
	Nsp2	G1820A	G519S	1	0.03
	Nsp10	T13188C	I4308T	1	0.03
	3C-like proteinase	A10323G	K3353R	1	0.03
	Nsp3	C4613T	L1450F	1	0.03
	Nsp2	T1227G	M321R	1	0.03
	Nsp8	T12667G	N4134K	1	0.03
	Nsp2	G1161A	R299K	1	0.03
	Nsp3	C5826T	T1854I	1	0.03
	Nsp9	C12756T	T4164I	1	0.03
	Nsp3	G6446T	V2061F	1	0.03
	Nsp6	G11417A	V3718I	1	0.03
	Nsp8	G12569T	V4102L	1	0.03
	Nsp2	T3026A	Y921N	1	0.03
ORF1b	ORF1ab polyprotein	C14408T	P314L	39	1
		C16466T	P1000L	33	0.85
		G15451A	G662S	30	0.77
		C19220T	A1918V	18	0.46
		C20320T	H2285Y	6	0.15
		T19420C	S1985P	4	0.10
		C16349T	S961L	4	0.10
		A21137G	K2557R	3	0.08
		A17431G	I1322V	2	0.05
		C18176T	P1570L	2	0.05
		G16489C^†	A1008P^† (NSP13: A85P)	1	0.03
		G18040T	A1525S	1	0.03
		G18469A	D1668N	1	0.03
		G19788C	E2107D	1	0.03
		T16307C	F947S	1	0.03
		G16852T	G1129C	1	0.03
		G21204T	K2579N	1	0.03
		C17125T	L1220F	1	0.03
		C14120T	P218L	1	0.03
		A20059G	S2198G	1	0.03
		G16741T	V1092F	1	0.03
		G20578T	V2371L	1	0.03
		G13726A	V87I	1	0.03
		T14875A^†	Y470N^† (NSP12: Y479N)	1	0.03
3a	ORF3a protein	C25469T	S26L	31	0.79
		A25439C	K16T	3	0.08
		G25947C	Q185H	2	0.05
		T25520G	F43C	1	0.03
		G25996T	V202L	1	0.03
7a	ORF7a protein	C27752T	ORF7a:T120I	31	0.79
		T27638C	ORF7a:V82A	23	0.59
		C27739T	ORF7a:L116F	11	0.28
		C27527T	ORF7a:P45L	7	0.18
		G27478T	ORF7a:V29L	1	0.03
7b	ORF7b protein	C27874T	ORF7b:T40I	14	0.35
7b	ORF7b protein	T27835C	ORF7b:I27T	1	0.03
8	ORF8 protein	A28095T	ORF8:K68^*	4	0.10
		A28111G	ORF8:Y73C	4	0.10
		C27972T	ORF8:Q27^*	3	0.08
		G28048T	ORF8:R52I	1	0.03
9b	ORF9b protein	A28461G	ORF9b:T60A	33
		C28291T	ORF9b:P3L	4	0.10
		G28396A	ORF9b:G38D	1	0.03
M	Membrane glycoprotein	T26767C	I82T	29	0.74
N	Nucleocapsid phosphoprotein
	NTD: RBD	A28461G	D63G	33	0.85
	C-tail	G29402T	D377Y	32	0.82
	SR-R	G28881T	R203M	30	0.77
	LQ-R	G28916T	G215C	15	0.38
	C-tail	G29427A	R385K	9	0.23
	N-tail	G28280C A28281T T28282A	D3L	4	0.10
	LQ-R	C28977T	S235F	4	0.10
	CTD	C29358T	T362I	4	0.10
	SR-R	G28883C	G204R	3	0.08
	SR-R	G28881A, G28882A	R203K	3	0.08
	NTD	G28739T	A156S	1	0.03
	LQ-R	G28975A	M234I	1	0.03
S	Spike protein
	S1: SD2	A23403G	D614G	39	1.00
	S1: NTD	del22029-22034	R158G	35	0.90
	S1: NTD	C21618G	T19R	34	0.87
	S2: HR1	G24410A	D950N	33	0.85
	S1: RBD: RBM	T22917G	L452R	33	0.85
	S1: RBD: RBM	C22995A	T478K	32	0.82
	S1: SD2	C23604G	P681R	31	0.79
	S1: RBD: RBM	A23063T	N501Y	7	0.18
	S1: NTD	C21846T	T95I	7	0.18
	S1: NTD	G21987A	G142D	5	0.13
	S1: SD1	C23271A	A570D	4	0.10
	S2	G24914C	D1118H	4	0.10
	S2	A24110C	I850L	4	0.10
	S1: SD2	C23604A	P681H	3	0.08
	S2: HR1	T24506G	S982A	3	0.08
	S2	C23709T	T716I	3	0.08
	S1	A22320C	D253A	1	0.03
	S2: TM	G25229A	G1223S	1	0.03
	S2	T24903C	I1114T	1	0.03
	S1: RBD	A22814G	I418V	1	0.03
	S2	G24751T	L1063F	1	0.03
	S1: NTD	C22323T	S254F	1	0.03
	S1: SD2	T23600C	S680P	1	0.03
	S2: TM	G25250T	V1230L	1	0.03

Open in a new tab

Nsp–nonstructural protein; ORF–open reading frame; S1, S2—Spike subunits 1 and 2; NTD—N-terminal domain; RBD—Receptor Binding Domain; RBM–Receptor Binding Motif; SD1, SD2—the subdomains 1 and 2; HR1- heptad repeats; TM—transmembrane domain; SR-R–Serine/Arginine-rich region; LQ-R—Leucin/Glutamine-rich region; CTD—C-terminal domain

*–nonsense mutation

†-novel, unreported substitutions

The other high frequency mutations (identified in 100% of samples) were the P314L amino acid change mutation in the ORF1b, the synonymous F924 (C3037T) change in the Nsp3 and the nucleotide substitution C241T in the 5`UTR (Table 4). Thus, in all isolates, regardless of their phylogenetic position, the substitution of D614G co-occurred within the ORF1b: P314L, Nsp3: F924 and 5`UTR: C241T. Further co-occurrence analysis revealed of the following 5 mutations: Nsp3: P2046L, Nsp3: P2287S, synonymous in the Nsp4: D2907 (C8986T), Nsp6: T3646A and Nsp14: A1918V.

Analysis of mutation profile among the symptomatic isolates from our study have revealed the two novel nucleotide substitutions G16489C and T14875A, which resulted in amino acid changes in the Nsp13: A85P (ORF1b: A1008P) and Nsp12: Y479N (ORF1b: Y470N), respectively (Table 4; GISAID IDs: EPI_ISL_3188967, EPI_ISL_3188979).

Genome-wide characterization of SARS-CoV-2 isolates has identified 34 nucleotide deletions resulted in 11 amino acid deletions (at the Spike, ORF8, Nsp6 and Nsp1 regions) and non-coding deletion g.a28271 at the upstream of the N gene (Table 5). This non-coding deletion g.a28271 has been identified in all SARS-CoV-2 isolates (n = 39; 100%). The average number of nucleotide deletions was 16.5 (σ = 4.38) per isolate with the minimum and maximum of 13 and 25, respectively. The average number of amino acid deletions was 5.15 (σ = 1.46) per isolate with the minimum and maximum of 4 and 8, respectively (Tables 1 and 5; Fig 3).

Table 5. The frequency of deletions identified among the SARS-CoV-2 isolates (n = 39) in Uzbekistan.

nt deletion	Genomic region	aa deletion	Samples	Freq
g.a28271-	noncoding, upstream N gene	-	39	1.00
22029–22034	S	EF156-157Δ	35	0.90
28248–28253	ORF8	DF119-120Δ	33	0.85
21992–21994	S	Y144Δ	19	0.49
11288–11296	NSP6	SGF 3675–3677Δ	11	0.28
21765–21770	S	HV69-70Δ	5	0.13
516–518	NSP1	M85Δ	1	0.03

Open in a new tab

Positions are indicated relative to the reference genome (NC_045512.2)

The frequency of amino acid deletions in non-structural proteins were 85% in the ORF8: D119/F120Δ (n = 33), 28% in the Nsp6: SGF3675-3677Δ (n = 11) and 3% in the Nsp1: M85Δ (n = 1). In the S protein there were three known amino acid deletions with the following occurrence frequencies:90% in for the E156/F157Δ (n = 35; 90%), 49% for the Y144 Δ (n = 19) and 13% for the HV69-70Δ (n = 5) (Table 5).

Phylogeny of SARS-CoV-2 isolates from Uzbekistan

The Nextclade-based phylogenetic analysis of the SARS-CoV-2 genome samples sequenced in our study has revealed that the most currently distributed SARS-CoV-2 variants in our dataset belong to the tree clades: 20A (Alpha, V1), 20I (Alpha, V1), and 21A (Delta) in consonance with “Nextstrain”‘s nomenclature system [16]. Neighbor Joining analysis and the “Nextstrain” classification has grouped 82% of our SARS-CoV-2 sample genome sequences into the Delta variant clade (n = 32; 82%), followed by Alpha (n = 4; 10.3%) and 20A (n = 3; 7.7%) clades (Figs 6 and 7).

Fig 6 — Sequences are placed on a reference tree, clades were assigned to the nearest neighbor, and private mutations analyzed. Brunches with colored circle represents variants from Uzbekistan.

Fig 7 — Samples are colored by taxonomic affiliation to clades or subclades. Clades are assigned according to Nextclade nomenclature. Delta variant divided to subclades 21A (green) and 21J (orange). Blue color—clade 20A, Red color–clade 20I (Alpha variant). Bootstrap values were shown.

The Nexstclade phylogenetic analysis based on the WGS showed that Alpha clade variants from our dataset, namely EPI_ISL_3188994 and EPI_ISL_3188999, have clustered with the SARS-COv-2 genome sequences from France and Armenia, whereas EPI_ISL_3188965 and EPI_ISL_3188992 from our study have clustered with England coronavirus sample sequences. The clade 20A sequences from our dataset such as EPI_ISL_3188963, EPI_ISL_3188985 and EPI_ISL_3188990 clustered with the SARS-CoV-2 variants sequenced from USA and England. Our Delta clade grouped sequence reads were clustered with coronavirus sample sequence reads from Russia, Denmark, USA, Egypt and Bangladesh.

Phylogeny and genomic characterization of the Delta isolates from Uzbekistan

Since most of the isolates were classified to the Delta clade, further step was to differentiate them by variations in the S protein sequence and other genomic regions. Analysis of the S gene mutation profile among Delta variants from our study has revealed a total 31 mutations, including five amino acid deletions and eight synonymous SNP mutations. In the Spike protein, high frequency mutations, covering the 94–100% of samples sequenced, were T19R, EFR156-158G, L452R, T478K, D614G, P681R and D950N (Table 6). Along with above variants, there were several genotypes that also have contained Y144Δ (38%), T95I (22%), G142D (16%) and I850L (13%) variations in the Spike protein (Table 6). The Delta genotypes sampled from Uzbekistan in our study were also characterized by occurrence of the high frequency (91%-100%) of nonsynonymous changes in other genomic regions, namely at the ORF1b: P314L, G662S and P1000L, ORF3a: S26L, ORF7a: T120I ORF9b: T60A, N: D63G, N: D377Y, N: R203M, and M: I82T (Table 7).

Table 6. Frequency of the identified Spike protein mutations (missense/synonymous) and deletions in the Delta isolates from Uzbekistan.

aa^* position	Reference	Variant	nt position	mutation	total	Freq	Type
19	T	R	21618	T>G	32	1.00	missense
69	H	ΔH	21765–21768	Δ	1	0.03	deletion
70	V	ΔV	21769–21770	Δ	1	0.03	deletion
95	T	I	21618	C>G	7	0.22	missense
142	G	D	21987	G>A	5	0.16	missense
144	Y	ΔY	21992–21994	Δ	12	0.38	deletion
156	E	ΔE	22029–22030	Δ	32	1.00	deletion
157	F	ΔF	22031–22033	Δ	32	1.00	deletion
158	R	G	22034	Δ	32	1.00	missense
253	D	A	22320	A>C	1	0.03	missense
297	S	S	22453	A>C	1	0.03	synonymous
302	T	T	22468	G>T	1	0.03	synonymous
312	I	I	22498	C>T	1	0.03	synonymous
375	S	S	22687	C>A	2	0.06	synonymous
418	I	V	22814	A>G	1	0.03	missense
452	L	R	22917	T>G	30	0.94	missense
478	T	K	22995	C>A	31	0.97	missense
501	N	Y	23063	A>T	1	0.03	missense
614	D	G	23403	A>G	32	1.00	missense
665	P	P	23557	A>T	1	0.03	synonymous
680	S	P	23600	T>C	1	0.03	missense
681	P	R	23604	C>G	31	0.97	missense
808	D	D	23986	T>C	1	0.03	synonymous
850	I	L	24110	A>C	4	0.13	missense
950	D	N	24410	G>A	32	1.00	missense
982	S	A	24506	T>G	1	0.03	missense
1061	V	V	24745	C>T	8	0.25	synonymous
1063	L	F	24751	G>T	1	0.03	missense
1223	G	S	25229	G>A	1	0.03	missense
1230	V	L	25250	G>T	1	0.03	missense
1238	T	L	25276	C>T	1	0.03	synonymous

Open in a new tab

*-aa–amino acid, positions are indicated relative to the reference genome (NC_045512.2)

Table 7. Mutations in the Delta isolates (21A and 21J) with the frequency of occurrence higher than 20%.

nt position^*	variant	total	frequency
A23403G	S: D614G	32	1.00
C14408T	ORF1b: P314L	32	1.00
del22029-22034	S: R158G	32	1.00
C21618G	S: T19R	32	1.00
A28461G	N: D63G	32	1.00
A28461G	ORF9b: T60A	32	1.00
G24410A	S: D950N	32	1.00
G29402T	N: D377Y	32	1.00
C16466T	ORF1b: P1000L	31	0.97
C22995A	S: T478K	31	0.97
C23604G	S: P681R	31	0.97
C25469T	ORF3a: S26L	31	0.97
C27752T	ORF7a: T120I	31	0.97
T22917G	S: L452R	30	0.94
G28881T	N: R203M	29	0.91
G15451A	ORF1b: G662S	29	0.91
T26767C	M: I82T	29	0.91
T27638C	ORF7a: V82A	23	0.72
C884T	ORF1a: R207C	18	0.56
C19220T	ORF1b: A1918V	18	0.56
C6402T	ORF1a: P2046L	16	0.50
C7124T	ORF1a: P2287S	16	0.50
C10029T	ORF1a: T3255I	16	0.50
G28916T	N: G215C	16	0.50
A11201G	ORF1a: T3646A	16	0.50
G4181T	ORF1a: A1306S	14	0.44
G9053T	ORF1a: V2930L	14	0.44
C27874T	ORF7b: T40I	14	0.44
C9891T	ORF1a: A3209V	11	0.34
C27739T	ORF7a: L116F	11	0.34
G29427A	N: R385K	9	0.28
C5184T	ORF1a: P1640L	9	0.28
T11418C	ORF1a: V3718A	9	0.28
C6539T	ORF1a: H2092Y	8	0.25
C11195T	ORF1a: L3644F	7	0.22
C3096T	ORF1a: S944L	7	0.22
C27527T	ORF7a: P45L	7	0.22
C21846T	S: T95I	7	0.22

Open in a new tab

*-nt–nucleotide, positions are indicated relative to the reference genome

According to the Nexststrain phylogenetic grouping based on the amino acid and several nucleotide substitutions, the Delta isolates from our study were divided into the two main subclades 21A (n = 18; 56.2%) and 21J (n = 14; 43.8 (Figs 6 and 7). All isolates in the subclade 21J were mainly defined by a frequently observed synonymous nucleotide substitution C8986T, and the amino acid substitutions at the ORF1a: A1306S (G4181T), P2046L (C6402T), P2287S (C7124T), V2930L (G9053T), T3255I (C10029T), T3646A (A11201G), ORF1b: A1918V (C19220T), and ORF7b: T40I (C27874T). Three of them (ORF1a: A1306S, V2930L, and ORF1b: A1918V) were unique to the 21J subclade, while the rest were present of subclade 21A, formed by two (11%) of sample genomes sequenced. In the subclade 21J, there were also 50% of isolates bearing the S protein—specific mutation T95I (C27874T).

Isolates in the subclade 21J were characterized by a higher number of average nucleotide and amino acids substitutions, but lower nucleotide and amino acids deletions per isolate compared to the subclade 21A (Table 8, Fig 8). The broader variation range of mutations events was observed among isolates in the subclade 21A (Table 8, Fig 8) than the subclade 21J. It was found that the substitutions N: R385K, ORF1a: S944L, H2092Y, L3644F, and ORF1b: H2285Y were always co-occurred and present in 50% of isolates in the subclade 21A of our study. The Nextclade based analysis of geographical distribution indicated that all samples in the subclade 21A were mostly clustered close to the SARS-CoV-2 genotypes from Bangladesh, whereas all samples in the subclade 21J were mostly clustered with the genome sequence samples of coronavirus from Egypt and Russia.

Table 8. Sample mean values of nucleotide and amino acid mutation type between the subclades 21A and 21J.

Types	21J (n = 14)	σ²	21A (n = 18)	σ²
nt substitutions	36.57 (SD = 2.31)	4.96	30.56 (SD = 8.94)	75.58
nt deletions	14.29 (SD = 32)	9.92	17.00 (SD = 4.93)	23.48
aa substitutions	31.36 (SD = 1.86)	3.23	24.11 (SD = 5.69)	30.65
aa deletions	4.43(SD = 1.08)	1.10	5.30 (SD = 1.64)	2.55

Open in a new tab

Fig 8 — The dots mark the median values, the bold bar marks interquartile range. nt–nucleotide; aa–amino acid.

Analysis of disease severity among patients infected with the Delta variant

The gender distribution of selected patients in our study was 51.3% (n = 20) of males, and 48.7% (n = 19) of females, with age distribution of 50 (SD = 14.7) and 60.7 (SD = 13.2) years respectively, the mean age of all 39 patients in both gender groups was 55.2 (SD = 14.9) (Table 9; Fig 9).

Table 9. Age and gender distribution of studied patients.

	Mean Age (SD)	Number of patients
Male	50 (14,7)	20
Female	60.7 (13,2)	19
Total (M+F)	55.21 (14,9)	39

Open in a new tab

A number of patients with mild, severe, and critical manifestation of COVID-19 were 46.2% (n = 18), 33.3% (n = 13), and 20.5% (n = 8) respectively. No statistically significant associations were found between disease severity vs. clades/subclades (p = 0.32), and between disease severity vs. gender (p = 0.4).

Clinical symptoms among patients affected by the Delta variant (n = 32) were distributed by the disease severity to three groups: mild 46.9% (n = 15), severe 31.2% (n = 10), critical 21.9% (n = 7), and death (2.5% of all studied). Almost all hospitalized patients in three groups had at least two chronic comorbidities, including obesity, cardiovascular diseases, and diabetes. Examining the effect of comorbidities on a clinical outcome, there were no significant associations found among the above-mentioned three groups related to comorbidities and disease severity. Interestingly, a distribution analysis of the Delta subclades among gender groups revealed that females were more affected by the 21A, whereas males by the 21J variants (OR = 4.73; 95% CI 1.05–24.44; χ2 = 4.57; p = 0.032; Fig 10). There were no significant differences found between gender and disease severity (p = 0.4).

Association analysis of different amino acid substitutions, including the two novel Nsp13:A85P and Nsp12:Y479N mutations, no mutation occurred within 39 SARS-CoV-2 genomes have revealed a significant association with disease severity, except for P45L (C27527T) in the ORF7a region. Patients, infected with a SARS-CoV-2 bearing the ORF7a 45P variant found in our study, have developed severe and critical form of the COVID-19. The SARS-CoV-2 bearing 45L amino acid change in the ORF7a was predominantly found in patients with mild symptoms. To determine whether the ORF7a:P45L substitution associated with mild or severe and critical disease symptoms wused logistic regression analysis and found that the ORF7a: 45L has been significantly associated with causing the mild disease symptoms (CMILE OR = 0.1; 95% CI 0.01–0.67; p<0,01), while the ORF7a:45P has been significantly associated with the severe and critical COVID-19 symptoms (CMILE OR = 9.1; 95% CI 1.38–84.98; p<0,01) (Table 10).

Table 10. Logistic regression analysis of viral ORF7a:P45L mutation and disease severity.

Mutation	CMLE OR	Lower 95% CI	Upper 95% CI	p-value	p-value (F)
ORF7a: 45L	0.1^*	0.01	0.67	0.004	0.03^**
ORF7a: 45P	9.1^*	1.38	84.98	0.004	0.03^**

Open in a new tab

*Conditional maximum likelihood estimates of Odds Ratio.

** Fisher`s exact test two-tailed p-value

Effect of amino acid substitution in the ORF7a: P45L as well as novel variants of the Nsp13:A85P, and Nsp12: Y479N on protein stability

To measure protein stability of mutated variants of SARAS-CoV-2 sequenced in our study, we have analyzed the variability of Gibbs free energy of unfolding (ΔΔG) values. We used the diverse prediction models of machine-learning to energy-based force-fields with different software programs (Fig 11). Analysis of average ΔΔG values, in terms of increased or decreased stability of the ORF7a: P45L and novel variants of the Nsp13: A85P, and Nsp12: Y479N, has revealed a significant decrease of protein stability with an average ΔΔG values of -0.48 (s = 0.51), -0.56 (s = 1.08), and -1.5; (s = 1.26), respectively (Fig 11). The P45L substitution revealed that non-synonymous Pro to Leu substitution has a deleterious effect, leading to a highly decreased stability of theORF7a protein (Fig 12). Thus, our finding suggests that non—synonymous substitution of amino acid proline to leucine in the position 45 of the ORF7a protein strongly affects to the functional activity of protein and weakening disease symptoms in comparison with the original coronavirus protein.

Fig 11 — ΔΔG—free energy of unfolding (kcal/mol). The negative and positive predicted ΔΔG values mean the destabilizing and stabilizing effect, respectively [24, 26].

Fig 12 — Dotted line represents hydrogen bonds. Substitution of Pro to Leu (left) results in destabilizing effect due to loss of hydrogen bond.

Discussion

The of the number of individuals affected by COVID-19 depends on several factors which have had an impact on whether new COVID-19 cases are increasing or declining in particular locations. These factors include the infection prevention policies, human behavior, effectiveness of vaccines over time, changes to the coronavirus itself, and the number of people in population who are vulnerable because of many reasons, including the age, genetic and immune status of hosts, and other social aspects of epidemy.

The study of the global situation showed that the COVID-19 disease pandemic in different countries had a various number of upward periods (incidence peaks), which varied from zero to five waives/incidence, among them the largest number of countries fell in two incidence peaks [27]. Our study also shows that in Uzbekistan, for the entire period of the COVID_19 disease pandemic, two clear waves were identified (Fig 1). The first wave was thought to be provoked by the spread of the original Wuhan variant [28] in middle of 2020. However, a previous SARS-CoV-2 sample sequencing effort from Uzbekistan, covering the first waive period (November to December of 2020), concluded that the early SARS-CoV-2 infections in our country were distributed from European and Near East countries as a result of international travelling [14] and represented by clades 20B (77,7%) and 19B (22,3%), whereas during the second wave dominant variant was Delta (82%), followed by Alpha (10.3%) and 20A (7.7%) clades (Figs 6 and 7), comparative phylogenetic tree between SARS-CoV-2 isolates from two waves presented in Fig 13.

The comparative study of the first wave virus genotypes (Fig 13) implied that the SARS-CoV-2 was distributed to Uzbekistan and not happened directly from China or Wuhan variant but has been spread through other countries after, at least, the first cycle of infection from entered countries. Sequencing based phylogeographic analysis in this study also suggested that there were the multiple independent viral introductions into Uzbekistan from North America, Europe, Africa and Asia [14], which were supported by the evidence of clustered outbreak/community transmission. The second peak occurred in middle of 2021 with the spread of the Delta variant, which eventually became dominant in the world [29] and later in Uzbekistan as shown in our current study (Figs 6 and 7). We suggest that the second wave of the SARS-CoV-2 genotype introduction have been mostly occurred through Russia and Turkey due to the strong socio-economic relationships between Uzbekistan and these countries.

The WGS of isolates our efforts have revealed that an average content of nucleotides was mainly consistent with reference SARS-CoV-2 genome (NC_045512) with slight reduction of G (0.012%) and C (0.051%) nucleotides in comparison to the NC_045512 where G and C accounted for 19.61% and 18.37% respectively [8]. It was found that RNA strand with a high number of C and G bases would form more stable stem-loops than that of a high number of T and A bases. This suggested that the SARS-CoV-2 is more efficient in reproduction than other coronaviruses because less energy is consumed in disrupting the secondary structures formed by its genomic RNA [8]. Because the Delta variant has a higher transmission rate compared to the previous variants, our data suggest that slight reduction of G and C content rate is an evolutionary adaptation of Delta variants for the rapid replication and reproduction in host population.

In our study, among the 12 classes of base substitutions, the C→T transition was dominant (42%) (Table 2; Fig 4). Abundance of C→T transition (40.6%) was observed among the first-wave SARAS-CoV-2 sequencing study from Uzbekistan [14]. Interestingly, we found asymmetry in base changes. For example, the rate of the C→T transition was much higher than that of its reverse T→C substitution (42% and 8%, respectively) (Table 2; Fig 4). Likewise, the rate of G→T transversion was five times higher than that of T→G substitution (16% and 3%, respectively) (Table 2; Fig 4). In this context, our results slightly less percentage from those reported previously by Yi et al. [30], but we observed similar pattern. Linear changes in the base composition over the time of spread were also observed in Ebola and influenza viruses [31]. Thus, the C→T and G→T asymmetry in the SARS-CoV-2 mutation spectra may be a characteristic of zoonotic RNA viruses recently introduced to human tissues [30].

Each of novel substitutions in the Nsp13: A85P (G16489C) and Nsp12: Y479N (T14875A) (Table 4), found in samples belonging to Delta variant`s subclade 21A (GISAID sample IDs: EPI_ISL_3188967, EPI_ISL_3188979), has been reported for the first time in this study. Both substitutions are located in nonstructural proteins encoded by the ORF1b of a SARS-CoV-2 genome. Nsp13 is a superfamily 1 helicase which acts as motor protein that unwind a double-stranded nucleic acid into the two single-stranded nucleic acid. Nsp12 of coronaviruses encodes RNA-dependent RNA polymerase (RdRp) involved in replication of their genome and for the transcription of their genes [32]. Therefore, novel mutations found in these regions of the SARS-Cov-2 may play an important role in understanding of the COVID-19 disease epidemic.

In the beginning of 2021, the Alpha variant of SARS-CoV-2 was spread rapidly and became dominant in Uzbekistan (unpublished). The situation has changed after the rapid spread of the Delta variant worldwide. In current study, we observed that the Delta variant accounted for 82% among analyzed isolates and became dominant in Uzbekistan in the middle of 2021 (Fig 7). Recently, Nextstrain updated its clade designation, partitioning the 21A into the two subclades of 21J and 21I. Clade 21J includes the Delta variant, but it possesses additional ORF1a mutations such as A1306S, P2046L, P2287S, V2930L, T3255I as well as the ORF7b mutation T40I, and the N gene mutation G215C. Among analyzed Delta isolates in our dataset, 44% was designated to 21J subclade, whereas the rest belonged to 21A (56%) (Fig 7). There were no samples grouped into 21I subclade in our samples. According to Nextstrain database the Delta 21J variant became predominant worldwide (https://nextstrain.org/ncov/gisaid/global, accessed in December 2021).

The co-occurrence of substitutions in the N: R385K, ORF1a: S944L, H2092Y, L3644F, and ORF1b: H2285Y, identified in our study probably a sign of evolutionary divergence within the subclade 21A; therefore, these mutations require an attention and should be tracked in further studies. In the S gene, the T95I has occurred in 22% of Delta samples. This finding is in full concordance with previously reported study based on the analysis of 1276 Delta isolates [33], on the other hand, in the same study a distinct substitution G215C in the N genes was found as the Delta plus variants in 58% sequences samples, whereas in current study this substitution was identified in 46.8% of Delta isolates.

Analysis of predicted Nsp12 and Nsp13 stability due to the respective Tyr479Asn and Ala85Pro substitutions resulted in decrease of an average ΔΔG values of these proteins for minus 1.5; and minus 0.56, respectively (Fig 11). Decreased stability and low substitution frequency in these core proteins of the SARS-CoV-2, found in our study, suggested that the Nsp13:A85P and Nsp12:Y479N mutations may not have an evolutionary advantage for SARS-CoV-2. Nevertheless, it was reported that the NSP13 downregulates interferon production and signaling as well as NF-κB promoter signaling by limiting the TBK1 and IRF3 activation [34, 35], whereas the Nsp12 attenuates type I interferon production by inhibiting IRF3 nuclear translocation [36]. These implies that the Nsp13:A85P and Nsp12:Y479N substitutions may have an impact on primary interferon suppression in the host cells and can antagonize host antiviral innate immunity. Therefore, further research and monitoring of the spread of these mutations is required.

The statistically significant gender distribution bias among the subclades 21A and 21J (Fig 10), found in sequenced WGS in our study, could be a population specific gender susceptibility to the SARS-CoV-2 clades and underlies a need for further comparative studies in other populations. Previous global GISAID-derived metadata analysis also reported statistically significant gender bias among several SARS-CoV-2 clades [37]. Thus, there is a gender specific susceptibility to SARS-CoV-2 variants worldwide and further studies needed to be conducted to elucidate the possible genetic mechanisms of this phenomenon.

The association of ORF7a: P45L mutation with disease severity in Uzbekistan, reported for the first time herein (Table 10), is in concordance with previous research, that the stabilizing mutation in the ORF7a of a SARS-CoV-2 was associated with the increased severity and lethality in a group of Romanian patients, despite a lower viral copy number and a lower number of associated comorbidities [38].

The ORF7a protein is thought to be a type I transmembrane protein. The structure of the SARS-CoV protein ORF7a, shows similarities to the immunoglobulin-like (Ig-like) fold with some features resembling those of the Dl domain of ICAM-1 and suggests a binding activity to integrin I domains [39]. It is known that Ig-like domain-containing proteins play vital roles in mediating macromolecular interactions in the immune system. Since SARS-CoV-2 ORF7a, similar to SARS-CoV ORF7a, it is predicted to be a member of the Ig-like domain superfamily [40–42]. ORF7a may play a significant role in the clinical severity of COVID-19 [43]. Recent study demonstrated that SARS-CoV-2 ORF7a coincubation with CD14+ monocytes ex vivo triggered a decrease in HLA-DR/DP/DQ expression levels and upregulated significant production of proinflammatory cytokines, including IL-6, IL-1β, IL-8, and TNF-α. Thus, it demonstrates that the SARS-CoV-2 ORF7a is an immunomodulating factor for immune cell binding and triggers dramatic inflammatory responses [42]. This forms the basis of a likely mechanism through which ORF7a mediates the potentially fatal cytokine storm progression in COVID-19 patients, indicating that ORF7a may be a key viral factor for disease severity.

The ORF7a: P45L mutation of the Delta became dominant in Russia [44]. It was also found in the Delta variants isolates from India [45], and in the Omicron samples from South Africa (EPI_ISL_6647958). We suggest that the 45L mutation in the ORF7a is the next step in the evolution of the coronavirus towards a decrease in the severity of the disease, since the main guarantee of the existence and spread of the virus is not the death of the host, but the evolution towards an increase in infectivity.

The study was limited by a relatively small number of samples that were subjected to WGS, since during the pandemic there were logistical issues with the timely supply of reagents in the required quantities. Another limitation of the study was the lack of SARS-CoV-2 samples from patients with asymptomatic disease, since such people do not usually seek medical help. Despite the limitations, the data obtained are valuable for understanding the spread and evolutionary features of SARS-CoV-2 specifically in Central Asian region. Further studies aimed at monitoring of SARS-CoV-2 variants distribution and their genetic diversity in the region are needed.

Supporting information

S1 File. Genome sequences of SARS-CoV-2 samples from Uzbekistan collected during the second pandemic wave.

(FASTA)

Click here for additional data file.^{(1.1MB, fasta)}

S2 File. Annotated genome variants of SARS-CoV-2 samples from Uzbekistan collected during second the pandemic wave.

Nucleotide and amino acid positions are indicated relative to the reference genome NC_045512.2 (MN908947).

(CSV)

Click here for additional data file.^{(87KB, csv)}

Acknowledgments

We thank the doctors and nurses of the COVID-19 clinics at the State Hospital Zangiota-1 Tashkent, Uzbekistan for their help in collection of the samples and disease data from symptomatic patients.

Data Availability

The data underlying the results presented in the study are available from FlagShare Public Repository https://doi.org/10.6084/m9.figshare.19221276.v1.

Funding Statement

This study has been supported by the research grant from the Ministry of Innovative Development, Republic of Uzbekistan (Research Grant number: А-ИРВ-2021-125). There was no additional external or internal funding received for this study. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

1.COVID-19 Map—Johns Hopkins Coronavirus Resource Center. [cited 16 Sep 2021]. Available: https://coronavirus.jhu.edu/map.html
2.Makarenkov V, Mazoure B, Rabusseau G, Legendre P. Horizontal gene transfer and recombination analysis of SARS-CoV-2 genes helps discover its close relatives and shed light on its origin. BMC Ecol Evol. 2021. Jan 21;21(1):5. doi: 10.1186/s12862-020-01732-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Mercatelli D, Giorgi FM. Geographic and Genomic Distribution of SARS-CoV-2 Mutations. Front Microbiol. 2020;11: 1800. doi: 10.3389/fmicb.2020.01800 [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Wang M, Fu A, Hu B, Tong Y, Liu R, Liu Z, et al. Virus Detection: Nanopore Targeted Sequencing for the Accurate and Comprehensive Detection of SARS‐CoV‐2 and Other Respiratory Viruses (Small 32/2020). Small. 2020;16. doi: 10.1002/smll.202002169 [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Lauring AS, Hodcroft EB. Genetic Variants of SARS-CoV-2—What Do They Mean? JAMA. 2021;325: 529–531. doi: 10.1001/jama.2020.27124 [DOI] [PubMed] [Google Scholar]
6.Lau BT, Pavlichin D, Hooker AC, Almeda A, Shin G, Chen J, et al. Profiling SARS-CoV-2 mutation fingerprints that range from the viral pangenome to individual infection quasispecies. Genome Med. 2021;13(1):62. doi: 10.1186/s13073-021-00882-2 ; PMCID: PMC8054698. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Mousavizadeh L, Ghasemi S. Genotype and phenotype of COVID-19: Their roles in pathogenesis. J Microbiol Immunol Infect. 2021;54: 159–163. doi: 10.1016/j.jmii.2020.03.022 [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Yin C. Genotyping coronavirus SARS-CoV-2: methods and implications. Genomics. 2020;112: 3588–3596. doi: 10.1016/j.ygeno.2020.04.016 [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Wang Y, Mao J-M, Wang G-D, Luo Z-P, Yang L, Yao Q, et al. Human SARS-CoV-2 has evolved to reduce CG dinucleotide in its open reading frames. Sci Rep. 2020;10: 12331. doi: 10.1038/s41598-020-69342-y [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Harvey WT, Carabelli AM, Jackson B, Gupta RK, Thomson EC, Harrison EM, et al. SARS-CoV-2 variants, spike mutations and immune escape. Nat Rev Microbiol. 2021;19: 409–424. doi: 10.1038/s41579-021-00573-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Dao TL, Hoang VT, Colson P, Lagier JC, Million M, Raoult D, et al. SARS-CoV-2 Infectivity and Severity of COVID-19 According to SARS-CoV-2 Variants: Current Evidence. J Clin Med. 2021;10: 2635. doi: 10.3390/jcm10122635 [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Lu R, Zhao X, Li J, Niu P, Yang B, Wu H, et al. Genomic characterisation and epidemiology of 2019 novel coronavirus: implications for virus origins and receptor binding. The Lancet. 2020;395: 565–574. doi: 10.1016/S0140-6736(20)30251-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Rahnavard A, Dawson T, Clement R, Stearrett N, Pérez-Losada M, Crandall KA. Epidemiological associations with genomic variation in SARS-CoV-2. Sci Rep. 2021;11: 23023. doi: 10.1038/s41598-021-02548-w [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Ayubov MS, Buriev ZT, Mirzakhmedov MK, Yusupov AN, Usmanov DE, Shermatov SE, et al. Profiling of the most reliable mutations from sequenced SARS-CoV-2 genomes scattered in Uzbekistan. PLoS One. 2022;17(3):e0266417. doi: 10.1371/journal.pone.0266417 [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Shu Y, McCauley J. GISAID: Global initiative on sharing all influenza data–from vision to reality. Eurosurveillance. 2017;22: 30494. doi: 10.2807/1560-7917.ES.2017.22.13.30494 [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Hadfield J, Megill C, Bell SM, Huddleston J, Potter B, Callender C, et al. Nextstrain: real-time tracking of pathogen evolution. Bioinformatics. 2018;34: 4121–4123. doi: 10.1093/bioinformatics/bty407 [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Cleemput S, Dumon W, Fonseca V, Abdool Karim W, Giovanetti M, Alcantara LC, et al. Genome Detective Coronavirus Typing Tool for rapid identification and characterization of novel coronavirus genomes. Bioinformatics. 2020;36: 3552–3555. doi: 10.1093/bioinformatics/btaa145 [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Singer J, Gifford R, Cotten M, Robertson D. CoV-GLUE: A Web Application for Tracking SARS-CoV-2 Genomic Variation. 2020. doi: 10.20944/preprints202006.0225.v1 [DOI] [Google Scholar]
19.Katoh K, Standley DM. MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability. Mol Biol Evol. 2013;30: 772–780. doi: 10.1093/molbev/mst010 [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Saitou N, Nei M. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol. 1987;4: 406–425. doi: 10.1093/oxfordjournals.molbev.a040454 [DOI] [PubMed] [Google Scholar]
21.Kumar S, Stecher G, Li M, Knyaz C, Tamura K. MEGA X: Molecular Evolutionary Genetics Analysis across Computing Platforms. Mol Biol Evol. 2018;35: 1547–1549. doi: 10.1093/molbev/msy096 [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Spitzer M, Wildenhain J, Rappsilber J, Tyers M. BoxPlotR: a web tool for generation of box plots. Nat Methods. 2014;11: 121–122. doi: 10.1038/nmeth.2811 [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Tokuriki N, Stricher F, Serrano L, Tawfik DS. How Protein Stability and New Functions Trade Off. PLOS Comput Biol. 2008;4: e1000002. doi: 10.1371/journal.pcbi.1000002 [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Sanavia T, Birolo G, Montanucci L, Turina P, Capriotti E, Fariselli P. Limitations and challenges in protein stability prediction upon genome variations: towards future applications in precision medicine. Comput Struct Biotechnol J. 2020;18: 1968–1979. doi: 10.1016/j.csbj.2020.07.011 [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Caldararu O, Blundell TL, Kepp KP. A base measure of precision for protein stability predictors: structural sensitivity. BMC Bioinformatics. 2021;22: 88. doi: 10.1186/s12859-021-04030-w [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Parthiban V, Gromiha MM, Schomburg D. CUPSAT: prediction of protein stability upon point mutations. Nucleic Acids Res. 2006;34: W239–W242. doi: 10.1093/nar/gkl190 [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Zhang SX, Arroyo Marioli F, Gao R, Wang S. A Second Wave? What Do People Mean by COVID Waves?–A Working Definition of Epidemic Waves. Risk Manag Healthc Policy. 2021;14: 3775–3782. doi: 10.2147/RMHP.S326051 [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Wu F, Zhao S, Yu B, Chen Y-M, Wang W, Song Z-G, et al. A new coronavirus associated with human respiratory disease in China. Nature. 2020;579: 265–269. doi: 10.1038/s41586-020-2008-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Cherian S, Potdar V, Jadhav S, Yadav P, Gupta N, Das M, et al. SARS-CoV-2 Spike Mutations, L452R, T478K, E484Q and P681R, in the Second Wave of COVID-19 in Maharashtra, India. Microorganisms. 2021;9: 1542. doi: 10.3390/microorganisms9071542 [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Yi K, Kim SY, Bleazard T, Kim T, Youk J, Ju YS. Mutational spectrum of SARS-CoV-2 during the global pandemic. Exp Mol Med. 2021;53: 1229–1237. doi: 10.1038/s12276-021-00658-z [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Wada Y, Wada K, Iwasaki Y, Kanaya S, Ikemura T. Directional and reoccurring sequence change in zoonotic RNA virus genomes visualized by time-series word count. Sci Rep. 2016;6: 36197. doi: 10.1038/srep36197 [DOI] [PMC free article] [PubMed] [Google Scholar]
32.Snijder EJ, Decroly E, Ziebuhr J. Chapter Three—The Nonstructural Proteins Directing Coronavirus RNA Synthesis and Processing. In: Ziebuhr J, editor. Advances in Virus Research. Academic Press; 2016. pp. 59–126. doi: 10.1016/bs.aivir.2016.08.008 [DOI] [PMC free article] [PubMed] [Google Scholar]
33.Kannan SR, Spratt AN, Cohen AR, Naqvi SH, Chand HS, Quinn TP, et al. Evolutionary analysis of the Delta and Delta Plus variants of the SARS-CoV-2 viruses. J Autoimmun. 2021;124: 102715. doi: 10.1016/j.jaut.2021.102715 [DOI] [PMC free article] [PubMed] [Google Scholar]
34.Yuen C-K, Lam J-Y, Wong W-M, Mak L-F, Wang X, Chu H, et al. SARS-CoV-2 nsp13, nsp14, nsp15 and orf6 function as potent interferon antagonists. Emerg Microbes Infect. 2020;9: 1418–1428. doi: 10.1080/22221751.2020.1780953 [DOI] [PMC free article] [PubMed] [Google Scholar]
35.Vazquez C, Swanson SE, Negatu SG, Dittmar M, Miller J, Ramage HR, et al. SARS-CoV-2 viral proteins NSP1 and NSP13 inhibit interferon activation through distinct mechanisms. PLOS ONE. 2021;16: e0253089. doi: 10.1371/journal.pone.0253089 [DOI] [PMC free article] [PubMed] [Google Scholar]
36.Wang W, Zhou Z, Xiao X, Tian Z, Dong X, Wang C, et al. SARS-CoV-2 nsp12 attenuates type I interferon production by inhibiting IRF3 nuclear translocation. Cell Mol Immunol. 2021;18: 945–953. doi: 10.1038/s41423-020-00619-y [DOI] [PMC free article] [PubMed] [Google Scholar]
37.Hamed SM, Elkhatib WF, Khairalla AS, Noreddin AM. Global dynamics of SARS-CoV-2 clades and their relation to COVID-19 epidemiology. Sci Rep. 2021;11: 8435. doi: 10.1038/s41598-021-87713-x [DOI] [PMC free article] [PubMed] [Google Scholar]
38.Lobiuc A, Șterbuleac D, Sturdza O, Dimian M, Covasa M. A Conservative Replacement in the Transmembrane Domain of SARS-CoV-2 ORF7a as a Putative Risk Factor in COVID-19. Biology. 2021;10: 1276. doi: 10.3390/biology10121276 [DOI] [PMC free article] [PubMed] [Google Scholar]
39.Hänel K, Stangler T, Stoldt M, Willbold D. Solution structure of the X4 protein coded by the SARS related coronavirus reveals an immunoglobulin like fold and suggests a binding activity to integrin I domains. J Biomed Sci. 2006;13: 281–293. doi: 10.1007/s11373-005-9043-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
40.Nelson CA, Pekosz A, Lee CA, Diamond MS, Fremont DH. Structure and Intracellular Targeting of the SARS-Coronavirus Orf7a Accessory Protein. Structure. 2005;13: 75–85. doi: 10.1016/j.str.2004.10.010 [DOI] [PMC free article] [PubMed] [Google Scholar]
41.Tan Y, Schneider T, Leong M, Aravind L, Zhang D. Novel Immunoglobulin Domain Proteins Provide Insights into Evolution and Pathogenesis of SARS-CoV-2-Related Viruses. mBio. 2020;11(3):e00760–20. doi: 10.1128/mBio.00760-20 [DOI] [PMC free article] [PubMed] [Google Scholar]
42.Zhou Z, Huang C, Zhou Z, Huang Z, Su L, Kang S, et al. Structural insight reveals SARS-CoV-2 ORF7a as an immunomodulating factor for human CD14+ monocytes. iScience. 2021;24: 102187. doi: 10.1016/j.isci.2021.102187 [DOI] [PMC free article] [PubMed] [Google Scholar]
43.Su C-M, Wang L, Yoo D. Activation of NF-κB and induction of proinflammatory cytokine expressions mediated by ORF7a protein of SARS-CoV-2. Sci Rep. 2021;11: 13464. doi: 10.1038/s41598-021-92941-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
44.Klink GV, Safina KR, Nabieva E, Shvyrev N, Garushyants S, Alekseeva E, et al. The rise and spread of the SARS-CoV-2 AY.122 lineage in Russia. Virus Evol. 2022. Mar 5;8(1):veac017. doi: 10.1093/ve/veac017 [DOI] [PMC free article] [PubMed] [Google Scholar]
45.Das JK, Sengupta A, Choudhury PP, Roy S. Characterizing genomic variants and mutations in SARS-CoV-2 proteins from Indian isolates. Gene Rep. 2021;25: 101044. doi: 10.1016/j.genrep.2021.101044 [DOI] [PMC free article] [PubMed] [Google Scholar]

PLoS One. doi: 10.1371/journal.pone.0270314.r001

Decision Letter 0

Vladimir Makarenkov

Transfer Alert

This paper was transferred from another journal. As a result, its full editorial history (including decision letters, peer reviews and author responses) may not be present.

3 May 2022

PONE-D-22-07071Genome sequence diversity of SARS-CoV-2 obtained from clinical samples in UzbekistanPLOS ONE

Dear Dr. Abdullaev,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Jun 17 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.
A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.
An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Vladimir Makarenkov

Academic Editor

PLOS ONE

Journal Requirements:

1. When submitting your revision, we need you to address these additional requirements.

Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. We noted in your submission details that a portion of your manuscript may have been presented or published elsewhere.

(PLOS Genetics, Manuscript ID: PGENETICS-D-22-00251)

Please clarify whether this publication was peer-reviewed and formally published. If this work was previously peer-reviewed and published, in the cover letter please provide the reason that this work does not constitute dual publication and should be included in the current manuscript."

3. We note that you have stated that you will provide repository information for your data at acceptance. Should your manuscript be accepted for publication, we will hold it until you provide the relevant accession numbers or DOIs necessary to access your data. If you wish to make changes to your Data Availability statement, please describe these changes in your cover letter and we will update your Data Availability statement to reflect the information you provide.

4. Please amend your authorship list in your manuscript file to include author Abrorjon Abdurakhimov.

5. Thank you for stating in your Funding Statement:

(This study has been supported by the research grant from the Ministry of Innovative Development, Republic of Uzbekistan (Research Grant number: А-ИРВ-2021-125). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.)

Please provide an amended statement that declares *all* the funding or sources of support (whether external or internal to your organization) received during this study, as detailed online in our guide for authors at http://journals.plos.org/plosone/s/submit-now. Please also include the statement “There was no additional external funding received for this study.” in your updated Funding Statement.

Please include your amended Funding Statement within your cover letter. We will change the online submission form on your behalf.

6. Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

Additional Editor Comments:

This paper is relevant and well written.

I think the authors could add a short discussion (preferably to the Introduction section) about possible origins of SARS-Cov-2. The references that could be cited are as follows:

Boni, Maciej F., et al. "Evolutionary origins of the SARS-CoV-2 sarbecovirus lineage responsible for the COVID-19 pandemic." Nature microbiology 5.11 (2020): 1408-1417.

Makarenkov, V., Mazoure, B., Rabusseau, G. et al. Horizontal gene transfer and recombination analysis of SARS-CoV-2 genes helps discover its close relatives and shed light on its origin. BMC Ecol Evo 21, 5 (2021).

Domingo JL. What we know and what we need to know about the origin of SARS-CoV-2. Environ Res. 2021;200:111785.

Moreover, the authors can also use a new SimPlot++ tool designed to detect recombination and visualize data using sequence similarity networks:

Samson, S. et al. SimPlot ++: a Python application for representing sequence similarity and detecting recombination, Bioinformatics, 2022; btac287 (https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btac287/6572334?guestAccessKey=d079b57c-5b8e-4bf4-a1d6-06274bd89169).

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The manuscript describes genetic diversity of SARS-CoV-2 during the second wave of infection in Uzbekistan.

Despite, limited number of samples, authors found unique and consistent mutations that were related to major mutations at that time.

Manuscript shows interesting findings, e.g.,effect of amino acid substitution on the protein stability and disease severity. it would be an advantage if authors could elaborate little bit more on the relationship between amino acid substitution and disease severity.

There are some spelling an types which must be corrected before the final publication. Maybe the figure resolution is better in original file, but in the pdf I read it was impossible to read x/y axes labels, and color codes. please correct n=? in abstract.

should be improved.

It would be great if authors refer to their findings/figures, when they draw concluding statements in discussion (like in lines 435-438).

A figure providing an overview of similarity or dissimilarity of WGS of novel mutations Nsp13 and Nsp12 with other mutations reported in the databases would be useful to include.

Manuscript can be accepted after minor revision.

Reviewer #2: The manuscript from Abdullaev et al. describes the SARS-Cov-2 variability in Uzbek population from Tashkent. The samples were collected in the second wave of pandemic, when the Delta clade variants represented the majority. The whole-genome sequencing of the 35 viral samples provides information about virus variability in Uzbekistan at that time. Additional analyses of the patients affected by the virus provide an insight into relation of such variants to disease severity and sex ratio of affected individuals. The authors provide phylogenetic and variant analyses of the sequences with their additional structural characterisation and prediction of function for the novel ones.

Overall, the proposed study is well-done and presents a thoughtful investigation of the SARS-Cov-2 genetic variability and its impact on the COVID-19 infection on a sample from Uzbek population.

The manuscript would benefit from some additional English proofing to improve clarity of the messages. I have made only a few suggestions on the Pdf file with the text to help authors with minor pitfalls (see the attachment).

The abstract would need a conclusion phrase.

There should be more clarity about the comparison groups for the statistical association tests, especially those which appear to show significant difference, to make sure, what feature has been compared to what.

Since the Materials section is after the results section, some information about tools and methods used to obtain results would be helpful to the readers, such as:

“Using text X (or using method KKK, or software VVV) we compared groups NNN and MMM and detected YYY”.

It would make sense to have a final one / two phrase(s) of discussion being conclusive of the manuscript. The final paragraph of discussion finishes the manuscript abruptly.

It would be important to mention the limitation of the study closer to the end of discussion.

Minor comments:

Table 1 – there should probably be a median of the observed number of substitutions. It seems that this table could look better if transposed, so lines would become columns and vice versa.

Table 2 – the last column should better be in % and have two non-zero decimal points, like 42.12% or 42.0095% There should be the bottom line with the Total, where the % would sum up to 100% - is it like that now already? If the presented do not sum up to 100%, authors could another row showing the remaining %-ge and explaining, where it belongs to.

What is “aa position”? Amino acid? Please, spell out. What is “nt position”? Nucleotide?

Discussion

Page 29, line386 – Avoid such start of a part “So far, during the pandemic, several factors…”. Instead, authors may use “The growth of the number of individuals affected by COVID-19 has been affected…”

All numbers smaller than 10 should be spelled out.

No new figures should ne presented in the discussion. Fig 13 should first be presented in the intro.

Figure 1 Please, add the axis name, such as “Date”. Y axis, there should be the text of axis name, as proposed but with addition of “, n individuals”

Figure 2 same as F1.

Rest of figures – all axes should have a name, where there are groupings, the authpors should explain, what they mean in the legends, like “nt_substitutions” – spell out the meaning. Figures 3+ for Y axis, the name should have “events” in plural. Figures with Mutation events might benefit from a violin plot style presentation as compared to the box plot, given the relatively small N of events and genomes in the study.

The phylogeny figure 6 is of insufficient image quality. Impossible to read the detail. Could authors add the corona plot to the supplementary, if any.

Figure 9 should be also presenting the age by sex.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

Attachment

Submitted filename: PONE-D-22-07071_IPnotes.pdf

Click here for additional data file.^{(1.1MB, pdf)}

PLoS One. 2022 Jun 27;17(6):e0270314. doi: 10.1371/journal.pone.0270314.r002

Author response to Decision Letter 0

7 Jun 2022

Response to Academis Editor:

Dear editor, thank you for valuable comments.

1. We have formatted the manuscript according to PLOS ONE's style requirements, we also have changed the order of sections, materials and methods have been moved after the introduction (see revised manuscript with track changes).

The figure files have been corrected by PACE to meet PLOS ONE requirements

2. The manuscript initially was submitted to PLOS Genetics (Manuscript ID: PGENETICS-D-22-00251) but not considered for publication by editors. Although the editors at that journal were not able to consider manuscript for publication, they encouraged us to transfer the manuscript to another PLOS journal. They have made this recommendation based on their assessment of manuscript, their knowledge of the other journal, and after consultation with other journal's editors. So, we state that the manuscript was not peer-reviewed and/or formally published elsewhere and has not been submitted simultaneously for publication elsewhere.

3. We provided repository information during submission, please hold it until acceptance. We do not wish to make changes to our Data Availability statement.

4. Thank you, we amended authorship list.

5. We will amend updated Funding Statement as follows: “This study has been supported by the research grant from the Ministry of Innovative Development, Republic of Uzbekistan (Research Grant number: А-ИРВ-2021-125). There was no additional external or internal funding received for this study. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript”. We included our amended Funding Statement within second cover letter.

6. We have updated preprints to published versions (if available) and corrected the order of references cited in the manuscript. We added one new reference and removed one (see revised manuscript with track changes)

7. According to your recommendation, a short information about possible origin of SARS-CoV-2 (citing Makarenkov et al., 2021) was added in the Introduction.

8. Thank you for suggesting a new SimPlot++ tool. We installed Windows version of this software and performed several runs. It is very useful program, but we need time to figure out which parameters suits better for analysis and how to interpret the results. For sure, we will use it to present further results, as work on collecting genomic information of SARS-CoV-2 in Uzbekistan still in progress.

Responce to Reviewer #1:

Dear reviewer, we appreciate the time you spent to review our manuscript and gave valuable recommendations

1. Amino acid substitution might change protein stability, thus might increase or decrease its functional properties. In the discussion section we tried to explain possible mechanisms of amino acid substitution in ORF7a which could be a mediator of proinflammatory cytokines production and triggers dramatic inflammatory responses resulted in disease severity.

2. N= corrected, missing value added

3. Original pictures submitted to the Journal are in good quality, after compression to pdf. the resolution is drastically reduced. I think the editor can send you the original image files by request.

4. Thank you for this suggestion, since we are continuously getting more genomic data from new samples, in our next study we will provide a detailed overview of dissimilarity of WGS of novel mutations with other mutations reported in the databases.

5. We have added references to our findings/figures in еру Discussion section as you suggested

Responce to Reviewer #2:

Dear reviewer, we appreciate the time you spent to review our manuscript and gave valuable recommendations

1. We have corrected the manuscript according to your suggestions on the Pdf file

2. Conclusion phrase have been added to abstract.

3. We have made corrections in groups comparison, according to your recommendations

4. We have updated information in the Results where it was appropriate

5. The limitation has been added closer to the end of discussions as you suggested

6. The Table 1 has been transposed

7. The Table 2 has been corrected

8. The abbreviations “aa” (amino acid) and “nt” (nucleotide) have been spelled out in the text

9. The phrase “So far, during the pandemic, several factors…”. has been changed as you recommended

10. Corrected. All number smaller than has been 10 spelled out

11. We believe that Fig 13 is more suitable for discussion in its content

12. Figure 1, suggested axis names have been added

13. Figure 2, suggested axis names have been added

14. Rest figures have been corrected according to your recommendations

15. Figures with Mutation events have been changed to Violin plot style instead of Box plot

16. The image has a highest possible resolution generated by the Nexclade. The quality is better in original tif., but reduced in pdf.

17. We do not have the corona plot

Attachment

Submitted filename: Response to reviewers.docx

Click here for additional data file.^{(19KB, docx)}

PLoS One. doi: 10.1371/journal.pone.0270314.r003

Decision Letter 1

Vladimir Makarenkov

8 Jun 2022

Genome sequence diversity of SARS-CoV-2 obtained from clinical samples in Uzbekistan

PONE-D-22-07071R1

Dear Dr. Abdullaev,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Vladimir Makarenkov

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

PLoS One. doi: 10.1371/journal.pone.0270314.r004

Acceptance letter

Vladimir Makarenkov

14 Jun 2022

PONE-D-22-07071R1

Genome sequence diversity of SARS-CoV-2 obtained from clinical samples in Uzbekistan

Dear Dr. Abdullaev:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Vladimir Makarenkov

Academic Editor

PLOS ONE

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

S1 File. Genome sequences of SARS-CoV-2 samples from Uzbekistan collected during the second pandemic wave.

(FASTA)

Click here for additional data file.^{(1.1MB, fasta)}

S2 File. Annotated genome variants of SARS-CoV-2 samples from Uzbekistan collected during second the pandemic wave.

Nucleotide and amino acid positions are indicated relative to the reference genome NC_045512.2 (MN908947).

(CSV)

Click here for additional data file.^{(87KB, csv)}

Attachment

Submitted filename: PONE-D-22-07071_IPnotes.pdf

Click here for additional data file.^{(1.1MB, pdf)}

Attachment

Submitted filename: Response to reviewers.docx

Click here for additional data file.^{(19KB, docx)}

Data Availability Statement

The data underlying the results presented in the study are available from FlagShare Public Repository https://doi.org/10.6084/m9.figshare.19221276.v1.

[pone.0270314.ref001] 1.COVID-19 Map—Johns Hopkins Coronavirus Resource Center. [cited 16 Sep 2021]. Available: https://coronavirus.jhu.edu/map.html

[pone.0270314.ref002] 2.Makarenkov V, Mazoure B, Rabusseau G, Legendre P. Horizontal gene transfer and recombination analysis of SARS-CoV-2 genes helps discover its close relatives and shed light on its origin. BMC Ecol Evol. 2021. Jan 21;21(1):5. doi: 10.1186/s12862-020-01732-2 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0270314.ref003] 3.Mercatelli D, Giorgi FM. Geographic and Genomic Distribution of SARS-CoV-2 Mutations. Front Microbiol. 2020;11: 1800. doi: 10.3389/fmicb.2020.01800 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0270314.ref004] 4.Wang M, Fu A, Hu B, Tong Y, Liu R, Liu Z, et al. Virus Detection: Nanopore Targeted Sequencing for the Accurate and Comprehensive Detection of SARS‐CoV‐2 and Other Respiratory Viruses (Small 32/2020). Small. 2020;16. doi: 10.1002/smll.202002169 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0270314.ref005] 5.Lauring AS, Hodcroft EB. Genetic Variants of SARS-CoV-2—What Do They Mean? JAMA. 2021;325: 529–531. doi: 10.1001/jama.2020.27124 [DOI] [PubMed] [Google Scholar]

[pone.0270314.ref006] 6.Lau BT, Pavlichin D, Hooker AC, Almeda A, Shin G, Chen J, et al. Profiling SARS-CoV-2 mutation fingerprints that range from the viral pangenome to individual infection quasispecies. Genome Med. 2021;13(1):62. doi: 10.1186/s13073-021-00882-2 ; PMCID: PMC8054698. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0270314.ref007] 7.Mousavizadeh L, Ghasemi S. Genotype and phenotype of COVID-19: Their roles in pathogenesis. J Microbiol Immunol Infect. 2021;54: 159–163. doi: 10.1016/j.jmii.2020.03.022 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0270314.ref008] 8.Yin C. Genotyping coronavirus SARS-CoV-2: methods and implications. Genomics. 2020;112: 3588–3596. doi: 10.1016/j.ygeno.2020.04.016 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0270314.ref009] 9.Wang Y, Mao J-M, Wang G-D, Luo Z-P, Yang L, Yao Q, et al. Human SARS-CoV-2 has evolved to reduce CG dinucleotide in its open reading frames. Sci Rep. 2020;10: 12331. doi: 10.1038/s41598-020-69342-y [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0270314.ref010] 10.Harvey WT, Carabelli AM, Jackson B, Gupta RK, Thomson EC, Harrison EM, et al. SARS-CoV-2 variants, spike mutations and immune escape. Nat Rev Microbiol. 2021;19: 409–424. doi: 10.1038/s41579-021-00573-0 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0270314.ref011] 11.Dao TL, Hoang VT, Colson P, Lagier JC, Million M, Raoult D, et al. SARS-CoV-2 Infectivity and Severity of COVID-19 According to SARS-CoV-2 Variants: Current Evidence. J Clin Med. 2021;10: 2635. doi: 10.3390/jcm10122635 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0270314.ref012] 12.Lu R, Zhao X, Li J, Niu P, Yang B, Wu H, et al. Genomic characterisation and epidemiology of 2019 novel coronavirus: implications for virus origins and receptor binding. The Lancet. 2020;395: 565–574. doi: 10.1016/S0140-6736(20)30251-8 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0270314.ref013] 13.Rahnavard A, Dawson T, Clement R, Stearrett N, Pérez-Losada M, Crandall KA. Epidemiological associations with genomic variation in SARS-CoV-2. Sci Rep. 2021;11: 23023. doi: 10.1038/s41598-021-02548-w [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0270314.ref014] 14.Ayubov MS, Buriev ZT, Mirzakhmedov MK, Yusupov AN, Usmanov DE, Shermatov SE, et al. Profiling of the most reliable mutations from sequenced SARS-CoV-2 genomes scattered in Uzbekistan. PLoS One. 2022;17(3):e0266417. doi: 10.1371/journal.pone.0266417 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0270314.ref015] 15.Shu Y, McCauley J. GISAID: Global initiative on sharing all influenza data–from vision to reality. Eurosurveillance. 2017;22: 30494. doi: 10.2807/1560-7917.ES.2017.22.13.30494 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0270314.ref016] 16.Hadfield J, Megill C, Bell SM, Huddleston J, Potter B, Callender C, et al. Nextstrain: real-time tracking of pathogen evolution. Bioinformatics. 2018;34: 4121–4123. doi: 10.1093/bioinformatics/bty407 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0270314.ref017] 17.Cleemput S, Dumon W, Fonseca V, Abdool Karim W, Giovanetti M, Alcantara LC, et al. Genome Detective Coronavirus Typing Tool for rapid identification and characterization of novel coronavirus genomes. Bioinformatics. 2020;36: 3552–3555. doi: 10.1093/bioinformatics/btaa145 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0270314.ref018] 18.Singer J, Gifford R, Cotten M, Robertson D. CoV-GLUE: A Web Application for Tracking SARS-CoV-2 Genomic Variation. 2020. doi: 10.20944/preprints202006.0225.v1 [DOI] [Google Scholar]

[pone.0270314.ref019] 19.Katoh K, Standley DM. MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability. Mol Biol Evol. 2013;30: 772–780. doi: 10.1093/molbev/mst010 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0270314.ref020] 20.Saitou N, Nei M. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol. 1987;4: 406–425. doi: 10.1093/oxfordjournals.molbev.a040454 [DOI] [PubMed] [Google Scholar]

[pone.0270314.ref021] 21.Kumar S, Stecher G, Li M, Knyaz C, Tamura K. MEGA X: Molecular Evolutionary Genetics Analysis across Computing Platforms. Mol Biol Evol. 2018;35: 1547–1549. doi: 10.1093/molbev/msy096 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0270314.ref022] 22.Spitzer M, Wildenhain J, Rappsilber J, Tyers M. BoxPlotR: a web tool for generation of box plots. Nat Methods. 2014;11: 121–122. doi: 10.1038/nmeth.2811 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0270314.ref023] 23.Tokuriki N, Stricher F, Serrano L, Tawfik DS. How Protein Stability and New Functions Trade Off. PLOS Comput Biol. 2008;4: e1000002. doi: 10.1371/journal.pcbi.1000002 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0270314.ref024] 24.Sanavia T, Birolo G, Montanucci L, Turina P, Capriotti E, Fariselli P. Limitations and challenges in protein stability prediction upon genome variations: towards future applications in precision medicine. Comput Struct Biotechnol J. 2020;18: 1968–1979. doi: 10.1016/j.csbj.2020.07.011 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0270314.ref025] 25.Caldararu O, Blundell TL, Kepp KP. A base measure of precision for protein stability predictors: structural sensitivity. BMC Bioinformatics. 2021;22: 88. doi: 10.1186/s12859-021-04030-w [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0270314.ref026] 26.Parthiban V, Gromiha MM, Schomburg D. CUPSAT: prediction of protein stability upon point mutations. Nucleic Acids Res. 2006;34: W239–W242. doi: 10.1093/nar/gkl190 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0270314.ref027] 27.Zhang SX, Arroyo Marioli F, Gao R, Wang S. A Second Wave? What Do People Mean by COVID Waves?–A Working Definition of Epidemic Waves. Risk Manag Healthc Policy. 2021;14: 3775–3782. doi: 10.2147/RMHP.S326051 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0270314.ref028] 28.Wu F, Zhao S, Yu B, Chen Y-M, Wang W, Song Z-G, et al. A new coronavirus associated with human respiratory disease in China. Nature. 2020;579: 265–269. doi: 10.1038/s41586-020-2008-3 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0270314.ref029] 29.Cherian S, Potdar V, Jadhav S, Yadav P, Gupta N, Das M, et al. SARS-CoV-2 Spike Mutations, L452R, T478K, E484Q and P681R, in the Second Wave of COVID-19 in Maharashtra, India. Microorganisms. 2021;9: 1542. doi: 10.3390/microorganisms9071542 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0270314.ref030] 30.Yi K, Kim SY, Bleazard T, Kim T, Youk J, Ju YS. Mutational spectrum of SARS-CoV-2 during the global pandemic. Exp Mol Med. 2021;53: 1229–1237. doi: 10.1038/s12276-021-00658-z [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0270314.ref031] 31.Wada Y, Wada K, Iwasaki Y, Kanaya S, Ikemura T. Directional and reoccurring sequence change in zoonotic RNA virus genomes visualized by time-series word count. Sci Rep. 2016;6: 36197. doi: 10.1038/srep36197 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0270314.ref032] 32.Snijder EJ, Decroly E, Ziebuhr J. Chapter Three—The Nonstructural Proteins Directing Coronavirus RNA Synthesis and Processing. In: Ziebuhr J, editor. Advances in Virus Research. Academic Press; 2016. pp. 59–126. doi: 10.1016/bs.aivir.2016.08.008 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0270314.ref033] 33.Kannan SR, Spratt AN, Cohen AR, Naqvi SH, Chand HS, Quinn TP, et al. Evolutionary analysis of the Delta and Delta Plus variants of the SARS-CoV-2 viruses. J Autoimmun. 2021;124: 102715. doi: 10.1016/j.jaut.2021.102715 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0270314.ref034] 34.Yuen C-K, Lam J-Y, Wong W-M, Mak L-F, Wang X, Chu H, et al. SARS-CoV-2 nsp13, nsp14, nsp15 and orf6 function as potent interferon antagonists. Emerg Microbes Infect. 2020;9: 1418–1428. doi: 10.1080/22221751.2020.1780953 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0270314.ref035] 35.Vazquez C, Swanson SE, Negatu SG, Dittmar M, Miller J, Ramage HR, et al. SARS-CoV-2 viral proteins NSP1 and NSP13 inhibit interferon activation through distinct mechanisms. PLOS ONE. 2021;16: e0253089. doi: 10.1371/journal.pone.0253089 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0270314.ref036] 36.Wang W, Zhou Z, Xiao X, Tian Z, Dong X, Wang C, et al. SARS-CoV-2 nsp12 attenuates type I interferon production by inhibiting IRF3 nuclear translocation. Cell Mol Immunol. 2021;18: 945–953. doi: 10.1038/s41423-020-00619-y [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0270314.ref037] 37.Hamed SM, Elkhatib WF, Khairalla AS, Noreddin AM. Global dynamics of SARS-CoV-2 clades and their relation to COVID-19 epidemiology. Sci Rep. 2021;11: 8435. doi: 10.1038/s41598-021-87713-x [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0270314.ref038] 38.Lobiuc A, Șterbuleac D, Sturdza O, Dimian M, Covasa M. A Conservative Replacement in the Transmembrane Domain of SARS-CoV-2 ORF7a as a Putative Risk Factor in COVID-19. Biology. 2021;10: 1276. doi: 10.3390/biology10121276 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0270314.ref039] 39.Hänel K, Stangler T, Stoldt M, Willbold D. Solution structure of the X4 protein coded by the SARS related coronavirus reveals an immunoglobulin like fold and suggests a binding activity to integrin I domains. J Biomed Sci. 2006;13: 281–293. doi: 10.1007/s11373-005-9043-9 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0270314.ref040] 40.Nelson CA, Pekosz A, Lee CA, Diamond MS, Fremont DH. Structure and Intracellular Targeting of the SARS-Coronavirus Orf7a Accessory Protein. Structure. 2005;13: 75–85. doi: 10.1016/j.str.2004.10.010 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0270314.ref041] 41.Tan Y, Schneider T, Leong M, Aravind L, Zhang D. Novel Immunoglobulin Domain Proteins Provide Insights into Evolution and Pathogenesis of SARS-CoV-2-Related Viruses. mBio. 2020;11(3):e00760–20. doi: 10.1128/mBio.00760-20 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0270314.ref042] 42.Zhou Z, Huang C, Zhou Z, Huang Z, Su L, Kang S, et al. Structural insight reveals SARS-CoV-2 ORF7a as an immunomodulating factor for human CD14+ monocytes. iScience. 2021;24: 102187. doi: 10.1016/j.isci.2021.102187 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0270314.ref043] 43.Su C-M, Wang L, Yoo D. Activation of NF-κB and induction of proinflammatory cytokine expressions mediated by ORF7a protein of SARS-CoV-2. Sci Rep. 2021;11: 13464. doi: 10.1038/s41598-021-92941-2 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0270314.ref044] 44.Klink GV, Safina KR, Nabieva E, Shvyrev N, Garushyants S, Alekseeva E, et al. The rise and spread of the SARS-CoV-2 AY.122 lineage in Russia. Virus Evol. 2022. Mar 5;8(1):veac017. doi: 10.1093/ve/veac017 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0270314.ref045] 45.Das JK, Sengupta A, Choudhury PP, Roy S. Characterizing genomic variants and mutations in SARS-CoV-2 proteins from Indian isolates. Gene Rep. 2021;25: 101044. doi: 10.1016/j.genrep.2021.101044 [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Genome sequence diversity of SARS-CoV-2 obtained from clinical samples in Uzbekistan

Alisher Abdullaev

Abrorjon Abdurakhimov

Zebinisa Mirakbarova

Shakhnoza Ibragimova

Vladimir Tsoy

Sharofiddin Nuriddinov

Dilbar Dalimova

Shahlo Turdikulova

Ibrokhim Abdurakhmonov

Roles

Abstract

Introduction

Fig 1. COVID-19 cases in Uzbekistan since March 2020 to September 2021 (according to Worldometer, https://www.worldometers.info).

Fig 2. Deaths caused by COVID-19 in Uzbekistan since March 2020 to September 2021 (according to Worldometer, https://www.worldometers.info).

Materials and methods

Samples collection

Table 1. The average number of nucleotide and amino acid substitutions, and deletions per the SARS-CoV-2 genome among 39 samples identified in Uzbekistan (with respect to the reference genome NC_045512.2).

Data collection

SARS-CoV-2 nucleic acid isolation

Next-generation sequencing

Genomic analyses, variants assessments and phylogeny

Data availability

Statistical analysis

Protein stability prediction

Results

Genome-wide characterization of SARS-CoV-2 isolates from Uzbekistan

Fig 3. Violin plot of nucleotide and amino acid substitutions, and deletions per the SARS-CoV-2 genome among 39 samples identified in Uzbekistan (with respect to the reference genome NC_045512.2).

Fig 4. Box plot of distribution of single nucleotide substitutions per genome among the SARS-CoV-2 isolates (n = 39) in Uzbekistan.

Fig 5. Box plot of distribution of transitions and transversions per genome among the SARS-CoV-2 isolates (n = 39) in Uzbekistan.

Table 2. Single nucleotide substitutions frequency in the genome of SARS-CoV-2 isolates (n = 39) from Uzbekistan.

Table 3. Genome distribution of the total number of identified nucleotide and amino acid substitutions among the SARS-CoV-2 (n = 39) samples from Uzbekistan.

Table 4. The frequency of missense mutation in the SARS-CoV-2 genotypes identified in Uzbekistan (nucleotide and amino acid positions are indicated relative to the reference genome NC_045512.2).

Table 5. The frequency of deletions identified among the SARS-CoV-2 isolates (n = 39) in Uzbekistan.

Phylogeny of SARS-CoV-2 isolates from Uzbekistan

Fig 6. The Nextclade based phylogenetic tree of the SARS-CoV-2 variants isolated in Uzbekistan.

Fig 7. Neighbor-joining phylogenetic tree of the SARS-CoV-2 isolates from Uzbekistan.

Phylogeny and genomic characterization of the Delta isolates from Uzbekistan

Table 6. Frequency of the identified Spike protein mutations (missense/synonymous) and deletions in the Delta isolates from Uzbekistan.

Table 7. Mutations in the Delta isolates (21A and 21J) with the frequency of occurrence higher than 20%.

Table 8. Sample mean values of nucleotide and amino acid mutation type between the subclades 21A and 21J.

Fig 8. Distribution of mutation types in the Delta variant subclades 21A (n = 18) and 21J (n = 14).

Analysis of disease severity among patients infected with the Delta variant

Table 9. Age and gender distribution of studied patients.

Fig 9. Age and gender distribution among COVID-19 patients.

Fig 10. Distribution of SARS-CoV-2 Delta subclades between male and female patients in Uzbekistan.

Table 10. Logistic regression analysis of viral ORF7a:P45L mutation and disease severity.

Effect of amino acid substitution in the ORF7a: P45L as well as novel variants of the Nsp13:A85P, and Nsp12: Y479N on protein stability

Fig 11. Analysis of different prediction models of the NSP12: Y479N, NSP13: A85P and ORF7a: P45L amino-acid substitution on protein stability.

Fig 12. 3D structure of the ORF7a: P45L substitution effect on protein stability.

Discussion

Fig 13. Phylogenetic tree of the SARS-CoV-2 isolates distributed in Uzbekistan during the 1-st (2020) and 2-nd (2021) coronavirus pandemic waves.

Supporting information

Acknowledgments

Data Availability

Funding Statement

References

Decision Letter 0

Vladimir Makarenkov

Roles

Transfer Alert

Author response to Decision Letter 0

Decision Letter 1

Vladimir Makarenkov

Roles

Acceptance letter

Vladimir Makarenkov

Roles

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases