Genomic Characterization of SARS-CoV2 from Peshawar Pakistan Using Next-Generation Sequencing

Ome Kalsoom Afridi; Nousheen Bibi; Syed Adnan Haider; Bibi Sabiha; Hanifullah Jan; Abid Ali Khan; Shireen Akhter; Valeed Khan; Johar Ali

doi:10.1007/s00284-021-02743-y

. 2022 Jan 4;79(2):48. doi: 10.1007/s00284-021-02743-y

Genomic Characterization of SARS-CoV2 from Peshawar Pakistan Using Next-Generation Sequencing

Ome Kalsoom Afridi ^1,^2,^✉, Nousheen Bibi ³, Syed Adnan Haider ⁴, Bibi Sabiha ⁴, Hanifullah Jan ⁴, Abid Ali Khan ⁵, Shireen Akhter ^6,⁷, Valeed Khan ⁸, Johar Ali ^4,⁹

PMCID: PMC8750362 PMID: 34982246

Abstract

This study aimed to characterize the whole genome of Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV2) isolated from an oropharyngeal swab specimen of a Pashtun Pakistani patient using next-generation sequencing. Upon comparing the SARS-CoV2 genome to the reference genome, a total of 10 genetic variants were identified. Among the 10 genetic variants, 1 missense mutation (c.1139A > G, p.Lys292Glu) in the Open Reading Frame 1ab (ORF1ab) positioned at 112 in the non-structural protein 2 (NSP2) was found to be unique. Phylogenetic analysis (n = 84) revealed that the current SARS-CoV2 genome was closely clustered with 8 Pakistani strains belonging to Punjab, Federal Capital, Azad Jammu and Kashmir (AJK), and Khyber Pakhtunkhwa (KP). In addition, the current SARS-CoV2 genome was very similar to the genome of SARS-CoV2 reported from Guam, Taiwan, India, the USA, and France. Overall, this study reports a slight mismatch in the SARS-CoV2 genome, indicating the presence of a single unique missense mutation. However, phylogenetic analysis revealed that the current SARS-CoV2 genome was closely clustered with 8 other Pakistani strains.

Supplementary Information

The online version contains supplementary material available at 10.1007/s00284-021-02743-y.

Introduction

Coronavirus infects a variety of hosts, including bats, snakes, birds, mice, wild animals, and humans [1–3]. Severe Acute Respiratory Syndrome (SARS) coronavirus (SARS-CoV) occurred in late 2002 in Guangdong, China. SARS-CoV has infected 8098 people worldwide, killing a total of 774 individuals. After SARS-CoV infection, another SARS virus called Middle East Respiratory Syndrome (MERS) emerged in Saudi Arabia in 2012. MERS-CoV caused 2494 infections and killed 858 people [3]. At the end of 2019, another coronavirus with the usual symptoms of pneumonia appeared in Wuhan, China. This virus is called SARS-CoV2 and the COVID19 term is used to refer to infected cases. To date, 199 million people worldwide have been infected with SARS-CoV2 and 4.24 million have expired. SARS-CoV2 is the 7th member of coronavirus (CoV) family Coronaviridae, which is known to cause serious infections in various hosts, such as birds, mammals, and humans [4, 5]. In humans, the six coronaviruses are known to cause different infections with different symptoms. Symptoms range from mild cold-like symptoms often caused by alpha (229E, NL63) and beta coronaviruses (OC43, HKU1) to severe respiratory illnesses caused by SARS-CoV and the MERS-CoV [6]. These viruses are usually known for having larger genetic material (RNA; 26–32 kilobases in length), composed of an enveloped positive-sense single-stranded RNA. Electron microscope observations of these RNA viruses show a crown-shaped spherical sequence arrangement commonly referred to as coronaviruses.

Whole-genome sequencing (WGS) played an important role in the understanding of various emerging viruses outbreaks, such as Zika, Ebola, Usutu, and Yellow fever viral infections [6]. Owing to the powerful role of WGS during various outbreaks, large-scale next-generation sequencing (NGS)-based efforts were initiated globally to explore the genomic insights of SARS-CoV2 following the first outbreak of COVID19. On January 5, 2020, the first whole-genome sequence of SARS-CoV2 was published. Following the first complete genome sequencing of SARS-CoV2, several countries submitted the complete genome sequence of SARS-CoV2 to the Global Initiative on Sharing All Influenza Data (GISAID) database [7, 8]. Pakistan submitted its first coronavirus whole-genome sequence (SARS-CoV2/Gilgit1/human/2020/PAK; accession number: MT240479) to the GenBank on March 25, 2020. Following the first WGS of SARS-CoV2 from Pakistan, several WGS-based attempts were made and the sequencing data were submitted to the GenBank (SARS-CoV2/Manga1/human/2020/PAK; accession number: MT262993) [9]. The initial two sequenced strains of Pakistani coronavirus (accession numbers: MT240479 and MT262993) exhibited 99.98% and 100% sequence similarity to the Chinese SARS-CoV2 isolate [9]. During the COVID19 pandemic, the mortality pattern of SARS-CoV2 in Pakistan was found to be considerably different from the rest of the world. For instance, the neighboring country of Pakistan, such as India, has so far been ranked the second largest in the world with the highest number of COVID-19-positive cases after the USA [10]. The variation in COVID-19-positive cases in Pakistan from its neighboring countries warrants extensive WGS-based investigation in order to rule out the key variants associated with Pakistani SARS-CoV2 strains and determine if such mutations exist in other countries with similar prevalence patterns. Therefore, we sequenced and characterized the complete genome of SARS-CoV2 isolated from a patient of Pashtun ethnicity in the Khyber Pakhtunkhwa (KP) region of Pakistan. In addition to WGS, phylogenetic analysis also performed to compare the current SARS-CoV2 strain genome with the publicly available genomes.

Materials and Methods

Extraction of RNA and Quality Control

An oropharyngeal swab specimen was collected from a symptomatic patient registered at Rehman Medical Institute (RMI) in Peshawar in June 2020 by following World Health Organization (WHO) guidelines [11]. Viral RNA was extracted from the swab specimen using the QIAsymphony SP/AS instruments as per the manufacturer’s instructions (QIAGEN). SARS-CoV2 was quantified using Rotor-Gene Q real-time PCR and a Novel Coronavirus Nucleic Acid Diagnostic Kit (Sansure Biotech, Inc. China). Quality control (QC) of the extracted RNA was performed by quantification using the Qubit™ RNA Broad Range (BR) assay kit (Cat. No. Q10211).

Primer Designing and PCR Amplification

Extracted RNA was subjected to cDNA synthesis using RevertAid First Strand cDNA Synthesis Kit (Cat. No. K1622) following the manufacturer’s guidelines (Thermo Fisher Scientific). Using Primer3 software (v 0.4.0) [12] seven sets of primers were designed to cover the genome of SARS-CoV2 (GenBank accession number: NC_045512). In addition, two sets of overlapping primers were also designed for PCR troubleshooting (Supplementary Table S1). Then, according to the recommended guidelines, PCR (Bio-Rad) was performed using the Phusion Flash High-Fidelity PCR Master Mix (Cat. No. F548S, Thermo Fisher Scientific). The PCR conditions were optimized as follows: (i) initial denaturation (95 °C for 10 s), (ii) 35 cycles of denaturation (95 °C for 3 s), and (iii) a final extension (72 °C for 4 min) were kept similar for all sets of primers while the annealing temperature and extension time were different for each set of primers (Supplementary Table S1). After amplification, the PCR products were run on an agarose gel and then quantified using the Qubit dsDNA High Sensitivity (HS) Assay Kit (Cat. No. Q32851; Invitrogen).

NGS Library Preparation and Whole-Genome Sequencing

The quantified PCR product was normalized to 0.2 ng/μL. The paired-end sequencing library was prepared using the Illumina Nextera XT DNA Library Preparation Kit (Cat. No. FC131-1096) by incorporating the following main steps: (i) tagmentation (enzymatic fragmentation and tagging of the dsDNA), (ii) PCR amplification, (iii) PCR clean up, (iv) bead-based normalization of NGS library, and (v) library pooling. The pooled library was then loaded onto the MiSeq (Illumina) for paired-end sequencing using sequencing reagent cartridge, MiSeq Reagent Micro Kit v2, 300-cycles (Cat. No. MS-103-1002).

NGS Bioinformatics Analysis

Paired-end NGS data were analyzed using publicly available bioinformatics softwares. Firstly, the read quality of FASTQ files was checked using the FastQC tool (v0.11.8) [13]. The Trimmomatic tool (v0.39) was used to remove low-quality base calls (Q < 30) and index adapter sequences from both ends of sequenced reads [14]. The filtered reads were aligned with the Wuhan reference genome (GenBank accession number: NC_045512) using the default settings for the Burrows–Wheeler Aligner (BWA, v0.7.17) [15]. The reference sequence of SARS-CoV2 (accession number: NC_045512) was used as a control for the viral genome alignment [16]. Genome annotation and variant calling were performed using China’s National Genomics Data Center (NGDC) [17]. The identified variants were cross-validated and analyzed using Genome Detective Coronavirus Typing tool [18] and BioEdit [19].

Phylogenetic Analysis

FASTQ files of various publicly available SARS-CoV2 genomes were retrieved from the GISAID and NCBI databases. A total of 85 SARS-CoV2 genomes including the current sequenced genome (accession number: MW242667) were clustered using Augur Nextstrain’s phylodynamic pipeline (Supplementary Material) [20]. The sequences were then aligned against Wuhan reference genome (accession number: NC_045512.2) using MAFFT [21]. Phylogenetic tree was constructed using IQ-TREE [22]. The phylogenetic tree was then visualized using FigTree v1.4.4 [23].

Results and Discussion

The whole genome of SARS-CoV2 isolated from a patient belonging to KP region of Pakistan was sequenced using NGS. A total of 166,007 reads (22,292,523 bases) were aligned against the reference genome (accession number: NC_045512). Sequence Statistics of the sequenced reads are listed in Supplementary Table S2. The SARS-CoV2-sequenced genome revealed the presence of 10 genomic variants encoding various genes, such as open reading frame 1ab (ORF1ab), spike glycoprotein (S), ORF8, nucleocapsid (N), and ORF10 genes. The 10 identified variants consist of 8 missense and 2 synonymous mutations. Out of total 10 variants, the following 5 mutations were identified in the ORF1ab region: 1139A > G (codons AAG to GAG, missense mutation), 2144G > T (codons GTC to TTC, missense mutation), 11083G > T (codons TTG to TTT, missense mutation), 13730C > T (codons: GCT to GTT, missense mutation), and 6312 C > A (codons ACA to AAA, missense mutation). A single mutation was identified in S gene 23929 C > T (codons TAC to TAT, synonymous mutation). Similarly, a single genomic variant was detected in the ORF8 gene at 28253 genomic location (C > T, codons TTC > TTT, synonymous mutation). Similarly, out of 10 mutations, 1139A > G, p.Lys292Glu (position 112 of NSP2 protein) was identified as a unique genetic variant. The NSP2 protein of SARS-CoV2 contains 61 amino acids, which are different from SARS-CoV [24]. In coronavirus infection, double-membrane vesicles filled with replication–transcription complexes (RTCs) are formed in the infected cells and the NSP2 protein is essential for the RTC formation [25]. Other variants identified in this study are also detected in the SARS-CoV2 genome isolated different countries around the world (Table 1). The higher number of variants identified in SARS-CoV2 replicase polyproteins at different positions are supported by a previous study indicating that replicase polyprotein of 13 SARS-CoV2 isolates from different countries harbored mutations at different locations of different amino acids [26]. The various missense mutations in the SARS-CoV2 replicase polyprotein detected in this study is consistent with previous studies. For instance, a missense mutation in the replicase polyprotein at position 3606 (L to F) was reported in a previous study [27]. Mutations in replicase polyprotein (ORF 1 ab) are associated with different mechanisms of SARS-CoV2 [28]. The low mortality rate of Pakistan can be attributed to its diverse climatic conditions. A recent study compared the death rates of SARS-CoV2 in various countries (n = 45) of the world according to the country-specific climatic condition. They found that temperate countries such as Italy, Spain, the Netherlands, France, and England exhibited higher mortality rates compared to countries with a diverse climate, such as Brazil, Australia, and Pakistan [29].

Table 1.

List of mutations detected in the genome of SARS-CoV2 from other geographic regions besides Pakistan

Genomic position	Mutation	Region	Simple ID
orf1ab	2144 (G > T)	England	EPI_ISL_425449
orf1ab	2144 (G > T)	Australia	EPI_ISL_427753
	11083 (G > T)	Australia	EPI_ISL_419793
	13730 (C > T)	Saudi Arabia	EPI_ISL_416432
		Australia	EPI_ISL_419761
		USA	EPI_ISL_434297
		England	EPI_ISL_433944
		India	EPI_ISL_437438
		Brunei	EPI_ISL_435674
	6312 (C > A)	Saudi Arabia	EPI_ISL_416432
		Australia	EPI_ISL_419761
		USA	EPI_ISL_434297
		India	EPI_ISL_437438
		Brunei	EPI_ISL_435674
S gene	23929(C > T)	Saudi Arabia	EPI_ISL_416432
		Iceland	EPI_ISL_417752
		Australia	EPI_ISL_419761
		USA	EPI_ISL_434297
		India	EPI_ISL_437438
		Brunei	EPI_ISL_435674
ORF8	28253 (C > T)	India	EPI_ISL_436456
		Denmark	EPI_ISL_437041
		England	EPI_ISL_453633
		Switzerland	EPI_ISL_476097
		Netherlands	EPI_ISL_461144
		England	EPI_ISL_461916
		USA	EPI_ISL_414483
		Wales	EPI_ISL_418137
		France	EPI_ISL_420064
N	28311 (C > T)	Australia	EPI_ISL_419728
		Saudi Arabia	EPI_ISL_416432
		USA	EPI_ISL_434297
		India	EPI_ISL_437438
	28887 (C > T)	Beijing	EPI_ISL_430722
		Singapore	EPI_ISL_435687
		Wales	EPI_ISL_432250
		USA	EPI_ISL_454682
		England	EPI_ISL_452867
		Australia	EPI_ISL_419961
ORF10 protein	29645 (G > T)	DRC*	EPI_ISL_437343

Open in a new tab

Based on phylogenetic analysis, the SARS-CoV2 genome sequenced in the current study formed clusters with isolates from Guam (accession numbers: MT459985.1, MT459986.1, and MT459987.1), Taiwan (accession numbers: MT517436.1 and MT517437.1), India (accession numbers: MT477885.1, MT457403.1, and MT415322.1), the USA (accession numbers: MT499206.1, and MT344946.1), France (accession number: MT470111.1), and 8 other Pakistani strains including, Punjab (accession numbers: MW422100.1), Federal Capital (accession numbers: MW422012.1, MW422082.1, MW422089.1, MW422099.1, and MW421988.1), Azad Jammu and Kashmir (AJK; accession number: MW422086.1), and KP (accession numbers: MW422088.1) (Fig. 1). Details of all annotated SARS-CoV2 genomic sequences from Pakistan and other countries used for phylogenetic analysis are listed in Supplementary Material. The close association between the current SARS-CoV2 genome and local Pakistani strains is inconsistent with a recent study suggesting differences between the local Pakistani strains [30]. The close association of local SARS-CoV2 strains can be attributed to the “Smart lockdown” in Pakistan, during which no restrictions were imposed on traveling within the country. In addition, the current SARS-CoV2 genome was very similar to the genome of SARS-CoV2 reported from Guam, Taiwan, the USA, India, and France. The close clustering of current-sequenced SARS-CoV2 genome with the genome of SARS-CoV2 reported from Guam, Taiwan, the USA, India, and France is in agreement with the recent study [31]. Sequenced in the first wave, the current SARS-CoV2 genome close clustering with the genomes of SARS-CoV2 of various countries can be attributed to flexible international travel. International travel has been considered as one of the potential risk factors for the transmission and circulation of different variants of SARS-CoV2 in Pakistan [31–34]. The initial-sequenced local SARS-CoV2 genomes (accession numbers: MT240479 and MT262993) are closely related to the Chinese SARS-CoV2 isolates [9], while the currently sequenced SARS-CoV2 isolate differs from that of the Chinese isolates (Fig. 1). This implies that the current-sequenced SARS-CoV2 genome evolved and acquired genetic variants from the Wuhan reference genome (accession number: NC_045512).

Fig. 1 — A phylogenetic tree of 85 genomes obtained from GISAID and sequenced in the current study. The selected genomes were clustered using Augur, the Nextstrain phylogenetic pipeline. The SARS-CoV2 genome (accession number: MW242667) sequenced in the current study is highlighted in red

In conclusion, using NGS, this study has revealed new information about the SARS-CoV2 genome and comprehensively characterized the full sequence of SARS-CoV2. A missense mutation (1139A > G) detected in the NSP2 protein of SARS-CoV2 in this study is a novel genetic variant. Furthermore, the identification of a novel variant in the SARS-CoV2 genome collected in Peshawar, Pakistan confirmed previous findings that SARS-CoV2 is not similar around the world. Therefore, there is a need for SARS-CoV2 genome sequencing in all regions of the world. The region-specific sequencing and reporting of the identified variant will greatly contribute to the development of vaccines and diagnostic kits. Furthermore, it has also been concluded that the SARS-CoV2 diagnostic kit developed in one part of the world may misdiagnose patients in another part. In conclusion, genome sequencing of local SARS-CoV2 strains could provide crucial information for improving diagnostic, prognostic, and therapeutic interventions. Overall, this study reports a minor deviation in the SARS-CoV2 genome showing the presence of 1 unique missense mutation. However, phylogenetic analysis revealed that the current SARS-CoV2 genome was closely clustered with 8 other Pakistani strains.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 89 kb)^{(89.8KB, docx)}

Supplementary file2 (CSV 46 kb)^{(46.1KB, csv)}

Acknowledgements

This research did not receive any specific grant from funding agencies in the public, commercial, or non-profit sectors.

Author Contribution

OKA wrote the manuscript and assisted the team in experimental and bioinformatics analysis, NB helped in data analysis and manuscript preparation, SAH performed bioinformatics analysis and reviewed the manuscript, BS and HJ performed the experiments, AAK supported the lab work, reviewed, and helped in the manuscript preparation, SA reviewed the manuscript and assisted in editing the first draft, VK helped with the sampling, and JA developed the idea and led the execution of the experiment, supervised, and advised on the data analysis and manuscript.

Data Availability

SARS-CoV2 full genome sequence generated in this study has been deposited in GenBank under the accession number: MW242667 (available at URL: https://www.ncbi.nlm.nih.gov/nuccore/MW242667.1/).

Declarations

Conflict of interest

The authors declare that they have no competing interests.

Ethical Approval

The present study reporting research involving Human Participants was carried out following ethics principles of the Declaration of Helsinki.

Informed Consent

This study was approved by our institutional review committee (Rehman Medical Institute-Research Ethics Committee, Peshawar, Pakistan; RMI/RMI-REC/Article Approval/48). Informed consent was obtained from the enrolled patient.

Footnotes

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

1.Hirschey M. Lab life-rebuild it better after coronavirus lockdowns ease. Nature. 2020;582(7811):184. doi: 10.1038/d41586-020-01708-8. [DOI] [PubMed] [Google Scholar]
2.Corman VM, Muth D, Niemeyer D, Drosten C. Hosts and sources of endemic human coronaviruses. Adv Virus Res. 2018;100:163–188. doi: 10.1016/bs.aivir.2018.01.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Guo D. Old weapon for new enemy: drug repurposing for treatment of newly emerging viral diseases. Virol Sin. 2020;35:253–255. doi: 10.1007/s12250-020-00204-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Cui J, Li F, Shi Z-L. Origin and evolution of pathogenic coronaviruses. Nat Rev Microbiol. 2019;17(3):181–192. doi: 10.1038/s41579-018-0118-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Monchatre-Leroy E, Boué F, Boucher J-M, Renault C, Moutou F, Ar Gouilh M, et al. Identification of alpha and beta coronavirus in wildlife species in France: bats, rodents, rabbits, and hedgehogs. Viruses. 2017;9(12):364. doi: 10.3390/v9120364. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Munnink BBO, Nieuwenhuijse DF, Stein M, O’Toole Á, Haverkate M, Mollers M, et al. Rapid SARS-CoV-2 whole-genome sequencing and analysis for informed public health decision-making in the Netherlands. Nat Med. 2020;26(9):1405–1410. doi: 10.1038/s41591-020-0997-y. [DOI] [PubMed] [Google Scholar]
7.Elbe S, Buckland-Merrett G. Data, disease and diplomacy: GISAID's innovative contribution to global health. Global Chall. 2017;1(1):33–46. doi: 10.1002/gch2.1018. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Shu Y, McCauley J. GISAID: global initiative on sharing all influenza data–from vision to reality. Eurosurveillance. 2017;22(13):30494. doi: 10.2807/1560-7917.ES.2017.22.13.30494. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Saif R, Mahmood T, Ejaz A. Whole genome comparison of Pakistani corona virus with Chinese and US strains along with its predictive severity of COVID-19. bioRxiv. 2020 doi: 10.1016/j.genrep.2021.101139. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Worldometers (2020) COVID-19 coronavirus pandemic
11.World Health Organization (2020) Laboratory testing for coronavirus disease 2019 (COVID-19) in suspected human cases: interim guidance, 2 March 2020. World Health Organization, Geneva.
12.Untergasser A, Cutcutache I, Koressaar T, Ye J, Faircloth BC, Remm M, et al. Primer3—new capabilities and interfaces. Nucleic Acids Res. 2012;40(15):e115. doi: 10.1093/nar/gks596. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Andrews S. FastQC: a quality control tool for high throughput sequence data. Cambridge: Babraham Bioinformatics, Babraham Institute; 2010. [Google Scholar]
14.Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30(15):2114–2120. doi: 10.1093/bioinformatics/btu170. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Li H, Durbin R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics. 2009;25(14):1754–1760. doi: 10.1093/bioinformatics/btp324. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Wu F, Zhao S, Yu B, Chen Y, Wang W, Song Z, et al. Severe acute respiratory syndrome coronavirus 2 isolate Wuhan-Hu-1, complete genome. Nature. 2020;579(7798):265–269. doi: 10.1038/s41586-020-2008-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.National Genomics Data Center Members and Partners Database resources of the national genomics data center in 2020. Nucleic Acids Res. 2020;48:D24–D33. doi: 10.1093/nar/gkz1210. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Cleemput S, Dumon W, Fonseca V, Abdool Karim W, Giovanetti M, Alcantara LC, et al. Genome detective coronavirus typing tool for rapid identification and characterization of novel coronavirus genomes. Bioinformatics. 2020;36(11):3552–3555. doi: 10.1093/bioinformatics/btaa145. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Hall T, Biosciences I, Carlsbad C. BioEdit: an important software for molecular biology. GERF Bull Biosci. 2011;2(1):60–61. [Google Scholar]
20.Hadfield J, Megill C, Bell SM, Huddleston J, Potter B, Callender C, et al. Nextstrain: real-time tracking of pathogen evolution. Bioinformatics. 2018;34(23):4121–4123. doi: 10.1093/bioinformatics/bty407. [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Katoh K, Asimenos G, Toh H. Multiple alignment of DNA sequences with MAFFT. Bioinformatics for DNA sequence analysis. New York: Springer; 2009. pp. 39–64. [DOI] [PubMed] [Google Scholar]
22.Nguyen L-T, Schmidt HA, Von Haeseler A, Minh BQ. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol Biol Evol. 2015;32(1):268–274. doi: 10.1093/molbev/msu300. [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Rambaut A, Holmes EC, O’Toole Á, Hill V, McCrone JT, Ruis C, et al. A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology. Nat Microbiol. 2020;5(11):1403–1407. doi: 10.1038/s41564-020-0770-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Wu A, Peng Y, Huang B, Ding X, Wang X, Niu P, et al. Genome composition and divergence of the novel coronavirus (2019-nCoV) originating in China. Cell Host Microbe. 2020;27:325–328. doi: 10.1016/j.chom.2020.02.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Hagemeijer MC, Verheije MH, Ulasli M, Shaltiël IA, De Vries LA, Reggiori F, et al. Dynamics of coronavirus replication-transcription complexes. J Virol. 2010;84(4):2134–2149. doi: 10.1128/JVI.01716-09. [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Khan MI, Khan ZA, Baig MH, Ahmad I, Farouk A-E, Song YG, et al. Comparative genome analysis of novel coronavirus (SARS-CoV-2) from different geographical locations and the effect of mutations on major target proteins: an in silico insight. PLoS ONE. 2020;15(9):e0238344. doi: 10.1371/journal.pone.0238344. [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Wang C, Liu Z, Chen Z, Huang X, Xu M, He T, et al. The establishment of reference sequence for SARS-CoV-2 and variation analysis. J Med Virol. 2020;92(6):667–674. doi: 10.1002/jmv.25762. [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Angeletti S, Benvenuto D, Bianchi M, Giovanetti M, Pascarella S, Ciccozzi M. COVID-2019: the role of the nsp2 and nsp3 in its pathogenesis. J Med Virol. 2020;92(6):584–588. doi: 10.1002/jmv.25719. [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Islam MR, Hoque MN, Rahman MS, Alam ARU, Akther M, Puspo JA, et al. Genome-wide analysis of SARS-CoV-2 virus strains circulating worldwide implicates heterogeneity. Sci Rep. 2020;10(1):1–9. doi: 10.1038/s41598-019-56847-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Khan MT, Ali S, Khan AS, Muhammad N, Khalil F, Ishfaq M, et al. SARS-CoV-2 Genome from the Khyber Pakhtunkhwa Province of Pakistan. ACS Omega. 2021;6(10):6588–6599. doi: 10.1021/acsomega.0c05163. [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Umair M, Salman M, Rehman Z, Badar N, Ali Q, Ahad A, et al. Proliferation of SARS-CoV-2 B. 1.1.7 variant in Pakistan-a short surveillance account. Frontiers Public Health. 2021;9:702. doi: 10.3389/fpubh.2021.683378. [DOI] [PMC free article] [PubMed] [Google Scholar]
32.Atif M, Malik I. Why is Pakistan vulnerable to COVID-19 associated morbidity and mortality? A scoping review. Int J Health Plann Manage. 2020;35(5):1041–1054. doi: 10.1002/hpm.3016. [DOI] [PMC free article] [PubMed] [Google Scholar]
33.Saqlain M, Munir MM, Ahmed A, Tahir AH, Kamran S. Is Pakistan prepared to tackle the coronavirus epidemic? Drugs Therapy Perspect. 2020;36:213–214. doi: 10.1007/s40267-020-00721-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
34.Nafees M, Khan F. Pakistan’s response to COVID-19 pandemic and efficacy of quarantine and partial lockdown: a review. Electron J Gen Med. 2020;17(2):em240. doi: 10.29333/ejgm/7951. [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary file1 (DOCX 89 kb)^{(89.8KB, docx)}

Supplementary file2 (CSV 46 kb)^{(46.1KB, csv)}

Data Availability Statement

SARS-CoV2 full genome sequence generated in this study has been deposited in GenBank under the accession number: MW242667 (available at URL: https://www.ncbi.nlm.nih.gov/nuccore/MW242667.1/).

[CR1] 1.Hirschey M. Lab life-rebuild it better after coronavirus lockdowns ease. Nature. 2020;582(7811):184. doi: 10.1038/d41586-020-01708-8. [DOI] [PubMed] [Google Scholar]

[CR2] 2.Corman VM, Muth D, Niemeyer D, Drosten C. Hosts and sources of endemic human coronaviruses. Adv Virus Res. 2018;100:163–188. doi: 10.1016/bs.aivir.2018.01.001. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR3] 3.Guo D. Old weapon for new enemy: drug repurposing for treatment of newly emerging viral diseases. Virol Sin. 2020;35:253–255. doi: 10.1007/s12250-020-00204-7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR4] 4.Cui J, Li F, Shi Z-L. Origin and evolution of pathogenic coronaviruses. Nat Rev Microbiol. 2019;17(3):181–192. doi: 10.1038/s41579-018-0118-9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR5] 5.Monchatre-Leroy E, Boué F, Boucher J-M, Renault C, Moutou F, Ar Gouilh M, et al. Identification of alpha and beta coronavirus in wildlife species in France: bats, rodents, rabbits, and hedgehogs. Viruses. 2017;9(12):364. doi: 10.3390/v9120364. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR6] 6.Munnink BBO, Nieuwenhuijse DF, Stein M, O’Toole Á, Haverkate M, Mollers M, et al. Rapid SARS-CoV-2 whole-genome sequencing and analysis for informed public health decision-making in the Netherlands. Nat Med. 2020;26(9):1405–1410. doi: 10.1038/s41591-020-0997-y. [DOI] [PubMed] [Google Scholar]

[CR7] 7.Elbe S, Buckland-Merrett G. Data, disease and diplomacy: GISAID's innovative contribution to global health. Global Chall. 2017;1(1):33–46. doi: 10.1002/gch2.1018. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR8] 8.Shu Y, McCauley J. GISAID: global initiative on sharing all influenza data–from vision to reality. Eurosurveillance. 2017;22(13):30494. doi: 10.2807/1560-7917.ES.2017.22.13.30494. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR9] 9.Saif R, Mahmood T, Ejaz A. Whole genome comparison of Pakistani corona virus with Chinese and US strains along with its predictive severity of COVID-19. bioRxiv. 2020 doi: 10.1016/j.genrep.2021.101139. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR10] 10.Worldometers (2020) COVID-19 coronavirus pandemic

[CR11] 11.World Health Organization (2020) Laboratory testing for coronavirus disease 2019 (COVID-19) in suspected human cases: interim guidance, 2 March 2020. World Health Organization, Geneva.

[CR12] 12.Untergasser A, Cutcutache I, Koressaar T, Ye J, Faircloth BC, Remm M, et al. Primer3—new capabilities and interfaces. Nucleic Acids Res. 2012;40(15):e115. doi: 10.1093/nar/gks596. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR13] 13.Andrews S. FastQC: a quality control tool for high throughput sequence data. Cambridge: Babraham Bioinformatics, Babraham Institute; 2010. [Google Scholar]

[CR14] 14.Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30(15):2114–2120. doi: 10.1093/bioinformatics/btu170. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR15] 15.Li H, Durbin R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics. 2009;25(14):1754–1760. doi: 10.1093/bioinformatics/btp324. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR16] 16.Wu F, Zhao S, Yu B, Chen Y, Wang W, Song Z, et al. Severe acute respiratory syndrome coronavirus 2 isolate Wuhan-Hu-1, complete genome. Nature. 2020;579(7798):265–269. doi: 10.1038/s41586-020-2008-3. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR17] 17.National Genomics Data Center Members and Partners Database resources of the national genomics data center in 2020. Nucleic Acids Res. 2020;48:D24–D33. doi: 10.1093/nar/gkz1210. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR18] 18.Cleemput S, Dumon W, Fonseca V, Abdool Karim W, Giovanetti M, Alcantara LC, et al. Genome detective coronavirus typing tool for rapid identification and characterization of novel coronavirus genomes. Bioinformatics. 2020;36(11):3552–3555. doi: 10.1093/bioinformatics/btaa145. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR19] 19.Hall T, Biosciences I, Carlsbad C. BioEdit: an important software for molecular biology. GERF Bull Biosci. 2011;2(1):60–61. [Google Scholar]

[CR20] 20.Hadfield J, Megill C, Bell SM, Huddleston J, Potter B, Callender C, et al. Nextstrain: real-time tracking of pathogen evolution. Bioinformatics. 2018;34(23):4121–4123. doi: 10.1093/bioinformatics/bty407. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR21] 21.Katoh K, Asimenos G, Toh H. Multiple alignment of DNA sequences with MAFFT. Bioinformatics for DNA sequence analysis. New York: Springer; 2009. pp. 39–64. [DOI] [PubMed] [Google Scholar]

[CR22] 22.Nguyen L-T, Schmidt HA, Von Haeseler A, Minh BQ. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol Biol Evol. 2015;32(1):268–274. doi: 10.1093/molbev/msu300. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR23] 23.Rambaut A, Holmes EC, O’Toole Á, Hill V, McCrone JT, Ruis C, et al. A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology. Nat Microbiol. 2020;5(11):1403–1407. doi: 10.1038/s41564-020-0770-5. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR24] 24.Wu A, Peng Y, Huang B, Ding X, Wang X, Niu P, et al. Genome composition and divergence of the novel coronavirus (2019-nCoV) originating in China. Cell Host Microbe. 2020;27:325–328. doi: 10.1016/j.chom.2020.02.001. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR25] 25.Hagemeijer MC, Verheije MH, Ulasli M, Shaltiël IA, De Vries LA, Reggiori F, et al. Dynamics of coronavirus replication-transcription complexes. J Virol. 2010;84(4):2134–2149. doi: 10.1128/JVI.01716-09. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR26] 26.Khan MI, Khan ZA, Baig MH, Ahmad I, Farouk A-E, Song YG, et al. Comparative genome analysis of novel coronavirus (SARS-CoV-2) from different geographical locations and the effect of mutations on major target proteins: an in silico insight. PLoS ONE. 2020;15(9):e0238344. doi: 10.1371/journal.pone.0238344. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR27] 27.Wang C, Liu Z, Chen Z, Huang X, Xu M, He T, et al. The establishment of reference sequence for SARS-CoV-2 and variation analysis. J Med Virol. 2020;92(6):667–674. doi: 10.1002/jmv.25762. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR28] 28.Angeletti S, Benvenuto D, Bianchi M, Giovanetti M, Pascarella S, Ciccozzi M. COVID-2019: the role of the nsp2 and nsp3 in its pathogenesis. J Med Virol. 2020;92(6):584–588. doi: 10.1002/jmv.25719. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR29] 29.Islam MR, Hoque MN, Rahman MS, Alam ARU, Akther M, Puspo JA, et al. Genome-wide analysis of SARS-CoV-2 virus strains circulating worldwide implicates heterogeneity. Sci Rep. 2020;10(1):1–9. doi: 10.1038/s41598-019-56847-4. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR30] 30.Khan MT, Ali S, Khan AS, Muhammad N, Khalil F, Ishfaq M, et al. SARS-CoV-2 Genome from the Khyber Pakhtunkhwa Province of Pakistan. ACS Omega. 2021;6(10):6588–6599. doi: 10.1021/acsomega.0c05163. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR31] 31.Umair M, Salman M, Rehman Z, Badar N, Ali Q, Ahad A, et al. Proliferation of SARS-CoV-2 B. 1.1.7 variant in Pakistan-a short surveillance account. Frontiers Public Health. 2021;9:702. doi: 10.3389/fpubh.2021.683378. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR32] 32.Atif M, Malik I. Why is Pakistan vulnerable to COVID-19 associated morbidity and mortality? A scoping review. Int J Health Plann Manage. 2020;35(5):1041–1054. doi: 10.1002/hpm.3016. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR33] 33.Saqlain M, Munir MM, Ahmed A, Tahir AH, Kamran S. Is Pakistan prepared to tackle the coronavirus epidemic? Drugs Therapy Perspect. 2020;36:213–214. doi: 10.1007/s40267-020-00721-1. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR34] 34.Nafees M, Khan F. Pakistan’s response to COVID-19 pandemic and efficacy of quarantine and partial lockdown: a review. Electron J Gen Med. 2020;17(2):em240. doi: 10.29333/ejgm/7951. [DOI] [Google Scholar]

PERMALINK

Genomic Characterization of SARS-CoV2 from Peshawar Pakistan Using Next-Generation Sequencing

Ome Kalsoom Afridi

Nousheen Bibi

Syed Adnan Haider

Bibi Sabiha

Hanifullah Jan

Abid Ali Khan

Shireen Akhter

Valeed Khan

Johar Ali

Abstract

Supplementary Information

Introduction

Materials and Methods

Extraction of RNA and Quality Control

Primer Designing and PCR Amplification

NGS Library Preparation and Whole-Genome Sequencing

NGS Bioinformatics Analysis

Phylogenetic Analysis

Results and Discussion

Table 1.

Fig. 1.

Supplementary Information

Acknowledgements

Author Contribution

Data Availability

Declarations

Conflict of interest

Ethical Approval

Informed Consent

Footnotes

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases