Abstract
Novel Corona Virus has become the talk and thought of the year disturbing the entire world with its new disease COVID. Being totally new, it has become a challenge to diagnose and treat the victims. Thus, it is necessary to understand in detail the genomics of the virus, thereby predicting its evolutionary relation to the other known organisms. This can provide an insight for the development of diagnosis and treatment methodology. In view of the importance of understanding the genetic make of NCoV, the current work was undertaken. The chapter deals in detail with the genomic sequence, important regions, conservations and variations in the genetic make up of the Novel Corona virus.
NCBI is one of the world’s premier websites and a master database [1] that has direct access to the genetics and protein related data of most of the known organisms. NCBI is used to retrieve the complete genome of the organism SARS Corona Virus 2019. The following is the information regarding the same:
Genome Length: 29903 bp ss-RNA
Description of the sequence: Severe acute respiratory syndrome coronavirus 2 isolate Wuhan-Hu-1, complete genome
The accession no for whole genome sequence: MN908947.3
Important Regions Within the Novel Corona Virus Genome [2]
From the data of NCBI it can be inferred that the major proteins involved in the virus structure and function are ORFs and structural proteins M, S, E and N. The longest gene region in the viral genome codes for ORF1ab followed by the gene coding for S protein. Furthermore, certain important characteristics of the genomic region are identified as follows:
The complete genome is from 1 to 29903 bp.
The initial 1–265 region codes for UTR un translated or leader sequence.
The region between 266 and 21555 codes for ORF1ab polyprotein of the bacterial genes.
21563–25384 region codes for one of the structural proteins S of the virus called as Surface glycoprotein.
25393–26220 region codes for the ORF3a gene.
E protein or Envelope protein is coded by the region 26245–26472.
26523–27191 codes for Membrane Glycoprotein or M structural protein.
ORF6 gene is coded from 27202–27387.
27394–27759 is coded for ORF 7a.
27894–28259 is coded for ORF8.
The gene coding for Nucleocapsid phosphoprotein N lies in the region 28274–29533.
Region from 29558–29674 codes for ORF10 protein.
Finally the region from 29675–29903 ends as 3 prime UTR.
Linking NCoV with SARS and determining its genomic conservation
The genomic sequence of Novel Corona Virus (NCoV) was retrieved from NCBI which was subjected to BLAST run to detect the organisms sharing similarity. According to the data mining related to NCoV it was found that the organism belongs to the family of SARS which are known to cause severe acute respiratory syndrome (Fig. 2.1).
Fig. 2.1.
BLAST analysis for the evolutionary study of NCoV
Inference: The above BLAST analysis for the complete genome of NCoV reveals its association with all the other SARS Corona virus of class 2 and share an identity of 99% to 100% with most of them.
Furthermore to understand the conservation pattern among the functional protein coding genes of NCoV genome their nucleotide sequences were retrieved from NCBI. A similarity search was performed using BLAST. Among them the major gene covering the maximum length of the genome is ORF1ab polyprotein (Gene region: 266–21555 bp). The gene sequences of Surface Glycoprotein (S), Nucleocapsid Protein (N), ORF3a polyprotein, Membrane Glycoprotein (M), Envelope Protein (E), ORF6, ORF7a, ORF8 and ORF10 were the other sequences to be analyzed (Fig. 2.2).
Fig. 2.2.
BLAST analysis of ORF1ab polyprotein coding gene
Inference: From the above BLAST analysis of ORF1AB polyprotein gene sequence it can be inferred that all the organisms that belong to SARS Corona Virus 2 family share completely identical sequence for this gene. Thus ORF1AB is conserved evolutionarily and might not be the reason for the variations among the isolates.
A similar comparison was performed for the genes coding for Surface Glycoprotein S, Nucleocapsid Protein N, ORF3a polyprotein, Membrane Glycoprotein M, Envelope Protein E, ORF6, ORF7a, ORF8 and ORF10. It was revealed that all these sequences are completely conserved and do not have any variable regions within the SARS corona 2 family. The above analysis related to the conservation study of the SARS Corona virus 2 reveals that all these 10 important genes share 99 to 100% sequence identity to the other SARS Corona Virus 2 family isolates. Thus there are no regions which are variable and are not the reason for evolution of any novel strain.
Contributor Information
Amit Kumar, Email: amit.kumar@dnares.in.
Ajit Kumar Saxena, Email: draksaxena1@rediffmail.com.
Gwo Giun (Chris) Lee, Email: clee@mail.ncku.edu.tw.
Amita Kashyap, Email: amita@dnares.in.
G. Jyothsna, Email: jyothsna@dnares.in
References
- 1.Sayers EW, Agarwala R, Bolton EE, et al. Database resources of the national center for biotechnology information. Nucleic Acids Res. 2019;47(D1):D23–D28. doi: 10.1093/nar/gky1069. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Wu F, et al. A new coronavirus associated with human respiratory disease in China. Nature. 2020;579(7798):265–269. doi: 10.1038/s41586-020-2008-3. [DOI] [PMC free article] [PubMed] [Google Scholar]


