Abstract
The novel coronavirus, SARS‐CoV‐2, has caused the most unfathomable pandemic in the history of humankind. Bangladesh is also a victim of this critical situation. To investigate the genomic features of the pathogen from Bangladesh, the first complete genome of the virus has very recently been published. Therefore, long‐awaited questions regarding the possible origin and typing of the strain(s) can now be answered. Here, we endeavor to mainly discuss the published reports or online‐accessed data (results) regarding those issues and present a comprehensive picture of the typing of the virus alongside the probable origin of the subclade containing the Bangladeshi strain. Our observation suggested that this strain might have originated from the United Kingdom or the other European countries epidemiologically linked to the United Kingdom. According to different genotyping classification schemes, this strain belongs to the A2a clade under the G major clade, is of B and/or L type, and is a SARS‐CoV‐2a substrain. In the future, randomized genomic data will certainly increase in Bangladesh, however because of globalization and immigrant movement, we urgently need a mass regional sequencing approach targeting the partial or complete genome that can link the epidemiological data and may help in further clinical intervention.
Keywords: Evolution, Bangladesh, Genome, Unique Mutation, SARS‐CoV‐2, Virus Classification
Highlights
This commentary has comprehensively explained the genotyping and possible origin of first Bangladeshi SARS‐CoV‐2 strain, which will also help in identifying the similar strains sequenced in future. Moreover, we stress onto the importance of a broad scale, rationale and nation‐wide genomic data analysis correlated with the clinical data.
From Bangladesh, the very first complete genome of SARS‐CoV‐2 (hCoV‐19/Bangladesh/CHRF_nCOV19_0001/2020 under GISAID Accession no: EPI/ISL/437912) was published by the Child Health Research Foundation (CHRF) on 12 May 2020. 1 Since the publication of the sequence, along with the huge appreciation from the national bioscience community, many researchers are conducting bioinformatics analyses to gain insight into the characteristics of the viruses circulating in Bangladesh. This opinion reflects the scientific thoughts regarding this genome, then a point‐by‐point approach will reveal the related bioinformatically analyzed reports published or open‐accessed elsewhere and future prospects.
First, Bangladesh reported the first case of COVID‐19 on 8 March 2020 having a travel history from Italy. 2 The first complete sequence data is of the virus strain sampled from a 22 years aged female patient on 18 April 2020, although not mentioned, possibly the person had no recent travel history. 1 Phylodynamics data found in the Nextstrain (https://nextstrain.org/) showed that the time to most recent common ancestor (tMRCA) of this Bangladeshi strain was 14 March 2020 spanning the time interval range between 7 March and 23 March 2020 (Figure 1). 3 Evolutionary analysis shown in the maximum clade credibility (MCC) tree represented the closest relative of hCoV‐19/Bangladesh/CHRF_nCOV19_0001/2020 as Wales/PHWC‐277D1/2020 until 14th April. Also, other closely related strains within the respective subclade are mostly from European countries (Figure 1).
Figure 1.

MCC tree showing the tMRCA and possible origin of first Bangladeshi strain. Image was taken from the Nextstrain, 3 link https://nextstrain.org/ncov/global?f_country=Bangladesh
It should be noted that the nearest origin and tMRCA of the Bangladeshi strain can be changed as this is a dynamic evolving tree, which is continuously updated based on ever‐increasing sequence numbers and resulting from a more equitable global sequence distribution, hiding samples available from regions that are generating lots of genomic data. 1 , 3 Nevertheless, the origin of this subclade has pointed toward the United Kingdom indicating that the Bangladeshi strain may come from what was once the epicenter of this pandemic. 3 Very likely, the sequence is not directly related to the first three cases reported in Bangladesh, as among the infected persons, two men returned from Italy and the other infected females had close contact with them. 2 It can also be noted that on 14th March, the Bangladesh Government banned flights carrying passengers from all European countries, except the United Kingdom. 4 Even on 10th April, the Civil Association Authority reported all the passenger flights on domestic and international routes remained suspended except a few with China and the United Kingdom. 5
Secondly, the sequence is showing 99.99% identity to the strains of European, Arabian, and Asian countries (Table 1 listing the representative countries with the strain ID) considering a good number of the aligned sequences of the NCBI and GISAID databases. 1 Focusing on this point, we should keep in mind that distinct phylogenetic analyses considering different parameters within the selected models can give ambiguous monophyletic results due to a high level of identity among a considerable number of sequences with the Bangladeshi strain. For example, one neighbor‐joining tree can predict a close relation of the virus to strain(s) of Greece, another tree may cluster with the other European, even Asian or American strains. Besides this, the maximum likelihood can give a misleading tree as the same mutations at identical genomic positions are present among multiple closely related viruses. As Nextstrain or GISAID deals with a large number of data set (25,246 as of 14th April 2020) and generates an updated picture of the evolutionary timeframe using the maximum clade credibility (MCC) tree, at this stage, these results can be considered more reliable. However, research studies based on interesting hypotheses, focused targets and solid rationales are highly encouraged.
Table 1.
Representative sequence information for the countries
| Middle‐East | Asia | Europe | North America |
|---|---|---|---|
| United Arab Emirates | Russia (EPI_ISL_428913) | Latvia (EPI_ISL_437090) | USA (EPI_ISL_444740) |
| (EPI_ISL_443182) | |||
| Saudi Arabia | India (MT415323) | Sweden (EPI_ISL_429119) | Mexico |
| (EPI_ISL_435132) | (EPI_ISL_412972) | ||
| Sri Lanka (EPI_ISL_428671) | Greece (MT328035) | ||
| Taiwan (EPI_ISL_426631) | Belgium (EPI_ISL_420433) |
Many researchers have modeled the evolutionary landscape based on the genome‐wide diverse pattern of mutational variations. 6 , 7 , 8 We have found, considering this literature and online resources, that the Bangladeshi strain fell within the clade A2a containing a mutation at the 614th position of the spike protein changing amino acid aspartate to glycine (synonymous to G clade according to GISAID phylogenetic tree) and another mutation at the 323rd position of NSP12 (proline to leucine). 1 , 3 In addition to this report, Brufsky 9 suggested this type of mutation can alter the heavily glycosylated spike protein, which has a possible impact on the membrane fusion in tissues resulting in change of the virulence. Another related study showed that this mutation generates a serine protease cleavage site in the S1‐S2 junction of spike protein that may facilitate the entry of SARS‐COV‐2 into the host cell. 10
According to Forster et al, 7 the strain can be characterized as a B type virus originating from type A subtype 29095C, which has a linked outward branch towards Europe. This derived B type virus has particular changes within the genome (NSP4:8,782C; ORF3a: 26,144G; ORF8: 28,144T) separating the B type from the other two (A and C), and is usually linked to immunological or environmental resistance against outside Asia, especially East Asia. Concomitantly, the Bangladeshi strain belongs to the L type, which is probably more aggressive and can spread quickly although human interventions have been decreasing the relative frequency of the L type. 6
Notably, based on another classification scheme, this is a SARS‐CoV‐2a substrain due to the presence of a unique trinucleotide‐bloc mutation present at the Nucleoprotein (N) protein coding region (28,881‐28,883: AAC), 8 and the translated G204R (glycine to arginine) mutation places the strain within the GR clade under the major G clade. 1 The resulting amino acid substitutions (lysine and arginine) might reduce the pathogenicity of the SARS‐CoV‐2. In the perspective of the ORF3a, the Bangladeshi strain has an identical nucleotide (base “G”) at two mutually exclusive sites (25,563 and 26,144), compared to the “wild‐type” 2g substrain (base “T”), wherein G25563T and G26144T are linked with the emergence of GH and V (or synonymously type C7) clades, 1 respectively. The resulting nonsynonymous mutations of the 3a protein (Q57H: glutamine→histidine and G251V: glycine→valine), albeit not within any of its six functional domains (I to VI), 11 might have a positive 12 or negative 8 impact on structural stability and binding affinity, possibly influencing disease pathogenesis (i.e. modulating the immunological reaction, notably the 'cytokine storm' in the host 8 ) and drug resistance capacity. 11 It was also claimed in the report of Ayub 8 as of 9th April 2020 that the predominant presence of SARS‐CoV‐2a substrain in a particular city or country, such as United Kingdom (26%), Belgium (31%), Netherland (50%), Portugal (60%), can be a cause of reduced cases of COVID‐19.
Strikingly, the Bangladeshi strain also has two unique mutations in the ORF1ab region. Among them, mutation E261D (glutamic acid to aspartic acid) in the NSP13 protein (RNA helicase and/or 5′ triphosphatase activity) was found only in one Austrian strain (hCoV‐19/Austria/CeMM0004/2020) collected on 3rd March 2020. 1 , 13 Remarkably, another mutation (I120F: isoleucine to phenylalanine) in the NSP2 (predicted role in viral pathogenicity) 14 is not found in any other sequences available in the GISAID database. 1 Overall, both of the unique mutations may not be very signficant, considering the similar chemical properties of the amino acids and will need further molecular investigation to find any clue.
Overall, it is speculated that Bangladesh can be an important source of mixed virus strains. This may be explained by the fact of the return of a lot of immigrants to Bangladesh from other countries, some of which were declared as the COVID‐19 epicenters. 15 In this context, extensive sequencing of representative districts or zones, therefore, will give a concrete and comprehensive basis to bring out significant information, as expected in an ongoing outbreak. Moreover, the sequence data should also be correlated with patient history. It is important to note here that we may not need to go for complete/partial genome sequencing because scientists might soon find out the crucial, epidemiological marker(s) from the existing sequence information linked to clinical importance, which can also be useful to track the presence of viral types alongside important mutations. Targeted sequencing of only that genomic region(s), or part of the segment(s) will be sufficient for getting the relevant information and will give a breadth of opportunity for matching the clinical history and spreading pattern more easily. To conclude, a comprehensive and rational analysis, considering the related virus strains or relevant countries, rather than taking all the strains into the computational run will bring about a clearer picture, although more genomic data along with patient history from Bangladesh is a prerequisite here.
Alam ASMRU, Islam MR, Rahman MS, Islam OK, Hossain MA. Understanding the possible origin and genotyping of the first Bangladeshi SARS‐CoV‐2 strain. J Med Virol. 2021;93:1–4. 10.1002/jmv.26115
Present address M. Anwar Hossain, Vice‐Chancellor, Jashore University of Science and Technology, Jashore‐7408, Bangladesh.
REFERENCES
- 1. Shu Y, McCauley J. GISAID: Global initiative on sharing all influenza data—from vision to reality. Euro Surveill Bull Eur Sur Mal Transm Eur Commun Dis Bull. 2017;22(13):30494. 10.2807/1560-7917.ES.2017.22.13.30494 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. Iedcr . https://www.iedcr.gov.bd/. Accessed May 16, 2020.
- 3. Hadfield J, Megill C, Bell SM, et al. Nextstrain: real‐time tracking of pathogen evolution. Bioinformatics. 2018;34(23):4121‐4123. 10.1093/bioinformatics/bty407 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. Educational institutions to remain closed till March 31 . The Business Standard. Published March 16, 2020. https://tbsnews.net/bangladesh/education/govt-orders-closure-all-educational-institutions-march-17-56947. Accessed May 16, 2020.
- 5. Airline industry counting losses amid COVID‐19 pandemic . https://thefinancialexpress.com.bd/trade/airline-industry-counting-losses-amid-covid-19-pandemic-1586520644. Accessed May 16, 2020.
- 6. Tang X, Wu C, Li X, et al. On the origin and continuing evolution of SARS‐CoV‐2. Natl Sci Rev. 2020;(nwaa036). 10.1093/nsr/nwaa036 [DOI] [PMC free article] [PubMed]
- 7. Forster P, Forster L, Renfrew C, Forster M. Phylogenetic network analysis of SARS‐CoV‐2 genomes. Proc Natl Acad Sci. 2020;117(17):9241‐9243. 10.1073/pnas.2004999117 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Ayub MI Reporting two SARS‐CoV‐2 strains based on a unique trinucleotide‐bloc mutation and their potential pathogenic difference. 19 April 2020. 10.20944/preprints202004.0337.v1 [DOI]
- 9. Brufsky A. Distinct viral clades of SARS‐CoV‐2: implications for modeling of viral spread. J Med Virol. 2020. 10.1002/jmv.25902 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Bhattacharyya C, Das C, Ghosh A, et al. Global spread of SARS‐CoV‐2 subtype with spike protein mutation D614G is shaped by human genomic variations that regulate expression of TMPRSS2 and MX1 genes. bioRxiv. 2020. 10.1101/2020.05.04.075911 [DOI] [Google Scholar]
- 11. Issa E, Merhi G, Panossian B, Salloum T, Tokajian S. SARS‐CoV‐2 and ORF3a: nonsynonymous mutations, functional domains, and viral pathogenesis. mSystems. 2020;5(3):e00266‐20. 10.1128/mSystems.00266-20 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Kim J‐S, Jang J‐H, Kim J‐M, Chung Y‐S, Yoo C‐K, Han M‐G. Genome‐wide identification and characterization of point mutations in the SARS‐CoV‐2 genome. Osong Public Health Res Perspect. 2020. 10.24171/j.phrp.2020.11.3.05 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Jia Z, Yan L, Ren Z, et al. Delicate structural coordination of the severe acute respiratory syndrome coronavirus Nsp13 upon ATP hydrolysis. Nucleic Acids Res. 2019;47(12):6538‐6550. 10.1093/nar/gkz409 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Angeletti S, Benvenuto D, Bianchi M, Giovanetti M, Pascarella S, Ciccozzi M. COVID‐2019: the role of the nsp2 and nsp3 in its pathogenesis. J Med Virol. 2020;92(6):584‐588. 10.1002/jmv.25719 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Khosrawipour V, Lau H, Khosrawipour T, et al. Failure in initial stage containment of global COVID‐19 epicenters. J Med Virol. 2020. 10.1002/jmv.25883 [DOI] [PMC free article] [PubMed] [Google Scholar]
