Skip to main content
GigaByte logoLink to GigaByte
. 2023 Nov 20;2023:gigabyte99. doi: 10.46471/gigabyte.99

The genome assembly and annotation of the Chinese cobra, Naja atra

Jiangang Wang 1,2,, Yuxin Wu 1,, Shiqing Wang 2,3, Weiwu Mu 1, Wenmei Zeng 1, Xi Chen 1, Kangfeng Jiang 1, Liangyu Yang 1, Guohai Hu 4,*, Fengping He 1,*
PMCID: PMC10682346  PMID: 38033372

Abstract

In China, 65 types of venomous snakes exist, with the Chinese Cobra Naja atra being prominent and a major cause of snakebites in humans. Furthermore, N. atra is a protected animal in some areas, as it has been listed as vulnerable by the International Union for Conservation of Nature. Recently, due to the medical value of snake venoms, venomics has experienced growing research interest. In particular, genomic resources are crucial for understanding the molecular mechanisms of venom production. Here, we report a highly continuous genome assembly of N. atra, based on a snake sample from Huangshan, Anhui, China. The size of this genome is 1.67 Gb, while its repeat content constitutes 37.8% of the genome. A total of 26,432 functional genes were annotated. This data provides an essential resource for studying venom production in N. atra. It may also provide guidance for the protection of this species.

Introduction

Elapidae is a family of snakes divided into three subfamilies (Bungarinae, Elapinae and Notechinae), with 44 genera and around 186 described species distributed widely [1]. The front of the mouth of an elapid has permanently erect tusks, which are his distinguishing features. Elapids include terrestrial and sea snakes. Terrestrial elapids, a family of venomous snakes, are distributed across the globe in tropical and subtropical regions, with most species inhabiting the Southern Hemisphere. Elapid sea snakes are mainly distributed in the Indian Ocean and the Southwest Pacific Ocean [2].

The Chinese cobra, or Naja atra (NCBI: txid8656) (Figure 1), is a species of cobra from the family Elapidae. Chinese cobras are usually between 1.2 and 1.5 m long [3], and they are among the most prevalent cobra species in China. The Chinese cobra likes to inhibit plains, hills and low mountains [4]. Humans often encounter Chinese cobras, although these snakes usually escape to avoid confrontation with humans. Chinese cobras can be observed hunting during daylight hours from March to October and up to 2–3 hours after sunset at temperatures of 20–32 °C [5]. They have a widely varied diet and prey on rodents, frogs, toads and other snakes.

Figure 1.

Figure 1.

The view of the head of a Chinese cobra (N. atra) snake on alert in Tainan City. N. atra. Source: Boris Smokrovic, Unsplash, CC0

The Chinese cobra is highly poisonous, its venom consisting mainly of postsynaptic neurotoxins and cardiotoxins [6]. Their venom offers them protection from predation to a certain extent; however, populations of Chinese cobra have declined by 30% to 50% due to habitat loss and hunting by humans. The venom of Chinese cobras can be used to extract anti-cobra snake venom, which is used to treat cobra snake bites. Although the Chinese cobra is currently listed as a Vulnerable species on the International Union for Conservation of Nature Red List [7], its numbers in the wild have declined from Vulnerable to Endangered due to continued hunting.

Main Content

Context

Snakebite is a serious threat to human life as it kills around 100,000 people annually. Genome-enabled research of toxin genes may facilitate the development of effective antivenoms. Here, we present a highly continuous reference genome assembly of N. atra. While there is a reference genome for the Indian cobra (Naja naja) [8], this is the first for the Chinese cobra. This resource may also provide valuable information for the conservation of this vulnerable species, which can be used for targeted protection and breeding.

Methods

The detailed methods used in this study are available via a protocol collection hosted in protocols.io [9] (Figure 2).

Figure 2.

Figure 2.

A protocols.io collection of the standard protocols for sequencing snake genomes [9].https://www.protocols.io/widgets/doi?uri=dx.doi.org/10.17504/protocols.io.5jyl8j6e9g2w/v2

Sample collection and sequencing

The N. atra sample used in this study was captured in Huangshan, Anhui, China, in 2021. After collection, the specimen was quickly frozen to −80 °C using drikold dry ice for storage and transport. Methods for DNA extraction, library construction and sequencing were identical those used by Liu et al. in a previous study [10].

Sample collection, experiments and research design were all authorized by the Institutional Review Board of BGI (BGI-IRB E22001). In this research, all the procedures have been operated abiding to the guidelines from BGI-IRB strictly.

Genome survey, assembly, annotation and assessment

The single-tube long fragment read sequencing data were assembled using Supernova (v2.1.1, RRID:SCR_016756) [11]. NextPolish (v1.0.5) [12] was then used to perform a second round of correction and a third round of polishing of this assembly using the Whole Genome Sequencing data. To get a haploid representation of the genome, duplicates were purged from the genome using the purge_dups pipeline (RRID:SCR_021173) [13]. The completeness of the genome was evaluated using sets of BUSCO (v5.2.2, RRID:SCR_015008) [14] with genome mode and lineage data from vertebrata_odb10 [15].

In order to detect the presence of known repeat elements in the genome of the many-banded P. mucosa, the following approach was employed. To identify the known repetitive elements in the genome of the many-banded krait, we used Tandem repeats Finder [16], LTR_Finder (RRID:SCR_015247) [17] and RepeatModeler (v2.0.1, RRID:SCR_015027) [18]. RepeatMasker (v3.3.0, RRID:SCR_012954) [19] and RepeatProteinMask v3.3.0 [20] were used to search the genome sequences for known repeat elements. The BRAKER2 pipeline (RRID:SCR_018964) [21] was used for gene prediction. Then, the gene sets were aligned against several known databases, including SwissProt, TrEMBL [22], Kyoto encyclopedia of genes and genomes (KEGG) [23], gene ontology (GO), and the Non-Redundant Protein Sequence Database [24] database.

Results

We present a draft genome sequence of N. atra. The size of this genome is 1.67 Gb (Table 1), similar to the previously published 1.79 Gb genome of N. naja [8]. The scaffold N50 length is 234.17 Kb, and the CG content reached 37.8%. The maximal scaffold length is 2,929,773 bp, demonstrating that the reference is highly continuous according to the characteristics of the genome sequence. In addition, the integrity of the genome was assessed at 84.1% using BUSCO (Figure 3).

Table 1.

Summary of the features of our N. atra genome.

Contig Scaffold
Maximal length (bp) 271,789 2,929,773
N90 (bp) 4,371 7,368
N50 (bp) 33,081 234,173
Number ≥ 100 bp 194,909 106,418
Number ≥ 2 kb 113,570 54,157
GC content (%) 40.3 37.8
Genome size (bp) 1,671,178,062

Figure 3.

Figure 3.

BUSCO assessment result of our N. atra genome.

In our N. atra genome, the content of repetitive elements is up to 40.26%, and the total length is 672 Mb (Tables 2, 3). After we counted all repeat elements, we found that long interspersed nuclear elements (LINEs) accounted for 30.63%, long terminal repeats (LTRs) accounted for 14.03% and DNA accounted for 4.27% (Figure 4).

Table 2.

Statistics for repetitive sequences identified in our N. atra genome.

Type Length (bp) % in genome
DNA 37,917,702 2.269170
LINE 449,338,074 26.890460
SINE 2,779,035 0.166310
LTR 224,765,038 13.450975
Other 0 0
Satellite 632,498 0.037852
Simple_repeat 5,080,994 0.304070
Unknown 7,924,824 0.474258
Total 672,795,525 40.263183

Table 3.

Summary of the TEs in our N. atra genome.

Repbase TEs TE proteins De novo Combined TEs
Type Length (bp) % in genome Length (bp) % in genome Length (bp) % in genome Length (bp) % in genome
DNA 44,907,141 2.57 3,638,477 0.20 41,761,899 2.39 81,259,555 4.66
LINE 170,663,721 9.79 140,023,530 8.03 581,624,764 33.36 619,156,475 35.51
SINE 25,759,131 1.47 0 0 8,061,060 0.46 32,226,226 1.84
LTR 22,468,876 1.28 30,088,483 1.72 149,994,747 8.60 159,624,403 9.15
Other 23,680 0.001 0 0 0 0 23,680 0.001
Unknown 0 0 0 0 5,653,213 0.32 5,653,213 0.32
Total 251,569,212 14.43 173,669,200 9.96 722,435,038 41.44 752,340,302 43.15

Figure 4.

Figure 4.

Distribution of transposable elements (TEs) in our N. atra genome. The TEs include DNA transposons (DNA) and RNA transposons (i.e., DNAs, LINEs, LTRs, and short interspersed nuclear elements (SINEs)). (a) Known sequences divergence rate (b) De novo sequences divergence rate.

Finally, 29,063 functional genes were annotated. Through KEGG annotation, we found that the genes related to signal transduction are essential in N. atra (Figure 5). Furthermore, through a pathway enrichment analysis, we found that the number of Human Diseases pathways is the highest. Environmental Information Processing and Organismal systems also account for a relatively large proportion. According to the annotation and enrichment in the GO database, 6,292 genes are enriched in cellular process and 6,734 in binding.

Figure 5.

Figure 5.

Gene annotation information of N. atra. (a) KEGG enrichment of N. atra. (b) GO enrichment of N. atra.

Acknowledgements

Yunnan Agricultural University collected the samples.

Funding Statement

Our project was financially supported by the Guangdong Provincial Key Laboratory of Genome Read and Write (grant no. 2017B030301011). This work was also supported by China National GeneBank (CNGB).

Data availability

The data supporting the findings of this study have been deposited into the CNGB Sequence Archive (or CNSA) of China National GeneBank DataBase (or CNGBdb) with the accession number CNP0004141. Raw reads are available in the SRA via bioproject PRJNA955401. Additional data is in the GigaDB repository [25].

Editor’s Note

This paper is part of a series of Data Release papers presenting the genomes of different snake species [26].

Abbreviations

GO, gene ontology; KEGG, Kyoto encyclopedia of genes and genomes; LINE, long interspersed nuclear element; LTR, long terminal repeat; SINE, short interspersed nuclear element; TE, transposable element.

Declarations

Ethics approval and consent to participate

The authors declare that ethical approval was not required for this type of research.

Competing interests

The authors declare no conflict of financial interests.

Authors’ contributions

GH and FH designed and initiated the project. YW performed DNA extraction, library construction and data analysis. JW wrote the manuscript. All authors read and approved the final manuscript.

Funding

Our project was financially supported by the Guangdong Provincial Key Laboratory of Genome Read and Write (grant no. 2017B030301011). This work was also supported by China National GeneBank (CNGB).

References

  • 1.Slowinski JB, Keogh JS. . Phylogenetic relationships of elapid snakes based on cytochrome b mtDNA sequences. Mol. Phylogenet. Evol., 2000; 15(1): 157–164. doi: 10.1006/mpev.1999.0725.i. [DOI] [PubMed] [Google Scholar]
  • 2.Chan S. . A Field Guide to the Venomous Land Snakes of Hong Kong. Hong Kong: Cosmos Books Ltd., 2006; ISBN 988-211-326-5. [Google Scholar]
  • 3.Gopalkrishnakone P, Chou LM. . Snakes of Medical Importance. Singapore: Venom and Toxins Research Group, 1990; ISBN 9971-62-217-3. [Google Scholar]
  • 4. Naja atra – General details, taxonomy and biology, venom, clinical effects, treatment, first aid, antivenoms. WCH Clinical Toxinology Resource. University of Adelaide. http://www.toxinology.com/fusebox.cfm?fuseaction=main.snakes.display&id=SN0039. Accessed 18 May 2023.
  • 5.Zhao EM, Adler K. . Herpetology of China. United States: Society for the Study of Amphibians and Reptiles, 1993; ISBN 0-916984-28-1. [Google Scholar]
  • 6.Wang AH, Yang CC. . Crystallographic studies of snake venom proteins from Taiwan cobra (Naja nana atra). Cardiotoxin-analogue III and phospholipase A2. J. Biol. Chem., 1981; 256(17): 9279–9282. doi: 10.1016/S0021-258(19)52542-X. [DOI] [PubMed] [Google Scholar]
  • 7.Naja atra”. IUCN Red List of Threatened Species. 2014; e.T192109A2040894. 10.2305/IUCN.UK.2014-3.RLTS.T192109A2040894.en.
  • 8.Suryamohan K, Krishnankutty SP, Guillory J et al. . The Indian cobra reference genome and transcriptome enables comprehensive identification of venom toxins. Nat. Genet., 2020; 52(1): 106–117. doi: 10.1038/s41588-019-0559-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Liu B, Cui L, Deng Z et al. . Protocols for the assembly and annotation of snake genomes. Protocols.io. 2023; 10.17504/protocols.io.5jyl8j6e9g2w/v2. [DOI]
  • 10.Liu B, Cui L, Deng Z et al. . The genome assembly and annotation of the many-banded krait, Bungarus multicinctus . GigaByte, 2023; 2023: gigabyte82. doi: 10.46471/gigabyte.82. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Weisenfeld NI, Kumar V, Shah P et al. . Direct determination of diploid genome sequences. Genome Res., 2017; 27(5): 757–767. doi: 10.1101/gr.214874.116. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Hu J, Fan J, Sun Z et al. . NextPolish: a fast and efficient genome polishing tool for long-read assembly. Bioinformatics, 2020; 36(7): 2253–2255. doi: 10.1093/bioinformatics/btz891. [DOI] [PubMed] [Google Scholar]
  • 13.Guan D, McCarthy SA, Wood J et al. . Identifying and removing haplotypic duplication in primary genome assemblies. Bioinformatics, 2020; 36(9): 2896–2898. doi: 10.1093/bioinformatics/btaa025. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Simão FA, Waterhouse RM, Ioannidis P. . BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics, 2015; 31(19): 3210–3212. doi: 10.1093/bioinformatics/btv351. [DOI] [PubMed] [Google Scholar]
  • 15.Wick RR, Holt KE. . Benchmarking of long-read assemblers for prokaryote whole genome sequencing. F1000Research, 2019; 8: 2138. doi: 10.12688/f1000research.21782.4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Benson G. . Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res., 1999; 27(2): 573–580. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Zhao X, Hao W. . LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic Acids Res., 2007; 35(suppl_2): W265–W268. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Smit A, Hubley R, Green P. . RepeatModeler Open-1.0. 2008–2015. Seattle, USA: Institute for Systems Biology, 2018. Available from: https://www.repeatmasker.org, Last Accessed May 2015; 1. [Google Scholar]
  • 19.Tarailo-Graovac M, Chen N. . Using RepeatMasker to identify repetitive elements in genomic sequences. Curr. Protoc. Bioinformatics, 2009; 25: 4.10.1–4.10.14. [DOI] [PubMed] [Google Scholar]
  • 20.Tempel S. . Using and understanding RepeatMasker. In: Mobile Genetic Elements. Springer, 2012; pp. 29–51, doi: 10.1007/978-1-61779-603-6_2. [DOI] [PubMed] [Google Scholar]
  • 21.Bruna T, Hoff KJ, Lomsadze A et al. . BRAKER2: automatic eukaryotic genome annotation with GeneMark-EP+ and AUGUSTUS supported by a protein database. NAR Genom. Bioinform., 2021; 3(1): lqaa108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Bairoch A, Apweiler R. . The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res., 2000; 28(1): 45–48. doi: 10.1093/nar/28.1.45. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Kanehisa M. . KEGG database. Novartis Found. Symp., 2006; 247: 91–101. discussion 101–3, 119–28, 244–52. doi: 10.1002/0470857897.ch8. [DOI] [PubMed] [Google Scholar]
  • 24.Pruitt KD, Tatusova T, Maglott DR. . NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res., 2007; 35(Database issue): D61–D65. doi: 10.1093/nar/gkl842. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Wang J, Wu Y, Wang S et al. . Supporting data for “The genome assembly and annotation of the Chinese cobra, Naja atra”. GigaScience Database, 2023; 10.5524/102476. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Snake Genomes. GigaByte. 2023; 10.46471/GIGABYTE_SERIES_0004. [DOI]
GigaByte. 2023 Nov 20;2023:gigabyte99.

Article Submission

Jiangang Wang
GigaByte.

Assign Handling Editor

Editor: Scott Edmunds
GigaByte.

Editor Assess MS

Editor: Hongfang Zhang
GigaByte.

Curator Assess MS

Editor: Chris Armit
GigaByte.

Review MS

Editor: Somasekar Seshagiri

Reviewer name and names of any other individual's who aided in reviewer Kushal Suryamohan
Do you understand and agree to our policy of having open and named reviews, and having your review included with the published papers. (If no, please inform the editor that you cannot review this manuscript.) Yes
Is the language of sufficient quality? Yes
Please add additional comments on language quality to clarify if needed
Are all data available and do they match the descriptions in the paper? No
Additional Comments I was not able to access all the data - but I assume its available under the accession they provide in the paper - also will be nice to have the gene annotation table as part of a supplementary section in the manuscript
Are the data and metadata consistent with relevant minimum information or reporting standards? See GigaDB checklists for examples <a href="http://gigadb.org/site/guide" target="_blank">http://gigadb.org/site/guide</a> Yes
Additional Comments
Is the data acquisition clear, complete and methodologically sound? No
Additional Comments A generic protocol is referenced - I suggest the data collected specific to this snake be stated in the MS
Is there sufficient detail in the methods and data-processing steps to allow reproduction? Yes
Additional Comments
Is there sufficient data validation and statistical analyses of data quality? Not my area of expertise
Additional Comments There is no statistical analysis applicable in the work
Is the validation suitable for this type of data? No
Additional Comments not sure this is applicable
Is there sufficient information for others to reuse this dataset or integrate it with other data? No
Additional Comments not clear - may be it is - urge the authors to make the gene annotation available as part of the manuscript
Any Additional Overall Comments to the Author The authors report the assembly for Naja atra. The report a genome of 1.34Gb coding for 26,432 functional genes. The work is suitable for the journal. Some suggestions to improve the work to make it more than just a genome announcement paper are noted below. 1. Authors point to a protocols paper to figure out the data generate – will be useful for them to state what data type and much was collected 2. Suggest adding numbers that indicate how many scaffolds greater than 5 or 10Mb is present in the assembly and does that number roughly correspond to the number of chromosomes seen elapids 3. They say they identified 26,432 functional genes – is all of these full length protein coding genes ? Is there an annotation for toxin genes ? Can they state how many toxins they find in the genome given it’s a medically important snake as stated in their introduction.
Recommendation Minor Revision
GigaByte.

Review MS

Editor: Peng Zhang

Reviewer name and names of any other individual's who aided in reviewer Peng Zhang
Do you understand and agree to our policy of having open and named reviews, and having your review included with the published papers. (If no, please inform the editor that you cannot review this manuscript.) Yes
Is the language of sufficient quality? No
Please add additional comments on language quality to clarify if needed
Are all data available and do they match the descriptions in the paper? No
Additional Comments
Are the data and metadata consistent with relevant minimum information or reporting standards? See GigaDB checklists for examples <a href="http://gigadb.org/site/guide" target="_blank">http://gigadb.org/site/guide</a> No
Additional Comments
Is the data acquisition clear, complete and methodologically sound? No
Additional Comments
Is there sufficient detail in the methods and data-processing steps to allow reproduction? No
Additional Comments
Is there sufficient data validation and statistical analyses of data quality? No
Additional Comments
Is the validation suitable for this type of data? No
Additional Comments
Is there sufficient information for others to reuse this dataset or integrate it with other data? No
Additional Comments
Any Additional Overall Comments to the Author The article does not meet the requirements of the Gigabyte Journal due to the following reasons, which require improvements: The English writing is of poor quality and the expressions used are colloquial. The data presented in the article is inconsistent. For instance, the genome size is mentioned as 1.34G in the abstract, but it is stated as 1.67G in the results section. This inconsistency raises concerns among reviewers regarding the reliability of the article. The quality of the genome is very low, indicated by a high number of contigs and a low Busco score, suggesting that the genome is incomplete. The study appears to be unfinished with respect to the N.atra genome. The previously published cobra genome, N.naja, exhibits higher quality. It possesses a contig N50 of 302.5 kb and represents a chromosome-level genome. The usage of screenshots as primary figures is inappropriate, and the figure legends are too brief. I recommend using the experimental subject as Figure 1 instead. The methods section lacks a description of the annotation method employed, requiring additional details. In summary, my suggestion is to reject the article.
Recommendation Reject (Unsound or Unusuable)
GigaByte.

Editor Decision

Editor: Hongfang Zhang
GigaByte. 2023 Nov 20;2023:gigabyte99.

Major Revision

Jiangang Wang
GigaByte.

Assess Revision

Editor: Hongfang Zhang
GigaByte.

Re-Review MS

Editor: Peng Zhang

Comments on revised manuscript The authors have improved their manuscript a little bit. The paper is now more readable. Although I think the paper does not reach my standard, the new cobra genome data presented here is a contribution to the herp community. Since GigaByte focuses on less-complex, stand-alone datasets, the paper may be acceptable.
GigaByte.

Editor Decision

Editor: Hongfang Zhang
GigaByte.

Final Data Preparation

Editor: Mary-Ann Tuli
GigaByte.

Editor Decision

Editor: Hongfang Zhang
GigaByte.

Accept

Editor: Scott Edmunds

Editor’s Assessment The Chinese cobra Naja atra is a highly venomous snake among the most prevalent cobra species in China. To help better understand the evolution and venom of cobra species, a 1.67Gb reference genome was sequenced, assembled and described in this work. During review some inconsistencies the data quality were fixed. With other cobra species already published, this data can be combined with these and other upcoming snake genome data to construct the evolutionary history of snakes and other reptiles as well as the genetic basis of venom.
GigaByte.

Export to Production

Editor: Scott Edmunds

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Data Availability Statement

    The data supporting the findings of this study have been deposited into the CNGB Sequence Archive (or CNSA) of China National GeneBank DataBase (or CNGBdb) with the accession number CNP0004141. Raw reads are available in the SRA via bioproject PRJNA955401. Additional data is in the GigaDB repository [25].


    Articles from GigaByte are provided here courtesy of Gigascience Press

    RESOURCES