Skip to main content
GigaByte logoLink to GigaByte
. 2023 Nov 7;2023:gigabyte97. doi: 10.46471/gigabyte.97

Genome assembly and annotation of the Brown-Spotted Pit viper Protobothrops mucrosquamatus

Xiaotong Niu 1,2,, Haorong Lu 3,4,, Minhui Shi 1,5, Shiqing Wang 1,5, Yajie Zhou 1,4,*, Huan Liu 1,4,*
PMCID: PMC10644238  PMID: 38023064

Abstract

The Brown-Spotted Pit viper (Protobothrops mucrosquamatus), also known as the Chinese habu, is a widespread and highly venomous snake distributed from Northeastern India to Eastern China. Genomics research can contribute to our understanding of venom components and natural selection in vipers. Here, we collected, sequenced and assembled the genome of a male P. mucrosquamatus individual from China. We generated a highly continuous reference genome, with a length of 1.53 Gb and 41.18% of repeat elements content. Using this genome, we identified 24,799 genes, 97.97% of which could be annotated. We verified the validity of our genome assembly and annotation process by generating a phylogenetic tree based on the nuclear genome single-copy genes of six other reptile species. The results of our research will contribute to future studies on Protobothrops biology and the genetic basis of snake venom.

Introduction

Protobothrops mucrosquamatus belongs to the Viperidae (viper) family of snakes commonly known as the brown spotted pit viper or Chinese habu. This species is widely distributed in northern Vietnam, Laos, northern Myanmar, northeastern India, as well as southwestern and eastern China (Figure 1) [1]. P. mucrosquamatus is a venomous snake with tubular venom-conducting fangs and loreal pit. Their poisoning manifests through the functional impairment of the blood circulation system of their prey [2]. Compared with other terrestrial vipers, the maximum amount of single-discharging venom of P. mucrosquamatus is higher than in Trimeresurus stejnegeri, Gloydius blomhoffii and Bungarus multicinctus [3]. Its toxicity per unit dose is also higher than in Deinagkistrodon acutus and T. stejnegeri [3].

Figure 1.

Figure 1.

A Brown-Spotted Pit viper (P. mucrosquamatus) individual, photographed by Diancheng Yang in Guilin, Guangxi Province.

Snake venom, while it may contribute to health damage in organisms [1, 2, 46], can also play a role in biomedicine [5, 79], particularly in snake antivenom development, disease treatment and many other fields [10]. High-quality reference genomes and transcriptomes are required to detect venom genes, get insights into toxin-manufacturing mechanisms, and design safe and effective antivenoms and other drugs [11, 12]. Moreover, the rapid evolution of venom proteins generally occurs under environmental stress [13, 14], such as predation needs. Hence, the study of proteinaceous-venom coding genes is an excellent model system for adaptation and nature selection [15].

Main Content

Context

While snake venoms are dangerous to human health, they are also a potential gold mine of bioactive proteins that can be harnessed for drug discovery [16]. Also, snake genomics has huge potential for studying venom evolution and toxicology. Here, we assembled a highly contiguous genome of a male P. mucrosquamatus individual collected from Guilin, Guangxi, China, using single-tube long fragment read (stLFR) technology [17] and whole genome sequencing (WGS). The total size of the genome we generated is 1.53 Gb, including 41.18% repeat elements. This data provides new material for future research on the Protobothrops genome and the genetic basis of this snake venom.

Methods

Detailed stepwise protocols are gathered in a protocols.io collection, with the minor adaptations outlined below [18] (Figure 2).

Figure 2.

Figure 2.

Protocols.io collection of the standard protocols for sequencing snake genomes [18]. https://dx.doi.org/10.17504/protocols.io.4r3l27ez4g1y/v1

Sample collection and sequencing

A male P. mucrosquamatus individual was captured in Guilin, Guangxi, China. After collection and identification, the specimen was quickly frozen in −80 °C Drikold dry ice for storage and transport in order to preserve DNA and RNA molecules. Samples from the heart, stomach, liver, and kidney were utilized for RNA sequencing. A muscle sample was used for stLFR and WGS sequencing. DNA extraction, library construction and sequencing are outlined in the protocols.io protocols [18].

The Institutional Review Board of BGI (BGI-IRB E22017) approved sample collection, experiments, and research design in this study. Throughout this research, strict adherence to the guidelines set by BGI-IRB was ensured during all procedures.

Genome assembly, annotation and assessment

Supernova software (v2.1.1; RRID:SCR_016756) was employed to assemble the stLFR sequencing data. To address any gaps and eliminate redundancies in this assembly, the WGS data was subjected to gap filling and redundancy removal using GapCloser [19] (v1.12-r6; RRID:SCR_015026) and redundans (v0.14a) [20], respectively.

In order to identify known repeat elements in genome sequences, a combination of tools was utilized: Tandem Repeat Finder [21] (v. 4.09), LTR_Finder (RRID:SCR_015247) [22], RepeatModeler [23] (v1.0.8; RRID:SCR_015027), RepeatMasker [24] (v. 3.3.0; RRID:SCR_012954) and RepeatProteinMask (v. 3.3.0) [25]. For the prediction of protein-coding genes, multiple approaches were employed. De novo gene prediction was performed using Augustus (v3.0.3; RRID:SCR_008417) [26]. The RNA-seq data was filtered with Trimmomatic (v0.30; RRID:SCR_011848) [27]. Then, the transcript assembly was performed using Trinity (v2.13.2; RRID:SCR_013048) [28] and based on clean RNA-seq data. Alignment of transcripts against the genome to obtain gene structures was performed using Program to Assemble Spliced Alignments (or PASA) (v2.0.2; RRID:SCR_014656) [29]. Homology-based prediction involved mapping protein sequences from the UniProt database (release-2020_05) of Pseudonaja textilis, Thamnophis elegans and Notechis scutatus to the genome using the Blastall (v2.2.26) [30] with an E-value cut-off of 1 × 10−5. Gene models were predicted by analyzing the alignment results with GeneWise [31] (v2.4.1; RRID:SCR_015054). Integration of RNA-seq, homology, and de novo predicted genes to generate the final gene set was achieved using the MAKER pipeline (v3.01.03; RRID:SCR_005309) [32].

To annotate the function of genes of P. mucrosquamatus, a comprehensive analysis was conducted. BLAST searches were executed against multiple databases, including SwissProt, TrEMBL (RRID:SCR_004426), and Kyoto Encyclopedia of Genes and Genomes (KEGG; RRID:SCR_012773), with an E-value cut-off of 1 × 10−5. To predict motifs and domains, InterProScan (v5.52-86.0; RRID:SCR_005829) [27] as well as gene ontology (GO; RRID:SCR_002811) were employed. The results of this analysis further enriched our understanding of the genes’ roles and their involvement in biological processes.

The completeness of the genome was evaluated using sets of Benchmarking Universal Single-Copy Orthologs (BUSCO; v5.2.2; RRID:SCR_015008) with genome mode and lineage data from vertebrata_odb10 [33]. To reconstruct the phylogenetic tree, we used OrthoFinder (v2.3.7; RRID:SCR_017118) [34] to search for single-copy orthologs among the protein sequences of Anolis carolinensis (GCA_000090745.2), Chelonia mydas (GCA_015237465.2), Danio rerio (GCA_000002035.4), Deinagkistrodon acutus [35], Gallus gallus (GCA_016699485.1), Homo sapiens (GCA_000001405.29), Mus musculus (GCA_000001635.9), Ophiophagus hannah (GCA_000516915.1), Python bivittatus (GCA_000186305.2), Xenopus tropicalis (GCA_000004195.4) and Alligator mississippiensis (GCA_000281125.4).

Results

In this snake genomics study, 224.27 Gb linked-reads data was obtained after stLFR sequencing, and 96.93 Gb short reads data was obtained after WGS sequencing, for a total of 321.20 Gb (Table 1).

Table 1.

Summary statistics of P. mucrosquamatus sequenced reads.

Base number GC content (%) Q20 (%) Q30 (%)
WGS fq1 52,036,970,400 40.30 97.58 92.48
fq2 52,036,970,400 40.23 97.98 92.71
stLFR fq1 104,698,910,600 38.89 96.9 90.75
fq2 136,108,583,780 41.72 97.79 91.85

We produced a high-continuity P. mucrosquamatus genome assembly, with 1.53 Gb total genome size, 39.86% GC content and 362.40 kb scaffold N50 length (Table 2). The P. mucrosquamatus genome assembly, whose maximal scaffold length reaches 5.31 M, has 149,173 scaffolds over 500 bp, with 1.51 Gb total length, occupying 98.82% of the entire genome. We foresee that this resource will provide new perspectives for the study of viper genomics.

Table 2.

Summary of the features of the P. mucrosquamatus genome.

Statistical level Original Scaffold > (500) bp
Scaffold Contig Contig > (500) Scaffold Contig
Total number (>) 203,555 287,462 192,124 149,173 232,200
Total length of (bp) 153,064,8812 1,481,196,605 1,457,896,424 1,512,499,815 1,463,075,630
Average length (bp) 7,519.58 5,152.67 7,588.31 10,139.23 6,300.93
N50 Length (bp) 380,005 36,547 37,585 390,274 37,334
N90 Length (bp) 2,960 2,304 2,773 3,453 2,667
Maximum length (bp) 5,566,463 488,153 488,153 5,566,463 488,153
GC content (%) 39.86 39.86 39.79 39.8 39.8

We identified 41.18% repetitive elements in our P. mucrosquamatus genome. Long interspersed nuclear elements (LINEs) constituted the largest proportion of this assembly at 32.33%, equivalent to 471.99 Mb. This figure is very similar to the repetitive element content in a previously sequenced Thamnophis elegans genome (42.02%) (accession No. PRJNA561996) and Crotalus tigris genomes (42.31%) [36], indicating consistency in the observed values. The other dominant examples of transposable elements (TEs), LTRs, DNA transposons and SINE were 11.50%, 4.94% and 0.80%, respectively (Figure 3, Tables 3 and 4).

Figure 3.

Figure 3.

Distribution of TEs in our P. mucrosquamatus genome. The TEs include DNA transposons (DNA) and RNA transposons (i.e., DNAs, LINEs, LTRs and SINEs). (a) Distribution of de novo sequence divergence rates. (b) Distribution of known sequence divergence rates.

Table 3.

Statistics for the repetitive sequences identified in our P. mucrosquamatus genome.

Type Repeat size % of genome
Trf 48,630,912 3.177144
Repeatmasker 248,960,159 16.265008
Proteinmask 178,699,911 11.674782
De novo 591,205,406 38.624497
Total 630,311,866 41.179391

Table 4.

Summary of the TEs in our P. mucrosquamatus genome.

Type Repbase TEs TE proteins De novo Combined TEs
Length (bp) % in genome Length (bp) % in genome Length (bp) % in genome Length (bp) % in genome
DNA 54,802,686 3.580357 2,721,607 0.177807 23,812,202 1.555693 75,566,775 4.936911
LINE 173,499,745 11.335046 145,892,994 9.531448 446,008,208 29.138507 494,919,112 32.333943
SINE 11,128,833 0.727066 0 0 1,414,004 0.092379 12,299,674 0.80356
LTR 27,382,417 1.788942 30,199,813 1.973007 165,177,572 10.791344 175,979,322 11.497041
Other 95,860 0.006263 0 0 0 0 95,860 0.006263
Total 248,960,159 16.265008 178,699,911 11.674782 588,493,585 38.447329 618,611,286 40.414972

Using homology-based, de-novo and RNA-sequencing annotation methods, 24,799 protein-coding genes were identified in our P. mucrosquamatus genome assembly. The average gene of a P. mucrosquamatus is 1.53 bp long and contains 8.96 exons. Additionally, 387 miRNAs, 319 tRNAs and 289 snRNAs were predicted in our P. mucrosquamatus genome (Table 5).

Table 5.

Statistics for the miRNA, tRNA, rRNA and snRNA predicted in our P. mucrosquamatus genome.

Type Copy (w) Average length (bp) Total length (bp) % of genome
miRNA 387 115.3540052 44,642 0.002917
tRNA 319 76.38244514 24,366 0.001592
rRNA rRNA 75 111.8266667 8,387 0.000548
18S 18 141.5555556 2,548 0.000166
28S 52 104.3269231 5,425 0.000354
snRNA snRNA 289 115.6955017 33,436 0.002184
CD-box 110 90.2 9,922 0.000648
HACA-box 66 144.7575758 9,554 0.000624
splicing 98 112.1734694 10,993 0.000718

Through comparisons with public datasets, including InterPro [37], KEGG [38], SwissProt [39], TrEMBL [39] and GO terms, 24,296 expanded gene families were identified, and 97.97% of genes could be annotated based on their function (Table 6).

Table 6.

Results of gene functional annotation.

Values Total Swissprot-Annotated KEGG-Annotated TrEMBL-Annotated Interpro-Annotated GO-Annotated Overall
Number 24,799 21,141 21,203 23,741 23,579 15,322 24,296
Percentage 100% 85.25% 85.50% 95.73% 95.08% 61.78% 97.97%

According to our KEGG enrichment analysis, Environmental Information Processing, Organismal Systems and Metabolism pathways comprise a significant proportion of these pathways. In particular, the Signal Transduction pathways take up the largest proportion. Genes associated with the Immune (2,445) and Endocrine systems (2,033) accounted for the largest number of Organismal System pathways (Figure 4a). Based on our GO analysis, 7,900 genes relate to binding and 7,740 genes to cellular processes (Figure 4b).

Figure 4.

Figure 4.

Gene annotation information of P. mucrosqamatus. (a) KEGG enrichment of P. mucrosquamatus. (b) GO enrichment of P. mucrosquamatus. (c) Venn diagram of InterPro, KEGG and Swissport annotation results.

Data validation and quality control

BUSCO v5.2.2 was used to evaluate the completeness and quality of our assembly [40]. Our BUSCO analysis results indicate that this genome assembly has up to 83.6% completeness using the vertebrata_odb10 database (Figure 5).

Figure 5.

Figure 5.

BUSCO assessment result of our P. mucrosquamatus genome.

To check the quality of our assembly, we constructed a phylogenetic tree using protein sequences from NCBI and CNGB for seven other kinds of amphibians and reptiles (Anolis carolinensis, Chelonia mydas, Deinagkistrodon acutus, Ophiophagus hannah, Python bivittatus, Xenopus tropicalis and Alligator mississippiensis), as well as Gallus gallus, Homo sapiens, Mus musculus, Danio rerio. The relationship among all these species reflected by the phylogenetic tree aligns with previous research, demonstrating that our data can screen related species (Figure 6). Finally, a total of 1,177 single-copy loci were found.

Figure 6.

Figure 6.

Phylogenetic tree reconstructed using single-copy genes from nuclear genomes. The numbers on the branches of the phylogenetic tree represent the branch length obtained in OrthoFinder.

Reuse potential

This genomic data will provide new resources for further studying viper biology and evolution alongside the genetic basis of viper snake venom.

Acknowledgement

Anhui Normal University collected the samples.

Funding Statement

Our project was supported by the China National GeneBank (or CNGB) and the Guangdong Provincial Key Laboratory of Genome Read and Write (grant no. 2017B030301011). This work was also supported by BGI-Shenzhen.

Data availability

The data that support the findings of this study have been deposited into the CNGB Sequence Archive (or CNSA) [41] of China National GeneBank DataBase (or CNGBdb) [42] with the accession number CNP0004048. Raw reads are available in the Short Read Archive under the BioProject ID PRJNA943598, and additional data is available in the GigaDB repository [43].

Editor’s note

This paper is part of a series of Data Release papers presenting the reference genomes of different snake species [44].

Abbreviations

BGI-IRB, Institutional Review Board of BGI; BUSCO, Benchmarking Universal Single-Copy Orthologs; GO, gene ontology; KEGG, Kyoto Encyclopedia of Genes and Genomes; LINE, long interspersed nuclear element; LTR, long terminal repeat; SINE, short interspersed nuclear element; stLFR, single-tube long fragment read; TE, transposable element; WGS, whole genome sequencing.

Declarations

Ethics approval and consent to participate

The authors declare that ethical approval was not required for this type of research.

Competing interests

The authors declare no conflict of financial interests.

Authors’ contributions

H Liu designed and initiated the project. H Lu, YZ and MS performed the DNA extraction and the library construction. XN and SW performed the data analysis and wrote the manuscript. All authors read and approved the final manuscript.

Funding

Our project was supported by the China National GeneBank (or CNGB) and the Guangdong Provincial Key Laboratory of Genome Read and Write (grant no. 2017B030301011). This work was also supported by BGI-Shenzhen.

References

  • 1.Liu C-C, Wu C-J, Hsiao Y-C et al. Snake venom proteome of Protobothrops mucrosquamatus in Taiwan: delaying venom-induced lethality in a rodent model by inhibition of phospholipase A2 activity with varespladib. J. Proteomics, 2021; 234: 104084. doi: 10.1016/j.jprot.2020.104084. [DOI] [PubMed] [Google Scholar]
  • 2.Valenta J, Stach Z, Otahal M. . Protobothrops mangshanensis bite: first clinical report of envenoming and its treatment. Biomed. Pap. Med. Fac. Univ. Palacký Olomouc Czech. Repub., 2012; 156(2): 183–185. doi: 10.5507/bp.2012.021. [DOI] [PubMed] [Google Scholar]
  • 3.He Y, Sun W-Q, Li J-L et al. Yuanmaotoufu Shedu Duxing Ji Huifu Guilv [Toxicity and recovery of Protobothrops mucrosquamatus Venoms]. Xiandai Shengwu Yixue Jinzhan, 2016; 16(9): 1623–1626. [Google Scholar]
  • 4.Mao Y-C, Liu P-Y, Chiang L-C et al. Clinical manifestations and treatments of Protobothrops mucrosquamatus bite and associated factors for wound necrosis and subsequent debridement and finger or toe amputation surgery. Clin. Toxicol., 2021; 59(1): 28–37. doi: 10.1080/15563650.2020.1762892. [DOI] [PubMed] [Google Scholar]
  • 5.Zeng F, Chen C, Chen X et al. Small incisions combined with negative-pressure wound therapy for treatment of Protobothrops mucrosquamatus bite envenomation: a new treatment strategy. Med. Sci. Monit., 2019; 25: 4495–4502. doi: 10.12659/MSM.913579. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Lin C-C, Chen Y-C, Goh ZNL et al. Wound infections of snakebites from the venomous Protobothrops mucrosquamatus and Viridovipera stejnegeri in Taiwan: bacteriology, antibiotic susceptibility, and predicting the need for antibiotics—A BITE study. Toxins, 2020; 12(9): 575. doi: 10.3390/toxins12090575. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Offor BC, Muller B, Piater LA. . A review of the proteomic profiling of African Viperidae and elapidae snake venoms and their antivenom neutralisation. Toxins, 2022; 14(11): 723. doi: 10.3390/toxins14110723. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Whitaker R, Whitaker S. . Venom, antivenom production and the medically important snakes of India. Curr. Sci., 2012; 103: 635–643. http://www.jstor.org/stable/24088795. [Google Scholar]
  • 9.Chippaux J-P, Goyffon M. . Venoms, antivenoms and immunotherapy. Toxicon, 1998; 36(6): 823–846. doi: 10.1016/S0041-0101(97)00160-8. [DOI] [PubMed] [Google Scholar]
  • 10.Vogel G, Pan H, Chettri B et al. A new species of the genus Protobothrops (Squamata: Viperidae) from Southern Tibet, China and Sikkim, India. Asian Herpetol. Res., 2013; 4(2): 119–115. doi: 10.3724/SP.J.1245.2013.00109. [DOI] [Google Scholar]
  • 11.Xin H, Tao P, Demin H et al. A new species of the genus Protobothrops (Squamata: Viperidae: Crotalinae) from the Dabie Mountains, Anhui, China. Asian Herpetol. Res., 2012; 3(3): 213–218. doi: 10.3724/SP.J.1245.2012.00213. [DOI] [Google Scholar]
  • 12.Tan CH. . Snake venomics: fundamentals, recent updates, and a look to the next decade. Toxins, 2022; 14(4): 247. doi: 10.3390/toxins14040247. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Aird SD, Aggarwal S, Villar-Briones A et al. Snake venoms are integrated systems, but abundant venom proteins evolve more rapidly. BMC Genom., 2015; 16: 647. doi: 10.1186/s12864-015-1832-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Casewell NR, Wüster W, Vonk FJ et al. Complex cocktails: the evolutionary novelty of venoms. Trends Ecol. Evol., 2013; 28(4): 219–229. doi: 10.1016/j.tree.2012.10.020. [DOI] [PubMed] [Google Scholar]
  • 15.Aird SD, Arora J, Barua A et al. Population genomic analysis of a pitviper reveals microevolutionary forces underlying venom chemistry. Genome Biol. Evol., 2017; 9(10): 2640–2649. doi: 10.1093/gbe/evx199. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Rao W, Kalogeropoulos K, Allentoft ME et al. The rise of genomics in snake venom research: recent advances and future perspectives. GigaScience, 2022; 11: giac024. doi: 10.1093/gigascience/giac024. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Wang O, Chin R, Cheng X et al. Efficient and unique cobarcoding of second-generation sequencing reads from long DNA molecules enabling cost-effective and accurate sequencing, haplotyping, and de novo assembly. Genome Res., 2019; 29(5): 798–808. doi: 10.1101/gr.245126.118. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Liu B, Cui L, Deng Z et al. The annotation pipeline for the genome of a snake. protocols.io. 2023; 10.17504/protocols.io.4r3l27ez4g1y/v1. [DOI]
  • 19.Luo R, Liu B, Xie Y et al. SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. GigaScience, 2012; 1(1): 18. doi: 10.1186/2047-217X-1-18. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Pryszcz LP, Gabaldón T. . Redundans: an assembly pipeline for highly heterozygous genomes. Nucleic Acids Res., 2016; 44(12): e113. doi: 10.1093/nar/gkw294. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Benson G. . Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res., 1999; 27(2): 573–580. doi: 10.1093/nar/27.2.573. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Xu Z, Wang H. . LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic Acids Res., 2007; 35(suppl_2): W265–W268. doi: 10.1093/nar/gkm286. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Smit AFA, Hubley R, Green P. . RepeatModeler Open-1.0. Seattle, USA: Institute for Systems Biology, 2008–2015; http://www.repeatmasker.org, Last Accessed 01 May 2023. [Google Scholar]
  • 24.Tarailo-Graovac M, Chen N. . Using RepeatMasker to identify repetitive elements in genomic sequences. Curr. Protoc. Bioinformatics, 2009; 25: 4.10.1–4.10.14. doi: 10.1002/0471250953.bi0410s25. [DOI] [PubMed] [Google Scholar]
  • 25.Tempel S. . Using and understanding RepeatMasker. In: Mobile Genetic Elements: Protocols and Genomic Applications. Springer, 2012; pp. 29–51, doi: 10.1007/978-1-61779-603-6_2. [DOI] [Google Scholar]
  • 26.Stanke M, Steinkamp R, Waack S et al. AUGUSTUS: a web server for gene finding in eukaryotes. Nucleic Acids Res., 2004; 32: W309–W312. doi: 10.1093/nar/gkh379. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Bolger AM, Lohse M, Usadel B. . Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics, 2014; 30(15): 2114–2120. doi: 10.1093/bioinformatics/btu170. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Haas BJ, Papanicolaou A, Yassour M et al. De novo transcript sequence reconstruction from RNA-seq using the trinity platform for reference generation and analysis. Nat. Protoc., 2013; 8(8): 1494–1512. doi: 10.1038/nprot.2013.084. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Haas BJ, Salzberg SL, Zhu W et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome Biol., 2008; 9: R7. doi: 10.1186/gb-2008-9-1-r7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Mount DW. . Using the Basic Local Alignment Search Tool (BLAST). Cold Spring Harb. Protoc., 2007; 2007: pdb.top17. doi: 10.1101/pdb.top17. [DOI] [PubMed] [Google Scholar]
  • 31.Birney E, Clamp M, Durbin R. . GeneWise and genomewise. Genome Res., 2004; 14(5): 988–995. doi: 10.1101/gr.1865504. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Campbell MS, Holt C, Moore B et al. Genome annotation and curation using MAKER and MAKER-P. Curr. Protoc. Bioinformatics, 2014; 48: 4.11.1–4.11.39. doi: 10.1002/0471250953.bi0411s48. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Wick RR, Holt KE. . Benchmarking of long-read assemblers for prokaryote whole genome sequencing. F1000Research, 2021; 8: 2138. doi: 10.12688/f1000research.21782.4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Emms DM, Kelly S. . OrthoFinder: solving fundamental biases in whole genome comparisons dramatically improves orthogroup inference accuracy. Genome Biol., 2015; 16(1): 157. doi: 10.1186/s13059-015-0721-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Yin W, Wang Z, Li Q et al. Supporting data for “Evolutionary trajectories of snake genes and genomes revealed by comparative analyses of five-pacer viper”. GigaScience Database, 2016; 10.5524/100196. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Margres MJ, Rautsaw RM, Strickland JL et al. The Tiger Rattlesnake genome reveals a complex genotype underlying a simple venom phenotype. Proc. Natl. Acad. Sci. USA, 2021; 118(4): e2014634118. doi: 10.1073/pnas.2014634118. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Jones P, Binns D, Chang H-Y et al. InterProScan 5: genome-scale protein function classification. Bioinformatics, 2014; 30(9): 1236–1240. doi: 10.1093/bioinformatics/btu031. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Kanehisa M. . KEGG database. Novartis Found. Symp., 2002; 247: 91–101; discussion 101–3, 119–28, 244–52. [PubMed] [Google Scholar]
  • 39.Bairoch A, Apweiler R. . The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res., 2000; 28(1): 45–48. doi: 10.1093/nar/28.1.45. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Simão FA, Waterhouse RM, Ioannidis P et al. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics, 2015; 31(19): 3210–3212. doi: 10.1093/bioinformatics/btv351. [DOI] [PubMed] [Google Scholar]
  • 41.Guo X, Chen F, Gao F et al. CNSA: a data repository for archiving omics data. Database, 2020; 2020: baaa055. doi: 10.1093/database/baaa055. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Chen FZ, You LJ, Yang F et al. CNGBdb: China National GeneBank DataBase. Yi Chuan, 2020; 42(8): 799–809. doi: 10.16288/j.yczz.20-080. [DOI] [PubMed] [Google Scholar]
  • 43.Niu X, Zhou Y, Lu H et al. Supporting data for “The genome assembly and annotation of the Brown-Spotted Pit viper Protobothrops mucrosquamatus”. GigaScience Database, 2023; 10.5524/102470. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Snake Genomes. GigaByte. 2023; 10.46471/GIGABYTE_SERIES_0004. [DOI]
GigaByte. 2023 Nov 7;2023:gigabyte97.

Article Submission

Xiaotong Niu
GigaByte.

Assign Handling Editor

Editor: Scott Edmunds
GigaByte.

Editor Assess MS

Editor: Hongfang Zhang
GigaByte.

Curator Assess MS

Editor: Chris Armit
GigaByte.

Review MS

Editor: Yasuhiro Go

Reviewer name and names of any other individual's who aided in reviewer Yasuhiro Go
Do you understand and agree to our policy of having open and named reviews, and having your review included with the published papers. (If no, please inform the editor that you cannot review this manuscript.) Yes
Is the language of sufficient quality? Yes
Please add additional comments on language quality to clarify if needed
Are all data available and do they match the descriptions in the paper? Yes
Additional Comments
Are the data and metadata consistent with relevant minimum information or reporting standards? See GigaDB checklists for examples <a href="http://gigadb.org/site/guide" target="_blank">http://gigadb.org/site/guide</a> Yes
Additional Comments
Is the data acquisition clear, complete and methodologically sound? Yes
Additional Comments
Is there sufficient detail in the methods and data-processing steps to allow reproduction? Yes
Additional Comments
Is there sufficient data validation and statistical analyses of data quality? Yes
Additional Comments
Is the validation suitable for this type of data? Yes
Additional Comments
Is there sufficient information for others to reuse this dataset or integrate it with other data? Yes
Additional Comments
Any Additional Overall Comments to the Author 1. The value of repeat element content is 41.18% in the Abstract, but the Main Content value is 38.62%, which is inconsistent with the Abstract value. I would like to see the values be unified into one (Total value?). 2. Figure 1 should show not only a picture of the snake but also its distribution area (habitat). 3. The first sentence of the Result states "224.27 Gb long reads data," but single-tube long fragment read (stLFR) is not a true long read. The term "linked-read" is better. 4. Tables 3 and 4 do not have specific descriptions of "De novo," so please provide more details. 5. The authors use BUSCO to evaluate gene completeness, but I recommend trying compleasm (https://github.com/huangnengCSU/compleasm), a recently improved version of BUSCO. 6. The animals in the parentheses after "For the purpose of checking the quality of our assembly, six other kinds of amphibians and reptiles" in the "Data validation and quality control" section also use animals other than amphibians and reptiles, so please correct the sentence appropriately. 7. Figure 5C needs to be explained in the text. 8. There is no explanation of the meaning of the numbers in the branches of the phylogenetic tree in Figure 7. There needs to be an explanation of how they were obtained.
Recommendation Minor Revision
GigaByte.

Review MS

Editor: Chaochao Yan

Reviewer name and names of any other individual's who aided in reviewer Yan Chaochao
Do you understand and agree to our policy of having open and named reviews, and having your review included with the published papers. (If no, please inform the editor that you cannot review this manuscript.) Yes
Is the language of sufficient quality? No
Please add additional comments on language quality to clarify if needed A complete English revision is required
Are all data available and do they match the descriptions in the paper? Yes
Additional Comments I did not find description of method for RNA-Seq in manuscript.
Are the data and metadata consistent with relevant minimum information or reporting standards? See GigaDB checklists for examples <a href="http://gigadb.org/site/guide" target="_blank">http://gigadb.org/site/guide</a> Yes
Additional Comments
Is the data acquisition clear, complete and methodologically sound? Yes
Additional Comments
Is there sufficient detail in the methods and data-processing steps to allow reproduction? Yes
Additional Comments
Is there sufficient data validation and statistical analyses of data quality? Yes
Additional Comments
Is the validation suitable for this type of data? Yes
Additional Comments
Is there sufficient information for others to reuse this dataset or integrate it with other data? Yes
Additional Comments Careful language modifications are needed, particularly in the abstract and introduction sections. Here are some detailed suggestions. 1. "In recent years, P. mucrosquamatus has generated a number of poisonous snake bite cases in southeastern China.". This does not serve as an optimal starting point for the research. To enhance its impact, consider incorporating more specific statistical data, such as the annual number of people bitten or killed by the snake species. 2. "Genomics research can provide much insight in understanding toxin-production mechanisms and natural selection in vipers. ". The term 'toxin-production mechanisms' suggests an intricate process that the genome typically does not reveal directly; instead, it usually provides information about venom components rather than their underlying mechanisms. Additionally, the scope of the following study does not encompass toxin genes and natural selection. The author may consider prioritizing data publication or incorporating relevant analyses to strengthen the research. 3. " Here, we collected a male P.... 41.18% repeat element content." . This sentence is excessively lengthy, requiring revision for improved readability. 4. "97.97% genes could be annotated based on function". The statement 'annotated based on function' is not suitable in this context. 5. "venom‐conducting fangs and cheek fossa", 'cheek fossa' should be 'loreal pit'. 6. "Compared with other terrestrial vipers ...Gloydius blomhoffii and Bungarus multicinctus" need references. 7. " making the study of proteinaceous-venoms coding genes an excellent model system for the study of adaptation and nature selection". model system refer to what? study or venoms? 8. Figure 2: Including a screenshot here is unnecessary; providing the reference should suffice. 9. "After collection and identification, the specimen was quickly frozen in -80°C drikold dry ice during storage and transport in order to maintain high quality for further use. " I am somewhat confused about the tissue used here. Additionally, the author should provide information about the Animal Ethics Committee agreement or approval. 10. "the RNAseq data underwent filtration with Trimmomatic“, However, there is a lack of information on how the RNAseq was performed. Please include details about the RNAseq methodology. 11. "We produced a high-continuity P. mucrosquamatus genome assembly, with 1.53Gb total genome size,". A scaffold N50 below 500k indicates a genome with lower continuity. Further efforts are needed to improve the genome's quality. Additionally, consider providing information on the genome size calculated by the kmer method. 12. "For the purpose of checking the quality of our assembly, " I would recommend comparing quality indicators, such as N50 and Gene Number, with those of other genomes.
Any Additional Overall Comments to the Author This study presents the genome of the venomous snake Protobothrops mucrosquamatus, utilizing stLFR and WGS data. The research further assesses the genome's quality and performs repeat and gene annotation. As this journal prioritizes data publication, an evaluation of innovativeness is not conducted. However, the current genome quality does not meet modern standards, further efforts are needed to improve the genome's quality. Moreover, It would be beneficial to conduct an analysis of the toxin genes detected in the genome.
Recommendation Accept
GigaByte.

Editor Decision

Editor: Hongfang Zhang
GigaByte. 2023 Nov 7;2023:gigabyte97.

Minor Revision

Xiaotong Niu
GigaByte.

Assess Revision

Editor: Hongfang Zhang
GigaByte.

Final Data Preparation

Editor: Mary-Ann Tuli
GigaByte.

Editor Decision

Editor: Hongfang Zhang
GigaByte.

Accept

Editor: Scott Edmunds

Editor’s Assessment The Brown-Spotted Pit viper Protobothrops mucrosquamatus, also known as the Chinese habu, is a widespread and highly venomous snake distributed from from NE India to Eastern China. To help better understand the evolution of pit vipers, a 1.53 Gb reference genome was sequenced, assembled and described in this work. During review some inconsistencies the metrics were fixed. This data can be combined with already published and upcoming snake genome data to construct the evolutionary history of snakes and other reptiles as well as the genetic basis of snake venom.
GigaByte.

Export to Production

Editor: Scott Edmunds

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Data Availability Statement

    The data that support the findings of this study have been deposited into the CNGB Sequence Archive (or CNSA) [41] of China National GeneBank DataBase (or CNGBdb) [42] with the accession number CNP0004048. Raw reads are available in the Short Read Archive under the BioProject ID PRJNA943598, and additional data is available in the GigaDB repository [43].


    Articles from GigaByte are provided here courtesy of Gigascience Press

    RESOURCES