Version Changes
Revised. Amendments from Version 1
In this revision, we have addressed all the issues raised by the referee. In additions, we have made few grammatical corrections. We revised the manuscript as a result of reanalysis of the data. We have also improved clarity in the Methods regarding assembly of the genome. Further, as per the suggestions of the referee as well as requirement, we incorporated 6 more references.
Abstract
Oryza coarctata plant, collected from Sundarban delta of West Bengal, India, has been used in the present study to generate draft genome sequences, employing the hybrid genome assembly with Illumina reads and third generation Oxford Nanopore sequencing technology. We report for the first time the draft genome with the coverage of 85.71 % and deposited the raw data in NCBI SRA, with BioProject ID PRJNA396417.
Keywords: Abiotic stress, Genome assembly, Halophyte, Nanopore, NGS, Salt stress, Wild Oryza, Whole genome sequencing
Introduction
Soil salinity is a major abiotic stress of rice cultivation globally ( Molla et al., 2015), and rice cultivation areas under soil salinity stress are increasing gradually. Genetic potential for salt tolerance of rice that exists among the natural population has been largely exploited, and alternative useful alleles may further enhance salinity tolerance. Wild species are a potential source of many useful genes and QTLs that may not be present in the primary gene pool of the domesticated species.
Oryza coarctata, known as Asian wild rice, grows naturally in the coastal region of South-East Asian countries. It flowers and set seeds under as high as 40 E.Ce dS m -1 saline soil ( Bal & Dutt, 1986). It is the only species in the genus Oryza that is halophyte in nature. However, with the exception of one transcriptomic ( Garg et al., 2014) and one miRNA ( Mondal et al., 2015) experiment, no large scale generation of any other genomic resource is available for this important species, although several pinitol biosynthesis pathway genes have been cloned to study the functional genomics ( Sengupta & Majumder, 2009).
Methods
The plant was collected from its native place, Sundarban delta of West Bengal, India (21°.36'N and 88°.15' E) and established at our institute Net house through clonal propagation. To determine the genome size, 20 mg of young leaf tissue from Net house grown plants was chopped into small pieces and stained with RNase containing propidium iodide (50 μg/ml) (BD Science, India) as per the protocol of Dolezel et al. (2007). The samples were filtered through a 40-μM mesh sieve (Corning, USA), before analysis in (CFM) BD FACS Calibur (BD Biosciences, San Jose, CA, USA). Pisium sativum leaf was used as standard for calculating the genome size. Further, high-quality genomic DNA from 100 mg young leaf of a single plant was extracted using CTAB method ( Ganie et al., 2016) for the preparation of various genomic DNA libraries. We used standard Illumina HiSeq 4000 platform (San Diego, CA, USA) to construct 151-bp paired-end libraries and four mate-pair libraries of four different sizes (average of 2, 4, 6 and 8 kb size). In addition, we also used third generation sequencing (Oxford Nanopore) technology for better assembly. Sequencing was performed on MinION Mk1b (Oxford Nanopore Technologies, Oxford, UK) using SpotON flow cell (R9.4) in a 48h sequencing protocol on MinKNOW 1.4.32. Base calling was performed using Albacore. Base called reads were processed using poRe version 0.24 ( Watson et al., 2015) and poretools version 0.6.0 ( Loman & Quinlan, 2014). Assembly of the high quality reads was performed using PLATANUS v1.2.4 ( Kajitani et al., 2014) and SSPACE v3.0 ( Boetzer et al., 2011) with default parameter. The simple sequence repeats (SSRs) of each scaffold were identified by MISA perl script ( Thiel et al., 2003). Gene model prediction was done by ab initio gene predictor AUGUSTUS 3.1 ( Stanke & Waak, 2003) and sequence evidence based annotation pipeline, MAKER v2.31.8 ( Campbell et al., 2014) with O. sativa ssp. japonica as reference gene model. The protein-coding genes were annotated by using BLAST based approach against a database containing functional plant genes downloaded from NCBI with Blast2GO (version 4.01) ( Conesa & Gotz, 2008). Genes with significant hits were assigned with GO (Gene Ontology) terms and EC (Enzyme Commission) numbers. InterProScan search and pathway analyses with KEGG database were also performed by using Blast2GO. Non-coding RNAs, such as miRNA, tRNA, rRNA, snoRNA, snRNA, were identified by adopting Infernal v1.1.2 ( Nawrocki & Eddy, 2013) using Rfam database (release 9.1) ( Nawrocki et al., 2015) and snoscan distribution. Transfer RNA was predicted using tRNAscan-SE v 1.23 ( Lowe & Eddy, 1997)
Discussion
The O. coarctata genome (2n=4X=48; KKLL; Sanchez et al., 2013) is self-pollinated, ( Sarkar et al., 1993) tetraploid plant with a genome size estimated by flow cytometry is found to be approximately 665Mb. The Illumina 4000 GA IIx sequencer pair-end generated 123.78 Gb data. Further four mate-pair libraries together generated 36.54 Gb and Nanopore generated 6.35 Gb sequence data. Hence, we achieved 250.66 X depth of the genome of O. coarctata. The final assembly generated 58362 numbers of scaffolds with a minimum length of 200 bp to maximum length of 7,855,609 bp and 1,858,627 bp N50 value, making a total scaffold length of 569994164 (around 570 Mb) assembled genome, resulting in 85.71% genome coverage. It has been calculated that data contain very small amount of non-ATGC character. Further, we also found that the 19.89% of the assembled genome is repetitive in nature. We also identified approximately 5512 different non-coding RNAs and around 230,968 SSRs. Gene ontology analysis identified several salt responsive genes.
Data availability
Raw sequence data are available at NCBI SRA under the BioProject ID: PRJNA396417.
Acknowledgements
TKM is grateful to Mr Sukdev Nath, who provided the planting material. TRS is thankful to the DST, Govt. of India for JC Bose National Fellowship. The authors are thankful to M/S Genotypic Technology Private Limited, Bengaluru, India for sequencing work and M/S BD Biosciences, India for Flow Cytometer work.
Funding Statement
The author(s) declared that no grants were involved in supporting this work.
[version 2; referees: 3 approved]
References
- Bal AR, Dutt SK: Mechanism of salt tolerance in wild rice ( Oryza coarctata Roxb). Plant Soil. 1986;92(3):399–404. 10.1007/BF02372487 [DOI] [Google Scholar]
- Boetzer M, Henkel CV, Jansen HJ, et al. : Scaffolding pre-assembled contigs using SSPACE. Bioinformatics. 2011;27(4):578–579. 10.1093/bioinformatics/btq683 [DOI] [PubMed] [Google Scholar]
- Campbell MS, Law M, Holt C, et al. : MAKER-P: a tool kit for the rapid creation, management, and quality control of plant genome annotations. Plant Physiol. 2014;164(2):513–524. 10.1104/pp.113.230144 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Conesa A, Götz S: Blast2GO: A comprehensive suite for functional analysis in plant genomics. Inter J Plant Genomics. 2008;619832. 10.1155/2008/619832 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dolezel J, Greilhuber J, Suda J: Estimation of nuclear DNA content in plants using flow cytometry. Nat Protoc. 2007;2(9):2233–2244. 10.1038/nprot.2007.310 [DOI] [PubMed] [Google Scholar]
- Ganie SA, Borgohain MJ, Kritika K, et al. : Assessment of genetic diversity of Saltol QTL among the rice ( Oryza sativa L.) genotypes. Physiol Mol Biol Plants. 2016;22(1):107–114. 10.1007/s12298-016-0342-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Garg R, Verma M, Agrawal S, et al. : Deep transcriptome sequencing of wild halophyte rice, Porteresia coarctata, provides novel insights into the salinity and submergence tolerance factors. DNA Res. 2014;21(1):69–84. 10.1093/dnares/dst042 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kajitani R, Toshimoto K, Noguchi H, et al. : Efficient de novo assembly of highly heterozygous genomes from whole-genome shotgun short read. Genome Res. 2014;24(8):1384–95. 10.1101/gr.170720.113 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Loman NJ, Quinlan AR: Poretools: a toolkit for analyzing nanopore sequence data. Bioinformatics. 2014;30(23):3399–3401. 10.1093/bioinformatics/btu555 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lowe TM, Eddy SR: tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 1997;25(5):955–964. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Molla KA, Debnath AB, Ganie SA, et al. : Identification and analysis of novel salt responsive candidate gene based SSRs (cgSSRs) from rice ( Oryza sativa L.). BMC Plant Biol. 2015;15:122. 10.1186/s12870-015-0498-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mondal TK, Ganie SA, Debnath AB: Identification of novel and conserved miRNAs from extreme halophyte, Oryza coarctata, a wild relative of rice. PLoS One. 2015;10(10):e0140675. 10.1371/journal.pone.0140675 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nawrocki EP, Eddy SR: Infernal 1.1: 100-fold faster RNA homology searches. Bioinformatics. 2013;29(22):2933–2935. 10.1093/bioinformatics/btt509 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nawrocki EP, Burge SW, Bateman A, et al. : Rfam 12.0: updates to the RNA families database. Nucleic Acids Res. 2015;43(Database issue):D130–7. 10.1093/nar/gku1063 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sanchez PL, Wing RA, Brar DS: The wild relative of rice: genomes and genomics. In: Q. Zhang and RA. Wing (eds.), Genetics and genomics of rice, plant genetics and genomics: crops and models.Springer Science Business Media New York.2013;9–25. 10.1007/978-1-4614-7903-1_2 [DOI] [Google Scholar]
- Sarkar RH, Samad MA, Seraj ZI, et al. : Pollen tube growth in crosses between Porteresia coarctata and Oryza sativa. Euphytica. 1993;69:129–134. 10.1007/BF00021736 [DOI] [Google Scholar]
- Sengupta S, Majumder AL: Insight into the salt tolerance factors of a wild halophytic rice, Porteresia coarctata: a physiological and proteomic approach. Planta. 2009;229(4):911–929. 10.1007/s00425-008-0878-y [DOI] [PubMed] [Google Scholar]
- Stanke M, Waack S: Gene prediction with a hidden Markov model and a new intron submodel. Bioinformatics. 2003;19(Suppl 2):ii215–225. 10.1093/bioinformatics/btg1080 [DOI] [PubMed] [Google Scholar]
- Thiel T, Michalek W, Varshney RK, et al. : Exploiting EST databases for the development and characterization of gene-derived SSR-markers in barley ( Hordeum vulgare L.). Theor Appl Genet. 2003;106(3):411–422. 10.1007/s00122-002-1031-0 [DOI] [PubMed] [Google Scholar]
- Watson M, Thomson M, Risse J, et al. : poRe: an R package for the visualization and analysis of nanopore sequencing data. Bioinformatics. 2015;31(1):114–115. 10.1093/bioinformatics/btu590 [DOI] [PMC free article] [PubMed] [Google Scholar]
