Abstract
The white-sands forests or varillales of the Peruvian Amazon are characterized by their distinct physical characteristics, patchy distribution, and endemism [1, 2]. Much research has been conducted on the specialized plant and animal communities that inhabit these ecosystems, yet their soil microbiomes have yet to be studied. Here we provide metagenomic 16S rDNA amplicon data of soil microbiomes from three types of varillales in Allpahuayo-Mishana National Reserve near Iquitos, Peru. Composite soil samples were collected from very low varillal, high-dry varillal, and high-wet varillal. Purified metagenomic DNA was used to prepare and sequence 16S rDNA metagenomic libraries on the Illumina MiqSeq platform. Raw paired-endsequences were analyzed using the Metagenomics RAST server (MG-RAST) and Parallel-Meta3 software and revealed the existence of a high percentage of undiscovered sequences, potentially indicating specialized bacterial communities in these forests. Also, were predicted several metabolic functions in this dataset. The raw sequence data in fastq format is available in the public repository Discover Mendeley Data (https://data.mendeley.com/datasets/syktzxcnp6/2). Also, is available at NCBI's Sequence Read Archive (SRA) with accession numbers SRX7891206 (very low varillal), SRX7891207 (high-dry varillal), and SRX7891208 (high-wet varillal).
Keywords: Metagenomics, 16S rRNA, Peruvian amazon, Soil microbiome, Tropical forest, Varillales, White-sand forests
Specifications Table
| Subject | Genetics, Genomics and Molecular Biology |
| Specific subject area | Soil Metagenomics |
| Type of data | Figures and 16S rDNA amplicon sequencing data |
| How data were acquired | Soil samples were collected from three varillal forest types of Allpahuayo-Mishana National Reserve. The metagenomic DNA was isolated using standardized protocols, and sequenced on Illumina Miseq platform |
| Data format | Raw data in fastq format were deposited in the public repository Discover Mendeley Data (https://data.mendeley.com/datasets/syktzxcnp6/2). Also, raw data is available in NCBI (https://www.ncbi.nlm.nih.gov/Traces/study/?acc=PRJNA611870&o=acc_s%3Aa) |
| Parameters for data collection | Metagenomic DNA isolated from soil samples were prepared by amplifying the V3–V4 region of the 16S rDNA gene paired-end sequenced on an Illumina MiSeq platform. |
| Description of data collection | Filtered sequence reads were analysed using bioinformatics tools (i.e., MG-RAST analysis, Parallel-Meta3 software) of the NGS data. |
| Data source location | Institution: Universidad Nacional de la Amazonia Peruana City/Town/Region: Iquitos/Maynas/Loreto Region Country: Peru Latitude and longitude (and GPS coordinates) for collected samples/data: 1. very low varillal (3°57′54.293"S, 73°26′10.110"W) 2. high-dry varillal (3°58′33.185"S, 73°25′37.165"W) 3. high-wet varillal (3°58′21.535"S, 73°25′54.369"W) |
| Data accessibility | Raw sequencing data are hosted in the public repository Discover Mendeley Data with direct URL to data: https://data.mendeley.com/datasets/syktzxcnp6/2 Also, raw sequencing data is available at NCBI under the BioProject No. PRJNA611870 (https://www.ncbi.nlm.nih.gov/Traces/study/?acc=PRJNA611870&o=acc_s%3Aa). SRA accession numbers: SRX7891206 (very low varillal): https://www.ncbi.nlm.nih.gov/sra/SRX7891206 SRX7891207 (high-dry varillal): https://www.ncbi.nlm.nih.gov/sra/SRX7891207 SRX7891208 (high-wet varillal): https://www.ncbi.nlm.nih.gov/sra/SRX7891208 |
Value of the data
-
•
This is the first metagenomic 16S rDNA amplicon data on bacterial profiling and its predicted metabolic functions of varillales in Allpahuayo-Mishana National Reserve of the Peruvian Amazon.
-
•
These data provide valuable information on the bacterial diversity and their metabolic functions of varillales in Allpahuayo-Mishana National Reserve of the Peruvian Amazon.
-
•
Metagenomic 16S rDNA amplicon data revealed a high percentage of undiscovered sequences which may indicate varillales contain specialized bacterial communities.
1. Data Description
The dataset contains raw paired-end sequencing data acquired through the V3–V4 region of the 16S rDNA gene of metagenomic DNA isolated from three type of white-sand forests or varillales. The raw sequencing data contain 297,864 sequences totalling 5,966,319 base pairs with an average length of 200 bp. The data files (reads in FASTQ format) were deposited at the public repository Discover Mendeley Data (https://data.mendeley.com/datasets/syktzxcnp6/2) and the NCBI database (https://www.ncbi.nlm.nih.gov/Traces/study/?acc=PRJNA611870&o=acc_s%3Aa) under the BioProject No. PRJNA611870, BioSample accession numbers: SAMN14351537, SAMN14351538, and SAMN14351539; and SRA accession numbers: SRX7891206 (very low varillal), SRX7891207 (high-dry varillal), and SRX7891208 (high-wet varillal). MG-RAST analysis showed that a considerable proportion of sequences were unknown (≈20%). Among the identified sequences, Bacteria (98.4%) and Archaea (0.26%) comprised the majority of the representative kingdoms. The dataset includes data at phylum levels, rarefaction curves and α-diversity results from the very low varillal (Fig. 1), high-dry varillal (Fig. 2), and high-wet varillal (Fig. 3). Additionally, in this dataset were predicted several metabolic functions, such as genetic information processing, carbohydrate metabolism, energy metabolism, etc. (Fig. 4).
Fig. 1.
Phylum levels, rarefaction curves and α-diversity of a very low varillal in Allpahuayo-Mishana National Reserve.
Fig. 2.
Phylum levels, rarefaction curves and α-diversity of a high-dry varillal in Allpahuayo-Mishana National Reserve.
Fig. 3.
Phylum levels, rarefaction curves and α-diversity of a high-wet varillal in Allpahuayo-Mishana National Reserve.
Fig. 4.
Predicted metabolic functions from three types of varillales in Allpahuayo-Mishana National Reserve.
2. Experimental Design, Materials, and Methods
2.1. Sample collection
In this dataset, soil samples were collected from varialles of Allpahuayo-Mishana National Reserve (Supplementary Fig. S1), which is located in a lowland tropical rain forest of the Peruvian Amazon between 130 and 153 m.a.s.l. Soil samples were obtained from three types of varialles as classified by [1]: 1) very low varillal (3°57′54.293``S, 73°26′10.110''W), which is characterized by a high density of small forest trees (height < 5 m) and an organic soil horizon thickness >11 cm ; 2) high-dry varillal (3°58′33.185``S, 73°25′37.165''W), which is characterized by larger forest trees (height >15 m) and an organic soil horizon thickness ≤11 cm; and 3) high-wet varillal (3°58′21.535``S, 73°25′54.369''W), which is also characterized by larger forest trees (height >15 m) but is differentiated by an organic soil horizon thickness >11 cm. Samples were obtained in October 2018 during the high water level season. In order to obtain a representative sample of soil bacterial diversity, thirteen soil cores (10 cm in diameter and 10 cm in depth) were collected in each varillal. The first soil core was designated the reference point for geographic coordinates. The remaining soil cores were sampled at five meter intervals in each cardinal direction with three soil cores obtained in each direction. All thirteen samples from a given reference point were pooled together, homogenized into a composite soil sample per varillal forest type and then passed through a 2 mm meshed sieve (Supplementary Fig. S2). The meshed soil samples were preserved temporarily at −20°C for further studies.
2.2. Metagenomic DNA isolation
Metagenomic DNA was isolated from composite soil samples following the protocol of Devi et al., [3]. In addition, to remove humic and fulvic acids contamination and exclude smaller fragments, partially purified metagenomic DNA was subjected to agarose gel (0.6%) electrophoresis for 30 min at 100 V and DNA fragments >20,000 bp were cut away using a sterile scalpel, placed in 2 mL microtubes, and purified with PureLink™ Quick Gel Extraction Kit (Invitrogen™, Catalog: K210012) following the manufacturer's instructions. Quality and quantity of the purified metagenomic DNA (size approximately to 10,000 bp) were verified by both electrophoretic and spectrophotometric analysis using a NanoDrop 2000 (Thermo Scientific).
2.3. Library preparation and next-generation DNA sequencing
Amplicon libraries were prepared following the 16S Metagenomics Sequencing Library preparation protocol (Part # 15044223 B). First, metagenomic DNA was amplified using primers designed to target 16S rDNA V3 and V4 regions [4]: 16S rDNA Amplicon PCR Forward Primer = 5′-TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCCTACGGGNGGCWGCAG-3′, 16S rDNA Amplicon PCR Reverse Primer = 5′-GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGACTACHVGGGTATCTAATCC-3′. These locus-specific primers were synthesized with overhanging Illumina adapter sequences. A second PCR was performed to incorporate multiplexing indices and Illumina sequencing adapters. Amplicon libraries were then purified using 0.8x AMPure XP beads (Beckman Coulter) and size verified on a Bioanalyzer 2100 (Agilent Technologies) using an Agilent High Sensitivity DNA Kit. Libraries were quantified using the Qubit™ dsDNA HS Assay Kit (Thermo Fisher Scientific), normalized, pooled, and paired-end sequenced using the MiSeq Illumina Platform.
2.4. Sequence analysis
Raw paired sequences were uploaded as FASTQ files and analysed using the MG-RAST server v 4.0.3 [5], [6], [7]. Reads obtained after quality control were subjected to taxonomic analysis by comparing with different ribosomal RNA databases using the open and closed-reference Operational Taxonomic Unit (OTU) picking strategy. The OTUs were classified using the Greengene 13_8 16S reference database [8]. Taxonomy assignments were made to each OTU using the RDP classifier [9] and Silvangs [10]. Finally, the sequence coverage by rarefaction analysis and the alpha diversity of species in each varillal was produced by the MG-RAST pipeline. The microbial metabolic pathways were determined based on the 16S rDNA gene data using Parallel-Meta3 software v 3.5.3 [11,12]
Acknowledgments
Authors received financial support from the Universidad Nacional de la Amazonia Peruana through the grants “Isolation of Environmental DNA from Soil and Water Samples of the Loreto Region” (grant ID: RR N° 0189-2018-UNAP) and “Bioprospection of Genes with Biotechnological Potential in Metagenomes of the Loreto Region” (grant ID: RR N° 0421-2019-UNAP). Additionally, we thank the Laboratorio de Biotecnología y Bioenergética, Universidad Científica del Perú (UCP) and the Pritzker Laboratory for Molecular Systematics and Evolution, Field Museum of Natural History for providing necessary infrastructure and equipment. Finally, we thank the Servicio Nacional de Áreas Naturales Protegidas por el Estado (SERNANP) and the High Direction of the Reserva Natural Allpahuayo-Mishana for providing the authorization to collect the soil samples.
Conflict of Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Footnotes
Supplementary material associated with this article can be found, in the online version, at doi:10.1016/j.dib.2020.105625.
Appendix. Supplementary materials
References
- 1.Adeney J.M., Christensen N.L., Vicentini A., Cohn‐Haft M. White-sand Ecosystems in Amazonia. Biotropica. 2016;48:7–23. doi: 10.1111/btp.12293. [DOI] [Google Scholar]
- 2.García-Villacorta R., Ahuite-Reátegui M., Olortegui-Zumaeta M. Clasificación de Bosques Sobre Arena Blanca de la Zona Reservada Allpahuayo-Mishana. Folia Amaz. 2003;14:17–33. doi: 10.24841/fa.v14i1.151. [DOI] [Google Scholar]
- 3.Devi S.G., Fathima A.A., Radha S., Arunraj R., Curtis W.R., Ramya M. A Rapid and Economical Method for Efficient DNA Extraction from Diverse Soils Suitable for Metagenomic Applications. PLoS ONE. 2015:10. doi: 10.1371/journal.pone.0132441. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Klindworth A., Pruesse E., Schweer T., Peplies J., Quast C., Horn M., Glöckner F.O. Evaluation of general 16S ribosomal RNA gene PCR primers for classical and next-generation sequencing-based diversity studies. Nucleic Acids Res. 2013;41:e1. doi: 10.1093/nar/gks808. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.MG-RAST, (n.d.). https://www.mg-rast.org/index.html?stay=1 (accessed March 14, 2020).
- 6.Meyer F., Bagchi S., Chaterji S., Gerlach W., Grama A., Harrison T., Paczian T., Trimble W.L., Wilke A. MG-RAST version 4-lessons learned from a decade of low-budget ultra-high-throughput metagenome analysis. Brief. Bioinform. 2019;20:1151–1159. doi: 10.1093/bib/bbx105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Meyer F., Paarmann D., D'Souza M., Olson R., Glass E., Kubal M., Paczian T., Rodriguez A., Stevens R., Wilke A., Wilkening J., Edwards R. The metagenomics RAST server – a public resource for the automatic phylogenetic and functional analysis of metagenomes. BMC Bioinformatics. 2008;9:386. doi: 10.1186/1471-2105-9-386. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.DeSantis T.Z., Hugenholtz P., Larsen N., Rojas M., Brodie E.L., Keller K., Huber T., Dalevi D., Hu P., Andersen G.L. Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB. Appl. Environ. Microbiol. 2006;72:5069–5072. doi: 10.1128/AEM.03006-05. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Pruesse E., Quast C., Knittel K., Fuchs B.M., Ludwig W., Peplies J., Glöckner F.O. SILVA: a comprehensive online resource for quality checked and aligned ribosomal RNA sequence data compatible with ARB. Nucleic Acids Res. 2007;35:7188–7196. doi: 10.1093/nar/gkm864. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Wang Q., Garrity G.M., Tiedje J.M., Cole J.R. Naive Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy. Appl. Environ. Microbiol. 2007;73:5261–5267. doi: 10.1128/AEM.00062-07. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Parallel-META, (n.d.). http://bioinfo.single-cell.cn/parallel-meta.html (accessed March 14, 2020).
- 12.Jing G., Sun Z., Wang H., Gong Y., Huang S., Ning K., Xu J., Su X. Parallel-META 3: Comprehensive taxonomical and functional analysis platform for efficient comparison of microbial communities. Sci. Rep. 2017;7:1–11. doi: 10.1038/srep40371. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.




