Abstract
Environmental DNA (eDNA) metabarcoding is becoming a core tool in ecology and conservation biology, and is being used in a growing number of education, biodiversity monitoring, and public outreach programs in which professional research scientists engage community partners in primary research. Results from eDNA analyses can engage and educate natural resource managers, students, community scientists, and naturalists, but without significant training in bioinformatics, it can be difficult for this diverse audience to interact with eDNA results. Here we present the R package ranacapa, at the core of which is a Shiny web app that helps perform exploratory biodiversity analyses and visualizations of eDNA results. The app requires a taxonomy-by-sample matrix and a simple metadata file with descriptive information about each sample. The app enables users to explore the data with interactive figures and presents results from simple community ecology analyses. We demonstrate the value of ranacapa to two groups of community partners engaging with eDNA metabarcoding results.
Keywords: environmental DNA, data visualization, citizen science, community science, shiny, metabarcoding, education, community ecology
Introduction
The targeted amplification and sequencing of DNA that living organisms shed into their physical environment, termed “environmental DNA (eDNA) metabarcoding,” is revolutionizing microbiology, ecology, and conservation research ( Deiner et al., 2017; Taberlet et al. 2012). Sequencing of eDNA extracted from field-collected soil, water, or sediment samples can yield insight into a range of questions, from profiling the composition of ancient plant and animal communities ( Pedersen et al., 2015), to monitoring populations of rare or endangered species ( Balasingham et al., 2018). As the cost of eDNA metabarcoding declines and sample collection techniques become more streamlined (e.g. Thomas et al. (2018)), professional research scientists are increasingly using eDNA metabarcoding as a platform to engage a diversity of community partners, including natural resource managers, undergraduate students, and citizen scientists in primary research. However, developing robust and impactful community science programs that engage community partners in all steps of the research process remains a challenge.
eDNA metabarcoding-based projects work well for programs that partner researchers with community scientists because non-experts can be quickly trained to collect samples in the field, and because eDNA metabarcoding is an exciting framework for research pertinent to disciplines such as medicine, agriculture, ecology, and geography ( Deiner et al., 2017). Community partners in such programs can have heterogeneous backgrounds, ranging from curious members of the public for whom collecting samples in the field is their first scientific research experience (e.g. University of California’s CALeDNA program), to professional natural resource managers who regularly collaborate with research scientists (e.g. Center for Ocean Solutions’ eDNA project). A key ingredient to promote sustained success of such programs is that community partners should be able to engage across multiple stages of the research project, not only in sample collection ( European Citizen Science Association, 2015; Pandya, 2012). This can be a challenge for community science programs because although it is relatively easy to train community partners to collect eDNA samples, it is far more challenging to train them to independently visualize and analyze results from these studies. Indeed, learning the bioinformatic tools necessary for managing the large, multidimensional datasets generated in these studies can be difficult for professional researchers ( Carey & Papin, 2018), let alone for the non-technical audience of some community science programs.
To address this challenge, we created the R package “ ranacapa”, at the core of which is a Shiny web app that can be used to visualize results from eDNA sequencing studies and perform simple community ecology analyses. ranacapa complements existing visualization platforms (e.g. Phinch ( Bik & Phinch Interactive, 2014), Phyloseq-Shiny ( McMurdie & Holmes, 2015), QIIME2 Viewer), because in addition to interactive visualizations, ranacapa includes brief explanations of several core analyses used in eDNA studies and includes links to additional educational resources. ranacapa works with community matrices generated via QIIME ( Caporaso et al., 2010) or the Anacapa sequence analysis pipeline, the latter being used extensively by the CALeDNA program.
Here, we describe the package and how it is used by two community science partnerships based at the University of California, Los Angeles (UCLA): first, a collaboration between eDNA researchers and resource managers at the National Park Service, and second, a partnership between community ecology researchers and an undergraduate microbiology course at UCLA. As we show in the Use cases, empowering community partners to interact with the data and perform simple but insightful community ecology analyses can help make these collaborations more enriching and valuable to both parties.
Implementation
At the core of ranacapa is a Shiny web app ( Chang et al., 2018), which is available at http://gauravsk.shinyapps.io/ranacapa or with ranacapa::runRanacapa(). The package also includes two categories of helper functions ( Table 1) that transform user-uploaded taxonomy and metadata tables into R objects that can be visualized and analyzed using the Phyloseq ( McMurdie & Holmes, 2013) and Vegan ( Oksanen et al., 2018) packages. ranacapa is available for installation from Github or CRAN:
devtools :: install_github ( "gauravsk/ranacapa" ) install.packages ( "ranacapa" )
Table 1. Functions included within the ranacapa package.
Name | Description |
---|---|
scrub_seqNum_column | Removes any "xxx_seq_number" columns from the input taxonomy file if present
(depends on which version of Anacapa was used to assign taxonomy) |
scrub_taxon_paths | Replaces empty cells in input taxonomy tables with “Unknown” |
validate_input_files | Verifies that the input taxonomy file and input mapping file meet specifications |
convert_biom_to_taxon_table | Converts a phyloseq-imported biom table into an Anacapa-formatted taxonomy
table |
group_anacapa_by_taxonomy | Summarizes a site-abundance table from the Anacapa pipeline to each unique
taxon |
categorize_continuous_vector | Categorizes a continuous vector into low, medium, and high |
convert_anacapa_to_phyloseq | Converts a site-abundance table from the Anacapa pipeline and the associated
metadata file into a phyloseq object |
vegan_otu | Creates a community matrix in the vegan package style using a phyloseq object
and an otu_table object |
custom_rarefaction | Rarefies a phyloseq object to a custom sample depth and with a given number of
replicates |
pairwise_adonis 1 | Wrapper function for multilevel pairwise comparison |
ggrare 2 | Makes a rarefaction curve using ggplot2 |
runRanacapaApp | Runs the ranacapa Shiny app with tabs for interactive visualizations and statistical
analyses |
1 adopted from https://github.com/pmartinezarbizu/pairwiseAdonis (GPL-3 License)
2 adopted from https://github.com/mahendra-mariadassou/phyloseq-extended (GPL-3 License)
The ranacapa Shiny app allows users to interact with eDNA results through statistical summaries and interactive plots, displayed in the following tabs:
• Sequencing depth: Introduces the potential for variation in sequencing depth among samples and explains the basic logic behind rarefying samples in metagenomics studies ( Figure 1). Users can rarefy the dataset to a sampling depth, or can proceed through the rest of the app without rarefying samples. The documentation acknowledges recent disagreement regarding the value of rarefying in metabarcoding and eDNA sequencing studies ( McMurdie & Holmes, 2014).
• Taxonomy heatmap: Shows the taxon-by-sample matrix as an interactive heatmap made using heatmaply::heatmaply() ( Galili et al., 2018), where the color of each cell represents the number of times a given taxon was sequenced in a sample ( Figure 2). Users can filter the taxon list by selecting or deselecting specific taxa.
• Taxonomy barplot: Shows the taxonomy-by-sample matrix as an interactive barplot ( Figure 3).
• Alpha diversity plots: Introduces the concept of alpha diversity as the local diversity measured in a single habitat or sample. Users can plot alpha diversity as observed taxon richness or as Shannon diversity per sample, or can group samples according to a variable in the metadata file ( Figure 4).
• Alpha diversity statistics: Allows users to choose a variable from the metadata, and generates an alpha diversity ANOVA table according to the user-selected variable. The tab also shows the output from a post-hoc Tukey test.
• Beta diversity plots: Introduces the concept of beta diversity as the turnover in species composition across habitats (or samples). The tab includes an ordination plot generated by phyloseq::plot_ordination(), which in turn uses an ordination object made with phyloseq::ordinate(., method = "PCoA"). Points on the PCoA plot are colored according to a user-selected metadata variable ( Figure 5).
The beta diversity plots tab also includes a dendrogram that groups sites based on Ward’s cluster analysis ( stats::hclust(distance_object, method = "ward.d2")), where distance_object is made using phyloseq::distance(). For both figures, users can toggle between using Jaccard and Bray-Curtis dissimilarity.
• Beta diversity statistics: Shows results from two statistical tests of species turnover across sites. The first test is a multivariate ANOVA of taxon turnover across sites, implemented with vegan::adonis(). The second statistical test, which is implemented with vegan::betadisper(), is of heterogeneity of variances among samples. This test compares the degree of sample-to-sample variation within habitats (or within other user-selected groups).
Operation
ranacapa depends on Bioconductor v 3.7, which in turn relies on R v 3.5.0. The Shiny app has been tested on Chrome and Firefox on Windows, Mac-OSX, and Ubuntu.
Input file structure
The ranacapa Shiny app requires two input files. The first requirement is a taxon-by-sample matrix, uploaded either as a rich, dense .biom table, or as a tab-separated .txt file. Qiime2-generated .qza files generated by QIIME2 are not immediately suitable for ranacapa, as they do not contain full taxonomy information. If the site-by-species matrix is uploaded as a .txt file, the file should match the specifications of the output files from the Anacapa eDNA sequence analysis pipeline. In Anacapa output, each row represents a taxonomic identification, and each column (save one) represents the number of times that taxon appears in each sequenced sample. One column, named sum.taxonomy must contain the taxonomic identification, with taxonomic rank separated by a semicolon, e.g. “ Chordata;Actinopteri;Chaetodontiformes;Chaetodontidae;Chaetodon;Chaetodon reticulatus .” A valid input file is structured as follows:
sum.taxonomy Arch_point_1 Arch_point_2 Black_seabass_reef_1 Black_seabass_reef_2
<full path> 0 0 0 0
<full path> 0 0 43 87
<full path> 0 0 0 0
<full path> 0 0 0 0
<full path> 24 36 30 16
<full path> 0 0 0 0
<full path> 0 0 0 0
<full path> 0 0 16 177
<full path> 0 0 0 0
<full path> 0 0 0 0
The second requirement is a tab-separated .txt file that contains sample metadata. The first column in the metadata file should match the sample names in the taxonomy table; the remaining columns contain sample information for each of the samples in the taxon-by-site matrix. The metadata should contain categorical variables with two or more categories per variable. A valid metadata file for the taxonomy table above is structured as follows:
Sample Sample_or_Control Island Protection Locality
Black_seabass_reef_1 Sample Anacapa MPA Black_seabass_reef
Arch_point_1 Sample Santa Barbara non-MPA Arch_point
Arch_point_2 Sample Santa Barbara non-MPA Arch_point
Black_seabass_reef_2 Sample Anacapa MPA Black_seabass_reef
The ranacapa function validate_input_files() verifies that both the taxonomy table and the metadata files match structural requirements, which are documented in the function help files.
Use cases
We expect that researchers with expertise in bioinformatics will use the sequence analysis pipeline of their choice to assign taxonomy to eDNA datasets, and generate clean taxonomy and metadata files that can be visualized in ranacapa. Researchers can share these files with their partners, and emphasize the analyses or visualizations most appropriate to their use case. We now show how ranacapa can facilitate authentic communication between researchers and community partners in two settings.
Use case 1: Partnership between eDNA researchers and natural resource managers
A team of UCLA researchers partnered with resource managers at the Channel Islands National Park Service to assess the potential for eDNA as a biodiversity monitoring tool to supplement time-intensive visual biodiversity surveys in the Southern California Channel Islands ( Deiner et al., 2017; Lessios, 1996; Usseglio, 2015). For this partnership, resource managers collected and filtered 30 unique one-liter water samples for eDNA analysis at permanent monitoring sites inside and adjacent to protected areas, and research scientists at UCLA performed eDNA sequencing of the mitochondrial 12S ( Miya et al., 2015) and CO1 ( Leray et al., 2013) genes, targeting bony fishes, elasmobranches, and invertebrate taxa. The researchers processed sequences and assigned taxonomy using the Anacapa pipeline, and shared results with the resource managers using the ranacapa Shiny app.
The taxonomy heatmap of species detected using the 12S and CO1 metabarcodes ( Figure 2) was the most valuable visualization to this collaboration, because it allowed the resource managers to filter the large observed species list down to a particular set of key taxa that they regularly monitor. The heatmap showed that this pilot study detected 36 of the 70 key metazoans at the species level, and the remaining 34 at the genus, family, or order level. This indicates that eDNA-based studies can provide critical information for ongoing management efforts and provide new insights into the spatial and temporal distributions of these species. The value of ranacapa in this scenario was to quickly sort through long species lists generated by eDNA sequencing to highlight the strengths and weaknesses in using eDNA to monitor diversity in the Channel Islands. The data from this study are packaged as the demo dataset for the ranacapa Shiny app and are available online ( Kandlikar et al., 2018a).
Use case 2: Partnership between eDNA researchers and an undergraduate microbiology course
A team of community ecology and environmental DNA researchers in the CALeDNA program collaborated with instructors of a research-based environmental microbiology course at UCLA ( Shapiro et al., 2015), in which students used eDNA metabarcoding to study the impact of a local wildfire on the plant and soil microbial community. The goal of this twenty-week course was to provide undergraduate students an authentic experience in basic microbiology and microbial community ecology research. Over the first ten weeks, eDNA researchers on the instructional team sequenced the ITS2 ( Gu et al., 2013) and 16S SSU RNA ( Caporaso et al., 2012) metabarcoding regions from student-collected soil samples and used the Anacapa pipeline to generate taxon-by-sample tables.
The course instructors used the ranacapa Shiny app to introduce students to the structure of eDNA sequencing results. The students were encouraged to explore data and perform the statistical analyses most pertinent to the hypotheses they had formed at the beginning of the course. A key benefit of using ranacapa was that despite having no prior bioinformatics experience, students could begin exploring the biodiversity in their samples in a matter of minutes by using the online instance of the Shiny app. This allowed the instructors to focus classroom time on biological questions rather than on troubleshooting bioinformatics problems, as had been the case in previous sessions of the course. The course instructors noted that visualizing eDNA data in ranacapa helped students understand the relationships between taxon-by-site matrices and the various metadata they had collected in the field. By significantly reducing the time and difficulty in visualizing basic biodiversity patterns, ranacapa helped students develop and pursue more sophisticated analyses during the remainder of the course, using tools such as STAMP ( Parks et al., 2014) and PICRUSt ( Langille et al., 2013). The taxonomy tables and metadata files used in this course are available online ( Kandlikar et al., 2018b).
Summary and future directions
Metabarcode sequencing of environmental DNA is becoming a key tool in a wide variety of ecological studies, and results from these studies are of interest to a broad audience. Our R package and Shiny app ranacapa helps users conduct exploratory analyses and visualizations on eDNA datasets, and is a step toward more fully engaging participants in all phases of eDNA sequencing-based community science projects.
We propose three avenues for future work with ranacapa. First, we plan to use ranacapa as the primary tool to present eDNA results from hundreds of samples sequenced by the CALeDNA community science program. Second, ranacapa is being integrated into the upcoming undergraduate curriculum module “Pipeline for Undergraduate Microbiome Analysis”, which will be an open-source, comprehensive suite of analysis and data visualization tools for undergraduate researchers. Finally, in the long-term, we believe there is great promise in linking ranacapa with packages that connect with APIs of online biodiversity databases (e.g. Taxize ( Chamberlain & Szöcs, 2013), rinat ( Barve & Hart, 2017)). This will help users explore a much wider range of biodiversity questions, for example, by programmatically asking whether their samples include invasive species that are absent from other nearby sites. In sum, tools like ranacapa that allow non-technical audiences to easily interact with results from eDNA sequencing studies have great potential to engage community partners with a wide range of backgrounds and interests in primary research.
Software availability
• A Shiny app, including a dataset generated for demonstrations, is available at https://gauravsk.shinyapps.io/ranacapa
• Source code is available from GitHub: https://github.com/gauravsk/ranacapa
• Archived source code at time of publication: http://dx.doi.org/10.5281/zenodo.1464285 ( Kandlikar & Cowen, 2018)
• Software license (GPL-3)
Data availability
Datasets used for the Use cases are available from Figshare:
Dataset 1: Taxon table and metadata file for Channel Islands eDNA samples (mitochondrial 12S and CO1 metabarcodes sequenced) https://doi.org/10.6084/m9.figshare.7199477.v1 ( Kandlikar et al., 2018a)
Dataset 2: Taxon table and metadata file for Santa Monica Mountains eDNA samples (16S and plant-ITS metabarcodes sequenced) https://doi.org/10.6084/m9.figshare.7199510.v1 ( Kandlikar et al., 2018b)
Both datasets are available under a CC-BY 4.0 license
Acknowledgments
We thank Sabrina Shirazi, Rachel Turba, Chris Dao, and Keith Mitchell for providing feedback on developmental versions of this package. We also thank Mahendra Mariadassau and Pedro Martinez Arbizu for making the phyloseq-extended and pairwiseAdonis packages openly available with a GPL-3 License.
Funding Statement
GSK and ZJG were supported by the US-NSF Graduate Research Fellowship [DEG No. 1650604]. NJBK was supported the National Science Foundation [DEB-1644641]. EEC, RSM, and the CALeDNA program are supported by the University of California President’s Research Catalyst Award [CA-16-376437].
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
[version 1; referees: 1 approved
References
- Balasingham KD, Walter RP, Mandrak NE, et al. : Environmental DNA detection of rare and invasive fish species in two Great Lakes tributaries. Mol Ecol. 2018;27(1):112–127. 10.1111/mec.14395 [DOI] [PubMed] [Google Scholar]
- Barve V, Hart E: Rinat: Access iNaturalist data through apis.2017. Reference Source [Google Scholar]
- Bik, Phinch Interactive: . Phinch: An interactive, exploratory data visualization framework for –Omic datasets. bioRXiv. 2014 doi: 10.1101/009944. [DOI] [Google Scholar]
- Caporaso JG, Kuczynski J, Stombaugh J, et al. : QIIME allows analysis of high-throughput community sequencing data. Nat Methods. 2010;7(5):335–336. 10.1038/nmeth.f.303 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Caporaso JG, Lauber CL, Walters WA, et al. : Ultra-high-throughput microbial community analysis on the Illumina HiSeq and MiSeq platforms. ISME J. 2012;6(8):1621–1624. 10.1038/ismej.2012.8 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Carey MA, Papin JA: Ten simple rules for biologists learning to program. PLoS Comput Biol. 2018;14(1):e1005871. 10.1371/journal.pcbi.1005871 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chamberlain SA, Szöcs E: taxize: taxonomic search and retrieval in R [version 1; referees: 3 approved]. F1000Res. 2013;2:191. 10.12688/f1000research.2-191.v2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chang W, Cheng J, Allaire J, et al. : Shiny: Web application framework for r.2018. Reference Source [Google Scholar]
- Deiner K, Bik HM, Mächler E, et al. : Environmental DNA metabarcoding: Transforming how we survey animal and plant communities. Mol Ecol. 2017;26(21):5872–5895. 10.1111/mec.14350 [DOI] [PubMed] [Google Scholar]
- European Citizen Science Association: Ten principles of citizen science.2015. Reference Source [Google Scholar]
- Galili T, O’Callaghan A, Sidi J, et al. : heatmaply: an R package for creating interactive cluster heatmaps for online publishing. Bioinformatics. 2018;34(9):1600–1602. 10.1093/bioinformatics/btx657 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gu W, Song J, Cao Y, et al. : Application of the ITS2 Region for Barcoding Medicinal Plants of Selaginellaceae in Pteridophyta. PLoS One. 2013;8(6):e67818. 10.1371/journal.pone.0067818 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kandlikar G, Cowen M: gauravsk/ranacapa: First release of ranacapa (Version v1.0.0). Zenodo. 2018. 10.5281/zenodo.1464285 [DOI] [Google Scholar]
- Kandlikar GS, Gold ZJ, Cowen MC, et al. : Taxon table and metadata file for Channel Islands eDNA samples (mitochondrial 12S and CO1 metabarcodes sequenced). Figshare. 2018a. 10.6084/m9.figshare.7199477.v1 [DOI] [Google Scholar]
- Kandlikar GS, Gold ZJ, Cowen MC, et al. : Taxon table and metadata file for Santa Monica Mountains eDNA samples (16S and plant-ITS metabarcodes sequenced). Figshare. 2018b. 10.6084/m9.figshare.7199510.v1 [DOI] [Google Scholar]
- Langille MG, Zaneveld J, Caporaso JG, et al. : Predictive functional profiling of microbial communities using 16S rRNA marker gene sequences. Nat Biotechnol. 2013;31(9):814–821. 10.1038/nbt.2676 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Leray M, Yang JY, Meyer CP, et al. : A new versatile primer set targeting a short fragment of the mitochondrial COI region for metabarcoding metazoan diversity: application for characterizing coral reef fish gut contents. Front Zool. 2013;10:34. 10.1186/1742-9994-10-34 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lessios HA: METHODS for quantifying abundance of marine organisms.In: Methods and techniques of underwater research.(eds. Lang, M. & Baldwin, C.). American Academy of Underwater Sciences (AAUS),1996;149–157. Reference Source [Google Scholar]
- McMurdie PJ, Holmes S: phyloseq: an R package for reproducible interactive analysis and graphics of microbiome census data. PLoS One. 2013;8(4):e61217. 10.1371/journal.pone.0061217 [DOI] [PMC free article] [PubMed] [Google Scholar]
- McMurdie PJ, Holmes S: Waste not, want not: why rarefying microbiome data is inadmissible. PLoS Comput Biol. 2014;10(4):e1003531. 10.1371/journal.pcbi.1003531 [DOI] [PMC free article] [PubMed] [Google Scholar]
- McMurdie PJ, Holmes S: Shiny-phyloseq: Web application for interactive microbiome analysis with provenance tracking. Bioinformatics. 2015;31(2):282–283. 10.1093/bioinformatics/btu616 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Miya M, Sato Y, Fukunaga T, et al. : MiFish, a set of universal PCR primers for metabarcoding environmental DNA from fishes: detection of more than 230 subtropical marine species. R Soc Open Sci. 2015;2(7):150088. 10.1098/rsos.150088 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Oksanen J, Blanchet FG, Friendly M, et al. : Vegan: Community ecology package.2018. Reference Source [Google Scholar]
- Pandya RE: A framework for engaging diverse communities in citizen science in the US. Front Ecol Environ. 2012;10(6):314–317. 10.1890/120007 [DOI] [Google Scholar]
- Parks DH, Tyson GW, Hugenholtz P, et al. : STAMP: statistical analysis of taxonomic and functional profiles. Bioinformatics. 2014;30(21):3123–3124. 10.1093/bioinformatics/btu494 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pedersen MW, Overballe-Petersen S, Ermini L, et al. : Ancient and modern environmental DNA. Philos Trans R Soc Lond B Biol Sci. 2015;370(1660):20130383. 10.1098/rstb.2013.0383 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shapiro C, Moberg-Parker J, Toma S, et al. : Comparing the Impact of Course-Based and Apprentice-Based Research Experiences in a Life Science Laboratory Curriculum. J Microbiol Biol Educ. 2015;16(2):186–197. 10.1128/jmbe.v16i2.1045 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Taberlet P, Coissac E, Hajibabaei M, et al. : Environmental DNA. Mol Ecol. 2012;21(8):1789–1793. 10.1111/j.1365-294X.2012.05542.x [DOI] [PubMed] [Google Scholar]
- Thomas AC, Howard J, Nguyen PL, et al. : ANDe: A fully integrated environmental DNA sampling system. Methods Ecol Evol. 2018;9(6):1379–1385. 10.1111/2041-210X.12994 [DOI] [Google Scholar]
- Usseglio P: Quantifying reef fishes: Bias in observational approaches.In: Ecology of fishes on coral reefs.(ed. Mora, C.). Cambridge University Press,2015;270–273. 10.1017/CBO9781316105412.035 [DOI] [Google Scholar]