Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2016 May 15.
Published in final edited form as: Genesis. 2015 Jul 17;53(8):535–546. doi: 10.1002/dvg.22872

SmedGD 2.0: The Schmidtea mediterranea genome database

Sofia MC Robb 1, Kirsten Gotting 1, Eric Ross 1, Alejandro Sánchez Alvarado 1,*
PMCID: PMC4867232  NIHMSID: NIHMS777829  PMID: 26138588

Abstract

Planarians have emerged as excellent models for the study of key biological processes such as stem cell function and regulation, axial polarity specification, regeneration, and tissue homeostasis among others. The most widely used organism for these studies is the free-living flatworm Schmidtea mediterranea. In 2007, the Schmidtea mediterranea Genome Database (SmedGD) was first released to provide a much needed resource for the small, but growing planarian community. SmedGD 1.0 has been a depository for genome sequence, a draft assembly, and related experimental data (e.g., RNAi phenotypes, in situ hybridization images, and differential gene expression results). We report here a comprehensive update to SmedGD (SmedGD 2.0) that aims to expand its role as an interactive community resource. The new database includes more recent, and up-to-date transcription data, provides tools that enhance interconnectivity between different genome assemblies and transcriptomes, including next generation assemblies for both the sexual and asexual biotypes of S. mediterranea. SmedGD 2.0 (http://smedgd.stowers.org) not only provides significantly improved gene annotations, but also tools for data sharing, attributes that will help both the planarian and biomedical communities to more efficiently mine the genomics and transcriptomics of S. mediterranea.

Keywords: Schmidtea mediterranea, planaria genome, GMOD, GBrowse

Introduction

Schmidtea mediterranea is a free-living member of the invertebrate Platyhelminthes phylum. Possessing both sexual and asexual biotypes, the diploid, free-living S. mediterranea has emerged as a particularly influential organism due to their remarkable capacity to regenerate complete animals from small body fragments in as little as seven days after amputation. Moreover, S. mediterranea possesses a large and experimentally accessible population of pluripotent stem cells. These attributes, combined with modest maintenance costs, have transformed S. mediterranea into a model system to study a host of fundamental problems such as stem cell biology (Eisenhoffer et al., 2008; Rossi et al., 2014; Rouhana et al., 2012; Rouhana et al., 2014; van Wolfswinkel et al., 2014), organogenesis (Adler et al., 2014; Chen et al., 2013; Forsthoefel et al., 2012; Forsthoefel et al., 2011; Lapan and Reddien, 2012), developmental plasticity (Gavino et al., 2013; Gurley et al., 2010; Roberts-Galbraith and Newmark, 2013), tissue homeostasis (Bender et al., 2012; Gurley et al., 2010; Pearson and Sánchez Alvarado, 2010; Pellettieri et al., 2010; Reddien et al., 2005), germ line formation (Collins et al., 2010; Newmark et al., 2008; Wang et al., 2010; Wang et al., 2007; Xiang et al., 2014), and regeneration (Gavino et al., 2013; Guedelhoefer and Sánchez Alvarado, 2012; Liu et al., 2013; Roberts-Galbraith and Newmark, 2013; Sánchez Alvarado, 2013a; Sikes and Newmark, 2013; Srivastava et al., 2014; Umesono et al., 2013).

As is the case for all flatworms, S. mediterranea belongs to a larger and diverse group of animals known as the Lophotrochozoans or Spiralians, a sister group to the better studied Ecdysozoa (e.g., Drosophila and C. elegans), and Deuterostomes (e.g., non-mammalian and mammalian vertebrates). Even though the flatworms encompass the most biomedically relevant group of animals within this major bilaterian clade (Chen et al., 2013; Collins and Newmark, 2013; Laumer et al., 2015; Sánchez Alvarado, 2015), much of the molecular and developmental biology of the Lophotrochozoa remain unexplored. Additionally, of all invertebrates, flatworms are among the most diverse. Consisting of nearly 50,000 different species (Littlewood and Bray, 2001), their diversity is only matched by the niches they occupy (e.g., fresh and salt water, as well as terrestrial environments), and lifestyles that range from free-living to commensal and parasitic forms (Brusca and Brusca, 2003). Hence, the phylogenetic position of planarians within the superclade of the Lophotrochozoa make S. mediterranea one of the few experimentally tractable organisms within this remarkably diverse group of animals with which to experimentally dissect many fundamental aspects of their remarkable, but generally understudied biology (Laumer et al., 2015; Sánchez Alvarado, 2015). For instance, S. mediterranea has begun to be used quite effectively as a model for understanding parasitic flatworms. Although marked differences exist between parasitic and free-living Platyhelminthes, they also share many common features. By exploiting the experimental accessibility of S. mediterranea, it has been possible to mechanistically dissect key fundamental aspects of parasitic flatworms (Collins and Newmark, 2013; Collins et al., 2013; Sánchez Alvarado, 2013b; Wang et al., 2013).

The growth of the flatworm community, and the many biological processes being interrogated using planarians as a model system requires the development of tools to access genomic and transcriptomic data. The original Schmidtea mediterranea Genomic Database (SmedGD) established in 2007 provided a searchable genome browser for the planarian community (Robb et al., 2008). This initial resource represented a draft genome assembly consisting of 43,295 scaffolds (N50=40,862), a large number of bioinformatically predicted gene models (31,179), but a relatively low number of transcript evidence supporting such models. Since SmedGD’s inception, not only has the community using SmedGD expanded, with an average of almost 1,000 unique visitors each week, but also there has been a significant growth of work in parasitic flatworm genomics and other planarian species (Liu et al., 2013; Sikes and Newmark, 2013). To meet the needs of this expanding community, and to provide a useful resource, we have updated SmedGD.

SmedGD 2.0 not only encompasses the base functions of a searchable genome sequence, but also includes new genome assemblies for both sexual (Sxl) and asexual (Asxl) biotypes, as well as extensive, biotype-specific transcriptome data. A commonly used measure for comparing genome assemblies is N50, the length of the genomic scaffold for which all scaffolds that size or larger contain at least half of the sum of the lengths of all contigs. By this measure the new Sxl biotype genome assembly is a significant improvement of the original draft assembly (15,334 scaffolds; N50=80,447). Additionally, the Asxl draft genome assembly consists of 15,046 scaffolds (N50=77,506). In both cases, and unlike SmedGD 1.0, only gene models supported by transcription data were retained. The comprehensive nature of the transcriptome data used to help annotate the new genome assemblies allowed us to generate biotype-specific non-redundant gene sets we term Smed Unigenes (Sxl= 26,008; Asxl= 27,813). In addition to more complete transcriptome information and better annotation of gene models, SmedGD 2.0 also incorporates tools for sharing data, and the integration of genomics and transcriptomics data for several Sxl genome assemblies and the Asxl assembly. In its present form SmedGD 2.0 provides a state-of-the-art genome database that will prove essential to the global efforts aimed at understanding the remarkable biology of S. mediterranea in particular, and flatworms in general.

SmedGD Architecture

The Schmidtea mediterranea Genome Database version 2.0 (SmedGD 2.0) was constructed with components from the Generic Model Organism Database (GMOD) project. The GBrowse 2.0 (Stein, 2013) genome browser is utilized and the majority of the data is stored in a MySQL database (MySQL, 2015) which uses the Bio::SeqFeature::Store schema and data adaptors. The RNASeq data is managed in BAM format and is accessed by the browser using the Bio::DB::Sam adaptor. Bio::DB::Sam and Bio::SeqFeature::Store are part of the Bioperl project (Stajich, 2007).

WordPress (2015), a free, open source software, is used to construct the front and entry page to SmedGD 2.0, PubMed publication pages, and all other non-browser or search interface pages. A free WordPress plugin, PubMed Posts (sydcode, 2014), is used to auto generate the publication pages by supplying a list of PubMed IDs. Custom Perl scripts were created to assist in searching the databases and consolidating information to create the Smed Unigene Gene Pages. The webserver and databases are hosted on a CentOS linux virtual machine, allowing for easy expansion of resources as demand dictates. Finally, to help navigate the features and contents of the database we have prepared a table of acronyms listing the names of file formats, biotype specific reagents (e.g., genome database, biotypes), as well as database and software tools are shown (Table 1).

Table 1.

List of acronyms used. The names of file formats, biotype specific reagents (e.g., genome database, biotypes) as well as database and software tools are shown.

Term Description URL
GFF3 Generic Feature Format Version 3 http://www.sequenceontology.org/gff3.shtml
BED Flexible way to define the data lines that are displayed in an annotation track http://genome.ucsc.edu/FAQ/FAQformat#format1
WIG Wiggle Track Format http://genome.ucsc.edu/goldenpath/help/wiggle.html
bigWig Binary format of Wiggle Track Format http://genome.ucsc.edu/goldenpath/help/bigWig.html
SAM Sequence Alignment Map Format https://samtools.github.io/hts-specs/SAMv1.pdf
BAM Binary format of Sequence Alignment Map Format https://samtools.github.io/hts-specs/SAMv1.pdf
SmedGD S. mediterranea Genome Database
SmedGD 1.0 S. mediterranea Genome Database version 1
SmedGD 2.0 S. mediterranea Genome Database version 2
Smed Unigenes A consistent S. mediterranea gene set that can be used in multiple genome assemblies
Smednr Non-redundant set of putative coding nucleotide sequences from S. mediterranea
Sxl S. mediterranea Sexual Biotype (S2F2)
Asxl S. mediterranea Asexual Biotype (CIW4)
GMOD (Generic Model Organism Database) Open source software tools for managing, visualising, storing, and disseminating genetic and genomic data. http://gmod.org/wiki/Main_Page
Swissprot/Uniprot Collection of functional information on proteins, with accurate, consistent and rich annotation. http://www.uniprot.org/uniprot/
GO (The Gene Ontology) Collaborative effort to address the need for consistent descriptions of gene products across databases. http://geneontology.org/
PFAM (Protein Families Database) Collection of protein families, each represented by multiple sequence alignments and hidden Markov models. http://pfam.xfam.org/
Gbrowse 2.0 Combination of database and interactive web pages for manipulating and displaying annotations on genomes. http://gmod.org/wiki/GBrowse
MAKER Genome annotation pipeline. http://www.yandell-lab.org/software/maker.html
Bio::SeqFeature::Store Tool for storage and retrieval of sequence annotation data. http://www.bioperl.org/wiki/Main_Page
Bioperl Community effort to produce Perl code which is useful in biology. http://www.bioperl.org/wiki/Main_Page
WordPress Free and open-source tool and a content management system based on PHP and MySQL. https://wordpress.org/
Trinity Tool for RNA-Seq de novo Assembly. http://trinityrnaseq.github.io/
Transdecoder Tool to find candidate coding regions within transcripts. https://transdecoder.github.io/
CD-HIT Program for clustering and comparing protein or nucleotide sequences. http://weizhongli-lab.org/cd-hit/
Seqclean Tool for validation and trimming of DNA sequences from a flat file database. http://sourceforge.net/projects/seqclean/files/
BLAST (Basic Local Alignment Search Tool) Tool to find regions of local similarity between sequences. http://blast.ncbi.nlm.nih.gov/Blast.cgi
hmmscan version 3.1b1 Tool for searching protein sequence vs profile-HMM database; a part of HMMER. http://www.ebi.ac.uk/Tools/hmmer/search/hmmscan
tmhmm version 2.0c Prediction of transmembrane helices in proteins. http://www.cbs.dtu.dk/services/TMHMM/
HMMER Used for searching sequence databases for homologs of protein domains. http://hmmer.janelia.org/
signalP Predicts the presence of signal peptide cleavage sites in amino acid sequences. http://www.cbs.dtu.dk/Services/SignalP/
ncoils Prediction of coiled-coil secondary structure elements. http://www.ch.embnet.org/software/coils/COILS_doc.html
MySQL A freely available Relational Database Management System that uses Structured Query Language (SQL). https://www.mysql.com/
Bio::DB::Sam This module provides a Perl interface to the libbam library for SAM/BAM sequence alignment databases. http://www.bioperl.org/wiki/Main_Page

SmedGD 2.0 contents

Besides the genome assembly and corresponding annotation found in the original SmedGD 1.0, SmedGD 2.0 contains a collection of next-generation sequence genome assemblies to both the sexual and asexual biotypes. To provide continuity between the browsers, much of the same data has been aligned to each of the assemblies now found in SmedGD 2.0. Also new to SemdGD 2.0 are Smed Unigene sequences, as well as extensive RNAseq data.

Pubmed publication pages

To strengthen the sense of community and to demonstrate the impact SmedGD has had on flatworm research, all abstracts of publications that cite SmedGD manuscripts are accessible from the home page. The five most current publications appear in the scrolling banner and all publications pages are searchable from the side bar and link to pubmed.

Assembly and annotations

SmedGD 2.0 contains multiple S. mediterranea assemblies and their respective MAKER gene annotations (Campbell et al., 2014). MAKER annotations were generated and are supported with transcriptome and protein homology data (SwissProt/UniProt (UniProt, 2015)). The original and most widely used S. mediterranea assembly, Sexual version 3.1, is still available and now an additional assembly of the Sxl biotype v4.0 and the first assembly of the Asxl biotype v1.1 are also available. SmedGD 2.0 is designed to be a storehouse for future assemblies as well.

Transcriptomes

A variety of assembled transcriptome sequences from both intact and regenerating Asxl and Sxl S. mediterranea animals have been aligned to each genome assembly (see Table 1). These aligned transcripts provide biological relevance and support to the available gene annotations.

RNAseq

RNAseq reads from both Sxl and Asxl animals have been aligned to the various genome assemblies and are useful in accessing the validity and completeness of the gene annotations. RNA expression data from both intact and regenerating animals (Xiang et al., 2014) can be visually compared to identify transcripts that may only be expressed during homeostasis or regeneration, or those that may be specific to one of the two biotypes.

Smed Unigenes

To produce a consistent gene set that can be used in multiple genome assemblies, we created a set of non-redundant sequences we call Smed Unigenes (with identifiers that begin with ‘SMU’). We began with four Trinity (Grabherr, 2011) assembled transcriptomes from S. mediterranea: Intact Whole Animal Asxl worms (GCZZ00000000); Intact Whole Animal Sxl worms (GDAG00000000); Pooled Samples from an Asxl regeneration time course (unpublished data); and pooled samples from a Sxl regeneration time-course (Xiang et al., 2014). We translated each transcriptome assembly into putative coding sequences with Transdecoder, provided by the Trinity suite, and combined the sequences into a single redundant set. To remove the redundancy of the set we used CD-HIT (Fu et al., 2012) to cluster all sequences with an identity of greater than or equal to 95%. The resultant sequence set was then filtered for contaminate with Seqclean (seqclean, 2011). To maximize the utility of the Smed Unigenes, sequences were annotated with the best BLASTx (Camacho et al., 2009) hit to SwissProt (UniProt, 2015) and NCBI’s NR database, using an e-value cutoff of 0.001. We also identified PFAM (Finn et al., 2014) domains present in Smed Unigene sequences using hmmscan, version 3.1b1, from the HMMER package (Finn et al., 2011) with an e-value cutoff of 0.01. We also annotated these transcripts with tmhmm, version 2.0c, (Krogh et al., 2001) signalP, version 4.1, (Petersen et al., 2011) and ncoils (Lupas, 1996). Any sequence below 300 base pairs in length that did not have any annotations attached were discarded. The final Smed Unigene set consists of 32,615 sequences with an average nucleotide length of 1,061. Additionally, tools have been generated in SmedGD 2.0 to download the amino acid sequence and the constituent transcript nucleotide sequences. Besides having been aligned to all available genome assemblies, the Smed Unigene sets have searchable protein domain feature along with homology information and a comprehensive gene page.

Smednr

Smednr (version: smed_20140614) is a non-redundant set of probable coding nucleotide sequences (with identifiers that begin with ‘SMED’), which were made by clustering a collection of both Sxl and Asxl assembled transcriptomes (See Table 2). The longest sequences from the clusters were chosen as the representative sequences for the clusters, with UTR included. These sequences have also been aligned to each of the three genome assemblies and can be viewed in each of the genome assembly GBrowse instances. This set of sequences is distinct from the Smed Unigenes in that not all Smednr entries are putative coding sequences and some have been included for compatibility with previous unpublished experiments.

Table 2.

Descriptions of S. mediterranea transcriptome assemblies aligned to genome assemblies.

Name Biotype Description Alignment Tool
033226_uniq Transcriptome Sxl & Asxl Sánchez Alvarado Lab transcriptome map2assembly
SmedAsxl Intact Transcriptome Asxl Trinity assembled transcriptome from SmedAsxl Intacts. Poly-A selected directional reads map2assembly
SmedSxl Regeneration Transcriptome Sxl Trinity assembled transcriptome from SmedSxl pooled regenerating animals. Poly-A selected non-directional reads (Xiang, 2014) map2assembly
BIMSB Transcripts Asxl (Adamidi, 2011) Tophat
SmedAsxl Regeneration Transcriptome Asxl Trinity assembled transcriptome from SmedAsxl pooled regenerating animals.Poly-A selected non-directional reads map2assembly
SmedSxl Intact Transcriptome Sxl Trinity assembled transcriptome from SmedSxl Intacts. Poly-A selected directional reads map2assembly
ESTs aligned with BLAT Asxl dbest S. mediterranea sequences BLAT

Protein homology and domains

Protein homology and domains predictions are very informative when assessing the potential function of gene products. Part of the MAKER pipeline for generating gene annotations is to run protein searches against genome sequence. Each of our genome assemblies has been analyzed in this fashion and best hits to the UniProt/SwissProt database (UniProt, 2015) have been determined and are displayed on the three genome assembly browsers in SmedGD 2.0.

The amino acid Smed Unigene sequences were used to identify homologs in Homo sapiens, Drosophila melanogaster, Caenorhabditis elegans, and Saccharomyces cerevisiae using Ensembl databases (Cunningham et al., 2015). In addition to homologs, these sequences were used to predict protein domains such as coiled-coil and transmembrane motifs as well as signal peptides and PFAM domains.

Gene ontology

Gene ontology (GO) terms provide a structured controlled vocabulary to describe the molecular function, biological process and cellular localization of gene products (Gene Ontology, 2015). Each Smed Unigene has a collection of associated GO terms that are derived from the GO terms of the best hit to UniProt/SwissProt.

SmedGD 1.0 Data retained in SmedGD 2.0

A collection of data that was available in SmedGD 1.0 remains accessible in SmedGD 2.0. This data includes RNAi phenotypes (Reddien et al., 2005), microarray expression data (Eisenhoffer et al., 2008) in situ hybridization images, aligned microRNAs and piRNAs (Friedlander et al., 2009).

Using SmedGD 2.0

Overview

SmedGD 2.0 can be navigated from the top menu bar on the home page. The home page contains a dropdown menu with links to each of the browsers (Smed Unigenes and the three genome assemblies), the search interfaces, and links to the BLAST (Camacho et al., 2009), downloads, help and contact pages.

Genome browser

Separate GBrowse instances for each assembly (S. mediterranea Sexual v3.1 and v4.0, S. mediterranea Asexual v1.1) and for the Smed Unigenes are available for viewing. The genome assembly browsers all have a ‘Smed Unigenes’ track, in which the Smed Unigenes have been aligned to the genome sequence and provides a link to the Smed Unigene gene page. The Smed Unigene pages, described below also link back to the genome browser allowing intuitive crosslinking between the genome assemblies (Figure 1a).

Figure 1.

Figure 1

Smed Unigene page

Panel a: In the Genome browser a pop-up balloon provides a link from each Smed Unigene alignment to the Smed Unigene Gene Page.

Panel b: An example of a Smed Unigene “Gene page”. The header region contains a summary of the Smed Unigene which includes its identifier, SMU15012603 (links back to Smed Unigene Browser), the bioinformatically determined putative function, Histone Deacteylase 1, a count of the number of transcripts (4) from the number of libraries, or RNASeq experiments, (4) by which this Smed Unigene is represented.

Panel c: Smed Unigene Gene Pages provide genome browser links to alignments for that Smed Unigene in each genome assembly.

Panel d: An example of the web page for retrieving Amino Acid Sequence. Clicking on the button labeled ”Get AA Sequence” (green arrow in panel A) will provide the amino acid sequence of Smed Unigene. Clicking on the button labeled “get NT Sequence” (panel A) will provide the nucleotide sequences of the transcripts that were clustered to generate this Smed Unigene.

Smed Unigenes Gene Page

The Smed Unigene gene page (Figure 1b) can be accessed from any of the browsers through the links of the individual features of the Smed Unigene alignment track, or from the results of the Smed Unigene Search pages. Each Smed Unigene Gene Page has five sections. The first section (Figure 1a), the header, lists the identifier of the Smed Unigene, and serves also as its link to the Smed Unigene GBrowse instance, the computationally determined putative function, a count of the transcripts that were clustered to generate the Smed Unigene sequence, and links to retrieve the amino acid Smed Unigene sequence (Figure 1a and Figure 1d) and the associated nucleotide transcript sequences. The second section (Figure 1b), “Selected Protein Similarities”, contains a summary of the best hits from a variety of database searches. The third section, “Selected Motif Similarities”, provides a visual overview of the location in the amino acid Smed Unigene sequence of several protein motifs, including, coiled-coil and transmembrane domains, signal peptides, and best hits to PFAM domains. The fourth section (Figure 1c), “Transcripts”, lists the transcripts that were used in the clustering algorithm to generate the Smed Unigene and provides a link to alignments of each transcript in the three other assemblies. The fifth and final section, “Associated Gene Ontology Terms”, lists the GO terms that are associated with the protein similarity hits.

Tools

An assortment of custom tools, an intuitive BLAST interface and GBrowse standard tools have been provided for users to create an environment in which more than just viewing of the genome is possible. Searches with sequences, terms, or publically provided data can be performed and saved for later use or to share with colleagues.

BLAST

BLAST is a tool used to identify similar sequences based on amino acid or nucleotide identity. This is particularly helpful in the prediction of homologs. Users are provided with an interface to BLAST (Figure 2) from the menu bar to search nucleotide or protein sequences against each individual genome assembly or to the Smed Unigene amino acid or nucleotide databases and link to the matching regions in the appropriate browser. When using BLAST to search the Smed Unigene sequence databases, a link to both the Smed Unigene GBrowse instance and the Smed Unigene Gene Page is provided in the BLAST results table. (Figure 2b and Figure 2c)

Figure 2.

Figure 2

BLAST interface

Panel a: BLAST form. The form allows for alignment to sequence databases available in SmedGD 2.0.

Panel b: BLAST result table. When blasting to the Smed Unigenes database, BLAST results are formatted in a table with links to the Smed Unigenes GBrowse, the Smed Unigene Gene Page (green arrow), and to downloads of the amino acid (AA) and nucleotide sequences (NT).

Panel c: An example of the genome browser displaying Smed Unigene BLAST results.

Smed Unigene Protein Search

A useful search feature, accessible from the main menu, for identifying Smed Unigenes that are predicted to contain coil-coil, transmembrane, and/or signal peptide domains as well as those that contain specific terms in the descriptions and IDs, or names, of putative homologs and domains (ex. Description: HDAC, ID:PF00850). The search (Figure 3a) can be limited by amino acid length. All resulting Smed Unigene identifiers (Figure 3b) can be used to link to the Smed Unigene Gene Page.

Figure 3.

Figure 3

Smed Unigene Protein Search Page

Panel a: Smed Unigene Protein Search. Protein motifs (coil-coiled, transmembrane, and/or signal peptides) can be selected and limited by adding additional search criteria such as protein descriptions, protein names, or amino acid length. A search with just text provided in the Search Protein Descriptions can be performed without any additional protein domains selected.

Panel b: Smed Unigene Protein Search results page. The search returns a list of Smed Unigenes with their putative function and a link the Smed Unigene Gene Page.

Smed Unigene GO terms Search

This search can be used to produce a list of all the Smed Unigenes that have an associated GO term (i.e. GO:0000578). Also, text searches (i.e. Embryo) of all GO term names, definitions and synonyms can be utilized to generate a collection of GO terms that are then used to identify all associated Smed Unigenes. All resulting Smed Unigene identifiers can be used to link to the Smed Unigene Gene Page.

Easy retrieval of Smed Unigene (amino acid or nucleotide) sequences

This particular feature is found in the results page of every Smed Unigene search (Figure 3b) including the BLAST results table (Figure 2b) against the Smed Unigene databases, it is possible to download the resulting Smed Unigene sequences with a single click. Either the amino acid sequence (Figure 1d) or the nucleotide transcript sequences that were utilized to construct the Smed Unigene can be downloaded.

Custom and Community tracks

Built in features of the GBrowse 2.0 software. Data such as RNAseq, protein homology, mapped transcriptomes, etc. can be uploaded and viewed privately, shared with a select group, or with everyone. Having these tools made available in SmedGD 2.0 will create a cooperative community environment. These data, in a variety of formats (GFF3, BED, WIG, bigWig, BAM, SAM, as well as others) can be uploaded or linked to from a remote server through the “Custom Tracks” tab found on each of the SmedGD 2.0 GBrowse instances. When the tracks are made “Public” they will appear in the “Community Tracks” tab. Creation of a user account enables users to return to their uploaded tracks.

Snapshots

Another built in feature of the GBrowse 2.0 software is Snapshots, which allows a user to save their viewing session so that it can be returned to at a later time or shared with others. The session includes the landmark region, any custom tracks, and all selected tracks. To share your session with another individual the snapshot is required to have the fully resolved URL of each of the visible tracks. The full URL can be acquired by selecting “Bookmark this” from the GBrowse “File” menu. Now when “Save Snapshot” is selected it will contain information about the full URL that can be successfully shared with others.

RNAi Search

The RNAi phenotype search, which was first used in SmedGD 1.0, is still available in SmedGD 2.0. RNAi phenotypes are searchable with terms defined in Reddien et al., 2005.

In situ images

In situ images which illustrate the qualitative mRNA expression pattern of transcripts are available in the SmedSxl v3.1 genome browser by selecting the in situ track. This data is carried over from SmedGD 1.0.

Curation and Future Expansions of SmedGD 2.0

With the accessibility and significant drop in cost of nucleic acid sequencing methods, we anticipate that multiple planarian resources will become available in the near future. For instance, new S. mediterranea assemblies will become available which can be readily integrated into the present architecture of SmedGD 2.0. In addition, independently generated and curated resources are likely to be produced by the growing flatworm community in the near future. Rather than serving as a central depository for these up-and-coming resources, we envision SmedGD 2.0 serving as a hub from which such resources can be accessed and integrated by the community of users.

We also aim to improve the content and utility of SmedGD 2.0 on an ongoing basis. Local planarian communities hold meetings every two years in Asia, North America and Europe, and the community at large organizes also every two years an international planarian meeting. Feedback and input is sought at these meetings and proposals for introducing new features to SmedGD considered and decided upon. Currently under development, for instance, is the incorporation of an anatomical ontology based whole-mount in situ database of gene expression patterns, as well as the integration of chromatin immunoprecipitation sequencing data (ChIP-seq), and variant calling features (e.g., single nucleotide polymorphisms, copy number variations, genome rearrangements). In addition, users can post feedback and make suggestions directly in the SmedGD 2.0 portal. In due course, we expect that once these tools are developed that users will be able to submit data for posting on SmedGD 2.0. In sum, SmedGD 2.0 (http://smedgd.stowers.org) will prove useful not only to the growing community of planarian researchers, but also to those investigators engaged in developmental and evolutionary biology, parasitology, comparative genomics, stem cell research and regeneration.

Acknowledgments

The authors would like to acknowledge past and present members of the Sánchez Alvarado laboratory for beta-testing SmedGD 2.0 and for their comments and suggestions to improve and enhance its usability. We would also like to thank Chris Seidel for his valuable contributions to the Smednr dataset. ASA is a Howard Hughes Medical Institute Investigator and an Investigator of the Stowers Institute for Medical Research.

References

  1. Adler CE, Seidel CW, McKinney SA, Sánchez Alvarado A. Selective amputation of the pharynx identifies a FoxA-dependent regeneration program in planaria. Elife. 2014;3:e02238. doi: 10.7554/eLife.02238. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Bender CE, Fitzgerald P, Tait SW, Llambi F, McStay GP, Tupper DO, Pellettieri J, Sánchez Alvarado A, Salvesen GS, Green DR. Mitochondrial pathway of apoptosis is ancestral in metazoans. Proc Natl Acad Sci U S A. 2012;109:4904–4909. doi: 10.1073/pnas.1120680109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Brusca RC, Brusca GJ. Invertebrates. 2nd. Sunderland, Mass.: Sinauer Associatespxix; 2003. p. 936. [Google Scholar]
  4. Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Madden TL. BLAST+: architecture and applications. BMC Bioinformatics. 2009;10:421. doi: 10.1186/1471-2105-10-421. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Campbell MS, Holt C, Moore B, Yandell M. Genome Annotation and Curation Using MAKER and MAKER-P. Curr Protoc Bioinformatics. 2014;48:4 11 11–14 11 39. doi: 10.1002/0471250953.bi0411s48. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Chen CC, Wang IE, Reddien PW. pbx is required for pole and eye regeneration in planarians. Development. 2013;140:719–729. doi: 10.1242/dev.083741. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Chen Y, Yu P, Luo J, Jiang Y. Secreted protein prediction system combining CJ-SPHMM, TMHMM, and PSORT. Mamm Genome. 2003;14:859–865. doi: 10.1007/s00335-003-2296-6. [DOI] [PubMed] [Google Scholar]
  8. Collins JJ, 3rd, Hou X, Romanova EV, Lambrus BG, Miller CM, Saberi A, Sweedler JV, Newmark PA. Genome-wide analyses reveal a role for peptide hormones in planarian germline development. PLoS Biol. 2010;8:e1000509. doi: 10.1371/journal.pbio.1000509. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Collins JJ, 3rd, Newmark PA. It's no fluke: the planarian as a model for understanding schistosomes. PLoS Pathog. 2013;9:e1003396. doi: 10.1371/journal.ppat.1003396. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Collins JJ, 3rd, Wang B, Lambrus BG, Tharp ME, Iyer H, Newmark PA. Adult somatic stem cells in the human parasite Schistosoma mansoni. Nature. 2013;494:476–479. doi: 10.1038/nature11924. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Cunningham F, Amode MR, Barrell D, Beal K, Billis K, Brent S, Carvalho-Silva D, Clapham P, Coates G, Fitzgerald S, Gil L, Giron CG, Gordon L, Hourlier T, Hunt SE, Janacek SH, Johnson N, Juettemann T, Kahari AK, Keenan S, Martin FJ, Maurel T, McLaren W, Murphy DN, Nag R, Overduin B, Parker A, Patricio M, Perry E, Pignatelli M, Riat HS, Sheppard D, Taylor K, Thormann A, Vullo A, Wilder SP, Zadissa A, Aken BL, Birney E, Harrow J, Kinsella R, Muffato M, Ruffier M, Searle SM, Spudich G, Trevanion SJ, Yates A, Zerbino DR, Flicek P. Ensembl 2015. Nucleic Acids Res. 2015;43:D662–D669. doi: 10.1093/nar/gku1010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Eisenhoffer GT, Kang H, Sánchez Alvarado A. Molecular analysis of stem cells and their descendants during cell turnover and regeneration in the planarian Schmidtea mediterranea. Cell Stem Cell. 2008;3:327–339. doi: 10.1016/j.stem.2008.07.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Finn RD, Bateman A, Clements J, Coggill P, Eberhardt RY, Eddy SR, Heger A, Hetherington K, Holm L, Mistry J, Sonnhammer EL, Tate J, Punta M. Pfam: the protein families database. Nucleic Acids Res. 2014;42:D222–D230. doi: 10.1093/nar/gkt1223. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Finn RD, Clements J, Eddy SR. HMMER web server: interactive sequence similarity searching. Nucleic Acids Res. 2011;39:W29–W37. doi: 10.1093/nar/gkr367. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Forsthoefel DJ, James NP, Escobar DJ, Stary JM, Vieira AP, Waters FA, Newmark PA. An RNAi screen reveals intestinal regulators of branching morphogenesis, differentiation, and stem cell proliferation in planarians. Dev Cell. 2012;23:691–704. doi: 10.1016/j.devcel.2012.09.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Forsthoefel DJ, Park AE, Newmark PA. Stem cell-based growth, regeneration, and remodeling of the planarian intestine. Dev Biol. 2011;356:445–459. doi: 10.1016/j.ydbio.2011.05.669. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Friedlander MR, Adamidi C, Han T, Lebedeva S, Isenbarger TA, Hirst M, Marra M, Nusbaum C, Lee WL, Jenkin JC, Sánchez Alvarado A, Kim JK, Rajewsky N. High-resolution profiling and discovery of planarian small RNAs. Proc Natl Acad Sci U S A. 2009;106:11546–11551. doi: 10.1073/pnas.0905222106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Fu L, Niu B, Zhu Z, Wu S, Li W. CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics. 2012;28:3150–3152. doi: 10.1093/bioinformatics/bts565. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Gavino MA, Wenemoser D, Wang IE, Reddien PW. Tissue absence initiates regeneration through Follistatin-mediated inhibition of Activin signaling. Elife. 2013;2:e00247. doi: 10.7554/eLife.00247. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Gene Ontology C. Gene Ontology Consortium: going forward. Nucleic Acids Res. 2015;43:D1049–D1056. doi: 10.1093/nar/gku1179. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Guedelhoefer OC, Sánchez Alvarado A. Amputation induces stem cell mobilization to sites of injury during planarian regeneration. Development. 2012;139:3510–3520. doi: 10.1242/dev.082099. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Gurley KA, Elliott SA, Simakov O, Schmidt HA, Holstein TW, Sánchez Alvarado A. Expression of secreted Wnt pathway components reveals unexpected complexity of the planarian amputation response. Dev Biol. 2010;347:24–39. doi: 10.1016/j.ydbio.2010.08.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Krogh A, Larsson B, von Heijne G, Sonnhammer EL. Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J Mol Biol. 2001;305:567–580. doi: 10.1006/jmbi.2000.4315. [DOI] [PubMed] [Google Scholar]
  24. Lapan SW, Reddien PW. Transcriptome analysis of the planarian eye identifies ovo as a specific regulator of eye regeneration. Cell Rep. 2012;2:294–307. doi: 10.1016/j.celrep.2012.06.018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Laumer CE, Hejnol A, Giribet G. Nuclear genomic signals of the 'microturbellarian' roots of platyhelminth evolutionary innovation. Elife. 2015;4 doi: 10.7554/eLife.05503. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Littlewood TJ, Bray RA. Interrelationships of the Platyhelminthes. London, New York: Taylor and Francis; 2001. [Google Scholar]
  27. Liu SY, Selck C, Friedrich B, Lutz R, Vila-Farre M, Dahl A, Brandl H, Lakshmanaperumal N, Henry I, Rink JC. Reactivating head regrowth in a regeneration-deficient planarian species. Nature. 2013;500:81–84. doi: 10.1038/nature12414. [DOI] [PubMed] [Google Scholar]
  28. Lupas A. Prediction and analysis of coiled-coil structures. Methods Enzymol. 1996;266:513–525. doi: 10.1016/s0076-6879(96)66032-7. [DOI] [PubMed] [Google Scholar]
  29. MySQL. 2015 http://dev.mysql.com.
  30. Newmark PA, Wang Y, Chong T. Germ cell specification and regeneration in planarians. Cold Spring Harb Symp Quant Biol. 2008;73:573–581. doi: 10.1101/sqb.2008.73.022. [DOI] [PubMed] [Google Scholar]
  31. Pearson BJ, Sánchez Alvarado A. A planarian p53 homolog regulates proliferation and self-renewal in adult stem cell lineages. Development. 2010;137:213–221. doi: 10.1242/dev.044297. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Pellettieri J, Fitzgerald P, Watanabe S, Mancuso J, Green DR, Sánchez Alvarado A. Cell death and tissue remodeling in planarian regeneration. Dev Biol. 2010;338:76–85. doi: 10.1016/j.ydbio.2009.09.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Petersen TN, Brunak S, von Heijne G, Nielsen H. SignalP 4.0: discriminating signal peptides from transmembrane regions. Nat Methods. 2011;8:785–786. doi: 10.1038/nmeth.1701. [DOI] [PubMed] [Google Scholar]
  34. Reddien PW, Bermange AL, Murfitt KJ, Jennings JR, Sánchez Alvarado A. Identification of genes needed for regeneration, stem cell function, and tissue homeostasis by systematic gene perturbation in planaria. Dev Cell. 2005;8:635–649. doi: 10.1016/j.devcel.2005.02.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Robb SM, Ross E, Sánchez Alvarado A. SmedGD: the Schmidtea mediterranea genome database. Nucleic Acids Res. 2008;36:D599–D606. doi: 10.1093/nar/gkm684. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Roberts-Galbraith RH, Newmark PA. Follistatin antagonizes activin signaling and acts with notum to direct planarian head regeneration. Proc Natl Acad Sci U S A. 2013;110:1363–1368. doi: 10.1073/pnas.1214053110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Rossi A, Ross EJ, Jack A, Sánchez Alvarado A. Molecular cloning and characterization of SL3: a stem cell-specific SL RNA from the planarian Schmidtea mediterranea. Gene. 2014;533:156–167. doi: 10.1016/j.gene.2013.09.101. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Rouhana L, Vieira AP, Roberts-Galbraith RH, Newmark PA. PRMT5 and the role of symmetrical dimethylarginine in chromatoid bodies of planarian stem cells. Development. 2012;139:1083–1094. doi: 10.1242/dev.076182. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Rouhana L, Weiss JA, King RS, Newmark PA. PIWI homologs mediate histone H4 mRNA localization to planarian chromatoid bodies. Development. 2014;141:2592–2601. doi: 10.1242/dev.101618. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Sánchez Alvarado A. Learning about loss. Elife. 2013a;2:e00533. doi: 10.7554/eLife.00533. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Sánchez Alvarado A. On the trail of a tropical disease. Elife. 2013b;2:e01115. doi: 10.7554/eLife.01115. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Sánchez Alvarado A. Unravelling a can of worms. Elife. 2015;4 doi: 10.7554/eLife.07431. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. seqclean. 2011 http://sourceforge.net/projects/seqclean/
  44. Sikes JM, Newmark PA. Restoration of anterior regeneration in a planarian with limited regenerative ability. Nature. 2013;500:77–80. doi: 10.1038/nature12403. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Srivastava M, Mazza-Curll KL, van Wolfswinkel JC, Reddien PW. Whole-body acoel regeneration is controlled by Wnt and Bmp-Admp signaling. Curr Biol. 2014;24:1107–1113. doi: 10.1016/j.cub.2014.03.042. [DOI] [PubMed] [Google Scholar]
  46. Stajich JE. An Introduction to BioPerl. Methods Mol Biol. 2007;406:535–548. doi: 10.1007/978-1-59745-535-0_26. [DOI] [PubMed] [Google Scholar]
  47. Stein LD. Using GBrowse 2.0 to visualize and share next-generation sequence data. Brief Bioinform. 2013;14:162–171. doi: 10.1093/bib/bbt001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. sydcode. 2014 pubmed-posts https://wordpress.org/plugins/pubmed-posts/
  49. Umesono Y, Tasaki J, Nishimura Y, Hrouda M, Kawaguchi E, Yazawa S, Nishimura O, Hosoda K, Inoue T, Agata K. The molecular logic for planarian regeneration along the anterior-posterior axis. Nature. 2013;500:73–76. doi: 10.1038/nature12359. [DOI] [PubMed] [Google Scholar]
  50. UniProt C. UniProt: a hub for protein information. Nucleic Acids Res. 2015;43:D204–D212. doi: 10.1093/nar/gku989. [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. van Wolfswinkel JC, Wagner DE, Reddien PW. Single-cell analysis reveals functionally distinct classes within the planarian stem cell compartment. Cell Stem Cell. 2014;15:326–339. doi: 10.1016/j.stem.2014.06.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Wang B, Collins JJ, 3rd, Newmark PA. Functional genomic characterization of neoblast-like stem cells in larval Schistosoma mansoni. Elife. 2013;2:e00768. doi: 10.7554/eLife.00768. [DOI] [PMC free article] [PubMed] [Google Scholar]
  53. Wang Y, Stary JM, Wilhelm JE, Newmark PA. A functional genomic screen in planarians identifies novel regulators of germ cell development. Genes Dev. 2010;24:2081–2092. doi: 10.1101/gad.1951010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  54. Wang Y, Zayas RM, Guo T, Newmark PA. nanos function is essential for development and regeneration of planarian germ cells. Proc Natl Acad Sci U S A. 2007;104:5901–5906. doi: 10.1073/pnas.0609708104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  55. WordPress. 2015 https://wordpress.org/
  56. Xiang Y, Miller DE, Ross EJ, Sánchez Alvarado A, Hawley RS. Synaptonemal complex extension from clustered telomeres mediates full-length chromosome pairing in Schmidtea mediterranea. Proc Natl Acad Sci U S A. 2014;111:E5159–E5168. doi: 10.1073/pnas.1420287111. [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES