To the Editor
Peptidome (available at http://www.ncbi.nlm.nih.gov/peptidome/) is a new public resource that archives and freely distributes tandem mass spectrometry (MS) peptide and protein identification data generated by the scientific community (Fig. 1). The database is administered by the National Center for Biotechnology Information (NCBI; Bethesda, MD, USA) at the National Institutes of Health. The core structure of the Peptidome database is based on NCBI's Gene Expression Omnibus (GEO1,2), and uses much of the same source code for handling submissions and accessing the database. Peptidome, like GEO, was constructed to promote sharing and dissemination of experimental data at a level of detail that is useful to both the specialized community and to the general biological community.
Figure 1.
The homepage of Peptidome, NCBI's peptide data resource.
Great emphasis has been placed on making the deposit procedures as simple as possible, yet still supporting a high level of experimental annotation. Although data from all stages of the experiment are captured, the burden of submission is minimized by accepting standard file formats and using spreadsheet templates for metadata. This method was chosen to allow submissions to be generated either programmatically or by hand, using tools familiar to everyone, without the delay of waiting for the perfect file format and slick graphical user interface to be developed. As the field matures, so will the data submission process. A Peptidome submission includes the following components:
Biological and methodological metadata. This information gives context to the experimental data and facilitates understanding of the goals of the study, the material under investigation, the instrumentation employed and the protocols used to generate and analyze the data.
The original raw spectra data, converted to open format. Raw data includes all of the original MS1 and MS2 data before any post-processing, regardless of whether it was identified or not. The availability of raw data is critical for independent verification and validation of the reported study results. Furthermore, collections of raw data greatly benefit the MS community in wider applications, such as preparing custom spectral libraries, conducting large-scale statistical studies of peptide fragmentation and developing new peptide-matching algorithms.
Peptide identification output files. These files contain matched peptide sequence information derived from the observed spectra using any search algorithm.
Conclusion-level results. These include the final, filtered list of identified proteins, and the peptides leading to those assignments, as determined by the submitter. This finalized list has much flexibility in that it allows inclusion of identifications from new algorithms or de novo identifications. It also allows the submitter to provide complete results independent of the methodology used, for example, cutoff scores, reverse database searches for false positives or agreement between multiple search engines. Peptidome processing procedures augment these lists with auxiliary information extracted from the peptide identification files, including scores and post-translational modifications. This augmented and enriched list is a convenient mechanism to access the relevant data discussed in an associated manuscript.
Submissions are organized into studies and samples. A sample includes all results from the same biological source, regardless of the number of instrument runs required to collect the data. A study is a collection of samples related in some fashion, for example, one sample may be the control and another the diseased state, or each sample may be one slice of a time series. Each sample has its own list of identified proteins, peptides and spectra. All studies and samples are assigned unique and stable accession numbers that may be used to cite and retrieve the records. Data may remain private for a limited time, typically pending manuscript publication, and during this period submitters may grant journal reviewers anonymous access to their data. Public data may be browsed by studies or by samples, and each sample may be viewed either protein-centric or peptide-centric. The evidence for each peptide may also be explored down to the level of individual spectra. The current focus is on providing a complete view of all the data; additional functionality will be added in the future to aid in the analysis of the data, such as annotating spectra displays with theoretical fragment ions. In addition to displaying results through the website, all originally submitted files may be downloaded for offline analysis.
Peptidome incorporates several measures to increase the quality, value and utility of submitted data. One of the distinguishing features of Peptidome is the high level of human support and curation in the submission process intended to ensure sufficient metadata and completeness of the submission. In addition, an attempt is made to match submitted identified protein names to NCBI protein designations, which improves accessibility to the larger biological community; this allows the experimental results to be retrieved by an independent search for a given protein or gene locus, without a priori knowledge of the pertinent experiment. Users can locate studies and proteins of interest by querying NCBI's powerful Entrez search system with relevant keywords, protein names, accessions and more. Data navigation is further enhanced through inter-database links that reciprocally connect Peptidome to other NCBI resources such as PubMed, GenBank and Gene.
In the spirit of collaborative data-sharing, Peptidome is entering into data exchange agreements with established MS resources including PRIDE3, PeptideAtlas4, Tranche5 and other ProteomExchange members. We hope to expand this network in the future by collaborating with other, more specialized proteomic resources, such as Human ProteinPedia6. In spite of any apparent duplication of effort and design of these resources, we believe the community will benefit from alternative analysis or presentation styles and the cooperation of disparate repositories in data deposits and dissemination options.
The initial exemplar submissions in Peptidome have been culled from selected submissions to PeptideAtlas, with the metadata enriched from papers associated with the experiments. Sharing and linking between resources will maximize the visibility and utility of public MS peptide identification data. We invite the proteomics community to support and participate in these data-sharing endeavors7 by contributing to the Peptidome resource, and we welcome input and suggestions.
Acknowledgments
The authors would like to thank D. Rudnev, D. Troup, R. Muertter, A. Soboleva, M. Tomashevsky and O. Ayanbule for their aid in the development of this project. Advice and support was provided by L. Geer, S. Sechi, S. Markey and the rest of the NIMH Laboratory of Neurotoxicology. This research was supported by the Intramural Research Program of the National Library of Medicine and the Roadmap, National Institutes of Health.
References
- 1.Edgar R, Domrachev M, Lash AE. Nucleic Acids Res. 2002;30:207–210. doi: 10.1093/nar/30.1.207. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Barrett T, et al. Nucleic Acids Res. 2009;37:D885–D890. doi: 10.1093/nar/gkn764. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Jones P, et al. Nucleic Acids Res. 2008;36:D878–D883. doi: 10.1093/nar/gkm1021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Deutsch EW, Lam H, Aebersold R. EMBO Rep. 2008;9:429–434. doi: 10.1038/embor.2008.56. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Andrews PC. 56th ASMS Conference on Mass Spectrometry and Allied Topics, American Society for Mass Spectrometry; Denver, CO. 2008. [Google Scholar]
- 6.Mathivanan S, et al. Nat Biotechnol. 2008;26:164–167. doi: 10.1038/nbt0208-164. [DOI] [PubMed] [Google Scholar]
- 7.Anonymous. Nat Biotechnol. 2007;25:262. [Google Scholar]