Highlights
-
•
The metabolome allows accessing the external influences under which an organism exists and develops in a dynamic way.
-
•
Recent years have seen the establishment of a global network for metabolomics data exchange.
-
•
Global metabolomics data exchange is leading to an exponential growth of publically available metabolomics data for re-analysis.
Abstract
Chemical Biology employs chemical synthesis, analytical chemistry and other tools to study biological systems. Recent advances in both molecular biology such as next generation sequencing (NGS) have led to unprecedented insights towards the evolution of organisms’ biochemical repertoires. Because of the specific data sharing culture in Genomics, genomes from all kingdoms of life become readily available for further analysis by other researchers. While the genome expresses the potential of an organism to adapt to external influences, the Metabolome presents a molecular phenotype that allows us to asses the external influences under which an organism exists and develops in a dynamic way. Steady advancements in instrumentation towards high-throughput and highresolution methods have led to a revival of analytical chemistry methods for the measurement and analysis of the metabolome of organisms. This steady growth of metabolomics as a field is leading to a similar accumulation of big data across laboratories worldwide as can be observed in all of the other omics areas. This calls for the development of methods and technologies for handling and dealing with such large datasets, for efficiently distributing them and for enabling re-analysis. Here we describe the recently emerging ecosystem of global open-access databases and data exchange efforts between them, as well as the foundations and obstacles that enable or prevent the data sharing and reanalysis of this data.
Current Opinion in Chemical Biology 2017, 36:58–63
This review comes from a themed issue on Omics
Edited by Frank C Schroeder and Georg Pohnert
For a complete overview see the Issue and the Editorial
Available online 13th January 2017
http://dx.doi.org/10.1016/j.cbpa.2016.12.022
1367-5931/Published by Elsevier Ltd.
Introduction
Chemical Biology employs chemical synthesis, analytical chemistry and other tools to study biological systems. Recent advances in both molecular biology such as next generation sequencing (NGS) have led to unprecedented insights towards the evolution of organisms’ biochemical repertoires. Because of the specific data sharing culture in Genomics, genomes from all kingdoms of life become readily available for further analysis by other researchers.
While the genome expresses the potential of an organism to adapt to external influences, the metabolome presents a molecular phenotype that allows us to asses the external influences under which an organism exists and develops in a dynamic way. Those external influences and stimuli are often subsumed under the term Exposome [1]. The metabolome, of course, is complemented in this respect by other molecular phenotypes like those characterised by the products of differential gene expression accessible by RNA sequencing techniques [2].
Steady advancements in instrumentation towards high-throughput and high-resolution methods have led to a revival of analytical chemistry methods for the measurement and analysis of the metabolome of organisms. Figure 1 demonstrates the steady growth of reported interest in the metabolome through a simple bibliometric analysis on Google Scholar. This steady growth of metabolomics as a field is leading to a similar accumulation of big data across laboratories worldwide as can be observed in all of the other omics areas. This calls for the development of methods and technologies for handling and dealing with such large datasets, for efficiently distributing them and for enabling the re-analysis.
In the following we will describe the recently emerging ecosystem of global open-access databases and data exchange efforts between them, as well as the foundations and obstacles that enable or prevent the data sharing and re-analysis of this data.
The virtues of data sharing in science
Without progressing into a treatise on the scientific method [3], open data sharing, as well as sharing of open source code and open access to articles, enables scientific peers to reproduce findings reported by a scientist or a group of scientists without barriers. This is important because controlled and/or closed access limits this to specific groups, potentially skewing the efficiency and objectivity of the scientific methods. Learned Societies, funders, some publishers and, in principle, a good portion of the scientific community agree on the importance of data sharing for the advancement of science. This is exemplified by documents such as the Berlin Declaration on Open Access to Knowledge in the Sciences and Humanities [4], which was preceded and followed by many similar texts.
More and more wide-spread acceptance of these principles has led to the creation of a number of organisations and movements to promote the open access to knowledge, information and data, such as the Open Knowledge Foundation [5], the Research Data Alliance (RDA) [6], the Global Alliance for Genomics and Health [7] and more. The virtues of data sharing are at the heart of the scientific method. A scientific publication is indeed not scholarship in itself, but merely an ‘advertisement of scholarship’ [8], whereas the full collection of scientific protocols, materials (this is difficult of course) and underlying data allows peers to assess the validity of the scholarly finding and underlying methods. Furthermore, a large collection of research data on a particular technique or subject lends itself to Meta-analysis, which consists of a set of statistical techniques to combine results from several studies. This may reveals insights that could not have been deduced from a single or only a few datasets but of course also poses questions about reproducibility and comparability caused by different experimental designs [9].
For most of the history of modern science, the sharing of data was done on request by researcher Alice to another scientist Bob who produced the data—with all its social implications. Such requests could be ignored and or access selectively granted, based on Alice’s standing with Bob. The emergence of the internet has the power to remove those barriers, but imposes new challenges. The development of tools and resources for publishing open data is increasingly important and relevant. Increasingly large, heterogeneous, and complex datasets require extra effort for storing, exchanging, and making sense of data. Initiatives to develop these tools and standards are driven by a range of international collaborations, government initiatives, institutions, and local communities. In the major omics areas like genomics, proteomics, and metabolomics, primary research data is being collected in centralised repositories maintained by specialised institutions such as the European Bioinformatics Institute (EMBL-EBI) [10] and the National Center for Biotechnology Information (NCBI) [11]. These institutions have been equipped with a mandate to support data repositories over longer periods of time, outside the usual 3- or 5-year funding cycles. Genomics researchers established guidelines for the deposition of sequence data in 1996 with the creation of the Bermuda Principles [12]. In metabolomics, we have now laid the foundations following on the steps of these pioneering efforts. Global and long-term supported databases exist as well as minimum information standards and procedures for data dissemination [45].
Global data management in metabolomics
Very few application and domain-specific databases to capture and disseminate primary data in metabolomics have arisen in 90s [13, 14], followed by the establishment of a first round of standardisation efforts by the Metabolomics Standards Initiative (MSI) [15]. Those are complemented by reference databases with information on chemical structures, physicochemical properties, biological functions, pathway network, and most importantly, reference spectral data. They can be classified into pathway-centric and compound-centric databases [16]. Examples for a pathway-centric most commonly used in metabolomics are: KEGG [17], Biocyc [18], Reactome [19], Wikipathways [20]. Examples for compound-centric databases are BMRB [21], ChEBI [22], ChemSpider [23], GMD [13], HMDB [24], MassBank [25], METLIN [26], NIST [27], and PubChem [28]. Compound-centric resources may contain spectral data. In metabolomics, references compounds are often used for metabolite identification by matching NMR resonance or mass spectral features to those of an unknown compound.
Databases
In the 1990s, global efforts to exchange genomic information [29, 30] arose which eventually evolved into the most liberal model of freely sharing and exchanging data. This led to an unprecedented wave of bioinformatics and biomedical research enabled through the open availability of a growing number of genomes across all kingdoms of life, which still continues to flourish today. It also paved the way for similar efforts in proteomics [31] and gene expression data [32]. In 2012, the European Bioinformatics Institute (EMBL-EBI) launched the MetaboLights database, the first general purpose, cross-species, cross-application database in metabolomics with the aim to enable a similar blossom in this remaining large pillar of omics sciences [33]. In the first two years after its inception, MetaboLights became the fastest growing data repository at the EMBL-EBI in term of data volume (see Figure 2). When the NIH recognised the importance of metabolomics for biomedical research by funding a set of Regional Comprehensive Metabolomics Resource Cores (RCMRC) across the USA, they also decided to invest in a US-based sister repository for MetaboLights, the Metabolomics Workbench [34]. This follows a well-established and −accepted model from genomics and other biomolecular data types of establishing sister repositories in major geographic regions of the world. Those repositories typically collaborate on the data maintenance and data exchange but compete in the way the data presented to their users.
MetaboLights
The MetaboLights database and repository was the first cross-species, general purpose repository for metabolomics data. Launched in 2012 by the European Bioinformatics Institute (EMBL-EBI) [35], it has seen steady growth in number of submissions, with each submission currently averaging about 20 GB per study, accumulating to about 4 TB of data in May 2016. It covers metabolite structures and their reference spectra, as well as the biological roles, locations, concentrations and experimental data from metabolic experiments. MetaboLights includes user submission tools, and incorporates de-facto standard formats for encoded spectral and chromatographic data, associated information about chemical structures, and metadata for describing assays and studies as a whole. Studies submitted to MetaboLights are manually curated and improved, if necessary, in collaboration with the submitters [36].
Many funders now require data arising from publicly funded organizations to be made freely accessible. The experimental data that scientists submit to MetaboLights have been used to justify findings in scientific studies and to verify experimental methods in peer-reviewed publications. Journal recommend or require the deposition of data in MetaboLights or its sister databases. They therefore play an important role in enabling the transparent reproduction and re-use of metabolomics results. MetaboLights is now the fastest growing repository at the EMBL-EBI, with a 3-month doubling time (see Figure 2)
Figure 3 shows the coverage of species and experimental techniques in MetaboLights. For the core model species in metabolomics, the amount of data is becoming sufficiently close for meta-analyses, but no such studies have been published to far.
The metabolomics workbench
The Metabolomics Workbench serves as a national and international repository for metabolomics data and metadata, and also includes data analysis tools and access to metabolite standards, protocols, tutorials, and training. The database was funded by the National Institutes of Health (NIH) Metabolomics Common Fund, with the aim to increase US national capacity in metabolomics: by supporting the development of next generation technologies, providing training, enhancing the availability of high quality reference standards, and promoting data sharing and collaboration [34].
The Metabolomics Workbench acts as a North American hub for the metabolomics related research carried out at each of the six Regional Comprehensive Metabolomics Research Cores (RCMRC). All metabolomics research carried out at these centers and funded by the NIH Metabolomics Common Fund must be made publically available via the Metabolomics Workbench. The emerging network of global and long-term supported metabolomics data repositories triggered the need for a global service to discover the metabolomics data sets regardless of which database they are actually located in.
MetabolomeXchange
MetabolomeXchange aggregates data from three different data providers—MetaboLights, Metabolomics Workbench and Metabolomic Repository Bordeaux—which together make up the MetabolomeXchange Consortium http://www.metabolomexchange.org/. The goal of MetabolomeXchange is to increase the accessibility of and awareness about newly released, publicly available metabolomics datasets from verified members of the Consortium. MetabolomeXchange aims to provide a network of stable and coordinated metabolomics data, while also assuring that both the scientific community and the commercial user community have access to high-quality reference data. The data “exchanged” through MetabolomeXchange consists of both experimental data and metadata for individual metabolites and metabolomic profiles.
MetabolomeXchange enables researchers to submit data either by submitting to the existing data repositories within the MetabolomeXchange Consortium, or by becoming a data provider and member of the consortium. MetabolomeXchange was launched in 2014, and is coordinated by the EMBL-EBI. It is an outcome of the European-Commission-funded Coordination of Standards in Metabolomics (COSMOS) project [38], which ran from 2012 to 2015, and gathered European metabolomics data providers to establish and promote community standards for metabolomics data and experiments [37]. MetabolomeXchange is modelled on the ProteomeXchange [31], a consortium established in 2012 to provide a coordinated submission of mass-spectrometry proteomics data to the main existing proteomics repositories, and to encourage optimal data dissemination. At the time of writing (December 2016) more than 540 datasets where publicly available on MetabolomeXchange.org.
Data sharing needs standards
In order to enable both the re-use of data as well as its barrier-free exchange, data and meta-data stored in public repositories such as Metabolomics Workbench or MetaboLights need to be encoded using community-agreed standards [37]. A first round of standardisation efforts in Metabolomics was achieved by the Metabolomics Standards Initiative (MSI) [15]. Around the year 2006, the MSI published documents about the Core Information for Metabolomics Reporting (CIMR). CIMR recommendations were published in the areas of In Vivio/Mammalian Biology, Plant Biology, In Vitro Biology/Microbiology as well as Environmental Analysis. Those documents are accessible via http://www.metabolomics-MSI.org.
When MetaboLights appeared in 2012, and later Metabolomics Workbench, the field had advanced by six years with new instrumentation and changing protocols. New open data standards had emerged and others were missing. This led to the foundation of the COSMOS initiative for the Coordination of Standards in Metabolomics [38]. Apart from reviving the interest in data and meta-data standards in metabolomics and providing a platform for discussions, COSMOS set out to develop missing open data formats and promote the use of data formats such as mzML [39] and mzTab [40], which had been developed by the proteomics community and could be applied to metabolomics with moderate effort. The recommendations of the MSI on which data to report is nowadays backed by a rich set of ontologies and controlled vocabularies which help researcher speak a common language and to avoid naming diversity through different conventions in different laboratories or communities [41].
To structure data captured according to MI standards and backed by ontologies, the ISA-TAB format [42] and related ecosystem of tools [43] has emerged as a quasi-standards. ISA stands for Investigation-Study-Assay—the typical hierarchical organisation of a biological study. ISA-TAB is a tabular format to hold data in a spreadsheet-like way, in addition offering support for ontologies and much more. Databases like MetaboLights support uploading of study information in ISA-TAB format.
The field of metabolomics continues to evolve new data standards and methods as it progresses. Recently, for example, a hashed identifier for mass spectra, SPLASH, was published, which improves the exchange of mass spectra and allows for the determination of provenance and duplicate detection [44].
Conclusion
Publishers, funders and learned societies more and more require the open availability of research data and resulting publications. Foundations have been laid to enable the global sharing and long term preservation of research data in metabolomics, following in the footsteps of the other large pillars of biomolecular data science. Deposition of research data in MetaboLights or Metabolomics Workbench will be easier for those laboratories with a structured internal approach to capturing and storing experimental data. In addition to the minimum information standards and data formats to encode primary research data and their meta-data, an ecosystem of tools exists to support the assembly and uploading of the information. Metabolomics data volume in public repositories is growing exponentially and will enable meta- and re-analyses previously not possible.
Acknowledgements
The authors acknowledge funding for the MetaboLights database (BBSRC Grants BB/I000933/1, BB/L024152/1), the COSMOS project (European Commission Framework 7 grant EC312941) and PhenoMeNal (European Commission H2020 grant EC654241).
References
- 1.Buck Louis G.M., Sundaram R. Exposome: time for transformative research. Stat Med. 2012;31:2569–2575. doi: 10.1002/sim.5496. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Collins J.E., Wali N., Sealy I.M., Morris J.A., White R.J., Leonard S.R., Jackson D.K., Jones M.C., Smerdon N.C., Zamora J. High-throughput and quantitative genome-wide messenger RNA sequencing for molecular phenotyping. BMC Genomics. 2015;16:578. doi: 10.1186/s12864-015-1788-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Wikipedia contributors: The Scientific Method [Internet]. Wikipedia, The Free Encyclopedia [date unknown], [no volume].
- 4.Wikipedia contributors: Berlin Declaration on Open Access to Knowledge in the Sciences and Humanities [Internet]. Wikipedia, The Free Encyclopedia [date unknown], [no volume].
- 5.Molloy J.C. The Open Knowledge Foundation: open data means better science. PLoS Biol. 2011;9:e1001195. doi: 10.1371/journal.pbio.1001195. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Treloar A. The Research Data Alliance: globally co-ordinated action against barriers to data publishing and sharing. Learn Publ. 2014;27:S9–S13. [Google Scholar]
- 7.Global Alliance for Genomics and Health [Internet]. Global Alliance for Genomics and Health [date unknown], [no volume].
- 8.Wikipedia contributors. Data sharing [Internet]. Wikipedia, The Free Encyclopedia 2016, [no volume].
- 9.Hong F., Breitling R. A comparison of meta-analysis methods for detecting differentially expressed genes in microarray experiments. Bioinformatics. 2008;24:374–382. doi: 10.1093/bioinformatics/btm620. [DOI] [PubMed] [Google Scholar]
- 10.Cook C.E., Bergman M.T., Finn R.D., Cochrane G., Birney E., Apweiler R. The European Bioinformatics Institute in 2016: data growth and integration. Nucleic Acids Res. 2016;44:D20–6. doi: 10.1093/nar/gkv1352. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Haines J.L., Korf B.R., Morton C.C., Seidman C.E., Seidman J.G., Smith D.R., editors. Current Protocols in Human Genetics. John Wiley & Sons, Inc.; 2001. Searching the NCBI Databases Using Entrez. [Google Scholar]
- 12.Schofield P.N., Bubela T., Weaver T., Portilla L., Brown S.D., Hancock J.M., Einhorn D., Tocchini-Valentini G., Hrabe de Angelis M., Rosenthal N. Post-publication sharing of data and tools. Nature. 2009;461:171–173. doi: 10.1038/461171a. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Kopka J., Schauer N., Krueger S., Birkemeyer C., Usadel B., Bergmuller E., Dormann P., Weckwerth W., Gibon Y., Stitt M. GMD@CSB.DB: the Golm Metabolome Database. Bioinformatics. 2005;21:1635–1638. doi: 10.1093/bioinformatics/bti236. [DOI] [PubMed] [Google Scholar]
- 14.Deborde C., Jacob D. MeRy-B, a metabolomic database and knowledge base for exploring plant primary metabolism. Methods Mol Biol. 2014;1083:3–16. doi: 10.1007/978-1-62703-661-0_1. [DOI] [PubMed] [Google Scholar]
- 15.Sansone S.-A., Fan T., Goodacre R., Griffin J.L., Hardy N.W., Kaddurah-Daouk R., Kristal B.S., Lindon J., Mendes P., Morrison N. The Metabolomics Standards Initiative. Nat Biotechnol. 2007;25:846–848. doi: 10.1038/nbt0807-846b. [DOI] [PubMed] [Google Scholar]
- 16.Fiehn O., Barupal D.K., Kind T. Extending Biochemical Databases by Metabolomic Surveys. J Biol Chem. 2011;286:23637–23643. doi: 10.1074/jbc.R110.173617. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Kanehisa M., Goto S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 2000;28:27–30. doi: 10.1093/nar/28.1.27. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Karp P.D., Ouzounis C.A., Moore-Kochlacs C., Goldovsky L., Kaipa P., Ahrén D., Tsoka S., Darzentas N., Kunin V., López-Bigas N. Expansion of the BioCyc collection of pathway/genome databases to 160 genomes. Nucleic Acids Res. 2005;33:6083–6089. doi: 10.1093/nar/gki892. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Joshi-Tope G., Gillespie M., Vastrik I., D’Eustachio P., Schmidt E., de Bono B., Jassal B., Gopinath G.R., Wu G.R., Matthews L. Reactome: a knowledge base of biological pathways. Nucleic Acids Res. 2005;33:D428–32. doi: 10.1093/nar/gki072. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Pico A.R., Kelder T., van Iersel M.P., Hanspers K., Conklin B.R., Evelo C. WikiPathways: pathway editing for the people. PLoS Biol. 2008;6:e184. doi: 10.1371/journal.pbio.0060184. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Ulrich E.L., Akutsu H., Doreleijers J.F., Harano Y., Ioannidis Y.E., Lin J., Livny M., Mading S., Maziuk D., Miller Z. BioMagResBank. Nucleic Acids Res. 2008;36:D402–8. doi: 10.1093/nar/gkm957. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Hastings J., De Matos P., Dekker A., Ennis M., Harsha B., Kale N., Muthukrishnan V., Owen G., Turner S., Williams M. The ChEBI reference database and ontology for biologically relevant chemistry: enhancements for 2013. Nucleic Acids Res. 2013;41:D456–63. doi: 10.1093/nar/gks1146. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Pence H.E., Williams A. ChemSpider: an online chemical information resource. J Chem Educ. 2010;87:1123–1124. [Google Scholar]
- 24.Wishart D.S., Tzur D., Knox C., Eisner R., Guo A.C., Young N., Cheng D., Jewell K., Arndt D., Sawhney S. HMDB: the human metabolome database. Nucleic Acids Res. 2007;35:D521–D526. doi: 10.1093/nar/gkl923. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Horai H., Arita M., Kanaya S., Nihei Y., Ikeda T., Suwa K., Ojima Y., Tanaka K., Tanaka S., Aoshima K. MassBank: a public repository for sharing mass spectral data for life sciences. J Mass Spectrom. 2010;45:703–714. doi: 10.1002/jms.1777. [DOI] [PubMed] [Google Scholar]
- 26.Smith C.A., O’Maille G., Want E.J., Qin C., Trauger S.A., Brandon T.R., Custodio D.E., Abagyan R., Siuzdak G. METLIN: a metabolite mass spectral database. Ther Drug Monit. 2005;27:747–751. doi: 10.1097/01.ftd.0000179845.53213.39. [DOI] [PubMed] [Google Scholar]
- 27.Stein S.E. Chemical substructure identification by mass spectral library searching. J Am Soc Mass Spectrom. 1995;6:644–655. doi: 10.1016/1044-0305(95)00291-K. [DOI] [PubMed] [Google Scholar]
- 28.Wang Y., Xiao J., Suzek T.O., Zhang J., Wang J., Bryant S.H. PubChem: a public information system for analyzing bioactivities of small molecules. Nucleic Acids Res. 2009;37:W623–33. doi: 10.1093/nar/gkp456. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Benson D.A., Karsch-Mizrachi I., Lipman D.J., Ostell J., Sayers E.W. GenBank. Nucleic Acids Res. 2009;37:D26–31. doi: 10.1093/nar/gkn723. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Silvester N., Alako B., Amid C., Cerdeño-Tárraga A., Cleland I., Gibson R., Goodgame N., Ten Hoopen P., Kay S., Leinonen R. Content discovery and retrieval services at the European Nucleotide Archive. Nucleic Acids Res. 2015;43:D23–9. doi: 10.1093/nar/gku1129. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Vizcaíno J.A., Deutsch E.W., Wang R., Csordas A., Reisinger F., Ríos D., Dianes J.A., Sun Z., Farrah T., Bandeira N. ProteomeXchange provides globally coordinated proteomics data submission and dissemination. Nat Biotechnol. 2014;32:223–226. doi: 10.1038/nbt.2839. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Kolesnikov N., Hastings E., Keays M., Melnichuk O., Tang Y.A., Williams E., Dylag M., Kurbatova N., Brandizi M., Burdett T. ArrayExpress update—simplifying data submissions [internet] Nucleic Acids Res. 2014 doi: 10.1093/nar/gku1057. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Steinbeck C., Conesa P., Haug K., Mahendraker T., Williams M., Maguire E., Rocca-Serra P., Sansone S.-A., Salek R.M., Griffin J.L. MetaboLights: towards a new COSMOS of metabolomics data management. Metabolomics. 2012;8:757–760. doi: 10.1007/s11306-012-0462-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Sud M., Fahy E., Cotter D., Azam K., Vadivelu I., Burant C., Edison A., Fiehn O., Higashi R., Nair K.S. Metabolomics Workbench: an international repository for metabolomics data and metadata, metabolite standards, protocols, tutorials and training, and analysis tools. Nucleic Acids Res. 2016;44:D463–70. doi: 10.1093/nar/gkv1042. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Haug K., Salek R.M., Conesa P., Hastings J., De Matos P., Rijnbeek M., Mahendraker T., Williams M., Neumann S., Rocca-Serra P. MetaboLights—an open-access general-purpose repository for metabolomics studies and associated meta-data. Nucleic Acids Res. 2013;41:D781–D786. doi: 10.1093/nar/gks1004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Salek R.M., Haug K., Conesa P., Hastings J., Williams M., Mahendraker T., Maguire E., González-Beltrán A.N., Rocca-Serra P., Sansone S.-A. The MetaboLights repository: curation challenges in metabolomics. Database. 2013;2013:bat029. doi: 10.1093/database/bat029. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Rocca-Serra P., Salek R.M., Arita M., Correa E., Dayalan S., González-Beltrán A., Ebbels T., Goodacre R., Hastings J., Haug K. Data standards can boost metabolomics research, and if there is a will, there is a way. Metabolomics. 2015;12:1–13. doi: 10.1007/s11306-015-0879-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Salek R.M., Neumann S., Schober D., Hummel J., Billiau K., Kopka J., Correa E., Reijmers T., Rosato A., Tenori L. COordination of Standards in MetabOlomicS (COSMOS): facilitating integrated metabolomics data access. Metabolomics. 2015 doi: 10.1007/s11306-015-0810-y. [no volume] [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Martens L., Chambers M., Sturm M., Kessner D., Levander F., Shofstahl J., Tang W.H., Römpp A., Neumann S., Pizarro A.D. mzML – a Community Standard for Mass Spectrometry Data. Mol Cell Proteomics. 2011 doi: 10.1074/mcp.R110.000133. 10:R110.000133-R110.000133. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Griss J., Jones A.R., Sachsenberg T., Walzer M., Gatto L., Hartler J., Thallinger G.G., Salek R.M., Steinbeck C., Neuhauser N. The mzTab data exchange format: communicating mass-spectrometry-based proteomics and metabolomics experimental results to a wider audience. Mol Cell Proteomics. 2014;13:2765–2775. doi: 10.1074/mcp.O113.036681. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Courtot M., Juty N., Knüpfer C., Waltemath D., Zhukova A., Dräger A., Dumontier M., Finney A., Golebiewski M., Hastings J. Controlled vocabularies and semantics in systems biology. Mol Syst Biol. 2011;7:543. doi: 10.1038/msb.2011.77. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Sansone S.-A., Rocca-Serra P., Field D., Maguire E., Taylor C., Hofmann O., Fang H., Neumann S., Tong W., Amaral-Zettler L. Toward interoperable bioscience data. Nat Genet. 2012;44:121–126. doi: 10.1038/ng.1054. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Rocca-Serra P., Brandizi M., Maguire E., Sklyar N., Taylor C., Begley K., Field D., Harris S., Hide W., Hofmann O. ISA software suite: supporting standards-compliant experimental annotation and enabling curation at the community level. Bioinformatics. 2010;26:2354–2356. doi: 10.1093/bioinformatics/btq415. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Wohlgemuth G., Mehta S.S., Mejia R.F., Neumann S., Pedrosa D., Pluskal T., Schymanski E.L., Willighagen E.L., Wilson M., Wishart D.S. SPLASH, a hashed identifier for mass spectra. Nat Biotechnol. 2016;34:1099–1101. doi: 10.1038/nbt.3689. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Salek R.M., Steinbeck C., Viant M.R., Goodacre R., Dunn W.B. The role of reporting standards for metabolite annotation and identification in metabolomic studies. GigaScience. 2013;2:13. doi: 10.1186/2047-217X-2-13. [DOI] [PMC free article] [PubMed] [Google Scholar]