Skip to main content
BioMed Research International logoLink to BioMed Research International
. 2014 Apr 9;2014:194812. doi: 10.1155/2014/194812

Tools and Databases of the KOMICS Web Portal for Preprocessing, Mining, and Dissemination of Metabolomics Data

Nozomu Sakurai 1,2,*, Takeshi Ara 1,2, Mitsuo Enomoto 1,2, Takeshi Motegi 1, Yoshihiko Morishita 1, Atsushi Kurabayashi 1, Yoko Iijima 1,3, Yoshiyuki Ogata 1,4, Daisuke Nakajima 1, Hideyuki Suzuki 1, Daisuke Shibata 1
PMCID: PMC4052814  PMID: 24949426

Abstract

A metabolome—the collection of comprehensive quantitative data on metabolites in an organism—has been increasingly utilized for applications such as data-intensive systems biology, disease diagnostics, biomarker discovery, and assessment of food quality. A considerable number of tools and databases have been developed to date for the analysis of data generated by various combinations of chromatography and mass spectrometry. We report here a web portal named KOMICS (The Kazusa Metabolomics Portal), where the tools and databases that we developed are available for free to academic users. KOMICS includes the tools and databases for preprocessing, mining, visualization, and publication of metabolomics data. Improvements in the annotation of unknown metabolites and dissemination of comprehensive metabolomic data are the primary aims behind the development of this portal. For this purpose, PowerGet and FragmentAlign include a manual curation function for the results of metabolite feature alignments. A metadata-specific wiki-based database, Metabolonote, functions as a hub of web resources related to the submitters' work. This feature is expected to increase citation of the submitters' work, thereby promoting data publication. As an example of the practical use of KOMICS, a workflow for a study on Jatropha curcas is presented. The tools and databases available at KOMICS should contribute to enhanced production, interpretation, and utilization of metabolomic Big Data.

1. Introduction

A metabolome, which comprises comprehensive data on quantification of metabolites in an organism calculated using metabolomic technologies [9, 10], has been increasingly used for the analysis and practical applications of biological and environmental systems. Within the data-intensive systems biology discipline, metabolomics is particularly important compared to other “omics” (genome, transcriptome, and proteome) disciplines since metabolomes are more closely related to phenotype and regulate gene and protein expression networks [1113]. Mass spectrometry (MS) and nuclear magnetic resonance spectroscopy (NMR) are complementary techniques often used for the detection and identification of metabolites. MS technology has integrated separation techniques and is used in most cases because of its sensitivity, selectivity, speed, and broad applicability [1416]. Owing to the wide range of chemical diversity, there is no ideal apparatus that is capable of analyzing all possible metabolites. Combinations of separation techniques with MS, such as liquid chromatography- (LC-) MS, gas chromatography- (GC-) MS, and capillary electrophoresis- (CE-) MS, are chosen according to a study's purpose [1719]. Metabolomics technology, including instrumental analysis, detection and identification of metabolites, statistical interpretation, and generation of hypotheses with computational support, is used for a variety of studies, such as functional analysis of biological systems [2022], biomarker discovery [23, 24], medical diagnostics [14, 25], quality assessment of foods [26, 27], evaluation of genetically modified crops [28, 29], and assessment of environmental pollution [30, 31].

A considerable number of software tools and databases have been developed for processing the complicated and multidimensional metabolome datasets generated by various types of MS-based instruments [3235]. A typical workflow of metabolomic data analysis includes the following processes: (a) preprocessing of raw data for extraction of metabolite features, annotation of the metabolites, and finally generation of metabolome data; (b) mining and visualization of metabolome data for statistical interpretation of its nature and hypothesis generation; (c) storing and disseminating the data for further utilization and comparison (Figure 1). XCMS2 [36], MzMine2 [37], MathDAMP [38], MetAlign [39], and MET-IDEA [40] are typical tools for preprocessing including detection, alignment, and annotation of metabolite features. Some of these tools also provide statistical analysis functions for data interpretation. MassBank [41], METLIN [42], PRIMe [43], and HMDB [44] are available as references of mass spectra for metabolite annotation. The metabolite data are interpreted by means of the genome information from compound databases such as KNApSAcK [45], PubChem [46], and Chemspider (http://www.chemspider.com/) and by means of metabolic pathway databases including KEGG [47], BioCyc [48], and Reactome [49], which enable data visualization on pathway maps. The raw and processed data are stored publicly in databases such as PlantMetabolomics.org [50], GMD@CSB.DB [51], SetupX (currently not available), MetabolomeExpress [52], MetaboLights [53], and Metabolomics Workbench (http://www.metabolomicsworkbench.org/).

Figure 1.

Figure 1

A typical workflow of a metabolomics study and KOMICS-relevant tools and databases. The process of data acquisition and preprocessing (a) is required for generating the metabolome data. A working hypothesis is generated by interpreting the metabolome data (b), and the cycle is completed after validating the hypotheses by further analyses (the arrow on the left side). The metabolome data are published in the databases (c) and utilized for preprocessing and data mining. The tools and databases introduced in the main text of this paper are shown in bold face.

We report here a portal website named KOMICS (The Kazusa Metabolomics Portal, http://www.kazusa.or.jp/komics/), which hosts tools and databases that we developed for metabolomics. Although an increasing number of tools and databases have become available, two major issues remain to be resolved, that is, comprehensiveness of metabolites [54, 55] and data dissemination [53, 56, 57]. Our primary aim in developing data preprocessing tools is to help researchers with the manual annotation process that remains essential for nontarget metabolomics [54]. PowerGet for LC-high-resolution-MS and FragmentAlign for GC-MS are tools that enable curation of peak alignment results. SpiceHit is a high-throughput metabolite identification tool for CE-MS analysis using the selected ion monitoring (SIM) method. We have also developed data mining and visualization tools for the generation of working hypotheses (KaPPA-View and RnR). Real data is indispensable for comparative analysis and for the development and improvement of preprocessing tools [53, 58]. MassBase is one of the largest raw data repositories, and KomicMarket is a database of metabolic profiling data. We developed a metadata-specific database, Metabolonote, to promote data publication by researchers. These resources for a wide range of metabolome data processing are expected to contribute to improved production and utilization of metabolomic data.

2. Materials and Methods

The standalone tools for metabolome data production, PowerGet, FragmentAlign, and SpiceHit, were developed in Java (Oracle Corporation). The web-based tools and databases were developed and are run in Apache, PHP, Perl, MySQL, Java, and Tomcat on Linux servers. The KOMICS website was constructed using the content management system “Joomla!” running on a Linux server with Apache, PHP, and MySQL. The details of the development and license information are described in the individual introduction pages of KOMICS, in manuals, or in other relevant help resources. The tools and databases are freely available to academic users.

Details of the analytical methods for the evaluation of preprocessing tools are described in the Supplementary Material (see Supplementary Material available online at http://dx.doi.org/10.1155/2014/194812).

3. Results and Discussion

The tools and databases we have developed and provided at the KOMICS web portal are classified into three categories according to the typical workflow of metabolomic data analysis, namely, (a) preprocessing tools, (b) data mining tools, and (c) databases for data dissemination (Figure 1). Here we describe several representative examples. All the currently available tools and databases are listed in Table 1. The number of records in each metabolomics-related database is shown in Table 2. The formats of input and output files and the availability of sample datasets are summarized in Table 3.

Table 1.

The tools and databases available at KOMICS (as of November 2013).

Name Description URL Reference
Standalone tools
PowerGet Metabolite detection, alignment, and annotation tool for LC-high-resolution-MS. http://www.kazusa.or.jp/komics/software/PowerGet
MatchedIonsFinder Revising tool for metabolite alignment results from LC-MS analyses. http://www.kazusa.or.jp/komics/software/MatchedIonsFinder [1]
FragmentAlign Metabolite alignment and annotation tool for GC-MS. http://www.kazusa.or.jp/komics/software/FragmentAlign
SpiceHit High-throughput metabolite detection and annotation tool for SIM analysis in CE-MS. http://www.kazusa.or.jp/komics/software/SpiceHit
KAGIANA Microsoft Excel-based tool for exploring the function of Arabidopsis genes. http://webs2.kazusa.or.jp/kagiana/ [2]
SokanProject A tool for calculating Pearson's correlation coefficients. http://www.kazusa.or.jp/komics/software/SokanProject

Web tools
MFSearcher Web service for rapid prediction of elemental composition and database searching by accurate mass values. http://webs2.kazusa.or.jp/mfsearcher/ [3]
DAGViz Visualization tool for similarities of gene ontology annotations. http://www.pgb.kazusa.or.jp/dagviz/ [4]

Databases
MassBase Largest repository of metabolomics raw data. http://webs2.kazusa.or.jp/massbase/
KomicMarket Sample-centric database for metabolomic profile data. http://webs2.kazusa.or.jp/komicmarket/
New KomicMarket temporary website Developmental version of KomicMarket. http://webs2.kazusa.or.jp/new_km_tmp/
KaPPA-View4 Pathway database for visualizing metabolome and transcriptome data. http://kpv.kazusa.or.jp/ [5]
Metabolonote Metadata-specific Semantic MediaWiki-based database. http://metabolonote.kazusa.or.jp/
MS-MS Fragment Viewer Database for MS/MS fragmentation data of 115 flavonoids. http://webs2.kazusa.or.jp/msmsfragmentviewer/
RnR Database providing metabolite-to-gene relationships calculated from ~200 transgenic Arabidopsis cells. http://webs2.kazusa.or.jp/kagiana/rnr
CoP Gene-to-gene coexpression database for 8 plant species calculated using the Confeito algorithm. http://webs2.kazusa.or.jp/kagiana/cop0911 [6]
KATANA Cross-search system for Arabidopsis genes. http://www.kazusa.or.jp/katana/ [7]
ARTRA Database of probe information of Arabidopsis DNA microarray data (developed by Takara Bio Inc.). http://artra.kazusa.or.jp/artra/ARI3_101/ [8]
FuLoja Database of Lotus japonicus full-length cDNA obtained in the NEDO project. http://webs2.kazusa.or.jp/IntegrationDBRS/FuLoja/
PMPj-Blast Database of ESTs, cDNAs, and oligo DNA microarray probes for Lotus japonicus and some other plants. http://webs2.kazusa.or.jp/IntegrationDBRS/pmpj-blast/

Table 2.

The number of records in metabolomics-related databases at KOMICS (as of November 2013).

Database name Number Description
MassBase 43959 Binary raw datasets
KomicMarket 85 Biological samples, including 251 instrumental analysis datasets
215 Chemical samples, including 488 instrumental analysis datasets
New KomicMarket temporary website 16 Number of studies, including 166 analyzed datasets
Metabolonote 34 Number of studies, including metadata for 375 instrumental analysis datasets and 765 computational analysis datasets
MS-MS Fragment Viewer 115 Analyzed flavonoids
RnR 194 Metabolite features

Table 3.

A summary of input and output file formats and availability of sample data for preprocessing tools. The precise formats are described in the instruction manuals for each tool.

Tool name Input Output Availability of sample data
PowerGet (1) PowerGet format (text file): MSGet tool is available for generating the text files from Xcalibur raw files
(2) MassBase SMS format (text file)
(3) mzXML file generated from the Xcalibur raw files using the ReAdW tool∗a
(1) PowerGet format (text file):
users can select the items and formats of the output file
(2) TogoMD format (text file)
KOMICS website

FragmentAlign Deconvoluted peak data
(1) FragmentAlign format (text file, one of the NIST formats)
(2) MassBase SMT format (text file)
(3) GMD∗b format (text file in NIST∗c MSP format)
(4).ELU file of AMDIS∗d software

Chromatogram data for deconvolution
CSV file exported by Pegasus III (text file)
FragmentAlign format (text file) KOMICS website
A sample file of a standard compound library is included in the tutorial data

SpiceHit Electropherogram data
(1) CSV file (text file)
(2) ChemStation  .MS file (binary)

Standard compound library
Excel file (binary)
(1) Tab-delimited text file
(2) Excel file
(1) KOMICS website
(2) Included in the tool
A sample of a standard compound library is included

∗bGMD: Golm Metabolome Database, http://gmd.mpimp-golm.mpg.de.

∗cNIST: National Institute of Standards and Technology, http://www.nist.gov/.

3.1. Data Preprocessing Tools

3.1.1. PowerGet

PowerGet is a standalone Java software package for detection, alignment, and annotation of metabolite features from data obtained using LC-high-resolution-MS (HRMS). Accurate mass values measured by HRMS, such as Fourier transform ion cyclotron resonance MS and Orbitrap MS (Thermo Fisher), allow users to predict the elemental composition of a metabolite. The intensity ratio of 13C to 12C isotopic ion peaks is useful for estimating the number of carbon atoms in a molecule. Estimation of ion adducts attached to the metabolites by coeluted ions is helpful for calculating elemental composition and for search of compound databases by mass values of nonionized molecules. The PowerFT module in PowerGet attaches these data automatically to all metabolite features in the LC-HRMS data. In the PowerMatch module, the metabolite features are aligned among the samples taking into account the similarity of MS/MS fragmentation patterns. A tool for refining the alignment results, MatchedIonsFinder [1], is also available via KOMICS.

To evaluate the accuracy of mass values of the peaks detected using PowerGet, the mass differences between a theoretical mass and a detected mass were compared to those of the peaks detected using the commercial software, Xcalibur (see Supplementary Method S1). PowerGet exhibited greater accuracy (0.579 ± 0.481 ppm (mean ± SD)) than Xcalibur (0.783 ± 0.563 ppm) in the evaluation of 143 standard compounds (Supplementary Table S1).

One of the unique functions of PowerGet is that the alignment results are manually editable: a user can promptly check metabolite's characteristics, such as mass chromatogram shape, existence of adjacent features, and MS/MS fragmentation patterns, by means of a graphical user interface (GUI), as shown in Figure 2. Alignment is essential for preparing matrices of samples to metabolite intensity data for further comparison and statistical analysis. Alignment is highly valuable when users need to annotate the metabolites, especially for unknown features. By comparing the features from several replicate samples, (1) the estimation error of the ion adducts is verified, (2) accuracy of mass measurement can be improved, and (3) reproducibly detected features are prioritized for further annotation. Therefore, alignment errors should be assessed and corrected during detailed annotation of unknown metabolites. PowerGet is utilized in preparing data for KomicMarket and Bio-MassBank (http://bio.massbank.jp/ ).

Figure 2.

Figure 2

The alignment-editing function of the PowerMatch module of PowerGet. (a) The Alignment Table shows the alignment results of the peaks detected in each sample. The intensity of peaks is summarized in another window (f). The details of the peak information (b), MS/MS fragments (c), appearance of peripheral peaks (d), and peak shape (e) are shown for the user-selected peaks. A misaligned peak (the blue colored cell in panel a) can be merged to an appropriate row using the Edit Alignment function (g), by immediately checking the detailed information for the peak (b–e).

3.1.2. FragmentAlign

This is a standalone Java tool designed for GC-MS data analysis with functions for alignment and annotation of metabolite features. A GUI for editing the alignment results is also implemented in this software (Figure 3). The similarity of fragment ion patterns generated by electron ionization (EI) is taken into account in the alignment of metabolite features. The metabolite features can also be annotated based on EI fragment patterns, by comparing to patterns from standard compounds. The fragment pattern data of standard compounds can be imported and utilized when the data is written in the format defined by the National Institute of Standards and Technology (NIST), USA.

Figure 3.

Figure 3

Screenshots of FragmentAlign. An alignment result from 3 samples is depicted (a). The electron ionization (EI) mode of a fragmentation pattern of the metabolite peak is presented in the Fragment Viewer panel (b). The metabolites are annotated by comparing the similarity of the fragment patterns to those obtained from standard compounds (c).

To evaluate the applicability of data matrices generated by FragmentAlign for further statistical analyses, a principal component analysis (PCA) was conducted using the GC-MS data obtained from 3 biological sources: Arabidopsis leaves, Lotus japonicus leaves, and Arabidopsis cultured cells. Five replicates of each source were mapped to similar positions, whereas the 3 sources were mapped separately from one another on the score plot (Supplementary Figure S1). High-correlation coefficients for peak intensity within the replicates were observed (Supplementary Figure S2). These results suggest that appropriate feature extraction and generation of data matrices can be performed successfully using FragmentAlign.

3.1.3. SpiceHit

The standalone Java tool SpiceHit is intended for high-throughput identification of metabolite features detected using the selected ion monitoring (SIM) method in CE-MS. The metabolite features are identified based on migration times relative to internal standard compounds and are compared to those of the standard compound library prepared in-house. The tool is designed for processing a large number of data files in a high-throughput manner; it requires checking and correcting the assignment errors manually.

To ascertain whether SpiceHit is applicable to practical data analysis, the accuracy of peak quantification was compared to that acquired using the commercial software ChemStation (Agilent Technologies, Palo Alto, CA). In the detection of amino acids, the results from SpiceHit were strongly correlated with those from ChemStation, as well as with theoretical concentrations (Supplementary Table S4). Similar relative standard deviation (RSD) values for each amino acid in triplicate analyses were observed for SpiceHit and ChemStation (Supplementary Figure S3). Good linearity of peak areas common to SpiceHit and ChemStation was observed in the amino acid peaks detected in the biological samples (Supplementary Figure S4). These results suggest that the accuracy and the sensitivity of peak quantification by SpiceHit are similar to those of ChemStation and that SpiceHit is suitable for practical use.

3.1.4. MFSearcher

This is a web service that allows for rapid prediction of elemental composition from accurate mass values and for rapid searching of compound databases [3]. A GUI tool for MFSearcher queries is also provided as a module in PowerGet. PowerGet has a batch search function for querying thousands of detected metabolite features via MFSearcher.

Because MFSearcher is a RESTful web service, the query parameters for MFSearcher should be included in the description of a URL. Numerous sample queries are available as URL links on the MFSearcher website.

3.2. Data Mining Tools

3.2.1. KaPPA-View

This is a web-based tool for the visualization of metabolomic data on metabolic pathway maps from the Kyoto Encyclopedia of Genes and Genomes (KEGG) [5]. The degree of change in metabolite abundance between several samples is expressed as hue of the compound symbols drawn on the KEGG pathway maps, based on the compound IDs assigned. Alterations in transcriptome data can be simultaneously depicted on the maps. This tool can be used for the integrated analysis of metabolomic, transcriptomic, and possibly proteomic data.

Sample data for testing the color representations on the pathway maps are available on the “Analysis” page of KaPPA-View. Users can select the items according to the directions presented on the page. Sample files for input data are available on the “Download” page.

3.2.2. RnR

This database contains data on the relationship between metabolites and genes; these data were generated via simultaneous measurement of the metabolome and transcriptome of approximately 200 transgenic cultured cell lines of the model plant Arabidopsis. The gene expression patterns and metabolite changes resulting from specific transgenes are compiled in the database. Users can search, for example, genes that can affect the abundance of the queried metabolites and vice versa. The database should contribute to knowledge discovery related to gene-to-metabolite regulatory networks in Arabidopsis cells.

To view an example dataset, a clickable pie chart of metabolites is presented on the main page of the RnR website. Clicking on a section of the pie chart will show a list of metabolites. After choosing a metabolite name, users will be able to view candidate genes that are related to the metabolite.

3.3. Databases for Data Dissemination

3.3.1. MassBase

The primary purpose of MassBase is the distribution of raw data generated by analytical instruments. Dissemination of raw data would enable the development and improvement of data analysis tools by bioinformatics researchers [53]. Binary raw data and near-raw text data exported from raw binary results are provided.

Users can search records by sample name, sample description, instrument type, and ionization mode on the “Advanced Search” page (Figure 4(a)). A summary of the records classified by species and instrument type is available on the “Summary” page.

Figure 4.

Figure 4

Screenshots of the web interface of MassBase (a) and KomicMarket (b).

3.3.2. KomicMarket

KomicMarket is a sample-centric database aimed at the distribution of metabolic profiles with and without metabolite annotations (Figure 4(b)). Previous results on the detection and annotation of metabolite features in certain samples can serve as good references for future metabolite annotations [56].

The records can be queried by keywords in the sample descriptions, including peak characteristics such as mass values, and in annotations of the peaks via the GUI on the KomicMarket website. The system provides application programming interfaces (APIs) for performing software-based querying of the data. Using the APIs, we employed the MFSearcher module of PowerGet to search metabolites in KomicMarket.

3.3.3. Metabolonote

This Semantic MediaWiki-based database is intended for managing metadata, which is the detailed information on experimental procedures accompanying the generation of data. Metabolonote is expected to accelerate publication of metabolomics data. The raw data obtained from the experiments or the processed data derived from them are not the target of Metabolonote, and the “actual data” are normally stored in other databases specifically built for a given purpose. Separation of the management of complicated metadata of metabolomics from the management of actual data makes it possible to share the same metadata among multiple actual databases such as raw data repositories, metabolic profile databases, reference libraries of MS/MS, and research papers. One-stop-shop management of complicated metadata of metabolomics eliminates the redundant management of metadata in several databases and reduces labor for data submitters. We defined a simple data format named Togo metabolomics data format (TogoMD) for easier description of metadata. Specifications of the TogoMD format are documented on the Metabolonote website (http://metabolonote.kazusa.or.jp/TogoMetabolomeDataFormat). Metabolonote provides application programming interfaces (APIs) for semantic searching of the records and retrieval of metadata. Because Metabolonote is a wiki system, it allows the submitters to attach free additional information about the metadata, such as images of the samples, video recordings of tricky analytical procedures, and links to a journal's website where the results are published. Therefore, metadata written by the submitter function as a hub of the web data resources related to the submitter's work. The increased presence of the submitters' published work on the web should increase the citations by others [58]. Therefore, the wiki system is expected to facilitate the dissemination of data to the public. Metabolonote is already linked to seven actual databases, including MassBase, KomicMarket, Bio-MassBank, and Riken PRIMe.

The metadata deposited in Metabolonote are listed on the “Public Pages.” The registered metadata are semantically searchable by various items (and combinations thereof) on the “Metadata Search” page.

3.4. Practical Use

Here we present a workflow for a metabolomics study of Jatropha curcas L. [59, 60], a biofuel plant, to illustrate an example of the practical use of the KOMICS resources (Figure 5). LC-Orbitrap-MS analyses were conducted using 4 developmental stages of J. curcas fruit samples. The acquired data were primarily recorded as a binary raw file with commercial software (Xcalibur, Thermo Fisher). To analyze the data with PowerGet, the chromatogram data were exported to text files using the MSGet tool, which is also available on the KOMICS website. The raw files and extracted text files were published on MassBase. The text data were then processed using the PowerGet tool to generate the metabolomic data. MatchedIonsFinder was used to refine the alignment results. MFSearcher was used for high-throughput search of elemental composition and compound databases. MS-MS Fragment Viewer was used for interpreting MS/MS fragments in the metabolite annotations. The peak information, profile data (in the TogoMD format), and MS spectrum data were stored on KomicMarket (on the New KomicMarket temporary website) and on Bio-MassBank, respectively. These data were recursively used for metabolite annotations during the preprocessing step. Subsequently, the nature of the metabolomic data was interpreted by visualization on pathway maps using KaPPA-View4 and other statistical analyses. Consequently, a drastic change in metabolites during the maturation of J. curcas fruit was detected, and these data should contribute to further analysis of oil production by J. curcas. The record in Metabolonote (metadata ID: SE5) is a good guidepost for finding data resources deposited in various databases on the web.

Figure 5.

Figure 5

A schematic representation of the workflow for analysis of metabolomic changes in the developing fruit of Jatropha curcas L. The tools and databases introduced in the main text are shown in bold face. The accession IDs of the data in each database are shown in italics.

4. Conclusions

We have developed various tools and databases for a wide range of processes related to metabolomic studies: preprocessing, data mining, and publication. To our knowledge, PowerGet and FragmentAlign are the first tools to allow users to curate alignment results via GUI. The unique concept of a metadata-specific database should accelerate data publication and dissemination. This infrastructure is expected to assist researchers to attain superior utilization of metabolomics' Big Data. Nonetheless, annotation of novel metabolites (the so-called unknown unknowns) remains a serious bottleneck in building comprehensive metabolomic datasets [16, 54]. Continuous efforts are needed to improve and automate annotation tasks. In addition, a systematic collection of annotation skills from experts will be necessary in the near future, as will the analysis and transfer of these skills to the public domain for education of fledgling annotators.

Supplementary Material

The Supplementary Material include 3 supplementary methods and 4 supplementary figures as follows.

Supplementary Method S1: The method for the evaluation of the accuracy of mass values of the peaks detected using PowerGet tool.

Supplementary Method S2: The method for the evaluation of applicability of a data matrix resulting from peak alignment in FragmentAlign tool to a comparative metabolomics analysis.

Supplementary Method S3. The method for the evaluation of accuracy of peak area quantified using SpiceHit tools.

Supplementary Figure S1: Principal component analysis (PCA) of metabolite peaks in Arabidopsis leaves, Lotus japonicus leaves, and Arabidopsis cultured cells detected using GC-MS and aligned by means of FragmentAlign.

Supplementary Figure S2: Correlation coefficients of the values of peak intensity aligned in FragmentAlign.

Supplementary Figure S3: Reproducibility of peak area quantification in SpiceHit and ChemStation.

Supplementary Figure S4: Comparison of values of the peak area quantified using SpiceHit and ChemStation.

Four supplementary tables are as follows.

Supplementary Table S1: The standard compounds used for the comparison of accuracy of mass values obtained using PowerGet and Xcalibur.

Supplementary Table S2: The data used for peak alignment of GC-MS data through FragmentAlign.

Supplementary Table S3: The data used for the evaluation of quantification of peaks using SpiceHit.

Supplementary Table S4: Comparison of accuracy of peak area quantification in SpiceHit and in ChemStation.

194812.f1.pdf (312.6KB, pdf)
194812.f2.xlsx (31.1KB, xlsx)
194812.f3.xlsx (10.8KB, xlsx)
194812.f4.xlsx (11.9KB, xlsx)
194812.f5.xlsx (13.4KB, xlsx)

Acknowledgments

This work was partly supported by a grant from the New Energy and Industrial Technology Development Organization (NEDO, Japan) as part of the project “Development of Fundamental Technologies for Controlling the Material Production Process of Plants,” a grant from the National Bioscience Database Center (NBDC) of Japan Science and Technology Agency (JST) as part of the project titled “The Life-Science Database Integration Project,” and a grant from the Kazusa DNA Research Institute.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

References

  • 1.Yamamoto N, Suzuki T, Ara T, et al. MatchedIonsFinder: a software tool for revising alignment matrices of spectrograms from liquid chromatography-mass spectrometry. Plant Biotechnology. 2012;29(1):109–113. [Google Scholar]
  • 2.Ogata Y, Sakurai N, Aoki K, et al. KAGIANA: an excel-based tool for retrieving summary information on Arabidopsis genes. Plant and Cell Physiology. 2009;50(1):173–177. doi: 10.1093/pcp/pcn179. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Sakurai N, Ara T, Kanaya S, et al. An application of a relational database system for high-throughput prediction of elemental compositions from accurate mass values. Bioinformatics. 2013;29(2):290–291. doi: 10.1093/bioinformatics/bts660. [DOI] [PubMed] [Google Scholar]
  • 4.Yano K, Aoki K, Suzuki H, Shibata D. DAGViz: a directed acyclic graph browser that supports analysis of gene ontology annotation. Plant Biotechnology. 2009;26(1):9–13. [Google Scholar]
  • 5.Sakurai N, Ara T, Ogata Y, et al. KaPPA-View4: a metabolic pathway database for representation and analysis of correlation networks of gene co-expression and metabolite co-accumulation and omics data. Nucleic Acids Research. 2011;39(1):D677–D684. doi: 10.1093/nar/gkq989. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Ogata Y, Suzuki H, Sakurai N, Shibata D. CoP: a database for characterizing co-expressed gene modules with biological information in plants. Bioinformatics. 2010;26(9):1267–1268. doi: 10.1093/bioinformatics/btq121. [DOI] [PubMed] [Google Scholar]
  • 7.Yano K, Dansako T, Sakurai N, Suzuki H, Shibata D. KATANA: a web-based guide to public databases for Arabidopsis genomic information. Plant Biotechnology. 2005;22(3):225–229. [Google Scholar]
  • 8.Ohba T, Suzuki K, Oura T, et al. ARTRA: a new database of the Arabidopsis transcriptome and gene-specific sequences for microarray probes and RNAi triggers. Plant Biotechnology. 2009;26(1):161–165. [Google Scholar]
  • 9.Oliver SG, Winson MK, Kell DB, Baganz F. Systematic functional analysis of the yeast genome. Trends in Biotechnology. 1998;16(9):373–378. doi: 10.1016/s0167-7799(98)01214-1. [DOI] [PubMed] [Google Scholar]
  • 10.Dunn WB. Current trends and future requirements for the mass spectrometric investigation of microbial, mammalian and plant metabolomes. Physical Biology. 2008;5(1) doi: 10.1088/1478-3975/5/1/011001.011001 [DOI] [PubMed] [Google Scholar]
  • 11.Dettmer K, Aronov PA, Hammock BD. Mass spectrometry-based metabolomics. Mass Spectrometry Reviews. 2007;26(1):51–78. doi: 10.1002/mas.20108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Hollywood K, Brison DR, Goodacre R. Metabolomics: current technologies and future trends. Proteomics. 2006;6(17):4716–4723. doi: 10.1002/pmic.200600106. [DOI] [PubMed] [Google Scholar]
  • 13.Nielsen J, Oliver S. The next wave in metabolome analysis. Trends in Biotechnology. 2005;23(11):544–546. doi: 10.1016/j.tibtech.2005.08.005. [DOI] [PubMed] [Google Scholar]
  • 14.Dunn WB, Hankemeier T. Mass spectrometry and metabolomics: past, present and future. Metabolomics. 2013;9(1, supplement):1–3. [Google Scholar]
  • 15.Zhang A, Sun H, Wang P, Han Y, Wang X. Modern analytical techniques in metabolomics analysis. Analyst. 2012;137(2):293–300. doi: 10.1039/c1an15605e. [DOI] [PubMed] [Google Scholar]
  • 16.Wishart DS. Computational strategies for metabolite identification in metabolomics. Bioanalysis. 2009;1(9):1579–1596. doi: 10.4155/bio.09.138. [DOI] [PubMed] [Google Scholar]
  • 17.Lei Z, Huhman DV, Sumner LW. Mass spectrometry strategies in metabolomics. The Journal of Biological Chemistry. 2011;286(29):25435–25442. doi: 10.1074/jbc.R111.238691. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Hall RD. Plant metabolomics: from holistic hope, to hype, to hot topic. New Phytologist. 2006;169(3):453–468. doi: 10.1111/j.1469-8137.2005.01632.x. [DOI] [PubMed] [Google Scholar]
  • 19.Saito K, Matsuda F. Metabolomics for functional genomics, systems biology, and biotechnology. Annual Review of Plant Biology. 2010;61:463–489. doi: 10.1146/annurev.arplant.043008.092035. [DOI] [PubMed] [Google Scholar]
  • 20.Khoo SH, Al-Rubeai M. Metabolomics as a complementary tool in cell culture. Biotechnology and Applied Biochemistry. 2007;47(part 2):71–84. doi: 10.1042/BA20060221. [DOI] [PubMed] [Google Scholar]
  • 21.Mashego MR, Rumbold K, de Mey M, Vandamme E, Soetaert W, Heijnen JJ. Microbial metabolomics: past, present and future methodologies. Biotechnology Letters. 2007;29(1):1–16. doi: 10.1007/s10529-006-9218-0. [DOI] [PubMed] [Google Scholar]
  • 22.Kopka J, Fernie A, Weckwerth W, Gibon Y, Stitt M. Metabolite profiling in plant biology: platforms and destinations. Genome Biology. 2004;5(6, article 109) doi: 10.1186/gb-2004-5-6-109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Koulman A, Lane GA, Harrison SJ, Volmer DA. From differentiating metabolites to biomarkers. Analytical and Bioanalytical Chemistry. 2009;394(3):663–670. doi: 10.1007/s00216-009-2690-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Jansson J, Willing B, Lucio M, et al. Metabolomics reveals metabolic biomarkers of Crohn’s disease. PLoS ONE. 2009;4(7) doi: 10.1371/journal.pone.0006386.e6386 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Madsen R, Lundstedt T, Trygg J. Chemometrics in metabolomics—a review in human disease diagnosis. Analytica Chimica Acta. 2010;659(1-2):23–33. doi: 10.1016/j.aca.2009.11.042. [DOI] [PubMed] [Google Scholar]
  • 26.Fitzgerald MA, McCouch SR, Hall RD. Not just a grain of rice: the quest for quality. Trends in Plant Science. 2009;14(3):133–139. doi: 10.1016/j.tplants.2008.12.004. [DOI] [PubMed] [Google Scholar]
  • 27.Pongsuwan W, Bamba T, Harada K, Yonetani T, Kobayashi A, Fukusaki E. High-throughput technique for comprehensive analysis of Japanese green tea quality assessment using ultra-performance liquid chromatography with time-of-flight mass spectrometry (UPLC/TOF MS) Journal of Agricultural and Food Chemistry. 2008;56(22):10705–10708. doi: 10.1021/jf8018003. [DOI] [PubMed] [Google Scholar]
  • 28.Ricroch AE, Bergé JB, Kuntz M. Evaluation of genetically engineered crops using transcriptomic, proteomic, and metabolomic profiling techniques. Plant Physiology. 2011;155(4):1752–1761. doi: 10.1104/pp.111.173609. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Kusano M, Redestig H, Hirai T, et al. Covering chemical diversity of genetically-modified tomatoes using metabolomics for objective substantial equivalence assessment. PLoS ONE. 2011;6(2) doi: 10.1371/journal.pone.0016989.e16989 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Krauss M, Singer H, Hollender J. LC-high resolution MS in environmental analysis: from target screening to the identification of unknowns. Analytical and Bioanalytical Chemistry. 2010;397(3):943–951. doi: 10.1007/s00216-010-3608-9. [DOI] [PubMed] [Google Scholar]
  • 31.Lin CY, Viant MR, Tjeerdema RS. Metabolomics: methodologies and applications in the environmental sciences. Journal of Pesticide Science. 2006;31(3):245–251. [Google Scholar]
  • 32.Fukushima A, Kusano M. Recent progress in the development of metabolome databases for plant systems biology. Frontiers in Plant Science. 2013;4(article 73) doi: 10.3389/fpls.2013.00073. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Tohge T, Fernie AR. Web-based resources for mass-spectrometry-based metabolomics: a user’s guide. Phytochemistry. 2009;70(4):450–456. doi: 10.1016/j.phytochem.2009.02.004. [DOI] [PubMed] [Google Scholar]
  • 34.Blekherman G, Laubenbacher R, Cortes DF, et al. Bioinformatics tools for cancer metabolomics. Metabolomics. 2011;7(3):329–343. doi: 10.1007/s11306-010-0270-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Hur M, Campbell AA, Almeida-de-Macedo M, et al. A global approach to analysis and interpretation of metabolic data for plant natural product discovery. Natural Product Reports. 2013;30(4):565–583. doi: 10.1039/c3np20111b. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Benton HP, Wong DM, Trauger SA, Siuzdak G. XCMS2: processing tandem mass spectrometry data for metabolite identification and structural characterization. Analytical Chemistry. 2008;80(16):6382–6389. doi: 10.1021/ac800795f. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Pluskal T, Castillo S, Villar-Briones A, Orešič M. MZmine 2: modular framework for processing, visualizing, and analyzing mass spectrometry-based molecular profile data. BMC Bioinformatics. 2010;11, article 395 doi: 10.1186/1471-2105-11-395. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Baran R, Kochi H, Saito N, et al. MathDAMP: a package for differential analysis of metabolite profiles. BMC Bioinformatics. 2006;7, article 530 doi: 10.1186/1471-2105-7-530. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Lommen A. Metalign: interface-driven, versatile metabolomics tool for hyphenated full-scan mass spectrometry data preprocessing. Analytical Chemistry. 2009;81(8):3079–3086. doi: 10.1021/ac900036d. [DOI] [PubMed] [Google Scholar]
  • 40.Lei Z, Li H, Chang J, Zhao PX, Sumner LW. MET-IDEA version 2.06; improved efficiency and additional functions for mass spectrometry-based metabolomics data processing. Metabolomics. 2012;8:105–110. [Google Scholar]
  • 41.Horai H, Arita M, Kanaya S, et al. MassBank: a public repository for sharing mass spectral data for life sciences. Journal of Mass Spectrometry. 2010;45(7):703–714. doi: 10.1002/jms.1777. [DOI] [PubMed] [Google Scholar]
  • 42.Smith CA, O’Maille G, Want EJ, et al. METLIN: a metabolite mass spectral database. Therapeutic Drug Monitoring. 2005;27(6):747–751. doi: 10.1097/01.ftd.0000179845.53213.39. [DOI] [PubMed] [Google Scholar]
  • 43.Sakurai T, Yamada Y, Sawada Y, et al. PRIMe update: innovative content for plant metabolomics and integration of gene expression and metabolite accumulation. Plant and Cell Physiology. 2013;54(2, article e5) doi: 10.1093/pcp/pcs184. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Wishart DS, Jewison T, Guo AC, et al. HMDB 3.0—the human metabolome database in 2013. Nucleic Acids Research. 2013;41:D801–D807. doi: 10.1093/nar/gks1065. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Afendi FM, Okada T, Yamazaki M, et al. KNApSAcK family databases: integrated metabolite-plant species databases for multifaceted plant research. Plant and Cell Physiology. 2012;53(2, article e1) doi: 10.1093/pcp/pcr165. [DOI] [PubMed] [Google Scholar]
  • 46.Wang Y, Xiao J, Suzek TO, Zhang J, Wang J, Bryant SH. PubChem: a public information system for analyzing bioactivities of small molecules. Nucleic Acids Research. 2009;37(2):W623–W633. doi: 10.1093/nar/gkp456. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Kanehisa M, Goto S, Sato Y, Furumichi M, Tanabe M. KEGG for integration and interpretation of large-scale molecular data sets. Nucleic Acids Research. 2012;40:D109–D114. doi: 10.1093/nar/gkr988. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Caspi R, Altman T, Dreher K, et al. The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of pathway/genome databases. Nucleic Acids Research. 2009;38(1):D742–D753. doi: 10.1093/nar/gkr1014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Croft D, O’Kelly G, Wu G, et al. Reactome: a database of reactions, pathways and biological processes. Nucleic Acids Research. 2011;39(1):D691–D697. doi: 10.1093/nar/gkq1018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Bais P, Moon SM, He K, et al. Plantmetabolomics.org: a web portal for plant metabolomics experiments. Plant Physiology. 2010;152(4):1807–1816. doi: 10.1104/pp.109.151027. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Kopka J, Schauer N, Krueger S, et al. GMD@CSB.DB: the Golm metabolome database. Bioinformatics. 2005;21(8):1635–1638. doi: 10.1093/bioinformatics/bti236. [DOI] [PubMed] [Google Scholar]
  • 52.Carroll AJ, Badger MR, Millar AH. The MetabolomeExpress project: enabling web-based processing, analysis and transparent dissemination of GC/MS metabolomics datasets. BMC Bioinformatics. 2010;11, article 376 doi: 10.1186/1471-2105-11-376. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Haug K, Salek RM, Conesa P, et al. MetaboLights-an open-access general-purpose repository for metabolomics studies and associated meta-data. Nucleic Acids Research. 2013;41:D781–D786. doi: 10.1093/nar/gks1004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Bowen BP, Northen TR. Dealing with the unknown: metabolomics and metabolite atlases. Journal of the American Society for Mass Spectrometry. 2010;21(9):1471–1476. doi: 10.1016/j.jasms.2010.04.003. [DOI] [PubMed] [Google Scholar]
  • 55.Neumann S, Böcker S. Computational mass spectrometry for metabolomics: identification of metabolites and small molecules. Analytical and Bioanalytical Chemistry. 2010;398(7-8):2779–2788. doi: 10.1007/s00216-010-4142-5. [DOI] [PubMed] [Google Scholar]
  • 56.Goodacre R, Vaidyanathan S, Dunn WB, Harrigan GG, Kell DB. Metabolomics by numbers: acquiring and understanding global metabolite data. Trends in Biotechnology. 2004;22(5):245–252. doi: 10.1016/j.tibtech.2004.03.007. [DOI] [PubMed] [Google Scholar]
  • 57.Kind T, Scholz M, Fiehn O. How large is the metabolome? A critical analysis of data exchange practices in chemistry. PLoS ONE. 2009;4(5) doi: 10.1371/journal.pone.0005440.e5440 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Piwowar HA, Day RS, Fridsma DB. Sharing detailed research data is associated with increased citation rate. PLoS ONE. 2007;2(3, article e308) doi: 10.1371/journal.pone.0000308. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Sakurai N, Ogata Y, Ara T, et al. Development of KaPPA-View4 for omics studies on Jatropha and a database system KaPPA-Loader for construction of local omics databases. Plant Biotechnology. 2012;29(2):131–135. [Google Scholar]
  • 60.Sano R, Ara T, Akimoto N, et al. Dynamic metabolic changes during fruit maturation in Jatropha curcas L. Plant Biotechnology. 2012;29(2):175–178. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

The Supplementary Material include 3 supplementary methods and 4 supplementary figures as follows.

Supplementary Method S1: The method for the evaluation of the accuracy of mass values of the peaks detected using PowerGet tool.

Supplementary Method S2: The method for the evaluation of applicability of a data matrix resulting from peak alignment in FragmentAlign tool to a comparative metabolomics analysis.

Supplementary Method S3. The method for the evaluation of accuracy of peak area quantified using SpiceHit tools.

Supplementary Figure S1: Principal component analysis (PCA) of metabolite peaks in Arabidopsis leaves, Lotus japonicus leaves, and Arabidopsis cultured cells detected using GC-MS and aligned by means of FragmentAlign.

Supplementary Figure S2: Correlation coefficients of the values of peak intensity aligned in FragmentAlign.

Supplementary Figure S3: Reproducibility of peak area quantification in SpiceHit and ChemStation.

Supplementary Figure S4: Comparison of values of the peak area quantified using SpiceHit and ChemStation.

Four supplementary tables are as follows.

Supplementary Table S1: The standard compounds used for the comparison of accuracy of mass values obtained using PowerGet and Xcalibur.

Supplementary Table S2: The data used for peak alignment of GC-MS data through FragmentAlign.

Supplementary Table S3: The data used for the evaluation of quantification of peaks using SpiceHit.

Supplementary Table S4: Comparison of accuracy of peak area quantification in SpiceHit and in ChemStation.

194812.f1.pdf (312.6KB, pdf)
194812.f2.xlsx (31.1KB, xlsx)
194812.f3.xlsx (10.8KB, xlsx)
194812.f4.xlsx (11.9KB, xlsx)
194812.f5.xlsx (13.4KB, xlsx)

Articles from BioMed Research International are provided here courtesy of Wiley

RESOURCES