Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2018 May 3.
Published in final edited form as: Nat Methods. 2017 Apr 27;14(5):461–462. doi: 10.1038/nmeth.4260

Systems biology guided by XCMS Online metabolomics

Tao Huan 1,13, Erica M Forsberg 1,13, Duane Rinehart 1, Caroline H Johnson 1,2, Julijana Ivanisevic 3, H Paul Benton 1, Mingliang Fang 1,4, Aries Aisporna 1, Brian Hilmers 1, Farris L Poole 5, Michael P Thorgersen 5, Michael W W Adams 5, Gregory Krantz 6, Matthew W Fields 6, Paul D Robbins 7, Laura J Niedernhofer 7, Trey Ideker 8, Erica L Majumder 9, Judy D Wall 9, Nicholas J W Rattray 2,10, Royston Goodacre 10, Luke L Lairson 11, Gary Siuzdak 1,11,12
PMCID: PMC5933448  NIHMSID: NIHMS961195  PMID: 28448069

To the Editor

An aim of systems biology is to understand complex interactions between genes, proteins and metabolites by integrating and modeling multiple data sources. We report an ‘integrated-omics’ approach within XCMS Online1 that automatically superimposes raw metabolomic data onto metabolic pathways and integrates it with transcriptomic and proteomic data (http://XCMSOnline.scripps.edu/).

Mapping downstream metabolite changes onto metabolic pathways and biological networks can provide considerable mechanistic insight that can be confirmed by association to multi-omic data. However, pathway analysis using untargeted metabolomics requires intense data curation, including feature filtering, statistical analysis and metabolite identification. Subjectively defined values such as fold change, P value and signal intensity cut-off are needed to identify significantly dysregulated metabolite features within enormous data sets. Confirming metabolite identities for pathway analysis typically requires performing additional tandem mass spectrometry (MS/MS) experiments and matching the spectra to standards or MS/MS spectral databases. The magnitude of these data sets makes it impractical to manually interpret, and therefore the use of bioinformatic tools at each step is essential. Multiple analysis platforms are often needed to complete the entire workflow, which can take several weeks, depending on the size of the sample cohort and the experience of the analyst.

XCMS was originally developed as a metabolomics data processing algorithm to extract metabolic features from raw MS data and perform statistical analysis. The evolution of XCMS from a command line tool2 to an intuitive cloud-based online platform1 facilitated its use by a broader community. However, the community is still in need of user-friendly tools to take metabolomic output and associate it with metabolic pathways to identify aberrant biological processes. To address this demand, we implemented automated predictive pathway analysis3, operating directly on the entire metabolic feature table, into the XCMS Online workflow (Fig. 1), removing the need to transfer data to another application and enabling quick and efficient pathway analysis. This process involves uploading raw MS data to XCMS Online, where the statistically significant features are identified; then, using Fisher’s exact test, dysregulated metabolic pathways are identified from the processed accurate mass data3. If gene and protein data are available, they are uploaded and overlaid with the results of the metabolomic analysis. Currently there are over 7,600 metabolic models available for pathway analysis from BioCyc4 v19.5–20.0, with contents being updated regularly. Further confirmation of dysregulated pathways can be performed by comparing metabolite spectra, obtained via targeted or autonomous MS/MS, with standard fragmentation spectra from METLIN, which contains MS/MS data on over 14,000 molecules5. To address instances in which a standard spectrum is not available, we have also recently added machine learning in silico fragmentation data to METLIN, generating MS/MS spectra on over 220,000 more molecules. Our workflow enables (i) evaluation of biochemical relevance by mapping high resolution MS data directly onto pathways, (ii) cross-integration of genomic and proteomic data and (iii) metabolite identity verification via data-dependent MS/MS analysis, either separately or as part of the autonomous workflow5.

Figure 1.

Figure 1

Workflow for metabolomic data and pathway analysis using XCMS Online. A metabolite feature table of statistically significant features is generated from standard XCMS processing; these features automatically undergo predictive pathway mapping using a specified biological model. The pathway cloud plot shows dysregulated pathways (blue circles) with increasing statistical significance on the y axis, metabolite overlap on the x axis and total number of metabolites in the pathway represented by the circle radius. The multiscale pathway coverage table presents enriched metabolic pathways with overlapped and total metabolites, genes and proteins. MS/MS data confirm dysregulated pathways by matching metabolite MS/MS spectra with the METLIN database.

Our multi-omic analysis tool uses embedded BioCyc4 and Uniprot6 databases to map user-uploaded gene and protein data onto the predicted metabolic pathways (Supplementary Fig. 1). Results can be viewed in table form or using the interactive Pathway Cloud plot (Fig. 1). Dysregulated pathways with greater percent overlap and statistical significance appear in the upper right of the cloud plot. Graph features can be clicked to view more information on overlapping gene, protein and metabolite data, with links to BioCyc, KEGG and METLIN. Important features can be readily identified, helping to decipher underlying biological mechanisms. Details on the pathway analysis and integrated omics workflow can be found in the Supplementary Methods. Data sharing is possible between collaborators and the public, and we encourage users to share their data in the XCMS Online community.

To demonstrate metabolic pathway analysis and multi-omic integration, we describe representative sample sets in the Supplementary Note, including metabolic pathway analysis using progenitor cell proliferation data and a bacterially induced corrosion study (Supplementary Fig. 2); proteomic integration with an aging study (Supplementary Fig. 3); transcriptomic and proteomic integration using a human colon cancer study (Supplementary Fig. 4 and Supplementary Table 1); a nitrate stress response study in sulfate-reducing bacteria (Supplementary Fig. 5) and a media stress response study in Escherichia coli (Supplementary Fig. 6 and Supplementary Table 2); and a cohort of 1,600 diabetes plasma samples (Supplementary Fig. 7), which helps illustrate the scalability of the cloud-based XCMS Online.

Other notable tools providing pathway analysis and multi-omic integration include Galaxy-M7, Open MS from KNIME8 and MetaboAnalyst9. However, many of these tools still require separate preprocessing of tandem liquid chromatography—mass spectrometry data and are not fully integrated into a single program. Our workflow automatically maps metabolomic data directly onto pathways and integrates transcriptomics and proteomics for systems-wide interpretation in one cohesive platform. Additionally, metabolic network mapping is available based on the predictive activity network algorithm3 for analysis of metabolomic data only, with multi-omics networking in development. In the future, we will incorporate unique metabolic pathways and networks from other sources to provide more comprehensive biological resources.

Data availability

To assist users with the workflow, we have provided a sample data set entitled “Ecoli_glucose-vs-adenosine” (Job ID #1133019) that can be found on XCMS Online under XCMS Public (https://xcmsonline.scripps.edu/landing_page.php?pgcontent=listPublicShares), as well as two instructional videos available on the XCMS Institute website (https://xcmsonline.scripps.edu/landing_page.php?pgcontent=institute) under the Omics tab and by clicking Integrated Omics or Pathway Cloud Plot.

Supplementary Material

Supp

Acknowledgments

The authors thank J. Nazroo, G. Tampubolon, N. Pendleton and F.C.W. Wu from the University of Manchester for constructive discussions alongside Medical Research Council grant MRC G1001375/1 (R.G.) for generous funding. The authors thank the following for funding assistance: Ecosystems and Networks Integrated with Genes and Molecular Assemblies (ENIGMA), a Scientific Focus Area Program at Lawrence Berkeley National Laboratory for the US Department of Energy, Office of Science, Office of Biological and Environmental Research under contract number DE-AC02-05CH11231 (G.S.); and National Institutes of Health grants R01 GMH4368 (G.S.) and PO1 A1043376-02S1 (G.S.).

Footnotes

Any Supplementary Information and Source Data files are available in the online version of the paper.

AUTHOR CONTRIBUTIONS

T.H., E.M.F., D.R., H.P.B., A.A., B.H., T.I., M.W.W.A., P.D.R., L.J.N., M.W.F. and G.S. contributed to multi-omic platform design and development; T.H., E.M.F., C.H.J., M.F., G.K., M.P.T., L.L.L., F.L.P. E.L.M., J.D.W., N.J.W.R. and R.G. contributed to data collection and analysis. T.H. and E.M.F. share first authorship. C.H.J., J.I., T.I., and G.S. also contributed to manuscript writing.

COMPETING FINANCIAL INTERESTS

The authors declare no competing financial interests.

References

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supp

Data Availability Statement

To assist users with the workflow, we have provided a sample data set entitled “Ecoli_glucose-vs-adenosine” (Job ID #1133019) that can be found on XCMS Online under XCMS Public (https://xcmsonline.scripps.edu/landing_page.php?pgcontent=listPublicShares), as well as two instructional videos available on the XCMS Institute website (https://xcmsonline.scripps.edu/landing_page.php?pgcontent=institute) under the Omics tab and by clicking Integrated Omics or Pathway Cloud Plot.

RESOURCES