omu, a Metabolomics Count Data Analysis Tool for Intuitive Figures and Convenient Metadata Collection

Connor R Tiffany; Andreas J Bäumler

doi:10.1128/MRA.00129-19

. 2019 Apr 11;8(15):e00129-19. doi: 10.1128/MRA.00129-19

omu, a Metabolomics Count Data Analysis Tool for Intuitive Figures and Convenient Metadata Collection

Connor R Tiffany ^a, Andreas J Bäumler ^a,^✉

Editor: Irene L G Newton^b

PMCID: PMC6460029 PMID: 30975806

Metabolomics is a powerful tool for measuring the functional output of the microbiota. Currently, there are few established workflows for analysis downstream of metabolite identification.

ABSTRACT

Metabolomics is a powerful tool for measuring the functional output of the microbiota. Currently, there are few established workflows for analysis downstream of metabolite identification. Here, we introduce omu, an R package designed for assigning compound hierarchies and linking compounds to corresponding enzyme and gene annotations for organisms of interest.

ANNOUNCEMENT

The omu R package is designed to analyze processed metabolomics count data. The central idea behind omu is assigning hierarchical metadata from the Kyoto Encyclopedia of Genes and Genomes (KEGG) (1) to each metabolite in order to help users create intuitive figures for visualizing their data. To do this, omu provides a suite of graphing and statistical functions centered around the use of assign_hierarchy, which provides the metadata for each metabolite based on a KEGG identifier in the user’s data. omu comes with an example data set of the fecal metabolome of nitric oxide synthase 2 (NOS2)-deficient C57BL/6J mice 3 days after mock treatment (gavage with sterile water) or oral gavage with a single dose of streptomycin (20 mg/animal). This data set was used to generate the figure in this paper.

Initially, users can use assign_hierarchy to provide metadata for each metabolite that has a KEGG compound number provided (Table 1). KEGG data are annotated by compound class, subclass 1, subclass 2, subclass 3, and subclass 4. This annotation provides several options for the users to analyze and visualize their data. For example, users can create a subset to a compound class they are particularly interested in, such as carbohydrates. The user can use the omu_summary function to perform a student’s t test on each metabolite between two experimental groups, provide a fold-change value for each compound, and use these data in conjunction with the hierarchical compound annotation to create intuitive figures. The count_fold_changes function can be used to provide a count table of every metabolite within a KEGG compound category that significantly increased or decreased between groups to make bar plots (Fig. 1A) that show how many compounds increased or decreased between experiment groups by a metabolite class or subclass. Alternatively, since omu_summary also provides fold-change data, the user can incorporate effect size into a figure by creating a volcano plot using the plot_volcano function, which allows the user to highlight points in the plot by the metadata (Fig. 1B).

TABLE 1.

Data table created from the example data set, showcasing the metadata collection that omu can perform

Gene	Metabolite	KEGG no.	Metabolite count	Class	Subclass 1	Subclass 2	Species, strain, serotype
PA4091 (hpaA)	4-Hydroxyphenylacetic acid	C00642	32,307	Organic acids	None	None	Pseudomonas aeruginosa PAO1
N297_4221 (hpaB)	4-Hydroxyphenylacetic acid	C00642	32,307	Organic acids	None	None	P. aeruginosa PAO1, VE13
N296_4221 (hpaB)	4-Hydroxyphenylacetic acid	C00642	32,307	Organic acids	None	None	P. aeruginosa PAO1, VE2
PA14_11000 (hpaA)	4-Hydroxyphenylacetic acid	C00642	32,307	Organic acids	None	None	P. aeruginosa UCBPP, PA14
PSPA7_1007 (hpaB)	4-Hydroxyphenylacetic acid	C00642	32,307	Organic acids	None	None	P. aeruginosa PA7
PP4_31900 (gbd)	4-Hydroxybutyric acid	C00989	315	Organic acids	Carboxylic acids	Hydroxycarboxylic acids	Pseudomonas putida NBRC 14164
APT59_05510	4-Hydroxybutyric acid	C00989	315	Organic acids	Carboxylic acids	Hydroxycarboxylic acids	Pseudomonas oryzihabitans
PverR02_11545	4-Hydroxybutyric acid	C00989	315	Organic acids	Carboxylic acids	Hydroxycarboxylic acids	Pseudomonas veronii
BJP27_20245	4-Hydroxybutyric acid	C00989	315	Organic acids	Carboxylic acids	Hydroxycarboxylic acids	Pseudomonas psychrotolerans

Open in a new tab

FIG 1 — Data visualization in omu. (A) Bar plot showing the number of significantly different metabolites between treatment groups by class from the example data set that comes with the R package. (B) Volcano plot comparing two treatment groups from the example data set. Highlighted points in the plot correspond to metabolite classes.

The omu R package can also help users generate hypotheses from their metabolomics data. The KEGG_gather function can get all known enzyme orthology data and gene data associated with each metabolite from the KEGG database as long as the computer is connected to the Internet. This produces high-dimensional data, but assign_hierarchy also provides metadata for enzymes and for genes in the form of organism data, allowing the user to reduce the data to items of interest. For example, if a user is interested in genes that Pseudomonas spp. have to metabolize organic acids, the KEGG_gather function in conjunction with the assign_hierarchy function can be used to generate a table containing this information (Table 1).

In summary, omu is a novel metabolomics analysis tool that helps users describe their data by incorporating metabolite metadata into intuitive figures and creating tables with genes and enzymes associated with metabolites of interest.

Data availability.

omu is available for download on the CRAN repository (https://cran.r-project.org/web/packages/omu/index.html) and was built using R (2), devtools (3), dplyr (4), ggfortify (5), ggplot2 (6), KEGGREST (7), knitr (8), magrittr (9), plyr (10), reshape2 (11), rmarkdown (12), stringr (13), roxygen2 (14), and tidyr (15). Detailed instructions for using omu can be found at https://cran.r-project.org/web/packages/omu/vignettes/Omu_vignette.html.

ACKNOWLEDGMENTS

Work in A.J.B.’s lab is supported by USDA/NIFA award 2015-67015-22930 and by Public Health Service grants AI044170, AI096528, AI112445, and AI112949.

REFERENCES

1.Ogata H, Goto S, Sato K, Fujibuchi W, Bono H, Kanehisa M. 1999. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res 27:29–34. doi: 10.1093/nar/27.1.29. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.R Development Core Team. 2016. R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria: http://www.r-project.org/. [Google Scholar]
3.Wickham H, Hester J, Chang W. 2018. devtools: tools to make developing R packages easier. https://rdrr.io/cran/devtools/.
4.Wickham H, Francois R, Henry L, Müller K. 2018. dplyr: a grammar of data manipulation. R Package version 075. https://rdrr.io/cran/dplyr/.
5.Tang Y, Horikoshi M, Li W. 2016. ggfortify: unified interface to visualize statistical results of popular R packages. R J 8:474–485. doi: 10.32614/RJ-2016-060. [DOI] [Google Scholar]
6.Wickham H. 2016. ggplot2: elegant graphics for data analysis. Springer-Verlag, New York, NY. [Google Scholar]
7.Tenenbaum D. 2018. KEGGREST: client-side REST access to KEGG. https://rdrr.io/bioc/KEGGREST/.
8.Xie Y. 2018. knitr: a general-purpose package for dynamic report generation in R. https://rdrr.io/cran/knitr/.
9.Bache SM, Wickham H. 2014. magrittr: a forward-pipe operator for R. R Core Development Team, Vienna, Austria: https://cran.r-project.org/package=magrittr. [Google Scholar]
10.Wickham H. 2011. The split-apply-combine strategy for data analysis. J Stat Softw 40:1–29. doi: 10.18637/jss.v040.i01. [DOI] [Google Scholar]
11.Wickham H. 2012. reshape2: flexibly reshape data: a reboot of the reshape package. R Package version 1.4.3. https://rdrr.io/cran/reshape2/.
12.Allaire JJ, Xie Y, McPherson J, Luraschi J, Ushey K, Atkins A, Wickham H, Cheng J, Chang W. 2018. rmarkdown: dynamic documents for R. https://rdrr.io/cran/rmarkdown/.
13.Wickham H. 2018. stringr: simple, consistent wrappers for common string operations. https://rdrr.io/cran/stringr/man/stringr-package.html.
14.Wickham H, Danenberg P, Eugster M. 2014. R: Roxygen2. CRAN.
15.Wickham H, Henry L. 2018. tidyr: easily tidy data with “spread()” and “gather()” functions. https://tidyr.tidyverse.org/.

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

[B1] 1.Ogata H, Goto S, Sato K, Fujibuchi W, Bono H, Kanehisa M. 1999. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res 27:29–34. doi: 10.1093/nar/27.1.29. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B2] 2.R Development Core Team. 2016. R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria: http://www.r-project.org/. [Google Scholar]

[B3] 3.Wickham H, Hester J, Chang W. 2018. devtools: tools to make developing R packages easier. https://rdrr.io/cran/devtools/.

[B4] 4.Wickham H, Francois R, Henry L, Müller K. 2018. dplyr: a grammar of data manipulation. R Package version 075. https://rdrr.io/cran/dplyr/.

[B5] 5.Tang Y, Horikoshi M, Li W. 2016. ggfortify: unified interface to visualize statistical results of popular R packages. R J 8:474–485. doi: 10.32614/RJ-2016-060. [DOI] [Google Scholar]

[B6] 6.Wickham H. 2016. ggplot2: elegant graphics for data analysis. Springer-Verlag, New York, NY. [Google Scholar]

[B7] 7.Tenenbaum D. 2018. KEGGREST: client-side REST access to KEGG. https://rdrr.io/bioc/KEGGREST/.

[B8] 8.Xie Y. 2018. knitr: a general-purpose package for dynamic report generation in R. https://rdrr.io/cran/knitr/.

[B9] 9.Bache SM, Wickham H. 2014. magrittr: a forward-pipe operator for R. R Core Development Team, Vienna, Austria: https://cran.r-project.org/package=magrittr. [Google Scholar]

[B10] 10.Wickham H. 2011. The split-apply-combine strategy for data analysis. J Stat Softw 40:1–29. doi: 10.18637/jss.v040.i01. [DOI] [Google Scholar]

[B11] 11.Wickham H. 2012. reshape2: flexibly reshape data: a reboot of the reshape package. R Package version 1.4.3. https://rdrr.io/cran/reshape2/.

[B12] 12.Allaire JJ, Xie Y, McPherson J, Luraschi J, Ushey K, Atkins A, Wickham H, Cheng J, Chang W. 2018. rmarkdown: dynamic documents for R. https://rdrr.io/cran/rmarkdown/.

[B13] 13.Wickham H. 2018. stringr: simple, consistent wrappers for common string operations. https://rdrr.io/cran/stringr/man/stringr-package.html.

[B14] 14.Wickham H, Danenberg P, Eugster M. 2014. R: Roxygen2. CRAN.

[B15] 15.Wickham H, Henry L. 2018. tidyr: easily tidy data with “spread()” and “gather()” functions. https://tidyr.tidyverse.org/.

PERMALINK

omu, a Metabolomics Count Data Analysis Tool for Intuitive Figures and Convenient Metadata Collection

Connor R Tiffany

Andreas J Bäumler

Roles

ABSTRACT

ANNOUNCEMENT

TABLE 1.

FIG 1.

Data availability.

ACKNOWLEDGMENTS

REFERENCES

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

omu, a Metabolomics Count Data Analysis Tool for Intuitive Figures and Convenient Metadata Collection

Connor R Tiffany

Andreas J Bäumler

Roles

ABSTRACT

ANNOUNCEMENT

TABLE 1.

FIG 1.

Data availability.

ACKNOWLEDGMENTS

REFERENCES

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases