Figure - PMC

Skip to main content

An official website of the United States government

Here's how you know

Here's how you know

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

View full-text article in PMC

. 2010 Apr 16;11(Suppl 2):S4. doi: 10.1186/1471-2105-11-S2-S4

Search in PMC
Search in PubMed
View in NLM Catalog
Add to search

Copyright ©2010 Lee et al; licensee BioMed Central Ltd.

This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PMC Copyright notice

An overview of the analysis procedure used to construct classification models based on metabolome datasets The procedure consists of four stages; data standardization, preprocessing, feature selection, and classification. The raw data from mass spectrometry machines are converted into the standard data formats mzXML [13] and CDF, and in turn preprocessed using the MZmine tool [14,15]. The data are then analyzed with various feature selection and classification techniques. For feature selection, we use chi-square as a univariate method, the correlation-based method as a multivariate method, and Decision tree and Random forest as classifier-embedded methods. For classification, we use Decision tree and Random forest as tree-based non-parametric methods and Support vector machine (SVM) as a generalized linear discriminative method. (An Artificial neural network (ANN) is not used here, since it is known that the ANN has weak points in many cases, compared to the SVM [18,19].) The dimension reduction methods PCA and PLS are used for visualizing overall distributions of given data.