Principal component analysis (PCA) is an unsupervised data analysis method that requires two conditions: (1) each principal component (PrC) is the best explanation of the original variables (x data) in the data set, and (2) each PrC is orthogonal to every other PrC. Data analysis results are visualized using the PrC score and loadings plots. PCA gives an overview of the structure of the data set and is helpful for identifying inherent clusters and separations within the data set and possible outliers. Subsequent supervised analysis methods, such as partial least squares/projections to latent structures discriminant analysis (PLS-DA) and orthogonal signal correction projections to latent structures discriminant analysis (O-PLS-DA), are necessary to extract important biomarkers. In PCA, only x data are considered, whereas partial least squares are an extension of PCA and considers both x and y data sets. Here the y data set can be class membership (PLS-DA) or other biological descriptions (PLS). It is regarded as a supervised modeling method when the multivariate data analysis considers the y data set. The supervised modeling satisfies dimensional reduction (PCA) and correlation with the response matrix (y data) at the same time, generating metabolic information relating to biological events of interest. The purpose of applying orthogonal signal correction is to remove data that are unrelated to the response matrix, such as instrument instability or other irrelevant confounding factors. However, this supervised method increases the risk of overfitting the model, and therefore subsequent model validation procedures are essential when performing supervised data analyses. |