Figure - PMC

Skip to main content

An official website of the United States government

Here's how you know

Here's how you know

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

View full-text article in PMC

. 2020 Oct 15;18:2920–2930. doi: 10.1016/j.csbj.2020.10.006

Search in PMC
Search in PubMed
View in NLM Catalog
Add to search

© 2020 Published by Elsevier B.V. on behalf of Research Network of Computational and Structural Biotechnology.

This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

PMC Copyright notice

Fig. 3 — Validation methods for machine learning results, including cross-validation, permutation, confusion matrix and receiver operator characteristic (ROC) curve. Matrix X represents the spectral dataset and Y represents the classification/concentration information (a) Cross-validation illustration using the Leave-One-Out (LOO) strategy as an example. Each time a single sample is left out and the remaining samples are used as a training set to build the multivariate model. This step is repeated for all the samples. The modelling error is calculated for each multivariate model with different numbers of components (the plot on the right side). (b) The permutation test shuffles the samples in the X block. The samples in the Y block remain in the same order as the original data. Pseudo machine learning models are built and statistical tests are applied to compare the original model with the pseudo machine learning models. The permutation test can be applied with different parameters (e.g. the number of components), and only the models with p-values lower than 0.05 are considered statistically reliable (the plot on the right side). (c) Confusion matrix indicating the number of true positive, false positive, false negative and true negative samples predicted by the machine learning method. Specificity and sensitivity are related to the number of samples that fall into each category (Equation (1), (2)). The ROC curve shows the relationship between specificity and sensitivity of the models where the threshold of the classifier is varied.