Figure - PMC

Skip to main content

An official website of the United States government

Here's how you know

Here's how you know

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

View full-text article in PMC

. 2019 Dec 12;20:264. doi: 10.1186/s13059-019-1862-5

Search in PMC
Search in PubMed
View in NLM Catalog
Add to search

© The Author(s). 2019

Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

PMC Copyright notice

Fig. 1 — Summary of the scPred method. a Training step. A gene expression matrix is eigendecomposed via singular value decomposition (SVD) to obtain orthonormal linear combinations of the gene expression values. Only PCs explaining greater than 0.01% of the variance of the dataset are considered for the feature selection and model training steps. Informative PCs are selected using a two-tailed Wilcoxon signed-rank test for each cell class distribution (see the “Methods” section). The cells-PCs matrix is randomly split into k groups and the first k group is considered as a testing dataset for cross-validation. The remaining K-1 groups (shown as a single training fold) are used to train a machine learning classification model (a support vector machine). The model parameters are tuned, and each k group is used as a testing dataset to evaluate the prediction performance of a f_i(x) model trained with the remaining K-1 groups. The best model in terms of prediction performance is selected. b Prediction step. The gene expression values of the cells from an independent test or validation dataset are projected onto the principal component basis from the training model, and the informative PCs are used to predict the class probabilities of each cell using the trained prediction model(s) f_b(x)