Figure - PMC

Skip to main content

An official website of the United States government

Here's how you know

Here's how you know

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

View full-text article in PMC

. 2021 Jun 10;12(6):898. doi: 10.3390/genes12060898

Search in PMC
Search in PubMed
View in NLM Catalog
Add to search

© 2021 by the authors.

Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

PMC Copyright notice

Hyperparameters tested for cluster annotation and schematic of algorithm performance evaluation. (A) Hyperparameters tested for each step of the cluster annotation algorithm, as described in Figure 2. The parameters that yielded optimal annotation performance are highlighted in orange. To compute CDGs, we compared the mean expression of all genes in the cluster of interest to their mean expression in a set of reference cells. Reference cells were taken as all other cells from the corresponding study (“within study”), all other cells from the corresponding tissue (“within tissue”), or all other cells from all processed studies (“pan-study”). After computing fold change values for each gene, we tested the selection of 1, 3, 5, 10, or 20 genes for the downstream annotation steps. We tested the use of absolute and scaled versions of GCAs (local scores between genes and cell types). To compute L2 norms, we tested the weighting of each GCA term with the corresponding fold change and log₂FC values to increase the contribution of the strongest CDGs to the cell type prediction. To rank all candidate cell types, we considered a modified L0 norm (number of genes with GCA > 3 to the given cell type), an L2 norm, and a composite metric that considers both the modified L0 and L2 norms. (B) For each parameter combination (n = 195), a cumulative distribution plot was generated to illustrate the fraction of clusters which were correctly predicted within a given rank, ranging from rank 1 to rank 5. To summarize the performance, we considered the number of clusters which were annotated correctly (corresponding to the red bar at Rank Threshold = 1), and we estimated the area under this curve (denoted as AUC_{Ranks 1–5}) as the average fraction of clusters for which the correct annotation was present among the top 1, 2, 3, 4, and 5 predictions.