Skip to main content
. 2021 Oct 6;12:5849. doi: 10.1038/s41467-021-26085-2

Fig. 1. Overview of DUBStepR workflow.

Fig. 1

After filtering out mitochondrial, ribosomal, spike-in, and pseudogenes, DUBStepR constructs a GGC matrix and bins genes by expression to compute their correlation range z-scores, which are used to select well-correlated genes. DUBStepR then performs stepwise regression on the GGC matrix to identify a minimally redundant subset of seed features, which are then expanded by adding correlated features (guilt-by-association). The optimal feature set size is determined using the Density Index metric.