Fig. 1. Overview of DUBStepR workflow.
After filtering out mitochondrial, ribosomal, spike-in, and pseudogenes, DUBStepR constructs a GGC matrix and bins genes by expression to compute their correlation range z-scores, which are used to select well-correlated genes. DUBStepR then performs stepwise regression on the GGC matrix to identify a minimally redundant subset of seed features, which are then expanded by adding correlated features (guilt-by-association). The optimal feature set size is determined using the Density Index metric.