Problem setup: |
n |
Total number of cells in the input matrix |
pt
|
Number of target genes |
pr
|
Number of regulators |
Zt, Zr
|
target gene and regulator expression (n × pt resp. n × pr matrices) |
K |
Desired number of clusters |
Π |
K × pt cluster membership matrix for target genes with Π(i, j) ∈ {0, 1} and
|
J |
pt × pt indicator matrix describing prior knowledge of biological relationships between target genes (e.g. pathway co-occurence) |
For data splits d = 1, 2: |
nd
|
Number of cells in the d-th data split |
Zt,d
|
target gene expression used in the d-th data split (nd × pt matrix) |
Zr,d
|
regulator expression used in the d-th data split (nd × pr matrix) |
For each cluster i = 1, …, K: |
Ci
|
Set of target genes in cluster i
|
Ri
|
Set of regulators associated with cluster i
|
Ni
|
Maximum number of regulators associated with cluster i
|
si
|
Sign vectors containing one sign for each regulator in Ri
|
Bi
|
non-negative regression coefficients (∣Ri∣ × pt matrix) |
|
positive variance parameter for each target gene and cluster |
Optimization-related (i = 1, …, K, j = 1, …, pt): |
λ |
Positive penalty parameter used in coop-Lasso |
wi
|
Positive weight vector of length pr for each cluster i in coop-Lasso |
Bi,OLS
|
Ordinary least squares estimates of the regression coefficients in cluster i (pr × ∣Ci∣ matrix) |
Bi,CL
|
Coop-lasso estimates of the regression coefficients in cluster i (pr × ∣Ci∣ matrix) |
τ |
Non-negative threshold for rag-bag clustering |
pi,j
|
Prior probability of target gene j being in cluster i
|
Lj
|
K × n2 likelihood matrix for target gene j
|
vj
|
Vector of n2 votes for the cluster assignment of target gene j
|
μ |
Prior strength in [0, 1] for trade-off between likelihood and prior in cluster allocation |
Validation measures (i = 1, …, K, j = 1, …, pt, k = 1, …, pr): |
|
Predictive R2 for cluster i
|
|
Predictive R2 for cluster i with regulator k omitted |
Ii,k
|
Importance of regulator k in cluster i
|
|
Predictive R2 for target gene j predicted by regulators in Ri
|
Sj
|
Silhouette score for target gene j
|
Output: |
T |
Regulatory table providing a summary of regulator strength in each cluster (pr × K matrix) |