Skip to main content
. 2021 Oct 7;12:5890. doi: 10.1038/s41467-021-25957-x
Reference-related symbols:
GrRg×n Input reference gene expression matrix, prior to scaling.
GrsRg×n Scaled reference gene expression matrix.
Xr{0,1}b×n One-hot design matrix assigning reference cells (columns) to batches (rows).
Xr{0}c×n Zero matrix assigning reference cells (cols) to query batches (rows). All values are 0 because reference cells do not belong to query batches. This term is used in the derivation for the reference compression terms.
μRg×1 Reference gene means used to center each gene for PCA.
σRg×1 Reference gene standard deviations used to scale each gene for PCA.
URg×d Gene loadings from the original PCA (before Harmony integration).
Zr=ΣrVrTRd×n Original (pre-harmonized) PC embedding for reference cells.
Z^rRd×n Integrated embedding for reference cells in harmonized PC (hPC) space, as output by Harmony.
Rr[0,1]k×n Soft cluster assignment of reference cells (cols) to clusters (rows), output by Harmony. Each column is a probability distribution that sums to 1.
YcosRd×k Cluster centroid locations in the harmonized embedding, L2-normalized.
BrRk×(1+b)×d 3D tensor of the estimated parameters (betas and intercepts) of the linear mixture model for each of k clusters for the reference cells.
NrRk×1 First reference compression term. Vector containing the size of each of the k clusters, effectively the number of reference cells contained within them.
CRk×d Second reference compression term.
Ref={μ,σ,U,Ycos,Nr,C} Set of Symphony minimal reference elements.