Skip to main content
. Author manuscript; available in PMC: 2022 Apr 24.
Published in final edited form as: Cell Syst. 2022 Jan 31;13(4):286–303.e10. doi: 10.1016/j.cels.2021.12.005

Figure 1. Pleiotropy underlying the fitness effects of gene perturbation can be approximated using dictionary learning.

Figure 1.

(A) Fitness screen collections measure changes in cell growth rate following gene perturbation across diverse cell contexts. Webster applies graph-regularized dictionary learning to these data to discover latent variables corresponding to biological functions. Webster returns (1) a dictionary matrix containing the fitness effect of perturbing each inferred biological function and (2) sparse gene-to-function loadings. Using this information, each measured gene effect can be approximated as a sparse linear combination of these latent functional effects, scaled by the appropriate loadings. Given the number of latent functions (k) and a sparsity parameter (t), Webster minimizes the total approximation error while preserving the local structure of genes and cell contexts in its reduced-dimensional representations (see also Figure S1A).

(B) A generative example. Fitness effects corresponding to two distinct biological functions are generated over 25 cell contexts, shown in a heatmap, with a negative score indicating a slowed growth rate.

(C) Top: diagram of gene-to-function relationships. Gene C influences both functions, representing pleiotropy. Bottom: A gene’s contribution to each function is captured in a loading score, shown in a heatmap.

(D) To simulate the fitness effect of knocking out each gene, we scaled the appropriate functional effects with the loading scores defined in C, adding random noise to represent measurement error.

(E) This results in a synthetic screening dataset of 100 gene perturbations across 25 cell contexts, with the original biological functions implicit in the structure of the data. This matrix is the sole input to Webster.

(F) From this dataset, Webster was parameterized to infer two functional effects and model each gene effect as a mixture of both functional effects. Webster recovered a dictionary matrix that matched the ground truth defined in (B), and a gene-to-function loadings matrix that matched the ground truth defined in (C).

(G) Webster reconstructed each noisy gene effect measurement as a sparse linear combination of learned functional effects, thereby accommodating pleiotropy. For example, Webster accurately modeled knockout of Gene C as a near-equal mixture of knocking out Function 1 and Function 2 while isolating measurement noise in the model residuals.