Skip to main content
. 2022 Apr 15;23:98. doi: 10.1186/s13059-022-02661-7

Fig. 2.

Fig. 2

MAVE-NN quantitative modeling strategy. a Structure of latent phenotype models. A deterministic G-P map f(x) maps each sequence x to a latent phenotype ϕ, after which a probabilistic measurement process p(y| ϕ) generates a corresponding measurement y. b Example of an MPA measurement process inferred from the sort-seq MPRA data of Kinney et al. [12]. MPA measurement processes are used when y values are discrete. c Structure of a GE regression model, which is used when y is continuous. A GE measurement process assumes that the mode of p(y| ϕ), called the prediction y^, is given by a nonlinear function g(ϕ), and the scatter about this mode is described by a noise model py|y^. d Example of a GE measurement process inferred from the DMS data of Olson et al. [8]. Shown are the nonlinearity, the 68% PI, and the 95% PI. e Information-theoretic quantities used to assess model performance. Intrinsic information, Iint, is the mutual information between sequences x and measurements y. Predictive information, Ipre, is the mutual information between measurements y and the latent phenotype values ϕ assigned by a G-P map. Variational information, Ivar, is a linear transformation of the log likelihood of a full latent phenotype model. The model performance inequality, Iint ≥ Ipre ≥ Ivar, always holds on test data (modulo finite data uncertainties), with Iint = Ipre when the G-P map is correct, and Ipre = Ivar when the measurement process correctly describes the distribution of y conditioned on ϕ. G-P: genotype-phenotype; MPA: measurement process agnostic; MPRA: massively parallel reporter assay; GE: global epistasis; DMS: deep mutational scanning; PI: prediction interval