Abstract
This tutorial presents a mathematical theory that relates the probability of sample frequencies, of M phenotypes in an isogenic population of N cells, to the probability distribution of the sample mean of a quantitative biomarker, when the N is very large. An analogue to the statistical mechanics of canonical ensemble is discussed.
Keywords: large deviation principle, chemical kinetics, Boltzmann’s law, variational Bayesian method, maximum entropy principle
INTRODUCTION
Statistical analyses of data and stochastic models of mechanisms are two very different, but complementary approaches in biological research. While the former obtains a quantitative representation of high-throughput measurements [1], the latter can provide “laws of nature” through limit theorems [2], widely called emergent phenomenon. A case in point is the theory of phase transition [3] which shows that a nonlinear stochastic dynamical system with bistability and cusp catastrophe, in the limit of time t→∞ followed by system’s size V→∞, necessarily exhibits a discontinuous transition [4]. Another example is the recent work [5] which demonstrates that Gibbsian equilibrium chemical thermodynamics can be reformulated as a limit theorem in a mesoscopic chemical kinetic system, with N species and M reversible stochastic elementary reactions, as the system’s size becoming macroscopic.
With the rise of single-cell biology, one naturally is interested in the limiting behavior of the phenotypic frequencies among a population of cells, usually based on one, or several biomarkers. In this case, there is actually a very powerful mathematical result that is widely known to probabilists and statistical physicists. In this tutorial, we give an introduction of this theory and discuss its broader implications.
CHARACTERIZING HETEROGENEITY IN SINGLE CELLS
Asymptotic probaility distribution for sample frequencies of cellular phenotypes
To study phenotypic heterogeneity, let a population of N isogenic cells as independent and identically distributed (i.i.d.) realizations of random events from a set : There are totally M possible phenotypes. Among the N cells, let nk denotes the random number of cells in the kth state: . By phenotypic frequency, we mean .
Let pk denote the probability of a cell in the kth state. Then the probability distribution for the observed frequency being follows a multinomial distribution
| (1) |
Since usually N is very large in a high-throughput single-cell experiment, one can safely approximate Eq. (1) using Stirling’s formula and obtain:
| (2) |
Therefore, one has the asymptotic limit
| (3) |
In the theory of large deviations of probability, this is known as Sanov’s theorem [6]. Since φ(x) > 0 except when , in the limit of N →∞, the probability of is zero, and the probability of is one. The frequency yields the probability for an infinitely large number of i.i.d. samples. Furthermore, Eq. (2) shows that are the most probable sample frequencies for a finite but large N.
Asymptotic distribution for the mean value of a biomarker
Eqs. (1) and (2) give the probability for the frequencies within the N cells distributed among the M phenotypic states. We now consider a specific biomarker g, which is assumed to be a well defined real-valued function of the phenotype of a cell: g = gk when a cell is in the kth state.
It is very clear that if one knows the frequencies , then the mean value for g over the entire population of the N cells is determined:
| (4) |
since the frequencies are random, so is . Then when N→∞, one expects the approaching to the expected value . This is easy to show:
| (5) |
What is the probability distribution for when N is very large but not infinite? One can calculate this:
| (6) |
We obtain Eq. (6) because among the many sets of x that give the same value y, each has a probability of . Therefore, as N→∞, only the set with the smallest φ(x) matters. Eq. (6) indicates that for very large N, the probability distribution for the mean value of the biomarker has the form e−Nψ(y), in which
| (7) |
In the theory of large deviations of probability, this result is known as contraction principle [6]. ψ(y) and φ(x) are called a level-1 and a level-2 large deviations rate functions, respectively.
From phenotypic frequencies to biomarker mean values
The right-hand-side of Eq. (7) can be further carried out; this is a problem of constrained minimization using multivariate calculus:
| (8) |
Introducing Langrage multipliers for Eq. (8),
| (9) |
Then we can find , β*, and λ* as the solution of
| (10) |
That is
| (11a) |
| (11b) |
in which β* is a function of y through Eq. (11b), which gives the function implicitly. We therefore obtain
| (12) |
where
| (13) |
and β*(y) solves .
The above computation tells us that if one knows the values of a biomarker for all the M states of a cell, , together with a prior knowledge of , one should construct the Z(β) function and calculate the F(β) given in Eq. (13). Then the probability distribution for the mean value of the biomarker is going to be:
| (14) |
It also tells us that if one observes the mean biomarker value being , then the most probable phenotypic frequencies will have a posterior form that deviates from its prior {pk}:
| (15) |
Both Eqs. (14) and (15) suggest that the functional relationship , between the mean value of the biomarker and the Lagrangian multiplier β, or its inverse form β=β*(y), are very fundamental to the probabilistic problem, in the limit of infinite sample size N→∞.
BEYOND AN i.i.d. POPULATION
We derived the expression in Eq. (3) based on the assumption of a population of N cells that are i.i.d. samples of a single M-state random individual with probability {pk}. When there are cell-cell interactions among the individuals within a population, the mathematics immediately becomes much more involved.
Two types of research go beyond an i.i.d. population in the stochastic modeling; they were originally motivated, respectively, by chemical kinetics in solution [7] and Ising model for ferromagnetism of solid [8,9]. In chemical kinetics, rapid spatial movement of all “individual molecules” in an aqueous solution leads to the assumption that every individual collides with every other invidivual, and certain “reactions” can occur randomly. The Gibbs function in chemical thermodynamics is precisely like the φ(x) function in Eq. (3), for complex chemical reaction systems in equilibrium [5]. Actually there is a general equation, first discovered by [10], whose solution can provide φ(x) for non-i.i.d. populations. For J =M − 1 reversible unimolecular reactions among M species, , with concentrations and arbitrary non-negative functions R±j(x) being the rates of the jth reaction between species j and the species M,
| (16) |
the equation reads
| (17) |
If and , then the solution to Eq. (17) recovers the Eq. (3),
| (18) |
in which the p’s are functions of q’s and r’s,
| (19) |
The particular set of R±j(x) represents chemical reactions in an ideal solution. A reader who had a course on freshman chemistry might recognize Eq. (18) as
where µm is the chemical potential of mth specie with mole fraction xm (not molarity) in an ideal solution, and . Then , where (pi/pj) is the equilibrium constant between species i and j [11]. Apart from the RT, the Gibbs energy function is a consequence of statistical counting, which has very little to do with the energy of the atoms in the molecules [12].
In the second type, Ising model and alike, “individual atoms” are located at fixed lattice points in a solid, each one only interacts with its neighbours. The limit of N→∞ of such an interacting particle system is known as hydrodynamic limit of the stochastic model.
Cell-cell interactions in a tissue or in a culture medium can have both types: When an interaction is mediated by rapidly diffusing small molecular factors, one can safely assume the interaction is between every two individual cells in a population. If an interaction between nearby cells is mediated by slowly diffusing molecules, or due to direct contacts via mechanical interactions, gap junctions, or synapses, then a lattice model is more appropriate. Combining these two types of mathematical descriptions leads to the “reaction diffusion” paradigm [13] which serves the foundation for describing living phenomena [14].
DISCUSSION
Statistical mechanics and Boltzmann’s law
A reader who had a course on statistical mechanics [15] will certainly recognize Z(β), F(β), and β−1 in Eq. (13) as partition function, Helmholtz free energy, and temperature, if one identifies gk as the energy of the kth state of a mechanical system. Eq. (12) then shows that F(β) = y + β−1ψ(y) where −ψ(y) should be identified as “entropy” of the mechanical system with energy y; and it is related to F(β) through a Legendre transform. Most textbooks on statistical mechanics do not tell its readers, however, the clear mathematical logic of all these formulae. But actually, Boltzmann’s 1877 paper [16], by counting the molecules with different kinetic energy in an ideal gas, had proceeded exactly the steps we took and derived the celebrated Boltzmann’s law, in the form in Eq. (11a).
Variational Bayesian method
The F(β) obtained in Eq. (13) has a very important property: For any, arbitrary, normalized distribution {zk},
| (20) |
In the variational Bayesian method for inference [17], one often knows a target, posterior distribution but computing its normalization factor is expensive. Eq. (20) shows that to obtain the target distribution, one can simply minimizes the left-hand-side of Eq. (20) among a set of possible {zk}. This same idea had also been used by Gibbs in his variational method [18]: The free energy F(β) of an equilibrium state is the minumum among all others through a virtual change of state.
Maximum entropy principle
The constrained optimization in Eq. (8) leading to distribution in Eq. (11a) has also become the foundation of maximum entropy principle (MEP) championed by Jaynes [19], which has played a productive role in data science. The axiomatic nature of MEP [20] and the role of conditional probability [21] have been elucidated.
The fundamental premises behind the large deviations principle (LDP) and the MEP are very different: Entropy, as a large deviation rate function, is used in the former to find the rare event that is the most probable, which is the only possible event in the limit: For an arbitrary set of n real values {φi},
| (21) |
where . This is the same idea in choosing only the term with the largest eigenvalue among the terms in a linear eigenvalue decomposition, in the limit of infinite time or system’s size. In MEP, however, entropy function is used as a measure for “unbias”. Actually, according to LDP, the x* in (11a) is not a probability distribution, it is the most probable frequency among N i.i.d. samples. In MEP, it is interpreted as the least biased probability distribution with maximum uncertainty.
ACKNOWLEDGEMENTS
We thank Ivana Bozic, Ken Dill, Hao Ge, Liu Hong, Matt Lorig and Wenning Wang for many helpful discussions. H. Q. is partially supported by NIH grant R01GM109964 (PI: Sui Huang) and the Olga Jung Wan Endowed Professorship.
Footnotes
COMPLIANCE WITH ETHICS GUIDELINES
This article does not contain any studies with human or animal subjects performed by any of the authors.
The authors Hong Qian and Yu-Chen Cheng declare that they have no conflict of interests.
REFERENCES
- 1.Pence CH (2011) “Describing our whole experience”: the statistical philosophies of W. F. R. Weldon and Karl Pearson. Stud. Hong Qian and Yu-Chen Cheng Hist. Philos. Biol. Biomed. Sci, 42, 475–485 [DOI] [PubMed] [Google Scholar]
- 2.Chibbaro S, Rondoni L and Vulpiani A (2014) Reductionism, Emergence and Levels of Reality. New York: Springer [Google Scholar]
- 3.Anderson PW (1972) More is different. Science, 177, 393–396 [DOI] [PubMed] [Google Scholar]
- 4.Qian H, Ao P, Tu Y and Wang J (2016) A framework towards understanding mesoscopic phenomena: Emergent unpredictability, symmetry breaking and dynamics across scales. Chem. Phys. Lett, 665, 153–161 [Google Scholar]
- 5.Ge H and Qian H (2016) Mesoscopic kinetic basis of macroscopic chemical thermodynamics: A mathematical theory. Phys. Rev. E, 94, 052150. [DOI] [PubMed] [Google Scholar]
- 6.Dembo A and Zeitouni O (1998) Large Deviations Techniques and Applications, 2nd ed. New York: Springer [Google Scholar]
- 7.Kurtz TG (1972) The relationship between stochastic and deterministic models for chemical reactions. J. Chem. Phys, 57, 2976–2978 [Google Scholar]
- 8.Liggett TM (1985) Interacting Particle Systems. New York: Springer-Verlag [Google Scholar]
- 9.Derrida B (1998) An exactly soluble nonequilibrium system: The asymmetric simple exclusion process. Phys. Rep, 301, 65–83 [Google Scholar]
- 10.Gang H (1986) Lyapunov function and stationary probability distribution. Zeit. Physik B: Cond. Matt 65, 103–106 [Google Scholar]
- 11.Chang R and Goldsby KA (2012) Chemistry, 11th ed. New York: McGraw-Hill [Google Scholar]
- 12.Qian H (2019) Stochastic population kinetics and its underlying mathematicothermodynamics. In: The Dynamics of Biological Systems, Bianchi A, Hillen T, Lewis M, Yi Y eds., pp. 149–188. Springer: New York [Google Scholar]
- 13.Murray JD (2011) Mathematical Biology II: Spatial Models and Biomedical Applications, 3rd ed. New York: Springer [Google Scholar]
- 14.von Bertalanffy L (1950) The theory of open systems in physics and biology. Science, 111, 23–29 [DOI] [PubMed] [Google Scholar]
- 15.Huang K (1963) Statistical Mechanics. New York: John Wiley & Sons [Google Scholar]
- 16.Sharp K and Matschinsky F (2015) Translation of Ludwig Boltzmann’s paper “On the relationship between the second fundamental theorem of the mechanical theory of heat and probability calculations regarding the conditions for thermal equilibrium”. Entropy (Basel), 17, 1971–2009 [Google Scholar]
- 17.Ghahramani Z (2001) An introduction to hidden Markov models and Bayesian networks. Int. J. Pattern Recognit. Artif. Intell, 15, 9–42 [Google Scholar]
- 18.Pauli W (1973) Pauli Lectures on Physics: Thermodynamics and the Kinetic Theory of Gas. Cambridge: The MIT Press [Google Scholar]
- 19.Jaynes ET (2003) Probability Theory: The Logic of Science. London: Cambridge University Press [Google Scholar]
- 20.Shore JE and Johnson RW (1980) Axiomatic derivation of the principle of maximum entropy and the principle of minimum cross-entropy. IEEE Trans. Inf. Theory, 26, 26–37 [Google Scholar]
- 21.van Campenhout JM and Cover TM (1981) Maximum entropy and conditional probability. IEEE Trans. Inf. Theory, 27, 483–489 [Google Scholar]
