Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2012 Jan 1.
Published in final edited form as: Bull Math Biol. 2010 Mar 3;73(1):116–150. doi: 10.1007/s11538-010-9524-5

Estimation of Cell Proliferation Dynamics Using CFSE Data

HT Banks a,*, Karyn L Sutton a, W Clayton Thompson a, Gennady Bocharov b, Dirk Roose c, Tim Schenkel d, Andreas Meyerhans e
PMCID: PMC2911498  NIHMSID: NIHMS195810  PMID: 20195910

Abstract

Advances in fluorescent labeling of cells as measured by flow cytometry have allowed for quantitative studies of proliferating populations of cells. The investigations (Luzyanina et al. in J. Math. Biol. 54:57–89, 2007; J. Math. Biol., 2009; Theor. Biol. Med. Model. 4:1–26, 2007) contain a mathematical model with fluorescence intensity as a structure variable to describe the evolution in time of proliferating cells labeled by carboxyfluorescein succinimidyl ester (CFSE). Here, this model and several extensions/modifications are discussed. Suggestions for improvements are presented and analyzed with respect to statistical significance for better agreement between model solutions and experimental data. These investigations suggest that the new decay/label loss and time dependent effective proliferation and death rates do indeed provide improved fits of the model to data. Statistical models for the observed variability/noise in the data are discussed with implications for uncertainty quantification. The resulting new cell dynamics model should prove useful in proliferation assay tracking and modeling, with numerous applications in the biomedical sciences.

Keywords: Cell proliferation, CFSE, Label structured population dynamics, Partial differential equations, Inverse problems

1. Introduction

Much progress in the quantification of cell population dynamics has been made in the last several years. Such improvements have allowed for the application of these methods to the investigation of questions in the life sciences in which proliferation plays a key role. For example, accurate quantification of changes in the rates at which various lymphocytes divide, differentiate and die can be used as a marker for changes in an immune response. Thus, better understanding of cell proliferation can lead to improvements in treatment for disease (such as cancer, HIV, and other viruses, etc.) progression. In the past several decades, proliferative assays have been carried out through incorporation of 5-Bromo-2′-deoxyuridine (BrdU) or tritiated thymine deoxyriboside (3HTdr) (Lyons and Doherty, 2004), both of which take the place of thymidine in the DNA of dividing cells, the latter of which is radioactive. In comparison to these two, carboxyfluorescein succinimidyl ester (CFSE) is more stably and evenly incorporated into cells, is detected easily by flow cytometry, and is nonradioactive (Bonhoeffer et al., 2000; Lyons and Doherty, 2004; Quah et al., 2007). It is not surprising then that CFSE has become the de facto staining method for many cell labeling studies (Hawkins et al., 2007; Lyons, 1999; Quah et al., 2007). The development of CFSE-based cytometry assays (Lyons and Parish, 1994) in conjunction with a Fluorescence-Activated Cell Sorter (FACS) provides biologists with the ability to measure some of these fundamental properties of a population of cells quickly and efficiently. However, there may be distinct benefits to be gained from alternative modeling approaches. To that end, the development of models of the type discussed in this work can contribute to the quantitative understanding of cellular behavior as represented by FACS data.

CFSE is introduced into a population of cells as (membrane permeable) carboxyfluorescein diacetate succinimidyl ester (CFDA-SE). After CFDA-SE diffuses across the cell membrane, enzyme reactions with cellular esterases cleave the acetate groups, resulting in highly fluorescent and membrane impermeable CFSE. At reasonable concentrations and near neutral pH, the incorporation of CFSE does not in any way adversely affect the function of the cell. As cells divide, the CFSE fluorescence intensity (FI) is split roughly evenly between the two daughter cells. Thus, measurement of FI provides an indirect measure of the number of divisions a cell has undergone. After staining, the cell population is analyzed at regular time intervals via FACS, which returns a histogram of the number of cells as a function of CFSE FI. Typically, CFSE can be used to track up to 8 rounds of division before the CFSE FI is reduced to the autofluorescence level of unstained cells. More information regarding the biological processes and experimental protocol can be found in Hawkins et al. (2007), Lyons (1999), Lyons and Doherty (2004), Lyons and Parish (1994), Matera et al. (2004), Quah et al. (2007).

While the ability to track a population of cells through multiple rounds of divisions has been greatly improved by the use of CFSE labeling and quantitative methods, there are possible improvements. In addition to the work of Luzyanina et al. (2007) upon which this work is based, many cell proliferation models have focused on describing growth and division dynamics of cells as a function of the number of divisions undergone, which is strongly correlated with CFSE intensity. The Smith–Martin cell cycle model (Smith and Martin, 1973), in which the cell cycle is divided into a stochastic resting G1 phase and a deterministic dividing SG2M phase, provides the basis for many of these papers (Bernard et al., 2003; Chao et al., 2003; de Boer et al., 2006; Ganusov et al., 2005; Hawkins et al., 2007; Le, 2004; Luzyanina et al., 2007). These models vary in structure, drawing on a range of techniques including compartmental modeling, agent-based modeling and probability-based modeling. While this is by no means an exhaustive list, there are a number of areas of study in which these approaches have contributed: immunoglobulin class switching in B cells (Hodgkin et al., 1996), cytokine regulation of T cells (Gett and Hodgkin, 2000), and surface molecule expression or internal expression of cytokines (Bird et al., 1998; Gett and Hodgkin, 1998). One notable recent effort is the paper by Lee et al. (2009), in which a generalized Smith–Martin model with division-dependent death rate is compared to a cyton model (developed in Hawkins et al., 2007). The validated cyton model is then used to study the effects of IL-4, thought to protect against apoptosis, on B cell population dynamics.

While these models are typically strongly biologically motivated, there are some drawbacks when considering their use with flow cytometry data. Some deconvolution is required of the CFSE data to obtain the number of cells as a function of the number of divisions undergone since initiation of the assay. In this process, a distribution for the CFSE across cells of the same division number is assumed, which may be reasonable in most cases. However, it is possible that there may be certain cell populations, either normal or abnormal that do not fit these distributions. Even if the assumed distribution is reasonable, it is possible that errors are made in counting some cells near the tails of the distributions corresponding to a given number of divisions. The alternative modeling approach we discuss in this manuscript is somewhat less restrictive in that it does not assume any distribution of label uptake of the cells. An additional benefit to this approach is that model solutions are directly compared to histograms obtained by flow cytometry, eliminating any possible misinterpretation of such data.

Building upon the work in Luzyanina et al. (2007, 2009), we seek to update these models (in which age is a discrete variable corresponding to division number) with a hyperbolic partial differential equation (PDE) model for the label-structured population density in which CFSE FI is a continuous state variable. Because FI intensity is determined by cellular events included in the model, it is useful to study cells as a function of FI in time. Divisions of cells are seen in model solutions, just as in histograms obtained by flow cytometry, but specifying division number is unnecessary. Meanwhile, because FI can range continuously, the nonuniform uptake of CFSE (and hence the resulting nonuniform distribution of CFSE in each generation) is preserved.

Previous work (Luzyanina et al., 2007) has already demonstrated the advantages of this type of PDE model as compared to some compartmental modeling efforts. While this model provided a reasonable way to mathematically reproduce experimentally observed proliferation dynamics, there was still room for improvement as peaks/generations of cells were incorrectly predicted by the best fit model solution. In this report, careful consideration is given to the biological and mathematical assumptions of the original model and refinements in both interpretation and parameter/mechanism formulations are proposed. First, the proliferation and death rate functions from Luzyanina et al. (2007) are redefined on a domain which strongly correlates with division number, and the proliferation rate function is changed so that it is time varying. It is shown via a model comparison test that the resulting model provides a statistically significant improved fit to an experimental data set over the previous model. Evidence is then offered to suggest that the death rate for cells should remain division dependent when compared to fitting the data set with a constant rate. Next, improvements to the treatment of the label loss rate are considered. It is found that the best fit to the data is not improved significantly when this function is either affine in form or taken to be a probability distribution of constant label loss rates within the population. For each version of the model, the parameters are estimated in an ordinary least squares inverse problem and the resulting cost functions are tested statistically to quantitatively assess the improvement in fitting to the data. Statistical models for the noise structure in the data are also discussed. It is shown that the noise in the data does not appear to have either constant variance (ordinary least squares formulation for absolute error) or variance proportional to the magnitude of the observation (generalized least squares formulation for relative error). Implications for the corresponding quantification of parameter uncertainty via confidence intervals are discussed.

The investigations in this paper demonstrate that the proposed PDE models can be used successfully in an interpretive framework for cell division dynamics and lay the groundwork for the continued refinement and extension of these models and their application to additional data sets. Specifically, two new features in the current models offer a dramatic improvement in fits to data in the context of biologically supportable model mechanisms. First, we introduce a decay dependent translation of intensity coordinate resulting in a dependent variable more strongly correlated to division number with which we define new effective proliferation and death rates. A further extension of the effective proliferation rates (which depend implicitly on time) to depend explicitly on time accounts for differences in proliferation rates beyond division dependence also results in statistically significant improved performance of the models. The ability to estimate from data these effective rates (relative to the new coordinates) allows for their comparison in healthy cell populations versus cell populations with abnormal growth and/or proliferation dynamics. The characterization of these differences can potentially improve our detection, identification and understanding of the disease or conditions responsible for the change in dynamics.

2. Preliminaries

2.1. Data

An original data set (shown in Fig. 1) containing time-series snapshots of the CFSE FI distribution of a population of dividing cells was used (Luzyanina et al., 2007). Briefly, this data set is the result of an in vitro proliferation assay with human peripheral blood mononuclear cells (PBMCs) from healthy blood donors. After isolation of the PBMCs from “buffy coats” by density centrifugation, 5 × 106 to 5 × 107 were stained with 5 μM CFDA-SE (Invitrogen, Germany) in phosphate-buffered saline containing 5% fetal calf serum (FCS). The cells were stimulated with 2.5 μg/mL phytohemagglutinine (Sigma, Germany) at time t = 0 hrs and plated in 24 well plates at 1 × 106 cells/mL RPMI-1640/10% FCS medium. Beginning at day 3, every 24 hrs one third of the medium was exchanged with fresh medium to ensure sustained cell nutrition. To not disturb the proliferating cell populations, cells from a single well were harvested for each time point. Cells were then stained with CD4 antibodies. This staining makes it possible to distinguish the CD4+ cells from other cells in the PBMC culture while simultaneously measuring CFSE expression in individual cells through FACS. This measurement process is high-speed (thousands of cells in seconds), and provides data for individual cells (Hawkins et al., 2007; Quah et al., 2007).

Fig. 1.

Fig. 1

Original CFSE histogram data.

The CFSE FI data is reported on a logarithmic scale, z = log10x where x is the CFSE FI of a given cell. The output of the measurement procedure is the counts cij of cells at time ti with log-intensity zj. The data set obtained for this report tracks cells from 0 to 120 hours in 24 hour intervals; the discretization of the z-axis into bins changes with each measurement in time (j = 1, …, J(i) for each time ti). Data for the entire population density can then be obtained by the transformation

nijd=cijTiCi,

where Ti is the total number of cells in the population at time ti and Ci is the total number of cells counted at time ti. This processing of the data was performed by Luzyanina et al. (2007) so that the data set obtained already consisted of the population density data nijd.

Note in Fig. 1 that an initially unimodal peak becomes multimodal as cohorts of cells begin to divide at different times. Each of these division peaks also slowly drifts to the left as FI is lost over time due to catabolic activity within the cell (Lyons, 1999). The noise in the data is typical for such experiments and is the result of any number of processes, from counting errors to variations in cell shape and size to the functioning of the machine itself (Luzyanina et al., 2007; Wikipedia, 2010).

As alluded to in the introduction and discussed at greater length in Luzyanina et al. (2007), Quah et al. (2007), many approaches to cell division tracking have involved the determination of the number of cells having undergone a certain number of divisions, but this information is not available from the experimental data without some sort of deconvolution technique to separate out the division peaks. These techniques invariably involve the definition of FI borders between subsequent division peaks. However, such techniques do not account for the inherent heterogeneity of the cell population due to variation in initial CFSE uptake, variability across cells in catabolism, proliferation, etc. that may result in cells at a given FI having undergone a different number of divisions. Moreover, this heterogeneity can make subsequent division peaks very difficult to resolve, introducing additional error into these traditional approaches. Hence, an advantage of the current approach is that it eliminates the need for these deconvolution techniques by explicitly accounting for the heterogeneity of the population through the use of CFSE FI as a continuous state variable.

2.2. Mathematical model

The model for the dynamics of life and death processes of a population of cells labeled with CFSE is proposed in Luzyanina et al. (2007) as a variation of a Bell–Anderson (Bell and Anderson, 1967) or Sinko–Streifer (Sinko and Streifer, 1967) population model. The solutions of these models may be directly compared to time snapshots of flow cytometry histograms, which usually depict cells of multiple generations. That is, the models aim to predict the number of cells at a given fluorescence intensity and time under specified dynamics. Let x denote the CFSE FI (in units of intensity, UI) of a cell and let n(t, x) be the label-structured population density (cells/UI) of cells with FI x at time t. Then the opulation density is governed by a hyperbolic partial differential equation (PDE)

nt(t,x)+[v(x)n(t,x)]x=(α(x)+β(x))n(t,x)+χ[xmin,xmax/γ]2γα(γx)n(t,γx), (1)

where ν(x) is the label loss rate, α(x) is the cell proliferation rate, β(x) is the cell death rate, x ∈ [xmin, xmax] and t > 0. Because cells naturally lose FI over time even in the absence of division (due to catabolic activity (Lyons, 1999)), the term ν(x) represents the natural label loss rate (UI/hr). The parameter γ is the label dilution factor, representing the ratio of FI of a mother cell to FI of a daughter cell. Division is coupled with immediate rapid growth of the new daughter cells. The observed or estimated value of γ reflects underlying dynamics (involving mechanisms regulating the growth and division) which occur on a faster time scale and have effectively been integrated over in time. A derivation of this model following the mass conservation principles of Sinko–Streifer (Sinko and Streifer, 1967) or Bell–Anderson (Bell and Anderson, 1967) models is presented in the Appendix.

Because FACS returns data on a logarithmic scale, it is convenient to make the change of variables z = log10 x. Assuming the natural label loss is proportional to the amount of label, we take ν(x) = −cx, a form which has been seen (Luzyanina et al., 2007) to better fit this data set than constant label loss assumptions. The resulting model, along with the change of variable ñ(t, z)= n(t, 10z) gives

nt(t,z)+[v(z)n(t,z)]z=(α(z)+β(z))n(t,z)+χ[zmin,zmaxlog10γ]2γα(z+log10γ)n(t,z+log10γ), (2)

where ν̃(z) = − = −c/ln 10, and α̃, β̃ are appropriately defined cell proliferation and death rates, respectively. The initial and boundary conditions for the model are

n(0,z)=n0(z)n(t,zmax)=0.

In our subsequent discussions, we will drop the tildes on the parameters α, β, c, ν, and the states n in (2) and take this equation as our fundamental model to be investigated and modified. From the structure of the above model, we deduce that some key tacit assumptions are:

  1. Division numbers are strongly correlated with FI.

  2. FI is proportional to total CFSE content (amount).

  3. Total CFSE is divided equally among daughter cells with each division.

  4. The rate of label loss v(z), the proliferation rate α(z), and the death rate β(z) do not depend on time.

Assumption (ii) appears to be tacitly made in the discussions of CFSE content or amount, CFSE FI and the definition of the parameter γ on p. 4 of Luzyanina et al. (2007) which we will understand here as being defined by the mother/daughter ratio of CFSE FI. Assumptions (i)–(iii) would imply that the state variable z is strongly correlated with, although not exactly equal to, division number. Hence, assumption (iv) would then be equivalent to stating that birth, death, and label loss rates, largely depending on division number, can be determined as functions of label intensity z. (It will later be shown, however, that we can modify some of these assumptions in their interpretation and implementation to obtain model extensions to produce significantly improved model agreement with the data.) Given the formulation (2), the goal is to use the CFSE data described in Section 2.1 in order to estimate the functions α(z), β(z), and ν(z) as well as the parameter γ. Following by now standard inverse problem procedures (Banks and Kunsich, 1989; Banks et al., 1996), these functions are parameterized by finite-dimensional approximations so that the estimation is computationally tractable (and theoretically sound—see the convergence arguments of Banks and Kunsich (1989), Banks and Pedersen (2009), Banks et al. (1996) in such inverse problem approximations). Both α(z) and β(z) are approximated by linear splines

α(z)=i=1Mαaiφi(z),β(z)=i=1Mβbiφi(z),

where φi(z) are piecewise linear spline functions satisfying

φi(zj)={1,i=j0,ij.

Note that ai = 0 or bi = 0 indicates zero birth or death, respectively, while ai = 1 or bi = 1 indicates that all cells at the given FI are dividing or dying on average once per hour, which is clearly a greater rate than these events actually occur. Thus, the parameters ai and bi are constrained to be between 0 and 1. As already noted above, it is assumed that the rate of label loss is proportional to label intensity (ν(x) = −c̄x, where ∈ ℝ+). Thus, after the change of variables, we have ν(z) = −c = −c̄/ln 10. Simulations of the model (2) demonstrate that 0.0025 ≤ c ≤ 0.0055 provides a reasonable range of possible values for the label loss rate. Similarly, γ is reasonably constrained to γ ∈ [1, 2]. For biological interpretation of this range, see Section 3.2.

The set of parameters to be estimated in the inverse problem is given in Table 1. Given a set of these parameters, the forward problem is solved numerically over a specified time interval using a publicly available vectorized version of the Lax–Wendroff method with a nonlinear filter developed by Shampine (2005) for solving hyperbolic PDEs in MATLAB. In order to obtain solution points on the nonuniform z-grid of the data points, different with each time tj, the solution was calculated (for all time points) on a grid of 500 evenly spaced points for each time point and then interpolated onto the data grid using linear interpolation with MATLAB’s interp1 routine. Throughout this report, it is assumed that [zmin, zmax] = [0, 3.5].

Table 1.

Summary of parameters to be estimated, with minimum values, maximum values, and units.

Parameter Minimum Maximum Units
ai 0 1 hr1
bi 0 1 hr1
c 0.0025 0.0055 UI/hr
γ 1 2 [none]

The ai and bi are coefficients of the effective proliferation and death functions, respectively. The parameter c scales the label loss as a function of fluorescence intensity, and γ is the observed ratio of mother to daughter cell CFSE concentration

After a brief discussion of the theory and implementation of the inverse problem in Section 2.3, computational results which attempt to fit the above model to the data are presented in Section 3. Then in an attempt to improve the fit of the model to the data, a transformed intensity variable depending on time and label loss is introduced into the model so that new effective proliferation and death rate functions are defined on a domain which is strongly correlated with division number. Next, the effective proliferation rate function (depending implicitly on time) will also be allowed to vary explicitly in time. Finally, further considerations regarding the use of a constant death rate and alternative parameterizations of the label loss rate are discussed. In each case, biological and physical justifications for the changes are given.

2.3. Mathematical and statistical aspects of the inverse problem

2.3.1. Ordinary least squares (OLS)

Given the mathematical model (2) for the label-structured population density n(t, z; θ⃗) at time t, log intensity z and parameters θ⃗ (see Table 1), the CFSE time-series histogram data constitutes a direct observation of the physical process. A statistical model for the observation process is given by

Nijd=n(ti,zj;θ0)+Eij, (3)

where n(ti, zj ; θ⃗0) is the solution to the model (2) at time ti (ti ∈ [t1,, tI ]) and log FI zj(zj ∈ [z1, … zJ (i)]) given the assumed true parameters θ⃗0 ∈ ℝp that generate observations Nijd. The statistical model (3) relies on the assumption that the random errors in the data do not depend on the magnitude of the observations themselves. The noise εij are independent identically distributed (i.i.d.) random variables with mean zero which represent the measurement error. Hence, Nijd are random variables as well, and the given data represent one realization of this random variable, i.e.,

nijd=n(ti,zj;θ0)+εij.

In the absence of any information regarding the distribution of εij, it is assumed only that the variance var(Eij)=σ02 does not depend on t or z (because εij are i.i.d.). The ordinary least squares (OLS) estimator is defined as

θOLS=argminθΘi=1Ij=1J(i)(Nijdn(ti,zj;θ))2, (4)

where Θ is a set of admissible parameters for the model. Given the data as a realization for this random variable, the OLS estimate is

θOLS=argminθΘi=1Ij=1J(i)(nijdn(ti,zj;θ))2=argminθΘFOLS(θ). (5)

The MATLAB constrained global minimization package fmincon, a gradient-based method for problems where the objective and constraint functions are continuous and have continuous first derivatives, was used to solve for the OLS estimate θ̂OLS.

The true covariance σ02 of the random variables εij is given by

σ02=1NE[i=1Ij=1J(i)(Nijdn(ti,zj;θ0))2]

where N=i=1IJ(i). The bias-corrected estimate of the variance with observations { nijd} is

σ^OLS2=1NpFOLS(θ^OLS).

One can also use an asymptotic theory involving sensitivity matrices for the corresponding covariance matrix Σ(θ̂OLS) associated with the estimated parameters. This can be used to compute standard errors and confidence intervals (see the detailed discussions in Banks et al., 2009; Banks and Tran, 2009; Seber and Wild, 2003).

Rewriting Eq. (3), we have Eij=Nijdn(ti,zj;θ0). Thus, the residuals

rij=nijdn(ti,zj,θ^OLS) (6)

are a realization of the error in the data and, given a mathematical model for a set of data with constant variance (CV), these residuals should be randomly distributed (Fig. 2, left) when plotted against the model values, n(ti, zj ; θ̂OLS). In many situations, the variance in the observations is not constant but is proportional to the magnitude of the observations. If OLS is used with such data, the same plot will not be random but rather will exhibit a characteristic fanning pattern (Fig. 2, right), indicating dependence of the residuals on model quantities. In that case, a generalized least squares (GLS) procedure, described in the next section, is appropriate. Thus, the shape of the residuals after data fitting by a least squares minimization provides a means of investigating the reliability of the assumptions of the statistical model.

Fig. 2.

Fig. 2

Left: OLS Residuals vs. Model for observations with constant variance. Right: OLS Residuals vs. Model for observations with nonconstant variance.

2.3.2. Generalized least squares (GLS)

When observational error is proportional to the magnitude of the observation, the statistical model is given by

Nijd=n(ti,zj;θ0)(1+Eij), (7)

where εij are defined as before. Then var(Nijd)=σ02n2(ti,zj;θ0). Accordingly, the generalized least squares (GLS) cost functional weights the observations according to their variance and the GLS estimator is

θGLS=argminθΘi=1Ij=1J(i)wij(Nijdn(ti,zj;θ))2=argminθΘFGLS(θ), (8)

where the weights are

wij={n2(ti,zj;θ),n(ti,zj;θ)N0,n(ti,zj;θ)<N. (9)

This definition of the weights prevents the GLS algorithm from giving unreasonably large weight to points where the model is near zero (Banks and Samuels, 2009). The value of N* is chosen so that the maximum number of data points are chosen without the minimization algorithm stagnating or converging erroneously as a result of assigning little weight to larger-magnitude observations. The computation of the GLS estimate θ̂GLS is an iterative procedure and is given as follows:

  1. Compute the OLS estimate θ^OLSk according to Eq. (5). Set k =0.

    Form the weights

    wijk={n2(ti,zj;θ^OLSk),n(ti,zj;θ^OLSk)N0,n(ti,zj;θ^OLSk)<N.
  2. Compute the approximation to the GLS estimate

    θk+1=argminθΘi=1Ij=1J(i)wijk(nijdn(ti,zj;θ))2.
  3. Repeat steps (ii) and (iii), incrementing k by one with each iteration, until the successive estimates satisfy

    i=1pθik+1θik2<ε,

    where ε is small, representing the predetermined desired convergence.

A more complete discussion of the GLS motivation, algorithm, and convergence can be found in Banks et al. (2009), Banks and Tran (2009) and the references therein. The result of the algorithm is the GLS estimate θ̂GLS. The variance coefficient may be estimated as

σ^GLS2=1NpFGLS(θ^GLS). (10)

Given these values, an asymptotic theory similar to that for the OLS estimator can be invoked to quantify uncertainty in the GLS estimate using standard errors and confidence intervals (again see Banks et al., 2009; Banks and Tran, 2009 for details).

Rewriting Eq. (7), we have Eij=(Nijdn(ti,zj;θ0))/n(ti,zj;θ0). Thus, the residuals

rij=(nijdn(ti,zj;θ^GLS))/n(ti,zj;θ^GLS) (11)

are a realization of the error in the data and should have constant variance. When GLS estimation is used with data in which the noise is proportional to the magnitude of the observations, and hence data for which this statistical model is appropriate, these residuals should appear random when plotted as a function of the model values n(ti, zj ; θ̂GLS). Figure 3 depicts a typical example of these residuals for data with variance proportional to the magnitude of the observation.

Fig. 3.

Fig. 3

Modified residuals for observations with nonconstant variance.

2.3.3. Model comparison test

When forming a mathematical model in an effort to describe a physical, biological, or sociological process, there is often the possibility of adding an additional term and/or mechanism into the model in an effort to better fit the data. In the case of such a re-finement, the resulting cost function (OLS or GLS) will at least remain unchanged but will likely decrease because of the additional degrees of freedom (in essence, minimizing over a less constrained set). However, the question must be asked whether the resulting decrease in the cost function reflects a significant improvement in the model fit to data, beyond a simple increase in degrees of freedom, to warrant the addition of that mechanism into the model. Alternatively, it may be sufficient to employ the less sophisticated mathematical model with fewer parameters. In this section, mathematical and statistical tools are used to help answer these questions are summarized.

In the statistical models discussed above, it is assumed that the true parameter θ⃗0 is contained in some set Θ of admissible parameters. Following the notation of Banks and Tran (2009), consider the constrained parameter space ΘH, a subset of the parameter space of the more complex model Θ, defined by

ΘH={θΘHθ=d} (12)

where H is an r × p matrix with full rank and d is an r-dimensional vector. The goal is to develop a statistical test of the null hypothesis, H0: θ⃗0ΘH. Let F(θ⃗) be the cost function (OLS or GLS) associated with a given model and data set and define

θ^H=argminθΘHF(θ),θ^=argminθΘF(θ).

Then it is possible to define the test statistic

UN=N[F(θ^H)F(θ^)]F(θ^), (13)

where N is the total number of data points. It follows from an asymptotic theory (see Banks and Fitzpatrick, 1989, 1990; Banks and Kunsich, 1989; Banks and Tran, 2009) and references therein), different from those mentioned above for standard errors and confidence intervals that if H0 is true, UN converges in distribution (as the number of data points go to infinity) to a χ 2 distribution with r degrees of freedom. Moreover, if η = P(U > UN) where U ~ χ2(r), then we may reject H0 with confidence (1 − η)100%.

For example: suppose in the parameter set shown in Table 1 that four nodes are being used in the estimation of the death rate function β(z), b1, …, b4. For simplicity, it will also be assumed that these are the only four parameters being estimated; that is c, {ai }, and γ are fixed. It may be of interest whether or not this death rate could be treated as a constant function. Then the restricted set of admissible parameters ΘH is given by (12) where

H=(110001100011),d=(000).

Note in this example that r = 3 for the χ2 test. Such examples along with others are discussed in Banks and Kunsich (1989), Banks and Tran (2009).

3. Model modifications

We first discuss the best fit we can obtain with the original model (given in Eq. (2)). The same data set was fit in Luzyanina et al. (2007), but exact values were not given for the proliferation and death rates (although they were depicted graphically). Also, we did not perform Tikhonov regularization and, therefore, our problem is not convexified to ensure only one minimum for a modified objective functional. Instead, we begin with the best fit parameters from Luzyanina et al. (2007), with the proliferation and death rates estimated to the best of our ability, and proceeded to obtain an OLS estimate of the parameters. In Section 3.2, we propose modifications to the model (2), and accept or reject these changes based on whether they result in a statistically significant improvement of the agreement between model solutions and experimental data.

3.1. Data fit with the original model

First the original model (2) is fit to data. The resulting discrepancies between the observed data and the best fit model solution are then used to provide motivation for improvements to the model. For the current discussion, focus will remain on the OLS estimation of the parameters. The alternative use of the GLS procedure is discussed in Section 4.

The death rate of cells is expected to vary little after the initial rounds of divisions. Thus, only four nodes are used in the estimation of the death rate function. In the region z ∈ [0, 2.5], the death rate is treated as a constant, β(z) = b1. The proliferation rate function, on the other hand, should vary with division number and more nodes are needed (particularly in regions of rapid division). The placement of the nodes for the proliferation and death rate functions is given in Tables 2 and 3, respectively. These nodes were chosen after considerable trial and error running both forward simulations and inverse problem parameter estimations. Nodes chosen too close together cause instabilities due to over-discretization, while nodes chosen too far apart do not provide sufficient information regarding the behavior of the population. After OLS minimization, the best fit parameter estimate θ̂OLS had a cost FOLS(θ̂OLS) = 3.2112 × 1012. The best fit proliferation and death rate functions are depicted in Fig. 4. The best fit values for the label loss rate and label dilution factor were ĉ = 0.004421 and γ̂ = 1.5751. The model solution evaluated at the best fit parameters is shown in comparison to the data in Figs. 5 and 6.

Table 2.

Nodes and estimated values (OLS) for the proliferation rate function in the original formulation using Eq. (2)

zi ai
1.2500 0.0020
1.5000 0.0112
1.6250 0.0169
1.7500 0.0161
1.8750 0.0091
2.0000 0.0222
2.1250 0.0015
2.2500 0.0505
2.3750 0.0117
2.5000 0.0027
2.6250 0.0231
2.7500 0.0016
2.8750 0.0076
3.0000 0.0076

Table 3.

Nodes and estimated values (OLS) for the death rate function in the original formulation using Eq. (2)

zi bi
2.5000 0.0085
2.7500 0.0248
3.0000 0.0000
3.5000 0.0000

Fig. 4.

Fig. 4

Graphical presentation of estimated (OLS) birth and death rate functions α(z) (left) and β(z) (right).

Fig. 5.

Fig. 5

OLS best fit model solution to original PDE formulation with Eq. (2) in comparison to the data: t = 0, 24, 48 hrs.

Fig. 6.

Fig. 6

OLS best fit model solution to original PDE formulation with Eq. (2) in comparison to the data: t = 72, 96, 120 hrs.

3.2. Modified model

It is clear from Figs. 5 and 6 that the fit of the model to the data can be improved. It appears that the model is not quite capturing the dynamics of the cells in the proliferation assay. Moreover, the label dilution factor γ must be less than 2 in order to accurately fit the data. If we were to interpret the measurement of fluorescence as a mass measurement, or a measurement of total quantity (assumption (ii)), then in the model (2), this definition would require γ = 2. Values of γ < 2 would then imply a creation of label during proliferation, or that the amount of CFSE in the mother cell is less than twice that of the daughter cell. Clearly, this is biologically infeasible and physically implausible so the interpretation is incorrect and FI is not a mass measurement. However, if we interpret the fluorescence intensity as a concentration measurement (as discussed in Lyons, 1999; Lyons and Parish, 1994; Wikipedia, 2010), and the same amount of CFSE in a larger volume fluoresces at a lower intensity, then a feasible range for γ is (Banks et al., 1988, 2003). The reason this parameter is not exactly γ = 1 is that while the concentration immediately after division of the daughter cells is identical to that of the original mother cell, a marked growth in cell volume immediately following division results in a dilution of the CFSE, and daughter cells are detected at a lower fluorescence intensity. However, the daughter cells do not fluoresce at half the original intensity (represented by γ = 2) until they reach the size of the mother cell, which occurs much later as the cell progresses through the nonmitotic phases of the cell cycle. While not actually a change in the mathematical model, this change in interpretation of the parameter γ provides for a more satisfactory discussion of the results of the model. The parameter γ effectively represents cellular mechanisms governing the timing of initial growth of daughter cells after division. These underlying dynamics likely occur on a much faster scale than the other observed processes in this model and have essentially been integrated over in time and are absorbed in the parameter γ. Thus, it is not likely that its value will be estimated in this type of experimental setting as the precise value γ = 1 which would reflect the strict biological interpretation as the ratio of mother to daughter label concentration.

Because of the natural loss of FI over time due to catabolic activity, assumption (i) is not to be taken in an exact sense—histograms of cells as a function of FI can be used to see division as it occurs, but no simple relationship between FI and division number exists. While the cells fluorescing at the intensity of the peaks seen in the data have all likely undergone the same number of divisions, cells of fluorescence intensity in the valleys in the data may have undergone different numbers of divisions since the start of the experiment. The natural label loss causes cells to slowly drift to the left on the FI scale. For that reason, no particular region of FI can be definitively linked to a particular division number in Fig. 1. However, we can define a new variable s = z + ct in terms of which we can more intuitively define the proliferation and death rates. As we shall see, this time dependent translation of the domain can be seen (in Fig. 7 below) to result in a data plot which does, in fact, correlate well with division number. (To see this, compare Fig. 7 below to Fig. 1.) To better capture this in our model, we replace the rates α(z), β(z) in our model by new translated effective proliferation and death rates α(s) = α(z + ct), β(s) = β(z + ct), respectively. It should be noted that the label loss rate function ν(z) does not undergo a similar translation/replacement because of the assumption that label loss rate depends on CFSE label intensity or FI, but not division number.

Fig. 7.

Fig. 7

Original data sets shown in translated log intensity s = z + ct, with c = ĉ = .0032888, as estimated with the OLS procedure for the modified model (14). Note that subsequent division peaks are now strongly correlated with specific regions in the state variable, unlike the original log intensity variable z (see Fig. 1).

In an attempt to interpret these effective rates of proliferation and death introduced in the efforts here, we recall for readers that, while not common in the biological sciences, it is altogether common in the physical sciences and engineering to consider velocities (i.e., rates of change) relative to different coordinate or reference frames. For example, in mechanics and motion of continua (elasticity and fluids) and deformable bodies (Banks and Lybeck, 1996; Fung 1993, 1994; Marsden and Hughes, 1994; Ogden, 1984), it is frequent to encounter velocities relative to a fixed coordinate system (in a Lagrangian formulation) or relative to a moving coordinate system (in an Eulerian formulation). More precisely, when analyzing the deformation or motion of solids, or the flow of fluids, it is necessary to describe the sequence or evolution of configurations throughout time. One description for motion is made in terms of the material or fixed referential coordinates, and is called a material description or the Lagrangian description—also called an initial/referential, material, undeformed, or fixed frame formulation. In this formulation, an observer standing in the fixed referential frame observes the changes in the position and physical properties as the material body moves in space as time progresses. In other words, this formulation focuses on individual particles as they move through space and time. The other description for motion is made in terms of the spatial or current coordinates, called a spatial description or Eulerian description—also called a current/present, space, deformed, or moving frame formulation. In this approach, one focuses on the current configuration of the body, giving attention to what is occurring at a moving material point in space as time progresses. That is, the coordinate system is relative to a moving point in the body, and hence is a moving coordinate system. An intuitive comparison of these two descriptions would be that in the Eulerian description one places the coordinate or reference system for motion of an object on the object as it moves through a moving fluid (e.g., on a boat in a river) while in the Lagrangian description one observes and describes the motion of the object from a fixed vantage point (e.g., motion of the boat from a fixed point on a bridge over the river or on the side of the river.).

Finally, biological evidence suggests that the proliferation rate might also depend in some way on time (Hawkins et al., 2007) in addition to division number, as the time to first division is clearly seen to differ from that of subsequent divisions. The assumption that the death rate depends only on division number is supported in literature (Hawkins et al., 2007; Luzyanina et al., 2007; Quah et al., 2007), and is not investigated here. The consideration of time dependence in the proliferation rate as introduced here will later be tested (with positive affirmation) for statistical significance, which is one way of determining whether the effect might be present in the experimental data. This is also seen in the data set. At t = 24 hours, all cells are still in the original (undivided) generation, indicating a proliferation rate of zero in the first 24 hrs. However, these cells do divide, producing the additional peaks seen at later times. Thus, it must be that α (at least as represented in the data) changes in time. In a given longitudinal in vitro data set such as we are considering here, the apparent proliferation rate does depend on time in some specific way related both to division times and label loss. Indeed, in defining the effective proliferation and death rates, we introduce time implicitly into the rates. Thus, permitting explicit time dependence in the effective proliferation rate α is a rather natural modification to also consider.

Taking all of these considerations into account, we modify the model (2) to obtain

n(t,z)t+[v(z)n(t,z)]z=(α(t,z+ct)+β(z+ct))n(t,z)+χ[zmin,zmaxlogγ]2γα(t,z+ct+logγ)n(t,z+logγ). (14)

Again, the ultimate goal is to estimate the functions α(t, s), β(s), and ν(z) and the parameter γ such that the model best fits a given set of data. Because the death rate β(s) is expected to be relatively constant after the initial rounds of division, it will be treated as a constant, β(s) = b1 for all s ∈ [0, 2.5]. For s ∈ [2.5, 4], β(s) will be constructed as a piecewise linear function,

β(s)=i=1Mβbiφi(s), (15)

where φi(s) are piecewise linear splines defined as before. The proliferation rate function α(t, s) will be represented by linear combinations of products of one dimensional piecewise linear splines ξji, i.e.,

α(t,s)=i=1Mαtj=1Mαzaijξj(t)ψi(s). (16)

As before, aij = 0 or bi = 0 indicates zero birth or death, respectively, while aij = 1 or bi = 1 indicates that cells at the given location in time and/or translated log are dividing or dying once per hour. This is clearly much faster than the actual or true value, so the interval [0, 1] should cover the true value for each aij and bi. To reduce the total number of parameters, α(t, s) will be set to zero for s ∈ [0, 1]. No cells enter this region, and hence the birth rate is arbitrary there. The s-nodes are not evenly spaced (see Table 4) but were chosen after considerable trials with both forward simulations and parameter estimations. As a general rule, the s-nodes need to be closely spaced in order to accurately model the data. Choosing too few nodes results in a poor estimation of the proliferation rate, often with the result of additional generations of cells appearing too early in the model solution. Choosing too many nodes leads to over-discretization and the additional need for some type of regularization. Here, we employ so-called regularization by discretization as described in Banks and Iles (1987), Banks and Kunsich (1989). The time nodes are evenly spaced every 12 hours in the region t ∈ [36, 120]. For t ∈ [0, 24], α(t, s) is set to zero as it is clear from the data that no proliferation occurs during this time.

Table 4.

Best fit (OLS) parameters shown along with the s (left column) and t (top row) nodes for the proliferation rate α(t, s). The function α(t, s) is shown graphically in Fig. 8. Note that α(0, s) = α(12, s) = α(24, s) = 0 and was not estimated

sj\ti 36 48 60 72 84 96 108 120
1.2500 0.0000a 0.0000a 0.0000a 0.0000a 0.2023 0.0158 0.0004 0.0002
1.5000 0.0000a 0.0000a 0.0000a 0.0000a 0.0187 0.0020 0.0014 0.0159
1.6250 0.0000a 0.0000a 0.7755 0.0045 0.0000 0.0022 0.0152 0.0226
1.7500 0.0000a 0.0000a 0.0468 0.1303 0.0130 0.0179 0.0342 0.0684
1.8750 0.0000a 0.0000a 0.0309 0.0231 0.0496 0.0143 0.0079 0.0796
2.0000 0.0000a 0.0000a 0.0000 0.0000 0.0435 0.0588 0.0415 0.1467
2.1250 0.1103 0.4173 0.0876 0.0000 0.1019 0.0001 0.0447 0.0715
2.2500 0.0082 0.0001 0.0014 0.0390 0.0001 0.1491 0.1954 0.2255
2.3750 0.1354 0.0002 0.1775 0.0055 0.0806 0.0000 0.0868 0.1318
2.5000 0.0111 0.0003 0.0020 0.2003 0.0000 0.2007 0.1694 0.3725
2.6250 0.0015 0.0148 0.2894 0.0000 0.1658 0.0006 0.0761 0.2036
2.7500 0.0008 0.1159 0.2415 0.3985 0.0011 0.1409 0.4961 0.2110
2.8750 0.0159 0.0013 0.0230 0.1588 0.0000 0.0514 0.0005 0.4284
3.0000 0.0000 0.0378 0.0005 0.2418 0.0064 0.0870 0.0035 0.9661
a

Parameter was not estimated but was set to zero as there are no cells observed in our data set at these fluorescent levels at the given times

Also as before, it will be assumed that the rate of label loss is proportional to total label concentration, resulting in ν(z) = −c in the model (14). This form was found in Luzyanina et al. (2007) to provide a better fit to this data set as compared to constant label loss. Forward simulations indicate c ∈ [0.0025, 0.0055] to be a reasonable range. The ranges of values for the parameters to be estimated in the modified model (14) are then no different from those given in Table 1, for estimating parameters in the original model (2).

We used the modified model (14)–(16) in an OLS procedure with the data. The corresponding OLS estimation results in ĉ = 0.003288 and γ̂ = 1.5169 with a cost FOLS(θ̂OLS) = 5.3181 × 1011. Using this value of c in the translated log intensity coordinate s = z + ct, we plotted the experimental data relative to this coordinate. As seen in Fig. 7, this results in a data plot that correlates remarkably well with division number. The corresponding estimated effective proliferation rate function α̂(t, s) is depicted graphically in Fig. 8 with node values given in Table 4. Similarly, the estimated effective death rate function β̂(s) is shown in Fig. 8 with nodal values given in Table 5. The model solution evaluated at the best-fit parameter vector is compared to the data in Figs. 9 and 10.

Fig. 8.

Fig. 8

Graphical representation of the best fit (OLS) death rate β(s) (top) and proliferation rate α(t, s) (bottom). Numerical values are given in Tables 4 and 5.

Table 5.

Best fit (OLS) parameters shown along with the s nodes (left column) for the death rate β(s). In the region s ∈ [0, 2.5], β(s) = b1 = 0.0665. The functions β(s) is shown graphically in Fig. 8

si bi
0.0000 0.1003
2.5000 0.1003
2.7500 0.0237
3.0000 0.0000
4.0000 0.0000

Fig. 9.

Fig. 9

Improved model solution evaluated at the best fit (OLS) parameters in comparison to the original data: t = 0, 24, 48 hrs.

Fig. 10.

Fig. 10

Improved model solution evaluated at the best fit (OLS) parameters in comparison to the original data: t = 72, 96, 120 hrs.

The improvement of the fit to the data is substantial both visually and in terms of lowering the OLS cost function value. The assumption of time dependence for the birth rate function appears justified not only biologically but also by the model/data fits. To verify that this is not merely due to an increase in the number of degrees of freedom, we use the model comparison statistic to test if the reduction in residual sum of squares is statistically significant. When the original model (of Eq. (2) with α not time dependent and only 20 unknown parameters: c, γ, 4 nodes for β(z), 14 nodes for α(z)) is only modified by allowing for time dependence in the proliferation rate α(t, z) with now 102 unknown parameters, the statistic corresponding to the resulting additional r = 82 degrees of freedom is given by

UN=N(3.2112×10129.8423×1011)9.8423×1011=12110

for N = 5352. This statistic suggests that it is unlikely at extremely high levels of confidence that this improvement in residual would have occurred by chance or by simply the act of increasing the number of degrees of freedom in the model, and supports the inclusion of explicit time dependence in the proliferation rates. We remark that in any such efforts with nonconstant parameters, one could (as is often done in general parametric estimation) substantially reduce the number of parameters to be estimated by employing a distributional form (a reduced order parameter shape or representation) instead of the general parameter representations as in (15)–(16). However, just as in using a maximum likelihood estimator (where assumption of a distributional representation for measurement error is required) or in general Bayesian approaches to estimation, one often loses information by an incorrect assumption regarding an a priori form for the distribution or shape function being estimated. In any case, this part of the model analysis strongly suggests that explicit time dependence in proliferation rates is important in accurately representing and understanding such data sets.

The translation of the intensity variable, in addition to its justification by the experimental setup, provides greater insight into how the proliferation rate varies both in time and with subsequent generations of cells. This modification does not change the number of parameters so we can directly look to the value of the cost functional, which is lower at FOLS(θ̂OLS) = 5.3181 × 1011. As a means of reference, the model solution in the translated intensity coordinate is graphed in Fig. 11. Note that the subsequent generation peaks align very closely. Compare also the data in the translated coordinate (Fig. 7).

Fig. 11.

Fig. 11

Best fit (OLS) model solution shown in terms of the translated coordinate s = z + ct.

There is some evidence (Luzyanina et al., 2007, 2009) that the death rate function might be treated as a constant, β(s) = β, thereby reducing the total number of parameters while still accurately fitting the data. Adding this restriction to the present formulation (14) and running the OLS estimation procedure again, we obtained a best fit cost of FOLS(θ̂OLS) = 8.2045 × 1011. The corresponding test statistic is UN = 2905, for which it may be concluded from the χ2(3) distribution with very high confidence (>99.999%) that β(z) cannot be treated as a constant for the current data set.

It has been assumed that label loss is strictly proportional to CFSE concentration (ν(x) = −cx; ν(z) = −c), but it may in fact vary in a more complex way based upon the nature of the catabolic activity within the cell. Thus, the possibility is considered that the label loss function might have the form ν(z) = ν0zc. Returning to the OLS minimization, we obtained the result ν̂0 = 7.8921 × 10−5 with a corresponding cost FOLS(θ̂OLS) = 5.3152 × 1011. The other parameters in the model remained largely unchanged. The resulting test statistic (for ν0 = 0) is UN = 2.92, and the additional term in the form of the label loss rate would only be supported with 91.25% confidence. The fit of this model to the data is not noticeably different from that depicted in Figs. 9 and 10.

Considering other possibilities, we note that there is also a large body of work on so-called Growth Rate Distribution (GRD) (or Class Rate Distribution (CRD)) models (Banks et al., 1988; Banks and Davis, 2007; Banks and Fitzpatrick, 1991). Adapted to the present application, these models assume that the population is divided into small groups of individuals which share a common label loss rate (the “class” rate) within the group. This is in place of assuming the affine term in the above paragraph. The dynamics of the total population are then defined by the probability distribution of label loss rates within the population. Such variability within the overall population may readily be biologically justified by the variety of catabolic mechanisms underlying the label loss. With a slight change of notation, let n(t, z; ck) be the structured population density of a cohort of cells all of which have label loss rate ck. We consider only the parameter c as being distributed as the affine term has already been shown to offer no statistically significant improvement. For simplicity, assume there are a finite number of ck ’s and that there is a discrete probability measure P defining their distribution within the total population. Then the total population is given by

N(t,z;P)=kn(t,z;ck)pk.

The estimation of the probabilities pk can be reduced to a quadratic programming problem (as described in Banks and Davis, 2007) which can be quickly and easily solved via MATLAB’s quadprog routine provided the other parameters of the model are fixed. For the current problem, 21 evenly spaced label loss rates in the region c ∈ [0.0015, 0.0055] were considered. In the interest of computational efficiency, fminconand quadprog were used in an alternating fashion in a hybrid algorithm. First, fmincon was used to estimate the other parameters leaving the pk fixed, and then quadprog was used to estimate the pk holding the other parameters fixed. Doing so results in a final cost of FOLS(θ̂OLS) = 5.3137 × 1011, which gives a model comparison statistic of UN = 4.43. This is not sufficient to warrant the inclusion of this additional complexity when UN is compared to the critical values for a χ2(20) distribution. (For similar results in other modeling attempts, see Banks et al., 2003.) The estimated distribution of label loss rates within the population is depicted in Fig. 12. The modifications made from the model in Eq. (14) and the corresponding statistics are summarized in Table 6.

Fig. 12.

Fig. 12

Estimated probability distribution of label loss rates c within the population.

Table 6.

Model comparison statistic for the modifications in the table: constant death rate β, affine label loss ν(z) = ν0zc, and distributed label loss ν(z) = c ~ P. These are all compared to the version of the model in Eq. (14) with α(t, s), β(s), ν(z) = −c, with a cost functional of FOLS(θ̂OLS) = 5.3181 × 1011. Note the row labeled r contains the degrees of freedom for the χ2 distribution that the model comparison statistic UN is to be compared. The bottom row depicts whether or not the more complex model is supported by the statistic

β ν(z) = ν0zc ν(z) ~ P
FOLS(θ̂OLS) 8.2045 × 1011 5.3152 × 1011 5.3137 × 1011
UN 2905 2.92 4.43
r 3 1 20
Improvement Yes No No

Together, these improvements result in a final model of the form

n(t,z)t+[cn(t,z)]z=(α(t,z+ct)+β(z+ct))n(t,z)+χ[zmin,zmaxlogγ]2γα(t,z+ct+logγ)n(t,z+logγ). (17)

It is worth noting that this particular model, at least at the present, has only been modified to fit the particular data set shown. It is certainly possible that there are biomechanisms which are not manifested in the available data set, and hence are not incorporated into this model. As this mathematical model is applied to different cell types and proliferation assays, it is expected that the model may take on slightly different forms. Still, the techniques used in obtaining this improved model as well as the overall form of the model lay a solid foundation for future work. Given the accuracy of the current model in replicating the experimental data, it is now reasonable to turn to a discussion of the validity of the statistical assumptions underlying the OLS minimization procedure.

4. Estimation improvements

To this point, our focus has remained on the OLS formulation of the inverse problem. Given the best fit model (17) and the best fit OLS estimate θ̂OLS, the resulting residuals rij versus the model values are plotted in Fig. 13.

Fig. 13.

Fig. 13

OLS residuals as a function of model value for each time measurement.

On one hand, these residuals certainly do not exhibit the fan-like structure characteristic of data sets in which noise is proportional to the magnitude of the observation (see Fig. 2). However, they do not appear to be random either, slowly growing in variation as the model values increase. Thus, it appears that the assumption of constant variance may not be particularly accurate, and perhaps another assumed error structure should be investigated, resulting in a more general least squares estimation approach, distinct from the two somewhat standard formulations used here.

The results (e.g., model fits, estimated parameter values) of the GLS procedure under the assumption of relative error (i.e., statistical model (7)) that we obtained are sufficiently similar to the results presented in the previous section so that separate graphics and tables are not included here. What is of interest for the current discussion are the residuals ij, shown in Fig. 14. It is clear that the GLS residuals are not random but slowly decay in time. The conclusion then, is that the underlying statistical model for the variation in our data may have neither CV noise nor noise which is proportional to the magnitude of the observation. This is not surprising given the complexity of the error in the observation or measurement process (see Wikipedia, 2010 for a discussion of the general analysis of error in data collection procedures). We remark that we do not use log likelihood estimation or error quantification methodology here because such an approach explicitly (by the form of the likelihood function employed) implies that we know a priori the distribution for measurement error in the underlying statistical model (e.g., see the discussions in Banks et al., 2009).

Fig. 14.

Fig. 14

GLS residuals as a function of model value for each time measurement.

In order to accurately and correctly compute standard errors and confidence intervals, the assumptions of the underlying statistical model must be reasonably correct (for the asymptotic formulae to be meaningful; see Banks et al., 2009). However, it appears that the error is not represented well by the statistical models in either the OLS (with constant variance error) or GLS (with proportional error) formulations. Therefore, estimations of the reliability of the parameter estimates (e.g., standard errors and confidence intervals) would be invalid and are not pursued further in this work. One could, of course, use bootstrapping to compute standard errors, but that is extremely computationally expensive for our problem and also involves some underlying assumptions (Banks et al., 2010), perhaps unsatisfied for our problem. Indeed, the error structure of the observed data may take several different forms. One possibility is that the noise may be proportional to some power λ of the observations

Nijd=n(ti,zj;θ0)+nλ(ti,zj;θ0)Eij,

where λ is now an additional parameter to be estimated (see Carroll and Ruppert, 2000; Davidian and Giltinan, 1995 for discussions). More generally, the noise could depend on some general function g of n(ti, zj ; θ⃗0):

Nijd=n(ti,zj;θ0)+g(ti,zj;θ0,n(ti,zj;θ0))Eij.

The determination of the parameter λ or the function g represents a difficult computational challenge and is a topic for future work that will require more careful analysis of the data collection process in the context of multiple data sets. Once ascertained and verified by the appropriate residual plots, the statistical model could then be used to quantify uncertainty in the estimates of the parameters of the model.

It is worth one further remark to note that the inaccuracies in the underlying statistical models do not invalidate the entire parameter estimation procedures we have pursued. The various methods of parameter estimation (OLS, GLS, etc.) all give very similar results and the figures of the previous sections demonstrate the accuracy of those results. The determination and validation of the statistical model simply provides an improvement to the estimation procedure that would permit the accurate quantification of uncertainty in the resulting estimated parameters.

5. Concluding remarks

The use of CFSE-based proliferation assays has been and will continue to be a powerful technique for monitoring a dividing cell population. Coupled with the high-throughput capacity of FACS, there are near limitless possibilities for the use of this technology to track cell divisions and division dependent changes. In this report, a mathematical model governing the population density of a population of proliferating CD4+ lymphocytes is proposed and its remarkable agreement with a FACS data set is demonstrated. This approach provides an alternative to current modeling efforts and does not require prescribed distributions on CFSE or other birth and death events. It also allows for direct comparison of solutions to data without preprocessing.

The current model demonstrates that the effective rate of proliferation within a population of cells depends not only on the number of divisions undergone, but also on the elapsed time since stimulation, as supported by a model comparison statistic. In addition to quantifying this effect, the estimation of the proliferation and death rate functions defined on a translated domain (which correlates more closely with division number than CFSE intensity alone at the current time) permits a better quantitative understanding of how these rates change with division number.

The effective rates of proliferation α(t, z + ct) and death β(z + ct) introduced in our models here can be correctly viewed as rates relative to the moving coordinate system s = z + ct (which corresponds more closely with division number) as compared to rates relative to the more obvious fixed coordinate z of log intensity. As in other fields of science and engineering, it can also be valuable in biological rate estimation to use such quantities to compare, characterize, or delineate cell populations with respect to their normality or lack thereof (as in diseased, infected, etc., cell populations). Specifically, if one can effectively use inverse problem techniques to reliably estimate these effective or relative rates from CFSE data on cell populations, this could lead to a significant new disease identification procedure.

Analysis of the statistical models underlying the noise in the data reveals that the noise in the data may have neither constant variance nor variance proportional to the magnitude of the observation itself. While this does not invalidate the results presented, it does suggest a possible direction for future computational work which will aid in the accurate quantification of the uncertainty associated with the estimated parameters in such problems.

While the current report focuses only on cell proliferation and death rates in terms of the number of divisions undergone, other division dependent properties (cell surface marker exhibition, cytokine content, etc.) can also be measured simultaneously by FACS during a proliferation assay (Quah et al., 2007). This, along with the applicability of the technique to a wide variety of cell types, has potentially profound applications in oncology (cancer metastasis and differentiation from normal cells), virology (latent viruses, HIV), and immunology (allergens, tissue grafting), either in the context of an interpretive framework, as a diagnostic tool, or even as part of a control mechanism (see, e.g. Bellomo and Preziosi, 2000; Gyllenberg and Webb, 1990; Hawkins et al., 2007; Komarova, 2006; Komarova and Wodarz, 2007).

Acknowledgments

This research was supported in part by Grant Number R01AI071915-07 from the National Institute of Allergy and Infectious Diseases in part by the Air Force Office of Scientific Research under grant number FA9550-09-1-0226, in part by the Russian Foundation for Basic Research and in part by the Program of the Russian Academy of Sciences “Basic Research for Medicine”. The authors are grateful to Dr. T. Luzyanina and two anonymous referees for constructive comments on these efforts.

Appendix: Derivation of model

The original PDE model proposed in Luzyanina et al. (2007) is a case of the Bell–Anderson model (Bell and Anderson, 1967) for populations which divide by fission. It was published like the Sinko–Streifer model (Sinko and Streifer, 1967) in 1967, and arrives at the same general equation form. Let n(t, x) be the population density of a labeled population with structure variable x, where x represents the fluorescence intensity (FI) of a cell. Then

N(t)=x0x1n(t,x)dx. (A.1)

represents the total number of cells with fluorescence intensity in (x0, x1). Here, x0 and x1 are arbitrary with the exception that [x0, x1] ⊂ [xmin, xmax] or [x0, x1] ⊂ (xmax, xmax]. Let Δx(t, x, Δt) be the average increase of FI of cells with initial intensity x during the interval (t, t + Δt) and assume that Δt is chosen such that |Δx| ≪ x1x0, so that the number of cells which move into the region via division and subsequently divide, die, or drift out of the region is negligible. It should be noted that Δx will be nonpositive as cells cannot increase in FI. Thus, subtraction by Δx actually results in a larger value. While counterintuitive, this definition is maintained in order to harmonize with other structured population models.

Consider the change in N (t) during the time interval (t, t + Δt), i.e., the quantity N (t + Δt) − N (t). Five possible contributions will be considered:

  1. Cells of intensity in the interval [x1, x1 − Δx(t, x1, Δt)], losing FI according to Δx:

    x1x1Δx(t,x1,Δt)n(t,x)dx.
  2. Cells of intensity in the interval [x0, x0 − Δx(t, x0, Δt)], losing FI according to Δx:

    x0x0Δx(t,x0,Δt)n(t,x)dx.
  3. Cells which would have contributed to N (t + Δt) had they not died:

    tt+Δtx0Δx(t,x0,τt)x1Δx(t,x1,τt)β(x)n(τ,x)dxdτ.
  4. The disappearance of cells from the region due to proliferation:

    tt+Δtx0Δx(t,x0,τt)x1Δx(t,x1,τt)α(x)n(τ,x)dxdτ.
  5. The gain of daughter cells (two of them) in the region as a result of proliferation in the parent region:

    χ[xmin,xmax/γ]2tt+Δtγx0γx1α(x)n(τ,x)dxdτ.

Then the difference ΔN(t) can be computed by summing the quantities in items (i) through (iv) in the following way: N (t + Δt) − N (t) = (i) – (ii) – (iii) – (iv) + (v). Following the standard procedure of dividing by Δt and letting Δt → 0, this gives dNdt on the left side. For the first term on the right side, if n(t, x) is continuous in t and x (a reasonable assumption), the Mean Value Theorem (MVT) implies that there exists a θ ∈ [x1, x1 − Δx(t, x1, Δt)] such that

x1x1Δx(t,x1,Δt)n(t,x)dx=Δx(t,x1,Δt)n(t,θ). (A.2)

Assuming Δx is continuous in Δt (that is, there is no instantaneous label loss) and varies smoothly, we have

limΔt0Δx(t,x1,Δt)Δtn(t,θ)dθ=v(x1)n(t,x1), (A.3)

where dxdt=v(x), the rate of FI change of cells of intensity x. Applying the same argument for the second term, we find

x0x0Δx(t,x0,Δt)n(t,x)dx=v(x0)n(t,x0). (A.4)

In the consideration of the third term, define

uβ(τ)=x0Δx(t,x0,τt)x1Δx(t,x1,τt)β(x)n(τ,x)dx. (A.5)

Then if Δx(t, x, Δt) and β(x)n(t, x) are continuous functions of their variables, so is uβ (τ) and by the MVT, there exists a θ′ [t, t + Δt] such that

1Δttt+Δtuβ(τ)dτ=uβ(θ). (A.6)

Thus, it follows that

limΔt0uβ(θ)=u(t)=x0x1β(x)n(t,x)dx, (A.7)

assuming Δx(t, x, 0) = 0 for all t, x (which follows from the previous assertion regarding the smoothness of Δx in Δt). Using a similar argument for the fourth term, we have

limΔt0uα(θ)=uα(t)=x0x1α(x)n(t,x)dx, (A.8)

where uα (τ) has the obvious definition.

For the final term, the same argument along with the change of variables ξ = x/γ results in

χ[xmin,xmax/γ]2limΔt0uα(θ)=χ[xmin,xmax/γ]2γx0x1α(γx)n(t,γx)dx. (A.9)

Altogether,

dNdt=v(x1)n(t,x1)+v(x0)n(t,x0)x0x1β(x)n(t,x)dxx0x1α(x)n(t,x)dx+χ[xmin,xmax/γ]2γx0x1α(γx)n(t,γx)dx. (A.10)

On the left side, differentiating N(t)=x0x1n(t,x)dx with respect to t we find

dNdt=x0x1n(t,x)tdx. (A.11)

Finally, by applying the fundamental theorem of calculus to the first two terms of (A.10), simplifying and rearranging, we have

x0x1n(t,x)t+x0x1(v(x)n(t,x)x=x0x1(α(x)+β(x))n(t,x)+χ[xmin,xmax/γ]2γx0x1α(γx)n(t,γx). (A.12)

Equivalently (because x0 and x1 were arbitrary),

n(t,x)t+[v(x)n(t,x)]x=(α(x)+β(x))n(t,x)+χ[xmin,xmax/γ]2γα(γx)n(t,γx). (A.13)

References

  1. Banks HT, Davis JL. A comparison of approximation methods for the estimation of probability distributions on parameters. Appl Numer Math. 2007;57:753–777. [Google Scholar]
  2. Banks HT, Fitzpatrick BG. Inverse problems for distributed systems: statistical tests and ANOVA. Proc. International Symposium on Math. Approaches to Envir. and Ecol. Problems, Springer Lecture Note in Biomath; July, 1988; Berlin: Springer; Brown University; 1989. pp. 262–273. LCDS/CCS Rep. 88-16. [Google Scholar]
  3. Banks HT, Fitzpatrick BG. J Math Biol. Vol. 28. University of Southern California; 1990. Statistical methods for model comparison in parameter estimation problems for distributed systems; pp. 501–527. CAMS Tech. Rep. 89-4, September, 1989. [Google Scholar]
  4. Banks HT, Fitzpatrick BG. Quart Appl Math. Vol. 49. University of Southern California; 1991. Estimation of growth rate distributions in size-structured population models; pp. 215–235. CAMS Tech. Rep. 90-2, January, 1990. [Google Scholar]
  5. Banks HT, Iles DW. On compactness of admissible parameter sets: convergence and stability in inverse problems for distributed parameter systems. Proc. Conf. on Control Systems Governed by PDE’s; February, 1986; Gainesville, FL. 1987. [Google Scholar]; Science. Vol. 97. Springer; Berlin: NASA Langley Res. Ctr; Hampton VA: 1986. Springer Lecture Notes in Control & Inf; pp. 130–142. ICASE Report #86-38. [Google Scholar]
  6. Banks HT, Kunsich K. Estimation Techniques for Distributed Parameter Systems. Birkhauser; Boston: 1989. [Google Scholar]
  7. Banks HT, Lybeck N. Systems and Control in the 21st Century. Birkhauser; Boston: 1996. Modeling methodology for elastomer dynamics; pp. 37–50. CRSC-TR96-29, NCSU, September, 1996. [Google Scholar]
  8. Banks HT, Pedersen M. Well-posedness of inverse problems for systems with time dependent parameters. Arab J Sci Eng Math. 2009;1:39–58. CRSC-TR08-10, August, 2008. [Google Scholar]
  9. Banks HT, Samuels JR. Detection of cardiac occlusions using viscoelastic wave propagation. Adv Appl Math Mech. 2009;1:1–28. CRSC-TR08-23, December, 2008. [Google Scholar]
  10. Banks HT, Tran HT. Mathematical and Experimental Modeling of Physical and Biological Processes. CRC Press; Boca Raton: 2009. [Google Scholar]
  11. Banks HT, Botsford LW, Kappel F, Wang C. Modeling and estimation in size structured population models. LCDC-CSS Report 87-13, Brown University; Proceedings 2nd Course on Mathematical Ecology; Trieste. December 8–12, 1986; Singapore: World Press; 1988. pp. 521–541. [Google Scholar]
  12. Banks HT, Smith RC, Wang Y. Masson Series on Research in Applied Math. Masson/Wiley; Paris/New York: 1996. Smart Material Structures: Modeling, Estimation and Control. [Google Scholar]
  13. Banks HT, Bortz DM, Holte SE. Incorporation of variability into the modeling of viral delays in HIV infection dynamics. Math Biosci. 2003;183:63–91. doi: 10.1016/s0025-5564(02)00218-3. [DOI] [PubMed] [Google Scholar]
  14. Banks HT, Davidian M, Samuels JR, Jr, Sutton Karyn L. An inverse problem statistical methodology summary. In: Chowell G, Hyman M, Hengartner N, Bettencourt LMA, Castillo-Chavez C, editors. Statistical Estimation Approaches in Epidemiology. Springer; Berlin: 2009. pp. 249–302. CRSC-TR08-01, January, 2008. [Google Scholar]
  15. Banks HT, Holm K, Robbins D. Standard error computations for uncertainty quantification in inverse problems: Asymptotic theory vs. bootstrapping. Arab J Sci Eng Math. 2010 doi: 10.1016/j.mcm.2010.06.026. submitted. CRSC-TR09-13, June, 2009; Revised August, 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Bell G, Anderson E. Cell growth and division I. A mathematical model with applications to cell volume distributions in mammalian suspension cultures. Biophys J. 1967;7:329–351. doi: 10.1016/S0006-3495(67)86592-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Bellomo N, Preziosi L. Modelling and mathematical problems related to tumor evolution and its integration with the immune system. Math Comput Model. 2000;32:413–452. [Google Scholar]
  18. Bernard S, Pujo-Menjouet L, Mackey MC. Analysis of cell kinetics using a cell division marker: Mathematical modeling of experimental data. Biophys J. 2003;84:3414–3424. doi: 10.1016/S0006-3495(03)70063-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Bird JJ, Brown DR, Mullen AC, Moskowitz NH, Mahowald MA, Sider JR, Ajewski TF, Wang C, Reiner SL. Helper T cell differentiation is controlled by the cell cycle. Immunity. 1998;9:229–237. doi: 10.1016/s1074-7613(00)80605-6. [DOI] [PubMed] [Google Scholar]
  20. Bonhoeffer S, Mohri H, Ho D, Perleson AS. Quantification of cell turnover kinetics using 5-Bromo-2′-deoxyuridine. J Immunol. 2000;164:5049–5054. doi: 10.4049/jimmunol.164.10.5049. [DOI] [PubMed] [Google Scholar]
  21. Carroll RJ, Ruppert D. Transformation and Weighting in Regression. Chapman & Hall; London: 2000. [Google Scholar]
  22. Chao DL, Davenport MP, Forrest S, Perleson AS. Stochastic stage-structured modeling of the adaptive immune system. Proceedings of the 2003 IEEE Bioinformatics Conference (CSB 2003); Albuquerque. August 11–14, 2003; 2003. pp. 124–131. [PubMed] [Google Scholar]
  23. Davidian M, Giltinan DM. Nonlinear Models for Repeated Measurement Data. Chapman & Hall; London: 1995. [Google Scholar]
  24. de Boer RJ, Ganusov VV, Milutinovic D, Hodgkin P, Perelson AS. Estimating lymphocyte division and death rates from CFSE data. Bull Math Biol. 2006;68:1011–1031. doi: 10.1007/s11538-006-9094-8. [DOI] [PubMed] [Google Scholar]
  25. Fung YC. Biomechanics: Mechanical Properties of Living Tissue. Springer; Berlin: 1993. [Google Scholar]
  26. Fung YC. A First Course in Continuum Mechanics. Prentice Hall; Englewood Cliffs: 1994. [Google Scholar]
  27. Ganusov VV, Pilyugin SS, de Boer RJ, Murali-Krishna K, Ahmed R, Anti R. Quantifying cell turnover using CFSE data. J Immunol Methods. 2005;298:183–200. doi: 10.1016/j.jim.2005.01.011. [DOI] [PubMed] [Google Scholar]
  28. Gett AV, Hodgkin PD. Cell division regulates the T cell cytokine repertoire, revealing a mechanism underlying immune class regulation. Proc Natl Acad Sci USA. 1998;95:9488–9493. doi: 10.1073/pnas.95.16.9488. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Gett AV, Hodgkin PD. A cellular calculus for signal integration by T cells. Nat Immunol. 2000;1:239–244. doi: 10.1038/79782. [DOI] [PubMed] [Google Scholar]
  30. Gyllenberg M, Webb GF. A nonlinear structured population model of tumor growth with quiescence. J Math Biol. 1990;28:671–694. doi: 10.1007/BF00160231. [DOI] [PubMed] [Google Scholar]
  31. Hawkins ED, Hommel M, Turner ML, Battye F, Markham J, Hodgkin PD. Measuring lymphocyte proliferation, survival and differentiation using CFSE time-series data. Nat Protocols. 2007;2:2057–2067. doi: 10.1038/nprot.2007.297. [DOI] [PubMed] [Google Scholar]
  32. Hawkins ED, Turner ML, Dowling MR, van Gend C, Hodgkin PD. A model of immune regulation as a consequence of randomized lymphocyte division and death times. Proc Natl Acad Sci. 2007;104(12):5032–5037. doi: 10.1073/pnas.0700026104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Hodgkin PD, Go NF, Cupp JE, Howard M. Interleukin-4 enhances anti-IgM stimulation of B cells by improving cell viability and by increasing the sensitivity of B cells to the anti-IgM signal. Cell Immunol. 1996;134:14–30. doi: 10.1016/0008-8749(91)90327-8. [DOI] [PubMed] [Google Scholar]
  34. Komarova NL. Stochastic modeling of drug resistance in cancer. J Theor Biol. 2006;239:351–366. doi: 10.1016/j.jtbi.2005.08.003. [DOI] [PubMed] [Google Scholar]
  35. Komarova NL, Wodarz D. Effect of cellular quiescence on the success of targeted CML therapy. PloS ONE. 2007;2:10, e990. doi: 10.1371/journal.pone.0000990. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Lee HY, Hawkins ED, Zand MS, Mosmann T, Wu H, Hodgkin PD, Perelson AS. Interpreting CFSE obtained division histories of B cells in vitro with Smith-Martin and cyton type models. Bull Math Biol. 2009;71:1649–1670. doi: 10.1007/s11538-009-9418-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. León K, Faro J, Carneiro J. A general mathematical framework to model generation structure in a population of asynchronously dividing cells. J Theor Biol. 2004;229:455–476. doi: 10.1016/j.jtbi.2004.04.011. [DOI] [PubMed] [Google Scholar]
  38. Luzyanina T, Mrusek S, Edwards JT, Roose D, Ehl S, Bocharov G. Computational analysis of CFSE proliferation assay. J Math Biol. 2007;54:57–89. doi: 10.1007/s00285-006-0046-6. [DOI] [PubMed] [Google Scholar]
  39. Luzyanina T, Roose D, Schenkel T, Sester M, Ehl S, Meyerhans A, Bocharov G. Numerical modelling of label-structured cell population growth using CFSE distribution data. Theor Biol Med Model. 2007;4:1–26. doi: 10.1186/1742-4682-4-26. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Luzyanina T, Roose D, Bocharov G. Distributed parameter identification for a label-structured cell population dynamics model using CFSE histogram time-series data. J Math Biol. 2009;59:581–603. doi: 10.1007/s00285-008-0244-5. [DOI] [PubMed] [Google Scholar]
  41. Lyons AB. Divided we stand: tracking cell proliferation with carboxyfluorescein diacetate succin-imidyl ester. Immunol Cell Biol. 1999;77:509–515. doi: 10.1046/j.1440-1711.1999.00864.x. [DOI] [PubMed] [Google Scholar]
  42. Lyons AB, Doherty KV. Curr Protocols Cytom. 2004. Flow cytometric analysis of cell division by dye dilution; pp. 9.11.1–9.11.10. [DOI] [PubMed] [Google Scholar]
  43. Lyons AB, Parish CR. Determination of lymphocyte division by flow cytometry. J Immunol Methods. 1994;171:131–137. doi: 10.1016/0022-1759(94)90236-4. [DOI] [PubMed] [Google Scholar]
  44. Marsden JE, Hughes TJR. Mathematical Foundations of Elasticity. Dover; Mineola: 1994. [Google Scholar]
  45. Matera G, Lupi M, Ubezio P. Heterogeneous cell response to topotecan in a CFSE-based proliferative test. Cytometry A. 2004;62:118–128. doi: 10.1002/cyto.a.20097. [DOI] [PubMed] [Google Scholar]
  46. Ogden RW. Non-Linear Elastic Deformations. Dover; Mineola: 1984. [Google Scholar]
  47. Quah B, Warren H, Parish C. Monitoring lymphocyte proliferation in vitro and in vivo with the intracellular fluorescent dye carboxyfluorescein diacetate succinimidyl ester. Nat Protocols. 2007;2:2049–2056. doi: 10.1038/nprot.2007.296. [DOI] [PubMed] [Google Scholar]
  48. Seber GAF, Wild CJ. Nonlinear Regression. Wiley; Hoboken: 2003. [Google Scholar]
  49. Shampine LF. Solving hyperbolic PDEs in MATLAB. Appl Numer Anal Comput Math. 2005;2:346–358. [Google Scholar]
  50. Sinko J, Streifer W. A new model for age-size structure of a population. Ecology. 1967;48:910–918. [Google Scholar]
  51. Smith JA, Martin L. Do cells cycle? Proc Natl Acad Sci USA. 1973;70:1263–1267. doi: 10.1073/pnas.70.4.1263. [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Wikipedia. 2010 http://en.wikipedia.org/wiki/Fluorescence_spectroscopy.

RESOURCES