MCMC Can Detect Nonidentifiable Models

Ivo Siekmann; James Sneyd; Edmund J Crampin

doi:10.1016/j.bpj.2012.10.024

. 2012 Dec 5;103(11):2275–2286. doi: 10.1016/j.bpj.2012.10.024

MCMC Can Detect Nonidentifiable Models

Ivo Siekmann ^†,^∗, James Sneyd ^‡, Edmund J Crampin ^†,^§

PMCID: PMC3514526 PMID: 23283226

Abstract

Continuous-time Markov models have been considered the best representation for the stochastic dynamics of ion channels for more than thirty years. For most single-channel data sets, several open and closed states are required for accurately representing the dynamics. However, each data point only shows if the channel is open or closed but not in which state it is. Consequently, some model structures are inherently overparameterized and therefore, in principle, unsuitable for representing any data—those models are called “nonidentifiable”. As of this writing, it seems to be poorly understood which continuous-time Markov models are identifiable and which are not, therefore the unconscious use of a nonidentifiable model is a considerable concern. To address this problem, an improved variant of a recently published Markov-chain Monte Carlo method is presented. The algorithm is tested using test data as well as experimental data. We demonstrate that, opposed to a widely used maximum-likelihood estimator, it gives clear warning signs when a nonidentifiable model is used for fitting. Furthermore, for test data that was generated from a nonidentifiable model, the Markov-chain Monte Carlo results recover much more information from the data than maximum-likelihood estimation.

Introduction

The kinetic properties of ion channels can be studied by patch-clamp recordings, which allow measurement of currents that are generated by ions diffusing through the pore of a single channel. Time series of these current measurements are characterized by stochastic jumps between a zero current, which indicates that the channel is closed, and one or more nonzero currents (conductance levels) that indicate release of ions, which means that the channel is open. Instantaneous stochastic transitions between open and closed states of an ion channel can be modeled by continuous-time Markov processes. Open and closed states are the vertices of a graph whose edges show between which states transitions are possible (Fig. 1). The parameters of Markov models are transition rates that describe how fast the transitions between two connected states occur. Because each of their states is either open or closed, these models are usually called “aggregated Markov models”.

Examples for Markov models. (a and b) Models with two open and two closed states that are used for an example with test data. (c and d) Nonidentifiable models.

Representing single-channel data sets by suitable aggregated Markov models is a subject of ongoing research. The main difficulty is the ambiguity of open and closed states. Although only two types of events can be observed—the channel is either open or closed—several open and closed Markov states are needed for an accurate representation of the time-course data. Thus, when an open event, say, is observed, it is impossible to decide which of the open states of the model generated this open event. This information can only be inferred indirectly; overcoming this difficulty is the crucial problem that any method for fitting Markov models to ion channel data has to solve.

The ambiguity of open and closed states also has important consequences for selecting a suitable Markov model for a given data set. Models representing the statistics of open and closed times of a given data set are not unique (1). A class of Markov models with the same number of open and closed states generates the same open and closed time-distributions and cannot be distinguished by fits to a single data set. Therefore, a modeler has to keep in mind that the best fit is only a representant for a class of models whose open and closed states are connected in a different way.

The next problem, originating from information lost due to the ambiguity of open and closed states, is more severe. Some aggregated Markov models produce the same open and closed time-distribution for different sets of rate constants (2,3). This means that these models are overparameterized because some of the rate constants, in principle, cannot be determined from any data set. Models whose parameters cannot be inferred unambiguously from data are named “nonidentifiable”.

In practice, it is often difficult to decide if a given model is identifiable or not because the classification of identifiable models is still an unsolved problem (2,4). Therefore, it is desirable that a method for fitting models to data gives a good idea of the uncertainty of individual parameters. If uncertainty is high for some of the parameters, this may indicate that the model is nonidentifiable and that the parameters with high uncertainty are those that cannot be fixed.

It is expected that Markov-chain Monte Carlo (MCMC) (5,6) methods will perform especially well in this respect because they approximate full probability distributions for individual rate constants when a model is fitted to data. More traditional approaches like maximum likelihood estimation (MLE) allow quantifying uncertainty of individual parameters—the standard deviation of the maximum likelihood estimator can be calculated, and by using an asymptotic approximation by normal random variables, a confidence interval can be estimated. We will demonstrate, by comparison of our MCMC method with the widely used MLE software QuB-MIL (7,8), that the more comprehensive information that is available through probability distributions will give much clearer warning signs that some of the rate constants cannot be fixed. It will also allow us to extract much more of the essential features of a given data set even if a nonidentifiable model is used.

The article is structured as follows. In the first section, we present an improved version of our previously published MCMC method (9) for fitting ion-channel data. This approach, offered here by us as new, to our knowledge, is easier to implement than the old; a comparison with our previous method shows that it also has a higher acceptance ratio, which means that fewer iterations are required for calculating a model's probability distributions. We use simulated test data to demonstrate how the best fit can be selected based upon subsequent fits to several models. In this short example, we imitate the typical workflow of a fit using experimental data. The comparison of our MCMC method with the MLE software QuB-MIL, for a fit to two nonidentifiable models demonstrating the advantages of MCMC over MLE, is followed by an example for model selection for realistic experimental data from the type-II inositol-trisphosphate receptor (the IP₃R, an ion channel that is important for the release of calcium ions from intracellular stores).

Theory and Methods

Continuous-time Markov models for ion channels

We aim to infer the rate constants of an aggregated continuous-time Markov model based upon a sequence (E^k) of events. Each event E^k represents a measurement where the ion channel has been found either open (O) or closed (C). All events are separated by a constant sampling time τ. Thus, the sequence (E^k) is interpreted as a discrete-time representation of an underlying continuous-time Markov process. It is assumed that the sequence (E^k) is found from a time series of ion-channel-current measurements either by 50% thresholding (which we found sufficient in all our applications) or more advanced methods of filtering. See Siekmann et al. (9) for further discussion.

A Markov model consists of a set of n_C closed and n_O open states. The possible transitions between states are represented by the edges of a graph. Rate constants q_ij between two adjacent states with indices i and j indicate how fast this transition occurs (Fig. 1).

The graph of a Markov model can be represented in matrix form by the infinitesimal generator Q = (q_ij) where q_ij is zero if the states i and j are not connected. We order the states so that the n_C closed states C₁,…, $C_{n_{C}}$ are followed by the open states $O_{n_{C} + 1}, \dots, O_{n_{S}}$ ; then the matrix Q has the block structure

Q = (\frac{Q_{C C} | Q_{C O}}{Q_{O C} | Q_{O O}}),

(1)

where the submatrices Q_CC and Q_OO contain the rates between open or between closed states; Q_CO consists of the transitions from closed to open and Q_OC contains the transitions from open to closed states. The model is assumed to be conservative, i.e., for the diagonal elements q_ii we have

q_{i i} = - \sum_{j \neq i} q_{i j}, i = 1, \dots, n_{S} .

(2)

The probabilities for the Markov chain being in any of the states are obtained as the solution of the differential equation

\frac{d p (t)}{d t} = p (t) Q, p (0) = p_{0},

(3)

which is given by

p (t) = p_{0} \exp (Q t),

(4)

where exp denotes the matrix exponential and p₀ is a stochastic vector whose components sum up to 1. In most cases, we need transition probabilities during a sampling interval τ. Therefore we define the transition matrix

A_{τ} = exp (Q τ) = (a_{i j}) = (\frac{A_{C C} | A_{C O}}{A_{O C} | A_{O O}}),

(5)

which has the same block structure as Q. Given two subsequent measurements E^k and E^k+1, we usually cannot determine which transition did take place as we have several open and closed states. However, we can restrict the possible transitions by using projections. If, for example, we know that we started in an open state at the beginning of a sampling interval, we know that all transitions starting in a closed state are impossible. By multiplying from the left with the projection matrix P_O, we obtain

A_{τ}^{O, \cdot} = P_{O} A_{τ} = (\frac{0 | 0}{A_{O C} | A_{O O}}),

i.e., all probabilities for transitions starting in closed states are set to zero. If at the end of a sampling interval we observe a closed event, we can restrict the exit states by multiplying with P_C:

A_{τ}^{O C} = P_{O} A_{τ} P_{C} = (\frac{0 | 0}{A_{O C} | 0}) .

For large times t, the probability p(t) tends to the stationary distribution π = lim_t→∞ p(t), which can be calculated from Eq. 3 by solving the system of linear equations:

\begin{array}{r} π Q = 0, \\ \sum_{i = 1}^{n_{S}} π_{i} = 1 . \end{array}

(6)

Under the assumption that the ion channel represented by the Markov model is at thermodynamic equilibrium, the detailed balance conditions must hold, which are given by

π_{i} q_{i j} = π_{j} q_{j i}, i, j = 1, \dots, n_{S} .

(7)

If the graph represented by Q is acyclic, Eq. 7 is automatically fulfilled. Otherwise, the value of one rate constant within a cycle is fixed by the other rate constants. This can be seen by Kolmogorov's Criterion, which is equivalent to Eq. 7. A cycle is a closed path that starts and ends in a state with index i. Let Γ be the set of rate constants labeling the edges that connect the vertices of the cycle in one direction and Γ^′ the rate constants of the cycle in the reversed direction. Then

\prod_{q \in Γ} q = \prod_{q^{'} \in Γ^{'}} q^{'} .

(8)

To give an example, for the model shown in Fig. 1 d, the sets Γ and Γ^′ are Γ = {q₂₁,q₁₄,q₄₂} and Γ^′ = {q₂₄,q₄₁,q₁₂}. Thus, Eq. 8 leads to the constraint q₂₁q₁₄q₄₂ = q₂₄q₄₁q₁₂.

Bayesian statistics of ion-channel models

Following a Bayesian statistics approach, we assign a probability to a model Q under the assumption that a sequence (E^k) of open and closed events has been observed. Although $P$ (Q|(E^k)) cannot be directly evaluated, it can be rewritten using Bayes' theorem as

P (Q | (E^{k})) \propto P ((E^{k}) | Q) P (Q),

(9)

where ∝ signifies that both sides are equal up to a multiplicative constant. At first glance, it seems unclear how the probability $P$ (Q) can be obtained. According to the Bayesian approach to inference, this distribution, which is called “prior distribution”, has to be chosen by the modeler. Choosing a prior $P$ (Q) enables the modeler to represent requirements that shall hold for any model. The prior

P (Q) \propto \exp (\frac{Tr Q}{ρ}), ρ \in R^{+},

(10)

where Tr(Q) is the trace of the matrix Q, ensures that models whose rate constants sum up to unrealistically high values are less likely. Equation 10 is obtained by assuming that all rate constants q_ij are independent and identically distributed according to an exponential distribution with parameter ρ. As already observed for the previous version of our algorithm (9), the results are not very sensitive to the choice of ρ. For all examples presented in this article, we have chosen ρ = 30.

The likelihood $P$ ((E^k)|Q) describes for a given model Q how probable it is to observe (E^k). Transition probabilities between consecutive events within a sampling interval τ can be calculated by restricting the transition matrix A_τ = exp(Qτ) to the class of states (open or closed) that has been observed. For this purpose, we have to generalize transitions between individual states to transitions between classes of states. As an example, we look at the transition from any of the open states to any of the closed states. The transition matrix A_τ^OC for this situation is found by multiplying A_t with the projection matrix P_O from the left-hand side and multiplying with the projection matrix P_C from the right-hand side. In this way, we obtain a new transition matrix

A_{τ}^{O C} = P_{O} A_{τ} P_{C},

(11)

which accurately describes transitions from any of the open states to any of the closed states. Thus, if our sequence (E^k) consists of these two classes O and C only, then the probability for O occurring after C can be calculated by

P ((O C) | Q) = λ \cdot A_{τ}^{O C} \cdot u = λ \cdot P_{O} A_{τ} P_{C} \cdot u,

(12)

where λ is the row vector of initial probabilities and u is a column vector that has the value 1 in every component. Multiplication with the vector u does nothing more than sum the probabilities of exiting to any of the closed states.

The vector of initial probabilities λ has to be normalized by only taking into account open or closed states depending on the first observation E¹. Thus, the normalized initial probability $\tilde{λ}$ is given by

\tilde{λ} = \frac{λ P_{E^{1}}}{{‖ λ P_{E^{1}} ‖}_{1}} = \frac{λ P_{E^{1}}}{λ P_{E^{1}} u},

(13)

where the norm ||·||₁ is the sum of the components of a stochastic vector.

We will usually assume that the Markov chain is at equilibrium at the beginning of a sequence, therefore we often choose $λ = \tilde{π}$ to be the stationary distribution π normalized as in Eq. 13. For an arbitrary sequence (E^k), the probability $P$ ((E^k)|Q) can be calculated as

P ((E^{k}) | E^{1}, Q) = \tilde{π} \cdot P_{E^{1}} \cdot A_{τ} \cdot P_{E^{2}} \cdot \dots \cdot A_{τ} \cdot P_{E^{N}} \cdot u,

(14)

where each E^k is either O or C. Equation 14 can be calculated efficiently based upon a recursively defined filter (10), as described in Siekmann et al. (9). Note that, in contrast to Siekmann et al. (9), only the forward stage of the forward-backward algorithm is needed to calculate this probability distribution, which means that the variant of the algorithm presented here is slightly more efficient. In the Supporting Material, we show how the evaluation of Eq. 14 can be considerably optimized if a model has only one open (or equivalently, one closed) state.

Now that all quantities appearing on the right-hand side of Eq. 9 have been defined, samples for Q can be generated using a Metropolis-Hastings algorithm (11,12). Sampling a sufficient number of models Q leads to a good approximation of the probability distribution $P$ (Q|E).

Metropolis-Hastings sampling

Metropolis-Hastings algorithms (11,12) consist of two steps:

Step 1

This step is designed for generating a proposal and accepting (or rejecting) this proposal. A proposal $\tilde{Q} = ({\tilde{q}}_{i j})$ for a new sample is generated from a sample Q by randomly perturbing the set of rate constants. The simplest method is using a uniformly distributed random walk U(−δ, δ) on the interval [−δ, δ]. The value of the step-width δ has to be adjusted to a given data set. It must be ensured that steps are large enough that local minima of the likelihood can be left after a certain number of iterations but not too large to ascertain that a sufficient number of proposed models $\tilde{Q}$ is accepted. In our experience, varying δ by trial and error is sufficient (for the examples here, suitable values range from 0.01 to 0.1), but we also applied an adaptive MCMC method with some success (13).

A proposal ${\tilde{q}}_{i j}$ is generated from q_ij as

{\tilde{q}}_{i j} = {\begin{array}{l} q_{i j} + u_{i j}, & if q_{i j} > 0, u_{i j} \sim U (- δ, δ), \\ 0, & if q_{i j} \leq 0 . \end{array}

(15)

In Eq. 15, only positive entries q_ij of the matrix Q are changed, i.e., we avoid adding new edges to the graph of our underlying Markov model. Rate constants must be positive, so sampling may have to be repeated until all proposals ${\tilde{q}}_{i j}$ are positive. Of course, after applying Eq. 15 the matrix $\tilde{Q}$ has to be recalculated so that the model remains conservative (using Eq. 2), and fulfils the detailed balance conditions (shown in Eq. 7). Detailed balance is imposed by fixing one rate constant within each cycle using Kolmogorov's Criterion (as shown in Eq. 8). Then the diagonal must be recalculated (again using Eq. 2) to ensure that $\tilde{Q}$ is conservative.

Step 2

In this step, it is decided if the proposal $\tilde{Q}$ is accepted as a sample from the probability distribution $P$ (Q|E). The model $\tilde{Q}$ is accepted with probability

α = min {1, \frac{P (\tilde{Q}) P (E | \tilde{Q})}{P (Q) P (E | Q)}},

(16)

where the right-hand sides of evaluating Eq. 9 for Q and $\tilde{Q}$ appear in the quotient. Equation 16 shows that a proposal $\tilde{Q}$ is accepted for sure if its likelihood is greater than for the sample Q. If the proposal $\tilde{Q}$ is rejected, a new proposal is generated.

Results

An example for model selection using test data

In the following, we demonstrate how our algorithm can be used to select an appropriate model for a given data set. For this purpose, we simulate data from the model shown in Fig. 1 a. Parameters can be found in Table 1.

Table 1.

Parameter values for models used for generating test data sets

Test data sets
Model	q_ij[ms⁻¹]	q_ji[ms⁻¹]	Data points	Sampling interval τ
Identifiable
Q₂₂	q₁₂ = 0.4	q₂₁ = 0.5	10⁵	0.05 ms
	q₁₃ = 7.0	q₃₁ = 3.5
	q₂₄ = 0.1	q₄₂ = 0.05
Nonidentifiable
$Q_{21}^{cycle}$	q₁₂ = 0.72	q₂₁ = 0.8	10⁶	0.05 ms
	q₁₃ = 0.3	q₃₁ = 0.5
	q₂₃ = 0.6	q₃₂ = 0.9
$Q_{31}^{cycle}$	q₁₂ = 0.058	q₂₁ = 0.3	10⁶	0.05 ms
	q₁₄ = 0.49735	q₄₁ = 0.42
	q₂₃ = 3	q₃₂ = 0.03
	q₂₄ = 4.9	q₄₂ = 0.8

Open in a new tab

Q₂₂ is shown in Fig. 1a, $Q_{21}^{cycle}$ in Fig. 1c, and $Q_{31}^{cycle}$ in Fig. 1d.

For fitting, the hierarchical search strategy proposed by Bruno et al. (4) is applied. To begin, the dwell-time histograms are examined. Raw, discretely sampled data cannot be represented in a logarithmically binned histogram because the open and closed times can only be determined up to multiples of the sampling interval τ. If bin widths are linear, the damaging influence on the histogram can be circumvented by choosing bin widths that are integral multiples of τ. This is impossible if bins are equally spaced on a logarithmic scale. Therefore, we resample the dwell-time distributions that we obtain from the discretely-sampled data as proposed by Gin et al. (14). The accuracy of this correction of the discrete dwell-time distributions is not so important because our main interest is to obtain an estimate of the minimum number of open and closed states that a Markov model must have to provide a good representation of a given data set. For the test data set considered here, the open and the closed time-histograms have two peaks each, indicating that an appropriate model must at least have two open and two closed states (Fig. 2).

Logarithmically binned dwell-time histograms for a test data set generated from the model shown in Fig. 1a with parameters from Table 1. Both the open as well as the closed time histograms have two peaks; therefore a Markov model representing this data must at least have two open and two closed states.

We start with a model with the maximum of possible connections (with more rate constants the model would be nonidentifiable, see the following section). Because this model shown in Fig. 1 b contains a cycle, one rate constant is fixed by the detailed balance condition (see Eq. 7). The fit leads to very low values of the rate constants q₃₄ and q₄₃ (see Fig. 3, e–g). According to the “hierarchical strategy” proposed by Bruno et al. (4), this is a hint that this edge should not be present. Fits to alternative connections of two open and two closed states with eight rate constants are not shown because it is proven (also in Bruno et al. (4)) that these models are equivalent to the model chosen here.

(a–d) Comparison of our previously published MCMC method (9) (a and c) with the improved version presented in this article (b and d). Both algorithms are run on a test data set of 100,000 data points separated by a sampling interval τ = 0.05 ms that was generated from Model 1a. Because it is difficult to tune the step-width δ of a simple Metropolis-Hastings step to achieve convergence with our earlier method, the adaptive walk move described in Christen and Fox (13) was used. While the runtime for 30,000 iterations is comparable (previous method, 33 min and improved method, 29 min on a standard PC), the improved method has a clearly better acceptance ratio (previous, 57.9%; improved, 69.0%). (e–g) The same test data set is fitted (using a simple Metropolis-Hastings step, δ = 0.04) with a model which has an additional link (Fig. 1 b). The additional rates q₃₄ and q₄₃ quickly tend toward low values, indicating that these rate constants are not supported by the data, whereas the other rates tend toward values close to the correct ones.

Adding additional open or closed states leads to behavior already observed in Siekmann et al. (9): the stationary probability of the additional state(s) is so low that it plays no role in the dynamics of the model (results not shown). This leads to the conclusion that a model with no connection between the open states O₃ and O₄ (Fig. 1 a) is the best representation of the given data set. Fig. 3, a–d, shows convergence plots for both the algorithm presented here and our previously published method. Because it is difficult to tune the step-width δ of our earlier method we compare both methods using the walk-move of an adaptive MCMC method, the t-walk (see Christen and Fox (13)).

Whereas both methods require similar runtime (approximately half an hour on a standard PC) for 30,000 iterations, our new, to our knowledge, method has a higher acceptance ratio (69% compared to 57.9%). Fig. 4 shows histograms of rate constants obtained from a fit to this model. Because we obtain similar likelihood scores for both models (−29,947 for the simpler and −29,949 for the model with the additional link), the model with six rates is preferable to the one with eight rates (where one rate is fixed by detailed balance, from Eq. 7, which makes seven independent parameters). Therefore, in addition to our observation that the rate constants connecting the open states O₃ and O₄ are small, we have further quantitative support for preferring the simpler model.

Fitted rate constants for a data set of 100,000 data points separated by a sampling interval τ = 0.05 ms generated from model Q₂₂ (see Fig. 1a, Table 1) fitted with the same model. The true values of the rate constants are given below the histograms for comparison. Parameters of the MCMC sampler. Iterations, 30,000; burn-in, 10,000 iterations; step-width δ = 0.05.

Nonidentifiable models

The task of model selection is a difficult problem because the number of possible models quickly increases with the number of states. Two additional problems arise due to the ambiguity of open and closed states in aggregated Markov models—namely, model equivalence and the nonidentifiability of models. Both phenomena are based upon a result by Fredkin et al. (2) and Fredkin and Rice (3) stating that a suitable model must accurately reproduce the open and closed time-distributions of a given data set. It is known that given, open and closed time-distributions determine a Markov model with n_C closed and n_O open states only up to equivalence to other models with the same number of open and closed states (1,4,15). This means that a whole class of Markov models (whose states are connected in different ways) fit a given data set equally well, i.e., should achieve approximately the same likelihood score. This phenomenon is called “model equivalence” in the literature.

It has to be noted here that the original definition of model equivalence (and the related notion nonidentifiability; see Kienker (1)), relate, strictly speaking, only to continuously-sampled data. Two models with infinitesimal generators Q and $\tilde{Q}$ are said to be equivalent if they generate the same open and closed time-distributions. It is easy to see, though, that the distributions of consecutive open and closed events (that play the role of the open and closed time distributions in the discrete case) also coincide if we sample from equivalent models Q and $\tilde{Q}$ at a discrete sampling interval τ. By assuming that our discrete data is the result of discrete sampling from an underlying continuous-time Markov process we will, in the following, use the original definitions of model equivalence and identifiability.

Closely related to model equivalence is the nonidentifiability problem. If the parameters of a model shall be determined from data, the model must generate different probability distributions for different parameter values. Otherwise, one data set (whose points are considered as samples from the probability distributions generated by the model) would have two or more different interpretations, i.e., could be represented by more than one model. For aggregated Markov models that are used for the modeling of ion channels, it has been known for a long time that, for some models, several sets of rate constants lead to identical dwell-time distributions (2,3). Therefore, for a nonidentifiable model at least some of the rate constants cannot be determined unambiguously from any given data set—these models are inherently overparameterized. Still, classifying the set of nonidentifiable models is an open problem (see Fredkin et al. (2) and Bruno et al. (4)).

The effect of nonidentifiability is more severe in practice than model equivalence. When choosing a particular model based upon a likelihood score, the fact that this model is only determined up to equivalence does not make this choice better or worse—the model simply is a representant of its equivalence class. Considering representants of equivalence classes using canonical forms (1,4,15) may even allow us to make model selection more efficient. Model equivalence further implies that it is questionable to interpret the graph of a Markov model as a representation of an underlying chemical process such as binding and unbinding of ligands—because fits to data do not allow us to distinguish between different mechanisms. In contrast, the unconscious use of a nonidentifiable model for representing a data set is potentially dangerous. The inferred estimates of at least some of the parameters are untrustworthy, as different choices of model parameters would represent the observations equally well. Therefore, biological interpretations based upon these ambiguous rate constants are likely to be wrong.

Because of this, the use of MCMC, which makes available the full probability distribution of parameters, has important advantages over approaches, such as maximum-likelihood estimators (MLE) that are based upon point estimates. An MLE approach might pick one possible set of rate constants as the best fit (although, in fact, one or more different choices represent the data equally well). For the algorithm implemented in the widely used software QuB-MIL (7,8) this was reported by Bruno et al. (4), and we will provide another example here. Uncertainty of MLE estimates can be quantified by asymptotic confidence intervals or inferred by more elaborate approaches like bootstrapping (16). Nevertheless, hints to nonidentifiability and other sources of model uncertainty can be revealed more clearly by MCMC approaches. Because MCMC approximates full probability distributions, it is expected that the marginal distributions for at least some of the rate constants indicate that they cannot be identified.

Two closed states, one open state

Let us first consider a model with three states connected to a cycle. This model is nonidentifiable, which follows directly from Fredkin and Rice (3). Depending on the total number of states n_S = n_O + n_C and the number of independent transitions from the open to the closed aggregate of states, n_OC, they showed that 2n_OC (n_S – n_OC) is an upper bound for the number of rate constants that can be inferred unambiguously from stationary data. For our example, we have n_S = 3 and n_OC = 1, because both edges between closed and open states originate in the same open state. Thus, the maximum number of identifiable rate constants is four, whereas the model shown in Fig. 1 c has six rate constants (although only five of these can be freely chosen because one of the rates is fixed by the others due to the detailed balance condition).

For this reason, it is expected that fits to a test data set generated from the model shown in Fig. 1 c will cause problems. We use the software QuB-MIL, which implements a widely used maximum-likelihood method to fit a test data set consisting of 10⁶ data points. Results are shown in Table 2. Module QuB-MIL, Ver. 1.5.0.29, was run ensuring that the detailed balance constraint, Eq. 7, was fulfilled. It shows that the fit fails to reproduce some important features of the data set (e.g., the open probability estimated from the data set is ∼24% whereas the stationary open probability of the fitted model is only 13.6%). However, this is not indicated clearly by QuB-MIL; the gradient is low, which generally implies that the set of rate constants returned by the method is indeed a local likelihood maximum. The only direct hint that the results of this fit may be questionable is that some rates, q₁₃, q₃₁, and q₂₃, have relatively high standard deviations. Certainly, experienced users of MLE methods will be able to interpret these hints appropriately.

Table 2.

QuB-MIL results of fits to nonidentifiable models

Fit with QuB-MIL to model $Q_{21}^{cycle}$ (Fig. 1 c)
$Q_{21}^{cycle}$	q_ij[ms⁻¹]	q_ji[ms⁻¹]
	q₁₂ = 0.446 ± 0.08549	q₂₁ = 1.872 ± 0.3422
	q₁₃ = 0.031 ± 0.02477	q₃₁ = 0.1624 ± 0.1268
	q₂₃ = 10.9 ± 6.161	q₃₂ = 13.3 ± 0.287
Stationary distribution
π₁ = 0.697596	π₂ = 0.166197	π₃ = 0.13620

Fit with QuB-MIL to model $Q_{31}^{cycle}$ (Fig. 1 d)

$Q_{31}^{cycle}$	q_ij[ms⁻¹]	q_ji[ms⁻¹]

	q₁₂ = 2.186 ± 0.684	q₂₁ = 0.121 ± 0.04151
	q₁₄ = 4.232 ± 0.7291	q₄₁ = 0.6076 ± 0.09128
	q₂₃ = 0.2647 ± 0.06778	q₃₂ = 0.03733 ± 0.004167
	q₂₄ = 0.2125 ± 0.0392	q₄₂ = 0.5511 ± 0.08904
Stationary distribution
π₁ = 0.006488	π₂ = 0.117209	π₃ = 0.831109	π₄ = 0.045193

Open in a new tab

The parameters of the test data sets can be found in Table 1.

But the MCMC method presented here not only gives clearer indications that the results of a fit to this model may be questionable, but is also able to extract much more information about the true model from the data set. The stationary probability of O₃ is estimated correctly. Because there is only one open state, the stationary probability of O₃ should be close to the relative frequency of open events. It is slightly more interesting that the exit rate from the open state, i.e., q₃₁ + q₃₂, can be determined as well (Fig. 5 h). This implies that the distribution of open times is estimated correctly by f_O3 = (q₃₁ + q₃₂) exp (−(q₃₁ + q₃₂)t). However, this quantity can be estimated from the data by more elementary means because 1/(q₃₁ + q₃₂) is the expected length of open time intervals.

Results for a fit to a data set (10⁶ data points) generated from the nonidentifiable model shown in Fig. 1c. The sampler was run for 5 × 10⁵ iterations; burn-in, 10,000 iterations. (a–c) Due to the symmetry in the model, label-switching occurs; the convergence plot shows that the rates entering O₃ are swapped. Because the resulting bimodal distributions have well-separated peaks (data not shown), the label-switching problem can be resolved by selecting one of the modes for each of the rate constants. Mean values of these distributions (b and c) are close to the true values of q₁₃ and q₂₃ (0.3 and 0.6). (d–f) Samples for the rates q₁₂ and q₂₁ connecting the closed states C₁ and C₂ move erratically, which leads to wide-spread distributions. (g–j) The rates exiting O₃ (q₃₁ and q₃₂) cannot be fixed individually, but their sum converges to a value close to the true value, 1.4).

Finally, some information on the rates exiting O₃ can be gained—the convergence plot (Fig. 5 a) shows that the rate constants q₁₃ and q₂₃ seem to swap their values after ∼1.5 × 10⁵ iterations. This behavior, which is due to the symmetry of model $Q_{21}^{c y c l e}$ (Fig. 1 c), is known as “label switching” (see Jasra et al. (17) for a review in the context of mixture models where this problem arises frequently). It is often difficult to resolve label-switching problems, but here the modes of the distributions of q₁₃ and q₂₃ are well separated; thus, by thresholding the original histograms at ∼0.45, we separate the two modes and obtain the histograms shown in Fig. 5, b and c, whose mean values are close to the true values 0.3 and 0.6.

In contrast, the samples for the rates q₁₂ and q₂₁ connecting the closed states are highly variable (Fig. 5 d), which leads to wide-spread distributions (Fig. 5, e and f). This indicates that these rates cannot be determined unambiguously from the data.

Incidentally, Ball et al. (18) experienced similar nonidentifiability when they tried to fit to the model shown in Fig. 1 c.

Three closed states, one open state

We now provide a second example of a fit to a nonidentifiable model: The model shown in Fig. 1 d. The convergence plot in Fig. 6 a shows that the rates q₁₂ and q₁₄ vary over a wide range. In addition, the other rate constants typically show a widespread distribution but do not move as erratically.

Results for a fit to a data set (2 × 10⁵ data points) generated from the nonidentifiable model shown in Fig. 1d. Sampler was run for 3 × 10⁶ iterations (burn-in, 1.5 × 10⁶) with a step-width δ = 0.15.

Again, the sum of the rates q₄₁ and q₄₂ exiting the open state O₄ converges to a distribution centered at the correct value 1.22 ms⁻¹, although the samples of the individual rates show high variability (Fig. 6 c). However, in contrast to the preceding example, the probability distributions for the rates entering O₄ do not allow inference of the true values.

For this example, the software QuB-MIL infers rate constants that are clearly different from the model used for generating the test data (see Table 2 b and compare to Table 1). In contrast to the previous fit, all standard deviations are relatively low, so in this case there is no sign indicating that the model may be unsuitable. The total exit rate from the open state O₄ is underestimated (1.16 ms⁻¹), although it is inferred accurately by the MCMC sampler.

An example with a realistic data set

Table 3 shows arithmetic means and standard deviations for the results of fits to data collected from type-II IP₃R at 10 μM IP₃, 5 mM ATP, and 50 nM Ca²⁺ (19). At this combination of ligands, the data is characterized by mode changes. Although the IP₃R is nearly inactive for long periods of time, it can instantaneously switch to a high level of activity that is characterized by frequent openings.

Table 3.

Results for fits for a data set (290,000 data points, sampling interval τ = 0.05 ms) collected from type-II IP₃R at 10 μM IP₃, 5 mM ATP, 50 nM Ca²⁺ to different models

Model	q_ij[ms⁻¹]	q_ji[ms⁻¹]	Likelihood
$Q_{11}$ C₁–O₂	q₁₂ = 2.684893 ± 0.070417	q₂₁ = 2.721870 ± 0.072489	−141,814

Q₂₁ $\begin{matrix} O_{3} \\ ∣ \\ C_{1} & - & C_{2} \end{matrix}$	q₁₂ = 0.039019 ± 0.003728	q₂₁ = 0.086724 ± 0.008305	−125,330
	q₁₃ = 9.787732 ± 0.101025	q₃₁ = 3.250568 ± 0.035998	−125,330
Q₃₁ $\begin{array}{l} O_{4} \\ ∣ \\ C_{1} & - & C_{2} & - & C_{3} \end{array}$	q₁₂ = 1.125150 ± 0.036311	q₂₁ = 0.094523 ± 0.008178	−125,072
	q₂₃ = 0.004706 ± 0.001427	q₃₂ = 0.011629 ± 0.003210
	q₂₄ = 10.065905 ± 0.139341	q₄₂ = 3.272794 ± 0.047411
Q₄₁C₁–C₂–C₃–C₄–O₅	q₁₂ = 0.019692 ± 0.007848	q₂₁ = 0.329510 ± 0.169228	−125,082
	q₂₃ = 0.493762 ± 0.133622	q₃₂ = 0.137882 ± 0.052847
	q₃₄ = 1.322551 ± 0.100999	q₄₃ = 0.110575 ± 0.009022
	q₄₅ = 10.110514 ± 0.119087	q₅₄ = 3.272045 ± 0.034769

Open in a new tab

Mean values and standard deviations are shown for runs of 60,000 iterations (burn-in: 30,000 iterations). Convergence plots for the model Q₃₁ with the best likelihood score are shown in Fig. 7.

We have selected a data segment that exhibits a high level of activity that we aim to represent in an appropriate model. Models are denoted Q_ij where i stands for the number of closed and j for the number of open states. A pictogram of the graph of a particular model is shown together with the results for the rate constants in Table 3. As an example, convergence plots for the MCMC algorithm are shown for model Q₃₁ (Fig. 7). The likelihoods suggest that model Q₃₁ with three closed and one open state fits the data best—its likelihood score is higher than for models Q₁₁, Q₂₁ with fewer and Q₄₁ with a higher number of closed states. Models with more than one open state produced fits where all except one open state had very low stationary probabilities (results not shown), and therefore these models were excluded. In addition, only one example for each of the models with one open state is shown because all topologies with one open state are equivalent (4). This leads to the conclusion that model Q₃₁ is the best representation for the active mode of type-II IP₃R. Details on a model that represents the switching between the active and the nearly inactive mode can be found in Siekmann et al. (20).

Convergence plot for the model Q₃₁ used for fitting data from type-II IP₃R. Acceptance ratio for these data consisting of 290,000 data points was 32.5% with a step-width δ = 0.01.

In principle, MCMC approaches can even be used for partially automatizing model selection using reversible-jump MCMC as shown by Hodgson and Green (21). However, designing a reversible-jump sampler is not an easy task, and is beyond the scope of this article.

Discussion

Identifiability is a severe problem for the modeling of ion channels. A large number of the models obtained by connecting open and closed states are overdetermined and thus cannot be used for fitting. Because the classification of identifiable models is still an unsolved problem, this leads to considerable difficulties in practical applications. By the negative criterion due to Fredkin and Rice (3) which gives an upper bound for the edges of identifiable models, it is possible to restrict the number of models, but effective methods for proving that a given model is identifiable are missing. Because identifiability of models, as well as the related problem of their equivalence classes, are both poorly understood, several canonical forms have been suggested by Kienker (1), Bruno et al. (4), Larget (15), and Flomenbom and Silbey (22). This approach inevitably restricts the model structure to certain topologies. A more severe problem is that, in some cases, models may only represent certain data sets if negative rate constants are allowed, which leads to physically unrealistic models as described in Bruno et al. (4). This restricts the approach for practical use.

In several examples, we demonstrated that Markov-chain Monte Carlo (MCMC) approaches give strong indications when a nonidentifiable model was used. The fits clearly showed that some rate constants could not be identified, i.e., the samples varied over a wide range. Even if a modeler does not have a deep theoretical background about advanced statistical concepts such as identifiability of models, these warning signs indicate that choosing this particular model is not a good idea.

In contrast, an MLE-based method might pick one particular set of parameters as the most likely choice of several possible alternatives without giving the modeler similarly obvious hints that this result is, in fact, questionable. Bruno et al. (4) seem to worry that the decision about a nonidentifiable versus an alternative identifiable model may come down to comparing the likelihood scores—they conjecture that the likelihood of a nonidentifiable model is always lower than the likelihood of an identifiable model with the same number of open and closed states. Even if this conjecture is true, it is unsatisfying, because the likelihood is an overall indicator of the quality of a fit. For example, the quality of a data set may play as much of a role for the likelihood score as a model being identifiable or nonidentifiable.

Our MCMC method not only demonstrates clearly that nonidentifiable models are overparameterized, but also extracts a lot of information from data sets generated by nonidentifiable models. In the case of the nonidentifiable model with three states (Fig. 1 c), the probability distributions of the rates entering the open state could be inferred approximately after solving a label-switching problem.

Because MCMC gives more detailed results than MLE, MCMC methods usually require longer runtimes. In general, our method is computationally more expensive, although runtimes on standard PCs are still in a range feasible for practical applications. However, we have demonstrated that by taking advantage of structural properties of models (for example, that a model has only one open state), considerable runtime improvements can be achieved (see the Supporting Material). It is expected that further research will lead to similar improvements for other important classes of models.

Acknowledgments

We thank Kate Patterson and Colin Fox for helpful comments and careful proofreading of this article. The careful reviews of two anonymous reviewers is gratefully acknowledged.

This work was supported by National Institutes of Health grant No. R01-DE19245.

Supporting Material

Document S1. Runtime improvements for models with one open or one closed state and four equations

mmc1.pdf^{(146.7KB, pdf)}

References

1.Kienker P. Equivalence of aggregated Markov models of ion-channel gating. Proc. R. Soc. Lond. B Biol. Sci. 1989;236:269–309. doi: 10.1098/rspb.1989.0024. [DOI] [PubMed] [Google Scholar]
2.Fredkin, D. R., M. Montal, and J. A. Rice. 1985. Identification of aggregated Markovian models: application to the nicotinic acetylcholine receptor. In Proceedings of the Berkeley Conference in Honor of Jerzy Neyman and Jack Kiefer, Vol. 1. L. M. L. Cam and R. A. Olshen, editors. Wadsworth, Belmont, CA. 269–289.
3.Fredkin D.R., Rice J.A. On aggregated Markov processes. J. Appl. Probab. 1986;23:208–214. [Google Scholar]
4.Bruno W.J., Yang J., Pearson J.E. Using independent open-to-closed transitions to simplify aggregated Markov models of ion channel gating kinetics. Proc. Natl. Acad. Sci. USA. 2005;102:6326–6331. doi: 10.1073/pnas.0409110102. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Gilks W.R., Richardson S., Spiegelhalter D.J., editors. Markov Chain Monte Carlo in Practice. Chapman & Hall; New York: 1996. [Google Scholar]
6.Gamerman D., Lopes H.F. Vol. 68, Texts in Statistical Science. 2nd Ed. Taylor & Francis; Boca Raton, FL: 2006. Markov chain Monte Carlo: stochastic simulation for Bayesian inference. [Google Scholar]
7.Qin F., Auerbach A., Sachs F. Estimating Single-Channel Kinetic Paramenters from Idealized Patch-Clamp Data Containing Missed Events. Biophys. J. 1996;70:264–280. doi: 10.1016/S0006-3495(96)79568-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Qin F., Auerbach A., Sachs F. Maximum likelihood estimation of aggregated Markov processes. Proc. Biol. Sci. 1997;264:375–383. doi: 10.1098/rspb.1997.0054. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Siekmann I., Wagner L.E., II, Sneyd J. MCMC estimation of Markov models for ion channels. Biophys. J. 2011;100:1919–1929. doi: 10.1016/j.bpj.2011.02.059. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Carter C.K., Kohn R. On Gibbs sampling for state space models. Biometrika. 1994;81:541–553. [Google Scholar]
11.Metropolis N., Rosenbluth A.W., Teller E. Equation of state calculations by fast computing machines. J. Chem. Phys. 1953;21:1087–1092. [Google Scholar]
12.Hastings W.K. Monte-Carlo sampling methods using Markov chains and their applications. Biometrika. 1970;57:97–109. [Google Scholar]
13.Christen J.A., Fox C. A general purpose sampling algorithm for continuous distributions (the T-walk) Bayesian Anal. 2010;5:263–282. [Google Scholar]
14.Gin E., Falcke M., Sneyd J. Markov chain Monte Carlo fitting of single-channel data from inositol trisphosphate receptors. J. Theor. Biol. 2009;257:460–474. doi: 10.1016/j.jtbi.2008.12.020. [DOI] [PubMed] [Google Scholar]
15.Larget B. A canonical representation for aggregated Markov processes. J. Appl. Probab. 1998;35:313–324. [Google Scholar]
16.Efron E., Tibshirani R. Chapman & Hall; New York: 1993. An Introduction to the Bootstrap. [Google Scholar]
17.Jasra A., Holmes C.C., Stephens D.A. Markov chain Monte Carlo methods and the label switching problem in Bayesian mixture modeling. Stat. Sci. 2005;20:50–67. [Google Scholar]
18.Ball F.G., Cai Y., O'Hagan A. Bayesian inference for ion-channel gating mechanisms directly from single-channel recordings, using Markov chain Monte Carlo. Proc. R. Soc. Lond. A. 1999;455:2879–2932. [Google Scholar]
19.Wagner L.E., 2nd, Yule D.I. Differential regulation of the InsP3 receptor type-1 and -2 single channel properties by InsP3, Ca2+and ATP. J. Physiology. 2012;590:3245–3259. doi: 10.1113/jphysiol.2012.228320. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Siekmann I., Wagner L.E., II, Sneyd J. A kinetic model of type I and type II IP3R accounting for mode changes. Biophys. J. 2012;103:658–668. doi: 10.1016/j.bpj.2012.07.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Hodgson M., Green P. Bayesian choice among Markov models of ion channels using Markov chain Monte Carlo. Proc. Roy. Soc. London A Math. Phys. Eng. Sci. 1999;455:3425–3448. [Google Scholar]
22.Flomenbom O., Silbey R.J. Utilizing the information content in two-state trajectories. Proc. Natl. Acad. Sci. USA. 2006;103:10907–10910. doi: 10.1073/pnas.0604546103. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Document S1. Runtime improvements for models with one open or one closed state and four equations

mmc1.pdf^{(146.7KB, pdf)}

[bib1] 1.Kienker P. Equivalence of aggregated Markov models of ion-channel gating. Proc. R. Soc. Lond. B Biol. Sci. 1989;236:269–309. doi: 10.1098/rspb.1989.0024. [DOI] [PubMed] [Google Scholar]

[bib2] 2.Fredkin, D. R., M. Montal, and J. A. Rice. 1985. Identification of aggregated Markovian models: application to the nicotinic acetylcholine receptor. In Proceedings of the Berkeley Conference in Honor of Jerzy Neyman and Jack Kiefer, Vol. 1. L. M. L. Cam and R. A. Olshen, editors. Wadsworth, Belmont, CA. 269–289.

[bib3] 3.Fredkin D.R., Rice J.A. On aggregated Markov processes. J. Appl. Probab. 1986;23:208–214. [Google Scholar]

[bib4] 4.Bruno W.J., Yang J., Pearson J.E. Using independent open-to-closed transitions to simplify aggregated Markov models of ion channel gating kinetics. Proc. Natl. Acad. Sci. USA. 2005;102:6326–6331. doi: 10.1073/pnas.0409110102. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib5] 5.Gilks W.R., Richardson S., Spiegelhalter D.J., editors. Markov Chain Monte Carlo in Practice. Chapman & Hall; New York: 1996. [Google Scholar]

[bib6] 6.Gamerman D., Lopes H.F. Vol. 68, Texts in Statistical Science. 2nd Ed. Taylor & Francis; Boca Raton, FL: 2006. Markov chain Monte Carlo: stochastic simulation for Bayesian inference. [Google Scholar]

[bib7] 7.Qin F., Auerbach A., Sachs F. Estimating Single-Channel Kinetic Paramenters from Idealized Patch-Clamp Data Containing Missed Events. Biophys. J. 1996;70:264–280. doi: 10.1016/S0006-3495(96)79568-1. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib8] 8.Qin F., Auerbach A., Sachs F. Maximum likelihood estimation of aggregated Markov processes. Proc. Biol. Sci. 1997;264:375–383. doi: 10.1098/rspb.1997.0054. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib9] 9.Siekmann I., Wagner L.E., II, Sneyd J. MCMC estimation of Markov models for ion channels. Biophys. J. 2011;100:1919–1929. doi: 10.1016/j.bpj.2011.02.059. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib10] 10.Carter C.K., Kohn R. On Gibbs sampling for state space models. Biometrika. 1994;81:541–553. [Google Scholar]

[bib11] 11.Metropolis N., Rosenbluth A.W., Teller E. Equation of state calculations by fast computing machines. J. Chem. Phys. 1953;21:1087–1092. [Google Scholar]

[bib12] 12.Hastings W.K. Monte-Carlo sampling methods using Markov chains and their applications. Biometrika. 1970;57:97–109. [Google Scholar]

[bib13] 13.Christen J.A., Fox C. A general purpose sampling algorithm for continuous distributions (the T-walk) Bayesian Anal. 2010;5:263–282. [Google Scholar]

[bib14] 14.Gin E., Falcke M., Sneyd J. Markov chain Monte Carlo fitting of single-channel data from inositol trisphosphate receptors. J. Theor. Biol. 2009;257:460–474. doi: 10.1016/j.jtbi.2008.12.020. [DOI] [PubMed] [Google Scholar]

[bib15] 15.Larget B. A canonical representation for aggregated Markov processes. J. Appl. Probab. 1998;35:313–324. [Google Scholar]

[bib16] 16.Efron E., Tibshirani R. Chapman & Hall; New York: 1993. An Introduction to the Bootstrap. [Google Scholar]

[bib17] 17.Jasra A., Holmes C.C., Stephens D.A. Markov chain Monte Carlo methods and the label switching problem in Bayesian mixture modeling. Stat. Sci. 2005;20:50–67. [Google Scholar]

[bib18] 18.Ball F.G., Cai Y., O'Hagan A. Bayesian inference for ion-channel gating mechanisms directly from single-channel recordings, using Markov chain Monte Carlo. Proc. R. Soc. Lond. A. 1999;455:2879–2932. [Google Scholar]

[bib19] 19.Wagner L.E., 2nd, Yule D.I. Differential regulation of the InsP3 receptor type-1 and -2 single channel properties by InsP3, Ca2+and ATP. J. Physiology. 2012;590:3245–3259. doi: 10.1113/jphysiol.2012.228320. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib20] 20.Siekmann I., Wagner L.E., II, Sneyd J. A kinetic model of type I and type II IP3R accounting for mode changes. Biophys. J. 2012;103:658–668. doi: 10.1016/j.bpj.2012.07.016. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib21] 21.Hodgson M., Green P. Bayesian choice among Markov models of ion channels using Markov chain Monte Carlo. Proc. Roy. Soc. London A Math. Phys. Eng. Sci. 1999;455:3425–3448. [Google Scholar]

[bib22] 22.Flomenbom O., Silbey R.J. Utilizing the information content in two-state trajectories. Proc. Natl. Acad. Sci. USA. 2006;103:10907–10910. doi: 10.1073/pnas.0604546103. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

MCMC Can Detect Nonidentifiable Models

Ivo Siekmann

James Sneyd

Edmund J Crampin

Abstract

Introduction

Figure 1.

Theory and Methods

Continuous-time Markov models for ion channels