Abstract
Hidden Markov models (HMMs) for biomolecules suffer from various forms of parameter non-identifiability. This poses severe challenges to both maximum likelihood and Bayesian inference. However, Bayesian inference offers effective means of overcoming these pathologies. We study the role of prior distributions in the face of practical parameter non-identifiability in Bayesian inference applied to prototypical patch clamp data of ligand-gated ion channels. We advocate the use of minimally informative priors, as they increase the accuracy and decrease the uncertainty of the inference. For complex HMMs, stronger prior assumptions are needed to render the posterior sufficiently proper. This can be achieved by confining the parameter space to physically motivated limits. Another beneficial assumption is finite cooperativity of ligand-binding and unbinding events, which introduces a bias towards non-cooperativity but still allows for a non-vanishing degree of cooperativity that is inferred from the data. Despite its vagueness, our prior renders the posterior sufficiently proper for all datasets that we considered without imposing the assumption of non-cooperativity. Combining all prior factors allows for meaningful inferences with a dataset of a thousand times lower quality.
I. INTRODUCTION
Time series data of various processes can be explained by continuous-time Markov Models (MM) [1]. In a biophysical context, the data could probe, e.g., the function of different proteins [2–9] or RNA folding kinetics [10, 11]. Assuming a well-mixed environment, these systems can be modeled by discrete states that interconvert via stochastic transitions, thereby defining a chemical reaction network (CRN). For example, states of a ligand-gated ion channel can be classified by conductance, dwell time, or the number of bound ligands [11–14].
Biophysical experiments typically only observe partial and noisy data such that states with similar signal properties, e.g., conductance, are aggregated into signal classes. Therefore, hidden Markov models (HMMs) must be used to describe the data [15, 16]. A common use case is the analysis of single-molecule ion channel data [17–23] but HMMs are also applied to other experiments [7–11, 24–29].
Often, the mean signal observed in single-molecule and ensemble data is a linear projection of the full Markovian dynamics onto a lower dimensional observable. This can cause the data to be insensitive to the rates of specific subprocesses within the CRN, which complicates their biophysical interpretation. We argue that the dimensionality reduction and aggregation of states, in general, induces a varying degree of practical parameter non-identifiability even for simple CRNs. Experimental noise and limited signal bandwidth only increase the severity of non-identifiability issues. Even worse, the HMM might become structurally non-identifiable [30–34].
Structural non-identifiability refers to models whose parameters cannot be inferred uniquely, even with an infinite amount of data [34]. For example, it might only be possible to infer algebraic combinations of parameters but not the parameters themselves. Instead practical non-identifiability is encountered when there is still a unique optimal parameter set, but it is impossible to collect enough data to reach a sufficiently low parameter uncertainty [33, 35, 36].
Whether a model is structurally or practically non-identifiable depends on the likelihood function, which can be derived from the chemical master equation (CME) in the case of MMs [37]. Here, we study a Bayesian filter [38] based on the Fokker-Planck approximation (FPa) of the CME [39], which preserves the crucial Markov property in a macroscopic signal [38]. The Bayesian filter extends the ideas of Moffat [40] to define a more general and realistic likelihood for ensemble patch clamp (PC) data. A complete HMM inference consists of both parameter estimation and model selection. In some cases, model selection for HMMs can be automated by inferring an infinite HMM [41–43]. However, to the best of our knowledge, the infinite HMM [41–43] only applies to single-molecule data analyzed by discrete-time HMMs but cannot be extended to ensemble data or continuous-time HMMs. Therefore, we assume a fixed CRN topology during parameter inference. Notably, the hidden variable of an HMM does not have to be discrete but can also be continuous [44, 45]. For example, Kalman filters [46, 47] can be used to approximate discrete HMMs [38, 40, 48–51] and define a valid HMM [44, 45].
Parameter estimation via maximum likelihood (ML) [52, 53], profile likelihood [33, 35], maximum a posteriori (MAP), and Bayesian inference [54] suffer in different ways from practical non-identifiability. Limitations in the amount and quality of data (relative to the complexity of the investigated CRN) severely impair or even prohibit ML and MAP inferences [55]. The profile likelihood technique has better uncertainty quantification than ML [33, 35], but still assumes an asymptotic amount of data. A full Bayesian inference [20, 21, 23, 41, 56–59] does not refer to an asymptotic limit in the amount of data and thus has a unique way to deal with parameter non-identifiability. In Bayesian statistics, unknown quantities are treated similarly to random variables [60, 61] in that probabilities express their uncertainty. The prior distribution encodes their uncertainty before analyzing the data. The result of a Bayesian analysis, the posterior distribution, represents the uncertainty of the unknowns in the light of the combined information encoded in the prior and the likelihood. Nevertheless, structural and practical non-identifiability also poses a challenge to Bayesian inference because these pathologies can result in improper posteriors [35, 62].
Our work focuses on the benefits and limitations of minimally informative and vaguely informative priors motivated by physical considerations in the presence of practical non-identifiability. We show that practical non-identifiability can be severely harmful when using uniform priors on the rate matrix of HMMs. In contrast, we suggest using a minimally informative prior [63–66] inspired by Jeffreys [63] and Jaynes [67]. Minimally informative priors are designed to make posteriors as sensitive to the data as possible. We observe that the minimally informative prior increases the accuracy and surprisingly decreases the uncertainty of parameter inferences. Notably, the minimally informative prior will significantly impact the posterior for any plausible amount of data. We explain the origin of these observations by the presence of practical non-identifiability.
Further, we demonstrate that the uniform and minimally informative priors lead to improper posteriors that cannot be normalized due to the practical non-identifiability (in an infinite parameter space [35]). Thus, the minimally informative prior only alleviates the challenges arising from practical non-identifiability of HMMs by making the posterior sufficiently proper but does not fully resolve them. A definition of what a sufficiently proper posterior means will be given below. We show that rendering the posterior sufficiently proper is the best that can be achieved in HMM inference when using minimally rather than vaguely informative priors. Notably, the same holds for ML inference. Furthermore, the minimally informative prior drastically improves the convergence [68] of the Hamiltonian Monte Carlo sampler [69–73]. Our results show that the limitations of the posterior observed in [35] are due to the use of a uniform prior.
Moving on to more complex HMMs, we show that eliminating the bias from a uniform prior does not solve non-identifiability issues. More information is needed, even if the HMM is structurally identifiable [62], for sufficiently proper posteriors. We present two techniques to achieve this. The first option is to enforce theoretically derived upper bounds, such as diffusion limits for binding rates. These restrict the regions in parameter space that contribute to a diverging normalization integral. This often renders the posterior sufficiently proper. However, hard theoretical limits are rarely available and might not apply to all parameters that suffer from non-identifiability. As an alternative or as an additional restraint, we suggest coupling each pair of binding rates and unbinding rates softly. These coupling terms bias the CRN towards non-cooperativity without enforcing it. Introducing these vaguely informative priors is more flexible than the common approach of setting parameters equal, which assumes strict non-cooperativity. The vaguely informative prior only defines the scale of plausible positive or negative logarithmic ratios, i.e., cooperativity in homomeric proteins. Hence, our approach infers how likely different degrees of cooperativity are compared to a non-cooperativity bias, which functions as Occam's razor. The most precise and accurate inferences are obtained if the minimally informative prior is combined with both additional prior assumptions. This combination of prior information allows for meaningful inferences with a thousand times poorer data quality for the most complex HMM that we studied. Notably, this reduction in the data quality that is necessary for meaningful HMM inference is crucial for CRNs of this complexity in the analysis of real-world PC data sets.
II. PARAMETER NON-IDENTIFIABILITY IN SIMPLE REACTION NETWORKS
Given time series data of length , and a probabilistic model in the form of a likelihood , the ML approach [74] infers the unknown parameters by maximizing over the parameter space . For models with structurally identifiable parameters, converges in distribution to . The quantification of the uncertainty of the ML estimate for models that satisfy certain regularity conditions is discussed in Sec. II B. Unfortunately, HMMs do not satisfy these regularity conditions [75]. They are singular instead of regular statistical models. We indicate the possible consequences of singular models by the rate equation (RE) solutions of two toy kinetic models.
A. Structural parameter non-identifiability
In general, structurally non-identifiable models are characterized by submanifolds in in which the likelihood is constant, even with infinitely many data [30, 62]. For the sake of argument, we only look at the RE solution from which an approximate likelihood can be derived. An example of a structurally non-identifiable model is a linear birth-death process characterized only by the mean number of bacteria in a well-stirred petri dish:
| (1) |
Parameter pairs with constant difference result in the same . This model is structurally non-identifiable, because one cannot disentangle and based on alone. A likelihood derived only from would show the symmetry . This implies that the likelihood is flat, , along straight lines in with intercept . Hence, the ML estimator cannot converge to a normal distribution centered at . However, there is independent information in the higher-order statistical moments that renders the linear birth-death model structurally identifiable. By incorporating the information contained in , which can be derived from the CME [76], one obtains a structurally identifiable model. Thus, structural non-identifiability can be caused by ignoring higher statistical moments of the data-generating process. Similarily, ignoring the Markov property of equilibrium fluctuations leads to structural non-identifiability in HMM inference as shown in [40].
B. Practical parameter non-identifiabilty
Structural identifiability is necessary, but not sufficient for successful ML inferences. For a finite amount of data, the HMM must also be practically identifiable, or, as we prefer to argue, it must be sufficiently practically identifiable. Let us clarify what we mean by this. The literature offers different definitions of practical identifiability vs. practical non-identifiability. Here, we follow the definitions of [33]. Likelihoods suffering from practical non-identifiability do not decay to zero (Fig. 1, blue curve for ), but stretch out infinitely in regions of (in one or multiple dimensions) for any finite amount of data [33]. This happens already in simple, partially observed CRNs.
FIG. 1. The severity of practical parameter non-identifiability depends on the relative height of the peak compared to the non-vanishing tails.
We sketch a one dimensional inference problem, e.g., a unknown chemical rate . The red dashed line shows the ML inference or Laplace approximation of the posterior based on a uniform prior. Note that the ML inference (using the curvature) typically underscores the uncertainty (quicker decay of the red dashed line than the blue solid line), particularly for . Note that a prediction of the uncertainty based on the curvature at cannot detect these shortcomings, because any function can be approximated with a second-order Taylor expansion around extreme values.
Let us assume that we can record the occupation number of state at any frequency and without any measurement noise such that, for the sake of argument, the additional complications arising from noisy data are avoided. As an example for a practically non-identifiable likelihood consider a CRN with a rate that depends on ligand concentration (or any other stimulus-dependent rate):
| (2) |
for a finite number of channels. Assuming that the initial condition is and that the experimental readout is , the part of the general solution of the RE that is experimentally accessible is
| (3) |
Note that the solution of the CME for the initial condition is a multinomial distribution. To understand this, consider the case that only one molecule is in state at . For all , the molecule has probability to be in , to be in and to remain in . If, instead, one has independent and identical molecules in state at , then each of them individually has the same probability , and at . Hence, the distribution over all states is a multinomial distribution [77] that evolves over time. If only state is observed, one can reduce the problem to
| (4) |
If none of the rates could be changed externally by varying such that , then we would also face structural non-identifiability. For a ligand-dependent rate , we can run the experiments at different ligand concentrations and thereby overcome structural non-identifiability. However, if we measure at two concentrations , such that and and are all different, but similar in magnitude, then we can still face a practical non-identifiability problem.
Practical parameter non-identifiability originates from the following phenomenon: Any combination of values for and that satisfies
| (5) |
will push the amplitude of one of the two exponential decays (Eq. 3) to zero. Even if experiments were run at many different ligand concentrations , we could still find combinations of and that satisfy one of the conditions in Eq. 5 for all . In regions of where one of the conditions (Eq. 5) holds for all , minor changes in or will hardly affect . Note that the correct solution of the CME is a multinomial distribution given the initial conditions. Adding information about the entire distribution of such as variance or skewness, etc. will not resolve the practical non-identifiability problem caused by the vanishing amplitude of one of the exponential decays. However, the multinomial model would improve the accuracy and quality of the uncertainty quantification.
This example is reminiscent of the common scenario for coupled CRNs in which only can be inferred if . However, here we do not consider a scenario in which the signal of the data-generating process is rate-limited. Instead, we assume that at least three different rates are at play, , , and that have similar magnitude but non-identical values. Therefore, rate-limiting contributions do not exist at least for one combination of and in the data-generating process. Nevertheless there are regions in parameter space , far away from the true parameter values , in which one or the other rate is rate-limiting for the predictions of the model. The structure of the CRN together with the fact that the signal is generated by a linear projection have the potential to be rate-limited somewhere in , independent of the true parameters of the data-generating process . Thus, for models such as , the likelihood will approach a non-vanishing constant due to rate-limiting effects in regions where and hence become practically non-identifiable. See App. A for a discussion on the effects if states , are observed in isolation or simultaneously.
Structural identifiability is a binary property (a model is either structural identifiable or not), whereas practical identifiability is gradual (continuous) [78]. The likelihood function (Fig. 1 blue curve) which is proportional to the posterior (for a uniform prior) indicates the continuous nature of practical non-identifiability. The maximum value of relative to the constant value in the tails specifies the degree of parameter identifiability. However, to classify models into practically identifiable or practically non-identifiable, one uses the confidence interval CI based on confidence level :
| (6) |
One defines the model as practical identifiabile if the confidence interval CI is a compact set. This holds if the subjectively defined threshold (Fig. 1) [76] is larger than the asymptotic value of the likelihood (Fig. 1, black dashed line). For multi-parametric models, there will be multiple asymptotic values for the different directions in . The interval where the dashed black lines cross the likelihood profile (blue lines) (Fig. 1) is the largest asymmetric confidence interval that can be deduced by using the profile likelihood technique [33]. The data contain no information to distinguish values of larger than . However, the data are informative for values and even . Often, the profile of might approach a non-vanishing constant value only asymptotically. Subjectively defining a threshold relative to the maximum value of the likelihood [33] is equivalent to choosing a significance level []. Note that ML does not infer the shape of globally (Fig. 1 red dashed curve). ML estimates the shape based on the curvature of the likelihood at using the Fréchet-Darmois-Cramér-Rao bound theorem. Therefore, standard ML does not detect practical non-identifiability (Fig. 1 red dashed curve). The degree to which practical non-identifiability affects the parameter inference depends on, as discussed, the intrinsic properties of the MM (i.e. the number of states and their connectivity) and on the specifics of the experimental data such as the rank of the linear projection, the signal-to-noise ratio and the signal bandwidth.
Due to the challenges indicated by the two toy examples, HMM inference based on ML has several drawbacks compared to sampling from the posterior . First, one must take extra precautions against pathologies of resulting in structual or practical non-identifiability, because ML does not reveal them [33]. Second, even if the model is structurally identifiable and sufficiently practically identifiable, the quality and quantity of the data are often insufficient to meet the implicit assumption that can be approximated by a normal distribution, which is a requirement to justify the use of ML. For a comment on strategies to detect parameter non-identifiability see App. 1. Fortunately, Bayesian statistics can deal with these pathologies of the likelihood. Nevertheless structural and practical non-identifiability pose a challenge. They create regions in where the prior dominates entirely the likelihood.
III. BAYESIAN INFERENCE IN A NUTSHELL
The Bayesian posterior
| (7) |
is a probability distribution on and combines the information encoded in the prior and the likelihood to quantify the uncertainty of . The posterior is called “proper”, if it can be normalized, which means the denominator in (Eq. 7) satisfies
| (8) |
We introduce the terminology sufficiently proper to clarify the notion of practical parameter non-identifiability. Practical non-identifiability in combination with minimally informative priors, which are often improper, may result in posteriors that are improper in a strict sense. The blue curve in Fig. 1 illustrates this for a one-dimensional case and an improper uniform prior. The essential information making the posterior proper (Fig. 1) are the inconspicuous cutoff values of the uniform prior.
The higher the posterior the less sensitive is the inference to actual values of the cutoff. Thus, we define the ratio of the height of the posterior (based on a uniform prior) at the MAP estimate and its non-vanishing asymptotic value as
| (9) |
One could subjectively define the posterior to be sufficiently proper, if , say, meaning that the peak of the posterior/likelihood towers at least by three orders of magnitude over the flat parts of the likelihood (in all unbounded directions in ). For , the posterior is sufficiently proper and might even be strictly proper. For non-uniform priors and one-dimensional inferences, flat parts of the likelihood reveal themselves by a posterior that is proportional to the prior in that area.
In multi-parameter inference problems with non-uniform priors the situation is in general more complicated. The likelihood could approach a non-vanishing constant value on some unbounded subset of any shape. However, we will encounter the simpler case, that marginal posteriors are proportional to the prior in certain areas. This can be explained by assuming that the likelihood is flat for an unbounded set in the direction of parameter . Let us denote the parameters without by and also assume that the prior factorizes: . Then
| (10) |
holds for the marginal posterior locally. So changes in the posterior are proportional to changes in the prior along in that region of the parameter space. If is more complicated (any differentiable curve or hyperplane), then one need to check if changes in the posterior are proportional to changes in the prior, when moving within . From a practical perspective, we call a posterior to be sufficiently proper, if (for a posterior based on a uniform prior) is small enough such that sampling from the posterior of interest (based on the minimally informative prior) is insensitive to moderate changes in the limits of the sampling box. Only if these limits are increased by orders of magnitude, then the posterior is going to be affected in its lower-order statistical moments. We will see that minimally informative priors, while sensitizing the posterior to the data, desensitize the resulting posterior to the exact limits of the sampling box and thereby render posterior sampling more robust. That way, it will become possible to analyze a dataset with, i.e., , if we use a minimally informative prior. We will also demonstrate that parameter inference can be improved further if we include additional information via the prior.
We will also use a simpler definition of sufficiently proper based on posterior samples, namely if the posterior mode carries most of the probability mass such that samples from the tail region hardly reach the limits of the sampling box. This indicates that probability mass in the tail regions is negligible relative to the probability mass under the posterior mode. Note that the density is only well-defined because we refer to a finite volume in the parameter space, otherwise the posterior is improper.
We refer to an inference as fully Bayesian if Eq. 7 is calculated or sampled
| (11) |
We use Hamiltonian Monte Carlo (HMC) [69, 70, 73] as provided by the Stan software [71, 72] to generate samples from . In addition to the covariance matrix of the parameters, allows the calculation of the credibility volume in order to assess parameter uncertainty. The smallest volume that encloses a probability mass
| (12) |
is called the Highest Density Credibility Volume (HDCV). Assuming that the model sufficiently captures the data generating process, the true parameter values will lie in the HDCV with a probability as soon as the likelihood dominates the prior.
Bayesian inference is conditional on the assumed prior and likelihood [79]. Altering or changes . The prior becomes irrelevant only in the infinite data limit (and only for regular models), meaning that ML and Bayesian inference become equivalent [54]. In case of practically non-identifiable models, Bayesian inference has at least two advantages over ML. First, by scrutinizing in detail, issues with structural or practical non-identifiability are revealed. Second, the introduction of priors can alleviate non-identifiability problems.
If little is known a priori about reasonable parameter values and the data constrain some parameters only vaguely, the use of a minimally informative prior is essential. It attempts to make as sensitive to the data as possible. Typically, a minimally informative prior maximizes the variance of the posterior . In contrast, we show that the minimally informative priors introduced below help confine within reasonable boundaries. Thus they reduce the variance of the posterior, if compared to uniform priors. However, minimally informative priors themselves are often improper, which might also render improper if is practically or even structurally non-identifiable. Thus, the posterior will be dominated by the prior in regions of where data fail to inform us about the parameters. Fortunately, Bayesian statistics provides us with tools to render sufficiently proper such as theoretically derived upper limits on parameters or vaguely informative assumptions about cooperativity incorporated in . The benefits and limits of combinations of minimally and vaguely informative priors in the presence of practical non-identifiability will be discussed for two CRNs (Fig. 2).
FIG. 2. Chemical reaction networks of ligand-gated ion channels and their simulated patch-clamp data.
a, c The MMs (one MM per column) consist of three or five closed states “” and one open state “”. Binding steps (red arrows) have concentration-dependent rates. The CRNs are specified by the absolute rate constants , and , the ligand concentration. Rates are given per subunit. Stoichiometry factors account for the number of subunits able to undergo the respective transitions. The units of the rates are in a. u. To calculate their SI units and , one needs to multiply their value by 6/7 (Sec. V C 4). The open states conduct a mean single-channel current , and the closed states conduct a current of . The synthetic data were simulated with the Gillespie algorithm at a sampling rate of 10 ka.u. The Bayesian filter analysis frequency is 2 to 5 ka.u. Since the units are in a. u., the ratios of rates or inverse dwell times determine the CRN. Further their ratio to the sampling frequencies determines how detailed the kinetics are recorded. Similarly, the relative magnitude of compared with and should be used to relate simulations to experimental conditions. b, d Open probability time traces calculated from normalized currents of simulated relaxation experiments of ligand concentration jumps with channels. For demonstration purposes, no experimental noise is added in this figure such that all fluctuations originate from Markov state transitions. However, when inferring the posteriors, additional experimental noise is added. The black lines are the theoretical open probabilities of the model. Typically, we used the set of 10 ligand concentrations.
IV. PARAMETRIZATION OF THE RATE MATRIX
In the following, we will analyze patch-clamp data that are simulated with MMs involving only mono-molecular or pseudo-monomolecular chemical reactions such as conformation dynamics of a protein or binding/unbinding transitions at excess ligand. If the CRN describes a single molecule, it can only be in one of Markov states at time :
| (13) |
where we use a one-hot encoding of states. If where denotes the standard basis, then the channel is in state at time . State-to-state transitions are governed by a transition matrix in discrete time,
| (14) |
with time increment , or with a rate matrix in continuous time:
| (15) |
where for . By definition, each column of sums to zero reflecting the assumption that the CRN is closed. The dwell times to remain in the -th state are exponentially distributed with mean . To facilitate the definition of a minimally informative prior in the next section, we use an alternative parameterization of that does not involve the chemical rates :
| (16) |
The parameters denote the probability of transitioning from state to state after the random dwell time has passed. Thus, each chemical rate is the product of a probability with the inverse mean dwell time. The parameters have no units, unless the statistical weight corresponds to a ligand-dependent . Since , both parameterizations of have the same number of free parameters. For each column, we separated the inverse time-like scale parameters from the probabilities , which are shape parameters. Because transition probabilities are constrained to , the likelihood remains finite for all and will be proper in these parameters (as long as Haldane-like priors are excluded for ). Then, only the dwell time parameters can render the HMM practically non-identifiable.
Figure 2 shows the time traces of plausible CRNs of two ligand-gated ion channels that have two binding pockets. These can be simulated with QuB [80] or an inhouse algorithm https://cloudhsm.it-dlz.de/s/QB2pQQ7ycMXEitE. The assumptions made to define a likelihood for these data are detailed in [38]. In App. B, we discuss the global sensitivity of the solution of the RE for CRN1 (Fig. 2 a) and demonstrate practical non-identifiability of the likelihood (App. B 3) even for over-optimistic data and strong prior knowledge (only a single rate constant is unknown).
V. DEFINING AND BENCHMARKING THE MINIMALLY INFORMATIVE PRIOR
The following section benchmarks the performance of the Bayesian filter for different combinations of minimally informative priors and physically motivated vaguely informative priors, for cases where the information content of the data is low relative to the number of parameters of the CRN. We first compare a uniform prior on the rate matrix with a minimally informative prior defined below (Eq. 20), which promises to be less biased (less misinformed). The reason is that the practical non-identifiability of the likelihood (App. B 1) is aggravated when using a uniform . In an unfortunate combination, a uniform prior places the probability mass where the likelihood becomes less and less pronounced and reaches a constant finite value (App. B 1). With a minimally informative prior, however, one can drastically reduce the severity of this problem. But one should be aware of the limitations of both priors. See App. C 1 for a brief biophysical example for the problem of different parametrizations of statistical models and prior distributions, which gave rise to the following Eq. 17.
A. Definition of minimally informative (MI) priors by approximating Jeffreys's rule
We will use a revised version of Jeffreys's rule [64] to define the minimally informative prior, which treats location, scale and shape parameters independently:
| (17) |
The location parameters, , such as the mean value of the normal distribution, are assigned uniform priors. Each scaling parameter, , has a log-uniform prior. Only the shape parameters are treated conjointly by evaluating the Fisher matrix,
| (18) |
according to Jeffreys's rule [64]. The separated treatment of location, scale and shape parameters can be applied to the used parametrization (Eq. 16) of . For a brief introduction of Eq. 17, see App. C 2. In addition, we simplify Eq.17 by assuming that can be applied to each column of (Eq. 16) independently. In that way, we obtain closed-form solutions of derived from simpler statistical models for the remaining of each column of .
B. Minimally informative prior for the rate matrix inspired by Jeffreys's rule
We use a simplification of Eq. 17 that is common practice [11] for complex multi-parameter MMs. The priors used to infer MMs from MD simulations are constructed by applying Jeffreys's rule to simpler statistical models [11] instead of applying it to the entire model. Bayesian estimation of (Eq. 14) from MD simulations [81] often uses one Dirichlet prior per column of the transition matrix:
| (19) |
However, the Jeffreys prior for is not a product of Dirichlet distributions [82, 83]. Also, for HMMs of single-molecule force spectroscopy data [10], products of Dirichlet priors are used for .
Here, we do not sample but (Eq. 14) because, in contrast to sampling , it is trivial to incorporate information about the scaling of the binding rates at different ligand concentrations in a direct parameterization of . The same applies to additional prior information on maximal binding rates. Note that with one exception, we use the parameterization of Eq. 16 to define . The exception will be discussed later when we add the information about theoretical upper diffusion limits on binding rates and (CRN1 and CRN2) and (CRN2). One can mix in the parameterization any , and equivalently as long as the pior remains equivalent, i.e., that holds, with indicating a different parameterization. As the mean dwell times are scaling parameters, we use a log-uniform prior following the arguments above. We use the Dirichlet prior for the probabilities , which is the default prior for probability vectors. These probabilities should not be confused with the transition probabilities stored in . Applying Eq. C3 to a multinomial likelihood, results in a Dirichlet prior with for all parameters. The value of is the number of Markov transitions leaving the -th state. For a physically meaningful CRN, tends to be sparse, e.g., ligand binding and channel gating does not occur at the exact same instant of time. Thus for the -th Markov state, the number of allowed transitions will usually satisfy where is the total number of states. The set contains the indices of all states that can be reached by one Markov transition, leaving the -th state. Then, using Eq. 17 and evaluating the factor for each column of individually, we obtain
| (20) |
where is the vector of concentration parameters and is the beta function at . The set of and for the log uniform distributions are the upper and lower limits of . The topology of the CRN is controlled by the Dirichlet distribution. See App. D for an illustration of for a state with two or three leaving transitions.
C. Advantages and limitations of the minimally informative prior in the presence of practical non-identifiability
Using the Bayesian filter, we study the impact of three different priors on the performance of , focusing on weakly informative patch-clamp data. The data are sampled from a 4-states-1-conducting-state HMM (CRN1 Fig. 2a). Our findings are presented in Fig. 3.
FIG. 3. Prior sensitivity analysis showing that the uniform prior aggravates problems caused by practical non-identifiability.
The figure contrasts the posteriors resulting from three different priors: uniform prior (green), minimally informative prior (black), and minimally informative prior with physical limits on the binding rates (blue). The rates are transformed to inverse dwell times and transition probabilities. a, RMSE of the mean of the marginal distribution vs. based on the uniform prior (green), the minimally informative prior(black) and minimally informative prior with imposed imposed diffusion limits (blue). The cyan dashed curve is a fit based on . A standard deviation of each data point which follows is assumed. All data sets were generated with and . b, Posterior distributions of the dwell times and transition probabilities for . On the diagonal, 1D marginal posteriors are plotted. The off-diagonal plots show the 2D marginal distributions of the posterior. All samples of the parameters were normalized to their true values . The blue lines indicate the true parameters. The insets on the diagonal show the same histogram only plotted on the logarithmic scale to display the flatness of the posterior if a uniform prior is used. The (red) vertical bar indicates , which corresponds to due to the normalization by the true values. The posterior is plotted based on samples with a Gelman-Rubin statistic of .
1. The scaling of the RMSE
We define the Euclidean distance or root-mean-square error (RMSE) between and the posterior mean of all chemical rates in log-space as
| (21) |
Appendix E discusses why we use the logarithm of the chemical rates. In Fig. 3a, the RMSE of the inferred is plotted against the number of ion channels per time trace . We use as a proxy for the quality or information content of the data. A regular statistical model is expected to show a -scaling of the RMSE, but the Bayesian filter is singular. This becomes apparent for the uniform prior: Below a critical value the RMSE deviates visually from the behavior of the other two priors. Above the Bayesian filter behaves like a regular statistical model.
In the biased regime, , the uniform prior causes the RMSE to deviate from the -scaling, whereas the log-uniform prior scales as to far smaller , because it imposes a penalty on larger values of . Deviations from -scaling of the RMSE to smaller values also occur if upper bounds are enforced on some . The log transformation eliminates the lower limits (App. E), but the upper bounds derived from the diffusion limit (Sec. V C 4) are still active. In contrast to the equivalence of the minimally informative and uniform prior above the parameter limits reduce the RMSE even at that are two orders of magnitude larger than . Note that the RMSE is dominated by the error in whose uncertainty is reduced most strongly by the constraints (Fig. 14). The impact of the other upper limits is discussed below.
2. The likelihood dominates the posterior only in a small region of the parameter space
Next we explore the impact of the prior in more detail for a dataset in the critical regime (). The symbol denotes a parameter divided by its true value such that should ideally be zero. Figure 3b shows the marginal posteriors of . With a uniform prior, the posterior of concentrates between 102 and 103 and would deviate even more strongly from zero, if the sampling box was larger. The insets in the diagonal panels provide an overview of the marginal posteriors of and over the full sampling box. To indicate the relative probability masses, the posterior ratios with , 2 are plotted. The flat right tail of the marginal posterior obtained with the uniform prior (Fig. 3b, green curve) causes the observed deviations of the RMSE (Fig. 3a, green curve and Fig. 13). Furthermore, a flat priordominated part of creates an exponentially growing part in . The ratios and drop from their peak values to their right tails only by before they reach a non-vanishing plateau. It takes two orders and three order of magnitude to do so. Hence, the posteriors and , are dominated by the uniform prior. It is plausible that this also holds for the entire .
Given that the sampling box covers multiple orders of magnitude for and , the flat part of the marginal posterior (where the data provide no information about these parameters) contributes a non-negligible probability mass to the posterior. Thus, posterior statistics such as mean, median, variance and derived quantities such as the RMSE become sensitive to the probability mass residing in the flat part. That should raise one's concern, because the inference will depend on the very limits of the sampling box. A larger sampling box would move probability mass from the peak into the flat area. Hence, when using a uniform prior in the critical regime, the limits of the sampling box for and act as highly informative user settings, even though they are often chosen arbitrarily and lack a physical justification. If instead were to drop to 10−4, say, before the marginal posterior flattens out, changes in the limits would not affect , unless one increased the sampling box by many orders of magnitude. Because the posterior of is improper, we cannot define meaningful HDCVs to quantify their uncertainty, if is defined on the entire real axis. More information is needed, which unfortunately happens often by setting the sampling box limits more or less arbitrarily. Alternatively, vaguely informative priors can be defined by using physical justified information that penalize the tails of as demonstrated below.
3. The minimally informative prior (partially) alleviates practical non-identifiability
A comparison of the posteriors obtained with a uniform and the minimally informative prior exemplifies the harm induced by the uniform prior (Fig. 3, black vs. green curve). The minimally informative prior (Eq. 20) penalizes large and decorrelates . Note that and belong to the same column in describing transitions leaving state . This accumulates probability mass at the true parameter values in , and . Consistently, the choice of prior has a stronger impact on , and (which are much less determined by the data) than on , and . Overall, the minimally informative prior concentrates closer to .
Thus, when looking at the RMSE as a function of , the minimally informative prior reduces the RMSE drastically below (Fig. 3a) such that the regime expands. The scaling is unfortunate if one tries to increase the inference quality by measuring additional data. However, the scaling is desirable in the reverse direction in which decreases, because it prevents inferences from becoming to quickly meaningless. Despite its heuristic definition, the prolonged -regime indicates that with this minimally informative prior one is less biased than with the uniform prior. Below one only sees the prior and hardly any effect of the data when using the uniform prior. The area around the true values has essentially no probability mass. However, the minimally informative prior concentrates the probability mass much closer to . Nevertheless, if the likelihood is flat in and at some distance to the peak, it follows that and are improper. This does not change for the minimally informative prior, because the log-uniform prior cannot be normalized when defined over the entire positive real axis. It diverges at and decays too slowly to zero for . We return to this observation in sec. V D.
Cutting with a sampling box into regions that are accessible and inaccessible to the sampler is not always a problem. Without knowing about practical non-identifiability problems, we showed in [38] that the HDCVs and thus the estimator have a frequentist interpretation as long as the peak of the posterior is towering highly enough over the flat parts. Thus, one finds inside a volume with a certain probability mass with a frequency approximately equal to that probability [38].
4. Adding information from physically motivated upper limits for binding rates
So far, we have shown that using a minimally informative prior robustifies the inference below . To alleviate the problem that the posterior could still be improper or vague in some parameters, we can add more physically motivated information, such as the diffusion limit for the binding rates (Fig. 2, red arrows). Typically, the random collision rate is used as a upper limit for binding rates [84]. Here we use a more realistic estimate for the binding of small ligands [85]. To the best of our knowledge, only Bayesian statistics can rigorously take full advantage of limits on the parameters, because the introduction of parameter limits will impair the validity of the normal approximation of the sampling distribution of . To implement the diffusion limit consistently in a minimally informative way, we need to change the definition of the prior Eq. 20. The rate holds such that we can also draw . So, the log-uniform prior is still used to set the time scale for all transitions leaving state . Then, we draw the statistical weights from the Dirichlet distribution for the second column of such that . The other rates can be defined by . Whether we impose a log-uniform prior on or some is irrelevant as long as this prior is introduced once for each column of .
We simulated the first binding rate (Model 1) with which is the diffusion limit derived in [85] for ligand binding. The stoichiometric factor of 2 incorporates the structural information that two binding pockets are available. Of course the binding rate of a real process will be slower than the upper limit from [85]. To incorporate this aspect we take advantage of the fact that in our simulation of data and inference is an arbitrary definition. In fact, we simulated and defined this value to represent the value in SI units. We can also apply a different mapping of the arbitrary units to SI units. We may decide that 700 a. u. is supposed to represent the diffusion limit [85] in SI units. Thus we defined . In that way we avoid the unrealistic and extreme scenario that is identical to the sampling box limit (which would still be for Bayesian inference a valid use case). All other “time-like” parameters such as sampling rates, dwell times and chemical rates need to rescaled by .
The impact of the upper limits is shown in Fig. 3a–b and Fig. 4 (blue curves). The upper limits on the parameters and are now much smaller than the limits that we used previously. This solves the non-normalizability problem in the crucial parameter and reduces the chance that is non-normalizable. Still, could diverge, if the marginal posterior of diverges. The parameters and are now bounded from above and below by mathematically and physically motivated limits.
FIG. 4. The uniform prior increases the posterior uncertainty.
We define a multivariate analog of the standard deviation (Frobenius norm of the covariance matrix of the posterior) vs. . The colors encode the prior assumptions. The larger the Frobenius norm, the more uncertainty remains after the inference. Plot is based on the same data used in Fig. 3.
The diffusion limits and restrict the RMSE (Fig. 3a, blue lines) to smaller values such that it drops below the regime for . However, even for data sets with , the constraint adds information and decreases the ED. Constraining the two binding rates and also influences via the likelihood the marginal posteriors of other parameters, particularly (Fig. 3b). The constraints shape into a more pronounced distribution which covers the true value.
5. The uniform prior increases posterior uncertainty
The constraint only adds information to as long as the data themselves do not restrict the HDCV within the prior's upper limits. To study the impact of the constraint it does not suffice to consider the RMSE. Notably, is not any quantity possible to report after an inference of real experimental data either. Ultimately, one judges the inference by its credibility intervals/regions (or confidence intervals/regions in an ML context) of the inferred parameters. Thus, only the posterior's shape in general (the median/mean/peak, covariance, and higher order statistical moments describing the tails) are at the modeler's disposal to assess the quality of the inference.
Therefore, we define a quantitative measure of the spread of , which is the Frobenius norm
| (22) |
of the covariance matrix of the samples of on the log space of the chemical rates. The observed transition around (Fig. 3a), that the location posterior derived from the uniform prior becomes equivalent to the posterior derived from Jeffreys prior without diffusion limit at , as judged by the RMSE, is only to some degree present in (Fig. 4, black and green lines), which as noted measures the posterior's spread not location. The spread of based on the uniform prior (green curve) does not converge to the spread of the minimally informative prior (black curve). It shrinks instead parallel given the range that is investigated. This effect becomes more prominent for the more complex CRN2. Further, the larger the size of the sampling box the larger due to the practical non-identifiability problem. The Frobenius norm of the covariance of based on the Jeffreys prior, including the diffusion limit (blue curve), has prolonged smaller values than without the diffusion limit but finally converges towards the Frobenius norm of derived from the prior without upper limits (black curve). Hence, the information from the added diffusion limit is made use of, even up to . Above the RMSE (Fig. 3a) is still a much more fluctuating parameter to benchmark the behavior of than the rather non-fluctuating . follows almost a straight -scaling (Fig. 4). In other words, repeating the experiment under identical conditions will leave almost unchanged, but the posterior's location will be randomly translocated from data set to data set within the soft constraints of . Larger simulation boxes increase the value of the green curve the most, followed by a minor effect on the black curve. This is an elegant result because, ultimately, with real experimental data, the inference quality is judged by uncertainty quantification and not by the RMSE.
D. The minimally informative prior still generates an improper posterior making it difficult to decide which point estimate has the smallest error.
To confirm that is still improper (Eq. 8) and where in the parameter space dominates , the most worrying and are plotted for and sampled in larger sampling boxes (Fig. 5). The PC data originates from the 4-states-1-open-state CRN (Fig. 2 a). Note that because only one Markov transition leaves state .
FIG. 5. The minimally informative prior without diffusion limit generates still improper posteriors for CRN1.
The relative marginal posteriors and are plotted for different and different sampling boxes. The tilde over the parameters indicates that each parameter is normalized to its true value. a displays power law-like behavior, but appears proper given the limits of the sampling box. b, However, the same sampled from a two magnitudes larger sampling box displays an area for large where the prior entirely dominates . The inset shows the mean, median, and peak for vs. for the smaller sampling box (solid line) and larger sampling box (dotted) line. c, for (blue, red densities) displays a non-local correlation structure leading to a bias of . The inset is based on the same data but is transformed to . d, For the heavy tails are much less a concern. The power law exponent for the smallest is . The insets of the panels display the mean (blue), the median (orange), and the peak (green) of the marginal posterior.
As a guide to the eye, functions (dashed lines, Fig. 5a–b,d) that have a power law scaling are plotted, including the log-uniform prior. The right tails of are well approximated by different power laws until the limit, , of the sampling box (Fig. 5a). The slopes of the tails are heavily influenced by the log uniform prior but also the likelihood contributes some additional slope (in other words information) till approximately . For the smallest sampling box, for all values of the seem to be proper but only vaguely identified. The exponent gradually increases with indicating the increase of information about . However, using a two-orders of-magnitude larger sampling box (Fig. 5b) demonstrates that for even larger , holds. I.e. the prior complete dominates in this region of in the direction of . Hence, indicates a practical non-identifiability problem of the HMM as only the prior (Eq. 10) and not the likelihood contributes to the posterior locally. For at least the data do not contribute information for . The limited influence of some chemical rates or dwell times in some parts of on the signal is a general feature of partially observed CRNs (App. B). In the inset of Fig. 5b, the mean (blue), the median (orange), and the peak (green) of vs. for the smaller sampling box (solid) and larger simulation boxes (dashed) are plotted. Mean values or higher statistical moments do not exist for distributions with power law tails with exponents , thus the peak should be reported for the barely identified . Consistently, the peak of (green curve inset Fig. 5 b) has maximally a relative error of a factor of 2 and seems unbiased and not diverging with increasing sampling box size, given the tested data and minimally informative prior. For the same data, the residuals of the mean (blue curve inset Fig. 5 b) of are heavily biased, sensitive to the sampling box limits and orders of magnitude away from . The median (orange curve inset Fig. 5 b) is more robust against different sampling box sizes. However, it is still biased towards too large values of , which is unsurprising as it is not defined on the full support if the shape of the log uniform prior [86] eventually dominates for large . This potentially indicates for parameters with strong practical non-identifiability degree (the magnitude of the posterior peak does not decay by multiple decades before reaching the area where the prior dominates the posterior) that it is better to report MAP values. In Fig. 5c we plot and in the inset to demonstrate that the practical non-identifiability of leads to bias of however the bias is much reduced for . The bias of the corresponding is also present for different data sets (inset of Fig. 5d).
The tails of given the sampling box, seem to be less heavy tailed (Fig. 5d) than those of . For , appears to follow a power law, hence, with . The quicker decaying power-law tails of still create a skewed distribution. The skewed tail of compensates partially for the bias of the peak region of (Fig. 5d). Hence, (blue curve, inset, Fig. 5d) is the least biased point estimator towards too small values until the data are strong enough. In contrast to , the standard Bayesian point estimate, performs best for . Besides our observation, mathematics dictates that reporting for parameters whose posterior has a powerlaw tail with small is more robust because mean values and eventually medians for do not exist. However, it is unclear whether reporting the mean value for parameters with large because of their smallest bias (inset of Fig. 5d). generalizes to other CRNs. Note that from the bias of the peak of to too small values, it is clear that small HDCIs around the peak will not include the with the frequency equal to the probability mass they claim to have. Thus, posteriors which fullfill decently the asymptotic behaviour of the Berstein-von-Mises theorem need larger or a second observable [38].
In summary, the minimally informative prior, particularly, its convex log uniform distribution for ’s or ’s has the desirable feature of concentrating much closer to , but it still produces improper if no further upper limits can be justified. The minimally informative prior alleviates the improperness of by making the posterior less sensitive to the often nonphysical and arbitrary limits of the sampling box, but the practical non-identifiability problem will become relevant when increasing the sampling boxes at some point for all data sets. Further, the higher the data quality, the less sensitive is the inference to the sampling boxes limits. Hence, the degree of the practical non-identifiability problem has to be judged based on how much the peak of decays before the slope of is essentially the slope of the prior.
E. Solving the practical non-identifiability problem with vague additional information on cooperativity
We exemplify how to robustify an improper by physically justified vaguely informative prior distributions. For CRN1, we enforce a soft physical constraint on the one hand on the binding rates and and on the other hand on the unbinding rates and by a regularizing prior, plausible for homomeric proteins. Physical common sense dictates that one should be skeptical a priori, if binding rates or unbinding rates for the same/similar binding sites have values differing by orders of magnitude. One modeling assumption encoding this skepticism could be
| (23) |
for binding and for unbinding
| (24) |
with some finite standard deviation (Fig. 6). Note that this corresponds to the classical definition of cooperativity: The ratio of to : If the affinity increases with the number of occupied binding sites, this is called positive cooperativity. I.e. if the microscopic binding rates(the binding rates per binding site) are constant, e.g. diffusion limited, the ratio can be used as measure of cooperativity with = 1 equals no cooperativity, > 1 is positive, < 1 negative cooperativity. The vaguely informative prior is a much less radical prior (assumption) than assuming identical microscopic rates (which is the non-cooperative assumption, , for the binding and unbinding), as frequently done for ligand gated [87, 88] and more excessively for voltage gated ion channels [89–92] to alleviate structurally non-identifiable or practical non-identifiability problems in HMM inferences. The CRN of the shaker channel [90] is a good example of how the structural prior information of having four subunits within the shaker channel implies a necessary complexity of the CRN to allow it to explain the data and to be physically interpretable. This CRN has that many voltage dependent rates that setting certain subsets of rates equal is used to avoid non-identifiable problems. However, this assumption might be incorrect for the ion channel at hand. Thus one gains identifiably of the parameters by loosing potentially the ability of the model to express the true process. Instead, assuming that cooperativity cannot change the chemical rates beyond some reasonable scale is physically plausible and restricts the model much less. One might debate the prior's variance and the prior's shape. Note that adding this additional regularizing prior solves the improperness problem of originating from a flat likelihood for high values of , no matter what finite is used because of the quickly decaying tails of normal distributions. Notably, almost any additional prior that adds at least a tiny amount of decay till infinity to the otherwise log uniform dominated posterior would also render proper. The normal distribution on the logspace has, by definition, desirable properties: It is symmetrical on the order of magnitude, and defines an area, within one standard deviation, with little impact on and an area of increasing penalty for values away by multiple standard deviations. In that way extreme conclusions from the data, expressing strong cooperativity effects in the data-generating process, have to be more and more supported by the data to be represented with a relevant magnitude in . Using the larger sampling box (Fig. 6) makes the regularization more urgent. In Fig. 6 panel a,b, we demonstrate how applying the additional prior renders proper even for and . Notably, in this case covers 2 oders of magitude. The bias (inset, Fig, 6a) of is drastically reduced with a finite compared to the pure minimally informative prior, and decreasing further improves the inference (Fig. 6b) in terms of bias and variance. Notably, the prior nudges the posterior to concentrate its mass between and . Thus, at some point, the smaller of Eq. 23 and 24, the more is the variance of the posterior decreased but also the bias of starts to dominate the posterior more and more. The traditional non-cooperativity assumption trades variance of for a maximum of bias and thus should only be applied if there is strong a priori evidence that non-cooperativity is true in the data-generating process.
FIG. 6. A vaguely regularizing prior on the cooperativity factor renders the posterior proper even for the lowest quality data.
We demonstrate for the previously used data sets of CRN1 (Fig. 2a) with the effects of the value of using the larger sampling box. Note that holds. The black continuous lines indicate the value of and on the x-axis. The black dashed lines indicate the corresponding and in units of the true value of x-coordinate to visualize the bias of the regularizing prior, constraining the posterior more and more between the dashed and solid black line with decreasing . a, for either no or a series of increasingly strong regularization. The inset compares the effect of the vague regularization (orange) on the mean (solid curve) and median (dashed curve) vs. with no regularization (blue). b, for the same series of increasing regularization. The inset compares the effect of the vague regularization (orange) on the mean (solid curve) and median (dashed curve) vs. with no regularization (blue). c, The effect of the regularization on , with no regularization (blue) and the most vague regularization prior (, green).
F. Minimal informative prior to ward off the curse of complexity and dimensionality
Next, we investigate the challenges arising when the complexity of is increased. CRN2 additionally contains two flip states (Fig. 2 d). We enforce microscopic reversibility, which reduces the number of to-be-inferred parameters by one. For a comment on how the minimally informative prior improves the convergence of the sampler to the true posterior (Ap. F 1) and alleviates the curse of dimensionality. In Fig. 7 a the pathological dimensions of derived from the uniform (green) against the same dimensions of (black) derived from the minimally informative are compared. The less pathological dimensions, in the sense that they deliver roughly Gaussian marginal posteriors, are plotted in Fig. 7 b1–6. Even for , the posterior based on the uniform prior clearly demonstrates the practical non-identifiability feature of the likelihood in and . The ridge-like correlation of (panel a3) with a slow decay along the ridge demonstrates the practical non-identifiability problem. Notably, the ridge is strongly constraint for values smaller than and (panel a1–2) but above it seems to extend to infinity. The strong correlation between , produces a paradoxical challenge of the minimally informative prior which turns out problematic for smaller . The result of the strong correlation is that the sample value for can be predicted by a affine linear function (panel a3 dashed line). Ignoring the affine part of the function, the slope of loguniform prior on will be mapped by the linear function to . Additionally, has its own log uniform prior. The same applies to both parameters the other way around. Hence, both one-dimensional priors together with the practical non-identifiability ridge, contribute to the posterior a scaling of for both parameters, creating bias towards too small values (see, the corresponding in Fig. 8 a). Based on the corresponding 2d marginal distributions (Fig. 7 a), the Markov transition probabilities , and appear to have a more and more diverging posterior the larger the sampling box for , gets. These parameters should be considered unidentified, based on the herein-employed heuristic and visual criterion (see discussion around Eq. 9) to asses the degree of practical non-identifiability . Surprisingly, is for CRN2 not as unidentified as for CRN1 (Fig. 3 b). We employed a smaller sampling box for the uniform prior (because of an often not converging sampler, likely caused by the practical non-identifiability problem and the curse of dimensionality. Using a smaller sampling box for the uniform prior also disadvantages the minimally informative prior in comparison because less of the part of the parameter space where the likelihood is flat is possible for the sampler to reach. In the App. Fig. 15 we demonstrate that sufficiently proper posteriors are achieved, for the minimally informative prior at the experimentally possible, but certainly challenging data set of . However, with the uniform prior one needs an impossible data quality of . Hence, the minimally informative increases the range of acceptable data for this CRN roughly a 50-fold. Using only the minimally informative prior (black posterior) produces for and visually-visible improper marginal posteriors for (see Fig. 7 and 8).
FIG. 7.
For the more complex CRN2 the minimally informative prior is necessary even for data sets of of unrealistic high quality such as . a, Posteriors based on uniform (green) and minimally informative priors (black) are compared. The clearly non-Gaussian-shaped marginal posteriors plus those concerning are plotted in a, all other, rather Gaussian marginal posteriors, are shown in b. The insets a1, a2 visualize the posterior without the log transformation. The black posterior is equipped with the minimally informative . The green posterior is based on the uniform . The flat (green) posterior for and , creates what appears to be a exponential increase for and . a3, demonstrates the positive linear correlation contained in . Deviations of both parameters from the corresponding true value can be compensated to some extent by the other parameter. Note that the sampling box for the posterior samples of updated from the minimal information is more than an order of magnitude larger than the range of the simulation box used when the uniform is used. This disadvantages the posterior based on the minimally informative prior but demonstrates the larger robustness of the minimally informative prior. The posteriors derived from the uniform are sampled by and then mapped to the (, )–space.
FIG. 8. For CRN2, the minimally informative prior enables inference for about two orders of magnitude lower data quality. Adding information by diffusion limits and vague bias towards non-cooperativity allows us to work with three orders of magnitude lower data quality.
The data sets simulated from CRN2 (Fig. 2) are analyzed The color black refers in all plots to based on the minimal informative prior. Blue corresponds to assuming diffusion-limited binding. Red an additional assumed vague prior on the cooperativity of the binding and unbinding rates. Magenta less vague no-cooperativity assumption. a, The true values of K are indicated by the blue lines. Posterior for for the minimally informative prior, minimally informative with upper limits and with an added vague no-cooperativity assumption. For visual clarity, we suppress , , in the main plot but add sub panels which display the corresponding posteriors. Note that these parameters are only slightly influenced by the priors, and even without the priors, the posterior is peaking Gaussian-like with some skewness. b, The RMSE of the log space of the chemical rates is plotted vs. for the median (solid curve) and the marginal peak (dashed curve) for different prior assumptions. c, The Frobenius norm of all of the covariance matrix of the samples of
G. Effects of combining the theoretical diffusion limit with vague non-cooperativity assumptions
In Fig. 8 a we display for the smallest tested data quality with the minimally informative prior and with the additional (vague and hard) constrains discussed below. Using only the minimally informative prior (black posterior) produces for and not sufficiently proper marginal posteriors. The first visually proper appears with (App. Fig. 15 a). From the discussion before, it is clear that the likelihood for is only sufficiently practical identifiable that the improperness of is not detected visually even though there is no reason to assume that it is not there if one would sample from much larger simulation boxes. Due to the complexity of the inference problem we employ the additional assumptions.
1. The vague no-cooperativity assumption increases the accuracy and decreases the uncertainty of the inference.
If one applies strict upper diffusion limits for , and (Fig. 8, blue posterior and curves) one gains a proper for in the corresponding dimensions and also the other dimensions become sufficiently proper. Adding further vague prior assumptions , , and , which regularizes gently towards a non-cooperative CRN (Fig 8 red posterior), reduces the mean-distance between and and reduces the uncertainty (Fig. 8 c). Note, that a less-vague no-cooperativity assumption , reduces the Frobenius norm and RMSE further (see 8 b-c). The RMSE (Fig. 8 b) shows that at the RMSE transitions to asymptote while for the practical non-identifiability problem combined with prior assumptions influence the posterior. Above , the posteriors equipped with priors with diffusion limit produce similar RMSEs, unless the less vague cooperativity assumptions (Fig. 8 b magenta curves and posterior) are used. For the less vague prior the RMSE converges onto the curves of the other two prior assumptions (blue and red curve) around . In contrast, the pure minimally informative prior has different RMSEs (Fig. 8 b black curve) for each data set. This shows that the vague no-cooperativity assumptions lost their influence on the RMSE, while the diffusion still influences the RMSE.
The Frobenius norm of the covariance matrix of shows (Fig. 8 c) that enforced upper diffusion limits (blue, red, and magenta curve) still add information and reduce the uncertainty of . Hence, even for data qualities of , an ML inference would ignore relevant information to reduce the uncertainty of the inference. The Frobenius norm of the posteriors based on the pure minimally informative prior without additional assumptions transitions at to the -scaling.
To summarize, with minimally informative prior with diffusion limits (Fig. 8), one can make inferences with more than 103 times smaller per time trace compared to Bayesian inferences with the uniform prior (Fig. 7) or ML/MAP inferences. Note that theoretical data qualities of are beyond experimentally achievable data qualities. The added vague non-cooperativity prior contribute information to the posterior approximately until as judged by the RMSE and Frobenius norm. For the less vague prior the gain of relevant information lasts for values even higher than as judged by the Frobenius norm, but the RMSE are roughly the same.
2. To what extent can channel properties be assessed against the bias of the no-cooperativity prior?
We test for the different no-cooperativity priors, what typical data quantity is needed such that supports positive cooperativity (defined as Sec. V E). One could also ask at what point becomes the bias of the additional soft coupling by the prior towards no-cooperativity detrimental to the inference because the DGP (Fig. 2) is cooperative with . There are essentially two categories (negative and positive ) and the infinitesimal thin (green) line in between with no–cooperativity . Let be our measure of coorperativity. If
| (25) |
the data supports positive cooperativity Notably, if , also a negative cooperativity model is plausible just as positive cooperativity. Following the Bayesian paradigm, we are not looking for any binary significant test result, but embrace the continuous aspect of the question at hand. The inequality 25 is fullfilled if the posterior median (solid lines) holds . We contrast the median with HDCIs that tell what is the smallest most probable interval. For skewed posteriors (because of uninformative data), HDCIs might indicate a different cooperativity model than the median. Note that for small we work in a regime where frequentist testing would likely not produce significant results.
For the pure minimally informative prior (Fig. 9 a), and the posterior flucuates between a weak indication based on the median (solid line) that there is positive cooperativity and that both models are equally plausible. In particular, the median is biased towards too small values of . The 0.5-HDCI (lower limit dashed black lines) is almost entirely smaller than . To have a unbiased median and a 0.5-HDCI that additionally supports qualitatively positive cooperativity one needs at least . In contrast, working with the uniform requires .
FIG. 9. The prediction of positive cooperativity (accelerated unbinding) of the unbinding of the second ligand gains certainty with vaguely informative prior assumptions with a bias towards no-cooperativity.
On the x-axis we plot as a proxy for the information content in the data originating from CRN2. The dashed lines indicate the lower limit of the 0.5-HDCI, and the solid lines are the medians. The color corresponds to the prior assumptions with more information added from left to right. The set of three [0.99,0.5,0.3]-HDCIs is plotted in each panel. As a guide to the eye to discern the gain of certainty that there is negative cooperativity and the reduction of bias, we replot the median and lower limit of the 0.5-HDCI from the previous panel. For visual clearity we supress the black lines in panel d. a, minimally informative prior. b With an additional diffusion limit assumption. c With additional vague no-cooperativity assumption , , and . d With an additional less vague no-cooperativity assumption , . A standard deviation of these priors corresponds approximately to 3.1 relative deviation of the corresponding parameters. Further, and .
One may ask where the bias towards too small values for comes from. The normal approximations used to derive the likelihood [38] is justified given the scale of . Thus, we suspect the bias to originate from correlations and a strong practical non-identifiability problem in the likelihood/posterior. Imagine a strong positive correlation after the inference between, e.g., and such that one can predict very accurately from the scale of . The limiting extreme case of such a correlation between two parameters is realized in the structurally non-identifiable problem of the linear birth-death model (Eq. 1). Giving a log-uniform prior would result in having a log-uniform prior. Supplying to both parameters log-uniform priors results – after the inference – in an effect of the combined prior on the posterior as and equivalently . An anticorrelation between the parameters would eliminate the effect of the prior.
Adding the diffusion limit to the posterior (Fig. 9b) extends the region of a visually sufficiently proper posterior (Fig. 8) at least to and decreases uncertainty (difference between black and blue dashed line). But an effect of the capabilities of the median to predict qualitatively positive cooperativity is small or not existent. While not necessarily indicating a wrong model, the median is typically undecided and thus biased to too small values. Hence, adding the diffusion limit reduces uncertainty of but does not help to answer the fine-grained question of cooperativity, unless one works with unrealistic data qualities (see uncertainty Fig. 9b above ).
This changes when adding the vague no-cooperativity prior (Fig. 9 c), which by its definition biases more around the green line. The median and the HDCIs (see, Fig. 9 c the difference between red and blue lines) are shifted towards against the bias of the Cauchy prior acting on . The median now always indicates positive cooperativity. The 0.5-HDCI is, roughly speaking, undecided but much less biased without the vague regularization.
We show in Fig. 9d the effect of decreasing the variance of the prior for the ratios (increasing the bias towards non-cooperatively of the binding and unbinding rates) on the one hand between and and on the other hand between and , while the cooperativity priors for and and and , remain the same. This structure vaguely incorporates the prior knowledge that the vertical Markov transitions (Fig. 2 c) in the CRN represent changes in the protein, which might alter the binding and unbinding rates to some amount. A further shift of the lower limit of the 0.5-HDCI (dashed lines) and the median (dotted lines) can be observed. For realistic PC data quality assumptions the prior combining diffusion limits with less vague non-cooperativity assumptions performs the strongest but also supposes the most. Only for high and unrealistic data quality the posteriors without the vague non-cooperativity assumptions (black) seem to have a smaller bias to smaller negative cooperativity.
VI. CONCLUSION
Bayesian inference offers efficacious remedies for practical non-identifiability problems in HMM inference thereby allowing parameter uncertainty quantification for finite data. Nevertheless, pathologies of the likelihood also pose challenges for Bayesian inference.
If little about the actual values of some parameters is known a priori, we show that minimally informative priors are crucial to expand the range of acceptable data quality. They attempt to make posteriors as sensitive to the data as possible, thereby also alleviating practical non-identifiability pathologies in HMMs. The suggested minimally informative prior increases accuracy and decreases uncertainty compared to a uniform prior.
Any prior dominates the posterior in the regions of a constant likelihood value (the essence of non-identifiability). The bias of the uniform prior to larger inverse dwell times or chemical rates combines in an unfortunate way with the practical non-identifiability problem of the likelihood itself. In contrast, the log-uniform part of the minimally informative prior puts equal statistical weight on each decade and thus alleviates this problem. It would also alleviate the problems mentioned in [35].
Notably, we show that the usually arbitrarily chosen simulation box limits determine the posterior on a relevant scale as soon as the simulation box is large enough given that one uses improper prior distributions. The minimally informative prior desensitizes the posterior concerning the sampling box limits. Only under rare conditions if the posterior has a peak close to the true values, that is multiple orders of magnitude higher than the purely prior-dominated parts, then this problem vanishes. This would make it possible to ignore the strictly prior-dominated parts. However, often, the peak will be less dominant. Importantly, if one uses the minimally informative prior for complex CRNs with a high dimensional parameter space, it is much simpler for the adaptive HMC sampler to produce well-mixing (converging) parameter chains, i.e., the samples indicate that the typical set [93] of the posterior was sampled.
We show that, unfortunately, for typical data qualities and quantities and realistic CRNs, further objective or subjective assumptions are necessary to obtain an interpretable and sufficiently proper posterior to overcome the challenges from the practical non-identifiability.
A solution to make the posterior proper is to apply meaningful limits to the relevant parameter subset of the sampling box, thereby reducing the uncertainty. The solution is objective if the limits can be theoretically derived (or are rooted in the physical properties of the molecules). Herein, it is shown that this information fusion from data and prior knowledge creates meaningful inferences with the lowest tested data quality, even for the most complex tested CRN. Nevertheless, derived theoretical limits might only sometimes be at hand, or even after their application, practical non-identifiability problems might remain in parameter dimensions where the upper limits do not apply. Herein, an additional vaguely informative prior on the ratio of some rates -a hyperparameter corresponding to the cooperativity of ligand binding and respective unbinding - is applied. Combing these objective and common sense (biochemical) prior assumptions deliver the best inferences in terms of RMSE and uncertainty of the posterior. This additional prior biases the CRN gently towards CRNs where ligand binding and unbinding of the different channel subunits occur independently but still allow for positive and negative cooperativity over orders of magnitude (depending on the choice of hyperparameters). Hence it is a much less radical assumption compared to the commonly used non-cooperativity assumption [87, 88]. Not using such a prior would mean that one is willing to accept a priori any order of magnitude of cooperativity effects to occur, which is against commonbiochemical experience - i.e. experience-based priors. Thus, extreme effects are only considered if the data is very certain about them. Thus, this prior is an Occam's razor.
Using this prior, even without the physically motivated upper sampling box limits, renders the posterior always proper at least in the relevant dimensions of the parameter space since the prior itself is proper. Notably, using these prior assumptions, one can learn from the HMM inference about negative cooperativity within the CRN with at least 103 times smaller data sets than with plain uniform prior assumptions or ML inferences. The allowed reduction of the data quality by a thousandfold is a prerequisite for inferring HMMs of this complexity scale with real world data.
One could also apply this technique to heteromeric proteins or across homomeric proteins containing mutated binding sites [94–96] if the scale of the differences between binding pockets can be coarsely estimated a priori. The more coarse the a priori estimate is the heavier should the tails of the regularization prior be. A Cauchy prior on the logspace provides a heavier tail but is still a proper prior. A detailed study of the different possibilities is out of the scope of this paper.
In a summary, Bayesian inferences provides flexible tools to accommodate for the shortcomings of ML inferences due to omnipresent practical non-identifiability problems of the likelihood. Careful prior elicitation by being minimally informative where one has absolutely no information on the scale of parameters, but being vaguely informative where there is some physical common sense knowledge about some parameters and very informative when there is objective information such as a theoretical upper bound is key get the most of biophysical meaningful HMMs. Using this prior elicitation approach, we where able to obtain meaningful biological insight with a thousand fold lower data quality.
Supplementary Material
ACKNOWLEDGMENTS
The authors are grateful to E. Schulz for designing software to simulate channel activity and to Th. Eick for performing simulations. This project was funded by the German Research Foundation (DFG) within Research Group FOR 2518 DynIon (project P2). F. Paul acknowledges funding from the Yen PostDoctoral Fellowship in Interdisciplinary Research and from the National Cancer Institute of the National Institutes of Health (NIH) through Grant CAO93577. M. Habeck acknowledges the Carl Zeiss Foundation funding within the program “CZS Stiftungsprofessure.” We want to thank M. Bücker for helping with the computer cluster at Friedrich Schiller University and Jena and K. Benndorf for their comments on the manuscript.
Contributor Information
Jan L. Münch, Institute of Physiology II, Jena University Hospital, Friedrich Schiller University, Jena 07743, Germany
Ralf Schmauder, Institute of Physiology II, Jena University Hospital, Friedrich Schiller University, Jena 07743, Germany.
Fabian Paul, Department of Biochemistry and Molecular Biology, University of Chicago, Chicago, United States.
Michael Habeck, Microscopic Image Analysis Group, Jena University Hospital, Friedrich Schiller University, Jena 07743, Germany.
References
- [1].Anderson D. F. and Kurtz T. G., Continuous time markov chain models for chemical reaction networks, in Design and analysis of biomolecular circuits: … (Springer, 2011) pp. 3–42. [Google Scholar]
- [2].Hille B., Ionic channels in excitable membranes. current problems and biophysical approaches, Biophysical journal 22, 283 (1978). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [3].Colquhoun D., Hawkes G. A., and Bernard K., On the stochastic properties of single ion channels, P. of the Roy. Soc. of London. Series B. Biological Sciences 211, 205 (1981). [DOI] [PubMed] [Google Scholar]
- [4].Loo D., Hazama A., Supplisson S., TuRK E., and Wright E. M., Relaxation kinetics of the na+/glucose cotransporter., Proceedings of the National Academy of Sciences 90, 5767 (1993). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [5].Eskandari S., Wright E., and Loo D., Kinetics of the reverse mode of the na+/glucose cotransporter, The Journal of membrane biology 204, 23 (2005). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [6].George A. and Zuckerman D. M., From average transient transporter currents to microscopic mechanism–a bayesian analysis, bioRxiv, 2023 (2023). [DOI] [PubMed] [Google Scholar]
- [7].Milescu L. S., Yildiz A., Selvin P. R., and Sachs F., Maximum likelihood estimation of molecular motor kinetics from staircase dwell-time sequences, Biophysical journal 91, 1156 (2006). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [8].Müllner F. E., Syed S., Selvin P. R., and Sigworth F. J., Improved hidden markov models for molecular motors, part 1: basic theory, Biophysical journal 99, 3684 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [9].Syed S., Müllner F. E., Selvin P. R., and Sigworth F. J., Improved hidden markov models for molecular motors, part 2: extensions and application to experimental data, Biophysical journal 99, 3696 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [10].Chodera J. D., Elms P., Noe F., Keller B., Kaiser C. M., Ewall-Wice A., Marqusee S., Bustamante C., and Hinrichs N. S., Bayesian hidden markov model analysis of single-molecule force spectroscopy: Characterizing kinetics under measurement uncertainty (2011), arXiv:1108.1430 [cond-mat.stat-mech].
- [11].Keller B. G., Kobitski A., Ja?schke A., Nienhaus G. U., and Noe? F., Complex rna folding kinetics revealed by single-molecule fret and hidden markov models, Journal of the American Chemical Society 136, 4534 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [12].Rosales R. A., Fitzgerald W. J., and Hladky S. B., Kernel estimates for one-and two-dimensional ion channel dwell-time densities, Biophysical journal 82, 29 (2002). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [13].Qin F. and Li L., Model-based fitting of single-channel dwell-time distributions, Biophysical journal 87, 1657 (2004). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [14].Shelley C. and Magleby K. L., Linking exponential components to kinetic states in markov models for single-channel gating, The Journal of general physiology 132, 295 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [15].Baum L. E. and Petrie T., Statistical inference for probabilistic functions of finite state markov chains, The annals of mathematical statistics 37, 1554 (1966). [Google Scholar]
- [16].Rabiner L. R., A tutorial on hidden Markov models and selected applications in speech recognition, Proc. of the IEEE 77, 257 (1989). [Google Scholar]
- [17].Chung S.-H., Moore J. B., Xia L., Premkumar L., and Gage P. W., Characterization of single channel currents using digital signal processing techniques based on hidden markov models, Philos. T. of the Roy. Soc. of Lond. Series B Bio. Sci. 329, 265 (1990). [DOI] [PubMed] [Google Scholar]
- [18].Albertsen A. and Hansen U.-P., Estimation of kinetic rate constants from multi-channel recordings by a direct fit of the time series, Biophysical journal 67, 1393 (1994). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [19].Qin F., Auerbach A., and Sachs F., Hidden Markov Modeling for Single Channel Kinetics with Filtering and Correlated Noise, Biophys J. 79, 1928 (2000). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [20].de Gunst M. M., Künsch H., and Schouten J., Statistical analysis of ion channel data using hidden markov models with correlated state-dependent noise and filtering, Journal of the American Statistical Association 96, 805 (2001). [Google Scholar]
- [21].Rosales R., Stark J. A., Fitzgerald W. J., and Hladky S. B., Bayesian restoration of ion channel records using hidden markov models, Biophysical journal 80, 1088 (2001). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [22].Venkataramanan L. and Sigworth F., Applying hidden Markov models to the analysis of single ion channel activity, Biophys J. 82, 1930 (2002). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [23].Rosales R. A., MCMC for hidden Markov models incorporating aggregation of states and filtering, Bull. Math. Biol. 66, 1173 (2004). [DOI] [PubMed] [Google Scholar]
- [24].Kinz-Thompson C. D. and Gonzalez R. L. Jr, Increasing the time resolution of single-molecule experiments with bayesian inference, Biophysical journal 114, 289 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [25].Kilic Z., Sgouralis I., and Pressé S., Generalizing hmms to continuous time for fast kinetics: hidden markov jump processes, Biophysical journal 120, 409 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [26].Kilic Z., Sgouralis I., Heo W., Ishii K., Tahara T., and Pressé S., Extraction of rapid kinetics from smfret measurements using integrative detectors, Cell Reports Physical Science 2 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [27].Saurabh A., Fazel M., Safar M., Sgouralis I., and Pressé S., Single-photon smfret. i: Theory and conceptual basis, Biophysical Reports 3, 100089 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [28].Ray K. K., Kinz-Thompson C. D., Fei J., Wang B., Lin Q., and Gonzalez R. L., Entropic control of the free-energy landscape of an archetypal biomolecular machine, Proceedings of the National Academy of Sciences 120, e2220591120 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [29].Kuschke S., Thon S., Sattler C., Schwabe T., Benndorf K., and Schmauder R., camp binding to closed pacemaker ion channels is cooperative, Proceedings of the National Academy of Sciences 121, e2315132121 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [30].Kienker P., Equivalence of aggregated markov models of ion-channel gating, P. of the Roy. Soc. of London. B. Biological Sciences 236, 269 (1989). [DOI] [PubMed] [Google Scholar]
- [31].Vajda S., Godfrey K. R., and Rabitz H., Similarity transformation approach to identifiability analysis of nonlinear compartmental models, Mathematical biosciences 93, 217 (1989). [DOI] [PubMed] [Google Scholar]
- [32].Audoly S., Bellu G., D’Angio L., Saccomani M. P., and Cobelli C., Global identifiability of nonlinear models of biological systems, IEEE Transactions on biomedical engineering 48, 55 (2001). [DOI] [PubMed] [Google Scholar]
- [33].Raue A., Kreutz C., Maiwald T., Bachmann J., Schilling M., Klingmüller U., and Timmer J., Structural and practical identifiability analysis of partially observed dynamical models by exploiting the profile likelihood, Bioinformatics 25, 1923 (2009). [DOI] [PubMed] [Google Scholar]
- [34].Middendorf T. R. and Aldrich R. W., Structural identifiability of equilibrium ligand-binding parameters, Journal of General Physiology 149, 105 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [35].Raue A., Kreutz C., Theis F. J., and Timmer J., Joining forces of bayesian and frequentist methodology: a study for inference in the presence of non-identifiability, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 371, 20110544 (2013). [DOI] [PubMed] [Google Scholar]
- [36].Middendorf T. R. and Aldrich R. W., The structure of binding curves and practical identifiability of equilibrium ligand-binding parameters, J. of General Physiology 149, 121 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [37].Delbrück M., Statistical fluctuations in autocatalytic reactions, The Journal of Chemical Physics 8, 120 (1940). [Google Scholar]
- [38].Münch J. L., Paul F., Schmauder R., and Benndorf K., Bayesian inference of kinetic schemes for ion channels by kalman filtering, Elife 11, e62714 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [39].Gillespie D. T., The chemical langevin equation, The Journal of Chemical Physics 113, 297 (2000). [Google Scholar]
- [40].Moffatt L., Estimation of Ion Channel Kinetics from Fluctuations of Macroscopic Currents, Biophys J. 93, 74 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [41].Hines K. E., Bankston J. R., and Aldrich R. W., Analyzing Single-Molecule Time Series via Nonparametric Bayesian Inference, Biophys J. 108, 540 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [42].Sgouralis I. and Pressé S., An introduction to infinite hmms for single-molecule data analysis, Biophysical journal 112, 2021 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [43].Sgouralis I. and Pressé S., Icon: an adaptation of infinite hmms for time traces with drift, Biophysical journal 112, 2117 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [44].Ghahramani Z., Learning dynamic bayesian networks, in International School on Neural Networks (Springer, 1997) pp. 168–197. [Google Scholar]
- [45].Bechhoefer J., Hidden markov models for stochastic thermodynamics, New Journal of Physics 17, 075003 (2015). [Google Scholar]
- [46].Kalman R. E., A new approach to linear filtering and prediction problems, J. of basic Engineering 82, 35 (1960). [Google Scholar]
- [47].Kalman R. E. and Bucy R. S., New results in linear filtering and prediction theory, Journal of BAsic Enginheering 83, 95 (1961). [Google Scholar]
- [48].Komorowski M., Finkenstädt B., Harper C. V., and Rand D. A., Bayesian inference of biochemical kinetic parameters using the linear noise approximation, BMC Bioinformatics 10, 343 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [49].Fink M. and Noble D., Markov models for ion channels: versatility versus identifiability and speed, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 367, 2161 (2009). [DOI] [PubMed] [Google Scholar]
- [50].Fearnhead P., Giagos V., and Sherlock C., Inference for reaction networks using the linear noise approximation, Biometrics 70, 457 (2014). [DOI] [PubMed] [Google Scholar]
- [51].Folia M. M. and Rattray M., Trajectory inference and parameter estimation in stochastic models with temporally aggregated data, Statistics and Computing 28, 1053 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [52].Myung I. J., Tutorial on maximum likelihood estimation, Journal of mathematical Psychology 47, 90 (2003). [Google Scholar]
- [53].Casella G. and Berger R. L., Statistical inference (Cengage Learning, 2021). [Google Scholar]
- [54].Gelman A., Carlin J. B., Stern H. S., and Rubin D. B., Bayesian data analysis (Chapman and Hall/CRC, 1995). [Google Scholar]
- [55].Joshi M., Seidel-Morgenstern A., and Kremling A., Exploiting the bootstrap method for quantifying parameter confidence intervals in dynamical systems, Metabolic engineering 8, 447 (2006). [DOI] [PubMed] [Google Scholar]
- [56].Ball F., Cai Y., Kadane J., and O’hagan A., Bayesian inference for ion–channel gating mechanisms directly from single–channel recordings, using markov chain monte carlo, Proceedings of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences 455, 2879 (1999). [Google Scholar]
- [57].Gin E., Falcke M., Wagner L. E., Yule D. I., and Sneyd J., Markov chain Monte Carlo fitting of single-channel data from inositol trisphosphate receptors, J. of Theoretical Biology 257, 460 (2009). [DOI] [PubMed] [Google Scholar]
- [58].Siekmann I., Wagner L. E., Yule D., Fox C., Bryant D., Crampin E. J., and Sneyd J., MCMC Estimation of Markov Models for Ion Channels, Biophys J. 100, 1919 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [59].Siekmann I., Sneyd J., and Crampin E. J., MCMC Can Detect Nonidentifiable Models, Biophys J. 103, 2275 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [60].Hines K. E., A Primer on Bayesian Inference for Biophysical Systems, Biophys J. 108, 2103 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [61].Ball F., MCMC for Ion-Channel Sojourn-Time Data: A Good Proposal, Biophys J. 111, 267 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [62].Wieland F.-G., Hauber A. L., Rosenblatt M., Tönsing C., and Timmer J., On structural and practical identifiability, Current Opinion in Systems Biology 25, 60 (2021). [Google Scholar]
- [63].Jeffreys H., An invariant form for the prior probability in estimation problems, Proceedings of the Royal Society of London. Series A. Mathematical and Physical Sciences 186, 453 (1946). [DOI] [PubMed] [Google Scholar]
- [64].Kass R. E. and Wasserman L., Formal rules for selecting prior distributions: A review and annotated bibliography, Journal of the American Statistical Association 435, 1343 (1996). [Google Scholar]
- [65].Yang R. and Berger J. O., A catalog of noninformative priors, Vol. 2 (Institute of Statistics and Decision Sciences, Duke University Durham, NC, USA, 1996). [Google Scholar]
- [66].Consonni G., Fouskakis D., Liseo B., and Ntzoufras I., Prior distributions for objective bayesian analysis, Bayesian Analysis 13, 627 (2018). [Google Scholar]
- [67].Jaynes E. T., Prior probabilities, IEEE Transactions on systems science and cybernetics 4, 227 (1968). [Google Scholar]
- [68].Gelman A. and Rubin D. B., A single series from the gibbs sampler provides a false sense of security, Bayesian statistics 4, 625 (1992). [Google Scholar]
- [69].Duane S., Kennedy A. D., Pendleton B. J., and Roweth D., Hybrid monte carlo, Physics letters B 195, 216 (1987). [Google Scholar]
- [70].Neal R. M. and Neal R. M., Monte carlo implementation, Bayesian learning for neural networks, 55 (1996). [Google Scholar]
- [71].Hoffman M. D. and Gelman A., The no-u-turn sampler: adaptively setting path lengths in hamiltonian monte carlo., J. of Machine Learning Research 15, 1593 (2014). [Google Scholar]
- [72].Gelman A., Lee D., and Guo J., Stan: A probabilistic programming language for bayesian inference and optimization, J. of Educational and Behavioral Statistics 40, 530 (2015). [Google Scholar]
- [73].Betancourt M., A conceptual introduction to hamiltonian monte carlo (2018), arXiv:1701.02434 [stat.ME].
- [74].Fisher R. A., Theory of statistical estimation, Mathematical Proceedings of the Cambridge Philosophical Society 22, 700–725 (1925). [Google Scholar]
- [75].Watanabe S., Almost all learning machines are singular, in 2007 IEEE Symposium on Foundations of Computational Intelligence (IEEE, 2007) pp. 383–388. [Google Scholar]
- [76].Browning A. P., Warne D. J., Burrage K., Baker R. E., and Simpson M. J., Identifiability analysis for stochastic differential equation models in systems biology, Journal of the Royal Society Interface 17, 20200652 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [77].Jahnke T. and Huisinga W., Solving the chemical master equation for monomolecular reaction systems analytically, J. Math. Biol. 54, 1 (2007). [DOI] [PubMed] [Google Scholar]
- [78].Lam N. N., Docherty P. D., and Murray R., Practical identifiability of parametrised models: A review of benefits and limitations of various approaches, Mathematics and Computers in Simulation 199, 202 (2022). [Google Scholar]
- [79].Gelman A. and Yao Y., Holes in bayesian statistics, Journal of Physics G: Nuclear and Particle Physics 48, 014002 (2020). [Google Scholar]
- [80].Nicolai C. and Sachs F., Solving ion channel kinetics with the qub software, Biophysical Reviews and Letters 8, 191 (2013). [Google Scholar]
- [81].Trendelkamp-Schroer B. and Noé F., Efficient bayesian estimation of markov model transition matrices with given stationary distribution, The Journal of chemical physics 138, 04B612 (2013). [DOI] [PubMed] [Google Scholar]
- [82].Assoudou S. and Essebbar B., A bayesian model for binary markov chains, International Journal of Mathematics and Mathematical Sciences 2004, 421 (2004). [Google Scholar]
- [83].Assoudou S. and Essebbar B., A bayesian model for markov chains via jeffrey’s prior, Communications in Statistics - Theory and Methods 32, 2163 (2003). [Google Scholar]
- [84].Smoluchowski M. v., Versuch einer mathematischen theorie der koagulationskinetik kolloider lösungen, Zeitschrift für physikalische Chemie 92, 129 (1918). [Google Scholar]
- [85].van Holde K., A hypothesis concerning diffusion-limited protein–ligand interactions, Biophys. Chem 101, 249 (2002). [DOI] [PubMed] [Google Scholar]
- [86].Newman M. E., Power laws, pareto distributions and zipf’s law, Contemporary physics 46, 323 (2005). [Google Scholar]
- [87].Lape R., Colquhoun D., and Sivilotti L. G., On the nature of partial agonism in the nicotinic receptor superfamily, Nature 454, 722 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [88].Khadra A., Yan Z., Coddou C., Tomić M., Sherman A., and Stojilkovic S. S., Gating properties of the p2x2a and p2x2b receptor channels: experiments and mathematical modeling, Journal of General Physiology 139, 333 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [89].Koren G., Liman E. R., Logothetis D. E., Nadal-Ginard B., and Hess P., Gating mechanism of a cloned potassium channel expressed in frog oocytes and mammalian cells, Neuron 4, 39 (1990). [DOI] [PubMed] [Google Scholar]
- [90].Zagotta W. N., Hoshi T., and Aldrich R. W., Shaker potassium channel gating. iii: Evaluation of kinetic models for activation., The Journal of general physiology 103, 321 (1994). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [91].Silva J. and Rudy Y., Subunit interaction determines i ks participation in cardiac repolarization and repolarization reserve, Circulation 112, 1384 (2005). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [92].Beattie K. A., Hill A. P., Bardenet R., Cui Y., Vandenberg J. I., Gavaghan D. J., de Boer T. P., and Mirams G. R., Sinusoidal voltage protocols for rapid characterisation of ion channel kinetics, The Journal of physiology 596, 1813 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [93].Cover T. M., Elements of information theory (John Wiley & Sons, 1999). [Google Scholar]
- [94].Nache V., Wongsamitkul N., Kusch J., Zimmer T., Schwede F., and Benndorf K., Deciphering the function of the cngb1b subunit in olfactory cng channels, Scientific reports 6, 29378 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [95].Wongsamitkul N., Nache V., Eick T., Hummert S., Schulz E., Schmauder R., Schirmeyer J., Zimmer T., and Benndorf K., Quantifying the cooperative subunit action in a multimeric membrane receptor, Scientific Reports 6, 20974 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [96].Schirmeyer J., Hummert S., Eick T., Schulz E., Schwabe T., Ehrlich G., Kukaj T., Wiegand M., Sattler C., Schmauder R., et al. , Thermodynamic profile of mutual subunit control in a heteromeric receptor, Proceedings of the National Academy of Sciences 118, e2100469118 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [97].Tveito A., Lines G. T., Edwards A. G., and McCulloch A., Computing rates of markov models of voltage-gated ion channels by inverting partial differential equations governing the probability density functions of the conducting and non-conducting states, Mathematical Biosciences 277, 126 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [98].Hines K. E., Middendorf T. R., and Aldrich R. W., Determination of parameter identifiability in nonlinear biophysical models: A Bayesian approach, The J. of General Physiology 143, 401 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [99].Milescu L. S., Akk G., and Sachs F., Maximum Likelihood Estimation of Ion Channel Kinetics from Macroscopic Currents, Biophys J. 88, 2494 (2005). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [100].Kusch J., Biskup C., Thon S., Schulz E., Nache V., Zimmer T., Schwede F., and Benndorf K., Interdependence of Receptor Activation and Ligand Binding in HCN2 Pacemaker Channels, Neuron 67, 75 (2010). [DOI] [PubMed] [Google Scholar]
- [101].Kreutz C., Raue A., Kaschek D., and Timmer J., Profile likelihood in systems biology, The FEBS journal 280, 2564 (2013). [DOI] [PubMed] [Google Scholar]
- [102].Datta G. S. and Ghosh M., On the invariance of noninformative priors, The annals of Statistics 24, 141 (1996). [Google Scholar]
- [103].Berger J. O., Bernardo J. M., and Sun D., Overall objective priors, Bayesian Analysis 10, 189 (2015). [Google Scholar]
- [104].Gelman A., Rubin D. B., et al. , Inference from iterative simulation using multiple sequences, Statistical science 7, 457 (1992). [Google Scholar]
- [105].Brooks S. P. and Gelman A., General methods for monitoring convergence of iterative simulations, Journal of computational and graphical statistics 7, 434 (1998). [Google Scholar]
- [106].Vehtari A., Gelman A., Simpson D., Carpenter B., Bürkner P.-C., et al. , Rank-normalization, folding, and localization: An improved r for assessing convergence of mcmc, Bayesian Analysis 16, 667 (2021). [Google Scholar]
- [107].Vats D. and Knudson C., Revisiting the gelman–rubin diagnostic, Statistical Science 36, 518 (2021). [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.









