Skip to main content

This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

bioRxiv logoLink to bioRxiv
[Preprint]. 2025 May 5:2024.04.20.590387. [Version 2] doi: 10.1101/2024.04.20.590387

Using Bayesian priors to overcome non-identifiablility issues in Hidden Markov models

Jan L Münch 1, Ralf Schmauder 2,*, Fabian Paul 3, Michael Habeck 4
PMCID: PMC12247640  PMID: 40654915

Abstract

Hidden Markov models (HMMs) for biomolecules suffer from various forms of parameter non-identifiability. This poses severe challenges to both maximum likelihood and Bayesian inference. However, Bayesian inference offers effective means of overcoming these pathologies. We study the role of prior distributions in the face of practical parameter non-identifiability in Bayesian inference applied to prototypical patch clamp data of ligand-gated ion channels. We advocate the use of minimally informative priors, as they increase the accuracy and decrease the uncertainty of the inference. For complex HMMs, stronger prior assumptions are needed to render the posterior sufficiently proper. This can be achieved by confining the parameter space to physically motivated limits. Another beneficial assumption is finite cooperativity of ligand-binding and unbinding events, which introduces a bias towards non-cooperativity but still allows for a non-vanishing degree of cooperativity that is inferred from the data. Despite its vagueness, our prior renders the posterior sufficiently proper for all datasets that we considered without imposing the assumption of non-cooperativity. Combining all prior factors allows for meaningful inferences with a dataset of a thousand times lower quality.

I. INTRODUCTION

Time series data of various processes can be explained by continuous-time Markov Models (MM) [1]. In a biophysical context, the data could probe, e.g., the function of different proteins [29] or RNA folding kinetics [10, 11]. Assuming a well-mixed environment, these systems can be modeled by discrete states that interconvert via stochastic transitions, thereby defining a chemical reaction network (CRN). For example, states of a ligand-gated ion channel can be classified by conductance, dwell time, or the number of bound ligands [1114].

Biophysical experiments typically only observe partial and noisy data such that states with similar signal properties, e.g., conductance, are aggregated into signal classes. Therefore, hidden Markov models (HMMs) must be used to describe the data [15, 16]. A common use case is the analysis of single-molecule ion channel data [1723] but HMMs are also applied to other experiments [711, 2429].

Often, the mean signal observed in single-molecule and ensemble data is a linear projection of the full Markovian dynamics onto a lower dimensional observable. This can cause the data to be insensitive to the rates of specific subprocesses within the CRN, which complicates their biophysical interpretation. We argue that the dimensionality reduction and aggregation of states, in general, induces a varying degree of practical parameter non-identifiability even for simple CRNs. Experimental noise and limited signal bandwidth only increase the severity of non-identifiability issues. Even worse, the HMM might become structurally non-identifiable [3034].

Structural non-identifiability refers to models whose parameters cannot be inferred uniquely, even with an infinite amount of data [34]. For example, it might only be possible to infer algebraic combinations of parameters but not the parameters themselves. Instead practical non-identifiability is encountered when there is still a unique optimal parameter set, but it is impossible to collect enough data to reach a sufficiently low parameter uncertainty [33, 35, 36].

Whether a model is structurally or practically non-identifiable depends on the likelihood function, which can be derived from the chemical master equation (CME) in the case of MMs [37]. Here, we study a Bayesian filter [38] based on the Fokker-Planck approximation (FPa) of the CME [39], which preserves the crucial Markov property in a macroscopic signal [38]. The Bayesian filter extends the ideas of Moffat [40] to define a more general and realistic likelihood for ensemble patch clamp (PC) data. A complete HMM inference consists of both parameter estimation and model selection. In some cases, model selection for HMMs can be automated by inferring an infinite HMM [4143]. However, to the best of our knowledge, the infinite HMM [4143] only applies to single-molecule data analyzed by discrete-time HMMs but cannot be extended to ensemble data or continuous-time HMMs. Therefore, we assume a fixed CRN topology during parameter inference. Notably, the hidden variable of an HMM does not have to be discrete but can also be continuous [44, 45]. For example, Kalman filters [46, 47] can be used to approximate discrete HMMs [38, 40, 4851] and define a valid HMM [44, 45].

Parameter estimation via maximum likelihood (ML) [52, 53], profile likelihood [33, 35], maximum a posteriori (MAP), and Bayesian inference [54] suffer in different ways from practical non-identifiability. Limitations in the amount and quality of data (relative to the complexity of the investigated CRN) severely impair or even prohibit ML and MAP inferences [55]. The profile likelihood technique has better uncertainty quantification than ML [33, 35], but still assumes an asymptotic amount of data. A full Bayesian inference [20, 21, 23, 41, 5659] does not refer to an asymptotic limit in the amount of data and thus has a unique way to deal with parameter non-identifiability. In Bayesian statistics, unknown quantities are treated similarly to random variables [60, 61] in that probabilities express their uncertainty. The prior distribution encodes their uncertainty before analyzing the data. The result of a Bayesian analysis, the posterior distribution, represents the uncertainty of the unknowns in the light of the combined information encoded in the prior and the likelihood. Nevertheless, structural and practical non-identifiability also poses a challenge to Bayesian inference because these pathologies can result in improper posteriors [35, 62].

Our work focuses on the benefits and limitations of minimally informative and vaguely informative priors motivated by physical considerations in the presence of practical non-identifiability. We show that practical non-identifiability can be severely harmful when using uniform priors on the rate matrix of HMMs. In contrast, we suggest using a minimally informative prior [6366] inspired by Jeffreys [63] and Jaynes [67]. Minimally informative priors are designed to make posteriors as sensitive to the data as possible. We observe that the minimally informative prior increases the accuracy and surprisingly decreases the uncertainty of parameter inferences. Notably, the minimally informative prior will significantly impact the posterior for any plausible amount of data. We explain the origin of these observations by the presence of practical non-identifiability.

Further, we demonstrate that the uniform and minimally informative priors lead to improper posteriors that cannot be normalized due to the practical non-identifiability (in an infinite parameter space [35]). Thus, the minimally informative prior only alleviates the challenges arising from practical non-identifiability of HMMs by making the posterior sufficiently proper but does not fully resolve them. A definition of what a sufficiently proper posterior means will be given below. We show that rendering the posterior sufficiently proper is the best that can be achieved in HMM inference when using minimally rather than vaguely informative priors. Notably, the same holds for ML inference. Furthermore, the minimally informative prior drastically improves the convergence [68] of the Hamiltonian Monte Carlo sampler [6973]. Our results show that the limitations of the posterior observed in [35] are due to the use of a uniform prior.

Moving on to more complex HMMs, we show that eliminating the bias from a uniform prior does not solve non-identifiability issues. More information is needed, even if the HMM is structurally identifiable [62], for sufficiently proper posteriors. We present two techniques to achieve this. The first option is to enforce theoretically derived upper bounds, such as diffusion limits for binding rates. These restrict the regions in parameter space that contribute to a diverging normalization integral. This often renders the posterior sufficiently proper. However, hard theoretical limits are rarely available and might not apply to all parameters that suffer from non-identifiability. As an alternative or as an additional restraint, we suggest coupling each pair of binding rates and unbinding rates softly. These coupling terms bias the CRN towards non-cooperativity without enforcing it. Introducing these vaguely informative priors is more flexible than the common approach of setting parameters equal, which assumes strict non-cooperativity. The vaguely informative prior only defines the scale of plausible positive or negative logarithmic ratios, i.e., cooperativity in homomeric proteins. Hence, our approach infers how likely different degrees of cooperativity are compared to a non-cooperativity bias, which functions as Occam's razor. The most precise and accurate inferences are obtained if the minimally informative prior is combined with both additional prior assumptions. This combination of prior information allows for meaningful inferences with a thousand times poorer data quality for the most complex HMM that we studied. Notably, this reduction in the data quality that is necessary for meaningful HMM inference is crucial for CRNs of this complexity in the analysis of real-world PC data sets.

II. PARAMETER NON-IDENTIFIABILITY IN SIMPLE REACTION NETWORKS

Given time series data 𝒴T=y1,,yT of length T, and a probabilistic model in the form of a likelihood p𝒴TθPr𝒴Tθ, the ML approach [74] infers the unknown parameters θtrue by maximizing p𝒴Tθ over the parameter space Θ. For models with structurally identifiable parameters, θML converges in distribution to θtrue. The quantification of the uncertainty of the ML estimate θML for models that satisfy certain regularity conditions is discussed in Sec. II B. Unfortunately, HMMs do not satisfy these regularity conditions [75]. They are singular instead of regular statistical models. We indicate the possible consequences of singular models by the rate equation (RE) solutions of two toy kinetic models.

A. Structural parameter non-identifiability

In general, structurally non-identifiable models are characterized by submanifolds in Θ in which the likelihood is constant, even with infinitely many data [30, 62]. For the sake of argument, we only look at the RE solution from which an approximate likelihood can be derived. An example of a structurally non-identifiable model is a linear birth-death process characterized only by the mean number of bacteria Enbakt(t) in a well-stirred petri dish:

Enbakt(t)=n0exp(ξt)=n0expkbirth-kdeatht. (1)

Parameter pairs θ=kbirth,kdeath with constant difference ξkbirth,kdeath=kbirth-kdeath result in the same Enbakt(t). This model is structurally non-identifiable, because one cannot disentangle kbirth and kdeath based on Enbakt(t) alone. A likelihood derived only from Enbakt(t) would show the symmetry p𝒴Tθ=p𝒴Tξ. This implies that the likelihood is flat, p𝒴Tθ=const, along straight lines in Θ with intercept ξ. Hence, the ML estimator cannot converge to a normal distribution centered at θtrue. However, there is independent information in the higher-order statistical moments that renders the linear birth-death model structurally identifiable. By incorporating the information contained in varnbakt(t), which can be derived from the CME [76], one obtains a structurally identifiable model. Thus, structural non-identifiability can be caused by ignoring higher statistical moments of the data-generating process. Similarily, ignoring the Markov property of equilibrium fluctuations leads to structural non-identifiability in HMM inference as shown in [40].

B. Practical parameter non-identifiabilty

Structural identifiability is necessary, but not sufficient for successful ML inferences. For a finite amount of data, the HMM must also be practically identifiable, or, as we prefer to argue, it must be sufficiently practically identifiable. Let us clarify what we mean by this. The literature offers different definitions of practical identifiability vs. practical non-identifiability. Here, we follow the definitions of [33]. Likelihoods suffering from practical non-identifiability do not decay to zero (Fig. 1, blue curve for θ>θmax), but stretch out infinitely in regions of Θ (in one or multiple dimensions) for any finite amount of data [33]. This happens already in simple, partially observed CRNs.

FIG. 1. The severity of practical parameter non-identifiability depends on the relative height of the peak compared to the non-vanishing tails.

FIG. 1.

We sketch a one dimensional inference problem, e.g., a unknown chemical rate k. The red dashed line shows the ML inference or Laplace approximation of the posterior based on a uniform prior. Note that the ML inference (using the curvature) typically underscores the uncertainty (quicker decay of the red dashed line than the blue solid line), particularly for k>kML. Note that a prediction of the uncertainty based on the curvature at kML cannot detect these shortcomings, because any function can be approximated with a second-order Taylor expansion around extreme values.

Let us assume that we can record the occupation number nO(t) of state O at any frequency and without any measurement noise such that, for the sake of argument, the additional complications arising from noisy data are avoided. As an example for a practically non-identifiable likelihood consider a CRN with a rate kBAL that depends on ligand concentration L (or any other stimulus-dependent rate):

AkBALBkOBO (2)

for a finite number of channels. Assuming that the initial condition is nt0=nA,0,0 and that the experimental readout is y(t)EnO(t), the part of the general solution of the RE that is experimentally accessible is

EnO(t)=kOBkBAL-kOBnAexp-kBALt-kBALkBAL-kOBnAexp-kOBt+nA. (3)

Note that the solution of the CME for the initial condition n(t=0)=nA,0,0 is a multinomial distribution. To understand this, consider the case that only one molecule is in state A at t=0. For all t>0, the molecule has probability pO(t) to be in O, pB(t) to be in B and pA(t) to remain in A. If, instead, one has nA independent and identical molecules in state A at t=0, then each of them individually has the same probability pO(t), pB(t) and pA(t) at t>0. Hence, the distribution over all states is a multinomial distribution [77] that evolves over time. If only state O is observed, one can reduce the problem to

nOt~binomialpOt,nAwithpOt=EnOtnA. (4)

If none of the rates could be changed externally by varying L such that kBA(L)=const, then we would also face structural non-identifiability. For a ligand-dependent rate kBAL, we can run the experiments at different ligand concentrations L and thereby overcome structural non-identifiability. However, if we measure at two concentrations L1, L2 such that kBAL1 and kBAL2 and kOB are all different, but similar in magnitude, then we can still face a practical non-identifiability problem.

Practical parameter non-identifiability originates from the following phenomenon: Any combination of values for kBALj and kOB that satisfies

kBALjkOBorkOBkBALj (5)

will push the amplitude of one of the two exponential decays (Eq. 3) to zero. Even if experiments were run at many different ligand concentrations Lj, we could still find combinations of kBA and kOB that satisfy one of the conditions in Eq. 5 for all Lj. In regions of Θ where one of the conditions (Eq. 5) holds for all Lj, minor changes in kBA or kOB will hardly affect EnO(t). Note that the correct solution of the CME is a multinomial distribution given the initial conditions. Adding information about the entire distribution of nO(t) such as variance or skewness, etc. will not resolve the practical non-identifiability problem caused by the vanishing amplitude of one of the exponential decays. However, the multinomial model would improve the accuracy and quality of the uncertainty quantification.

This example is reminiscent of the common scenario for coupled CRNs in which only kj can be inferred if ki,truekj,true. However, here we do not consider a scenario in which the signal of the data-generating process is rate-limited. Instead, we assume that at least three different rates are at play, kBAL1, kBAL2, and kOB that have similar magnitude but non-identical values. Therefore, rate-limiting contributions do not exist at least for one combination of kBALi and kOB in the data-generating process. Nevertheless there are regions in parameter space Θ, far away from the true parameter values θtrue, in which one or the other rate is rate-limiting for the predictions of the model. The structure of the CRN together with the fact that the signal is generated by a linear projection have the potential to be rate-limited somewhere in Θ, independent of the true parameters of the data-generating process θtrue. Thus, for models such as nO(t)~binomialpO(t),nA(0), the likelihood will approach a non-vanishing constant due to rate-limiting effects in regions where kikj and hence become practically non-identifiable. See App. A for a discussion on the effects if states A, B are observed in isolation or simultaneously.

Structural identifiability is a binary property (a model is either structural identifiable or not), whereas practical identifiability is gradual (continuous) [78]. The likelihood function (Fig. 1 blue curve) which is proportional to the posterior pk𝒴T (for a uniform prior) indicates the continuous nature of practical non-identifiability. The maximum value of p𝒴Tk relative to the constant value in the tails specifies the degree of parameter identifiability. However, to classify models into practically identifiable or practically non-identifiable, one uses the confidence interval CI based on confidence level ζα:

CI=θΘlogp𝒴TθML/p𝒴Tθ<ζα. (6)

One defines the model as practical identifiabile if the confidence interval CI is a compact set. This holds if the subjectively defined threshold ζα (Fig. 1) [76] is larger than the asymptotic value of the likelihood p𝒴Tθflat (Fig. 1, black dashed line). For multi-parametric models, there will be multiple asymptotic values for the different directions in Θ. The interval kmin,kmax where the dashed black lines cross the likelihood profile (blue lines) (Fig. 1) is the largest asymmetric confidence interval that can be deduced by using the profile likelihood technique [33]. The data contain no information to distinguish values of k larger than kmax. However, the data are informative for values k<kmax and even k<kmin. Often, the profile of p𝒴Tθ might approach a non-vanishing constant value only asymptotically. Subjectively defining a threshold relative to the maximum value of the likelihood [33] is equivalent to choosing a significance level []. Note that ML does not infer the shape of p𝒴Tθ globally (Fig. 1 red dashed curve). ML estimates the shape based on the curvature of the likelihood at θML using the Fréchet-Darmois-Cramér-Rao bound theorem. Therefore, standard ML does not detect practical non-identifiability (Fig. 1 red dashed curve). The degree to which practical non-identifiability affects the parameter inference depends on, as discussed, the intrinsic properties of the MM (i.e. the number of states and their connectivity) and on the specifics of the experimental data such as the rank of the linear projection, the signal-to-noise ratio and the signal bandwidth.

Due to the challenges indicated by the two toy examples, HMM inference based on ML has several drawbacks compared to sampling from the posterior pθ𝒴T. First, one must take extra precautions against pathologies of p𝒴Tθ resulting in structual or practical non-identifiability, because ML does not reveal them [33]. Second, even if the model is structurally identifiable and sufficiently practically identifiable, the quality and quantity of the data are often insufficient to meet the implicit assumption that pθML can be approximated by a normal distribution, which is a requirement to justify the use of ML. For a comment on strategies to detect parameter non-identifiability see App. 1. Fortunately, Bayesian statistics can deal with these pathologies of the likelihood. Nevertheless structural and practical non-identifiability pose a challenge. They create regions in Θ where the prior dominates entirely the likelihood.

III. BAYESIAN INFERENCE IN A NUTSHELL

The Bayesian posterior

pθ𝒴TPrθ𝒴T=p𝒴Tθp(θ)p𝒴Tθp(θ)dθ (7)

is a probability distribution on Θ and combines the information encoded in the prior pθPr(θ) and the likelihood to quantify the uncertainty of θ. The posterior is called “proper”, if it can be normalized, which means the denominator in (Eq. 7) satisfies

p𝒴Tθpθdθ<. (8)

We introduce the terminology sufficiently proper to clarify the notion of practical parameter non-identifiability. Practical non-identifiability in combination with minimally informative priors, which are often improper, may result in posteriors that are improper in a strict sense. The blue curve in Fig. 1 illustrates this for a one-dimensional case and an improper uniform prior. The essential information making the posterior proper (Fig. 1) are the inconspicuous cutoff values of the uniform prior.

The higher the posterior the less sensitive is the inference to actual values of the cutoff. Thus, we define the ratio of the height of the posterior (based on a uniform prior) at the MAP estimate and its non-vanishing asymptotic value as

ψsupθpθ𝒴TpθMAP𝒴T=supθp𝒴Tθp𝒴TθMAP<1. (9)

One could subjectively define the posterior to be sufficiently proper, if ψ10-3, say, meaning that the peak of the posterior/likelihood towers at least by three orders of magnitude over the flat parts of the likelihood (in all unbounded directions in Θ). For ψ=0, the posterior is sufficiently proper and might even be strictly proper. For non-uniform priors and one-dimensional inferences, flat parts of the likelihood reveal themselves by a posterior that is proportional to the prior in that area.

In multi-parameter inference problems with non-uniform priors the situation is in general more complicated. The likelihood could approach a non-vanishing constant value on some unbounded subset ΘflatΘ of any shape. However, we will encounter the simpler case, that marginal posteriors are proportional to the prior in certain areas. This can be explained by assuming that the likelihood is flat for an unbounded set in the direction of parameter θi. Let us denote the parameters without θi by θi and also assume that the prior factorizes: p(θ)=pθipθi. Then

pθi𝒴TΘflatp𝒴Tθp(θ)dθipθiΘflatp𝒴Tθipθidθipθi (10)

holds for the marginal posterior locally. So changes in the posterior are proportional to changes in the prior along θi in that region of the parameter space. If Θflat is more complicated (any differentiable curve or hyperplane), then one need to check if changes in the posterior are proportional to changes in the prior, when moving within Θflat. From a practical perspective, we call a posterior to be sufficiently proper, if ψ (for a posterior based on a uniform prior) is small enough such that sampling from the posterior of interest (based on the minimally informative prior) is insensitive to moderate changes in the limits of the sampling box. Only if these limits are increased by orders of magnitude, then the posterior is going to be affected in its lower-order statistical moments. We will see that minimally informative priors, while sensitizing the posterior to the data, desensitize the resulting posterior to the exact limits of the sampling box and thereby render posterior sampling more robust. That way, it will become possible to analyze a dataset with, i.e., ψ=0.05, if we use a minimally informative prior. We will also demonstrate that parameter inference can be improved further if we include additional information via the prior.

We will also use a simpler definition of sufficiently proper based on posterior samples, namely if the posterior mode carries most of the probability mass such that samples from the tail region hardly reach the limits of the sampling box. This indicates that probability mass in the tail regions is negligible relative to the probability mass under the posterior mode. Note that the density is only well-defined because we refer to a finite volume in the parameter space, otherwise the posterior is improper.

We refer to an inference as fully Bayesian if Eq. 7 is calculated or sampled

θ~pθ𝒴Tp𝒴Tθp(θ). (11)

We use Hamiltonian Monte Carlo (HMC) [69, 70, 73] as provided by the Stan software [71, 72] to generate samples from pθ𝒴T. In addition to the covariance matrix of the parameters, pθ𝒴T allows the calculation of the credibility volume in order to assess parameter uncertainty. The smallest volume VP that encloses a probability mass P[0,1]

P=VPpθ𝒴Tdθ (12)

is called the Highest Density Credibility Volume (HDCV). Assuming that the model sufficiently captures the data generating process, the true parameter values θtrue will lie in the HDCV with a probability P as soon as the likelihood dominates the prior.

Bayesian inference is conditional on the assumed prior and likelihood [79]. Altering p𝒴Tθ or p(θ) changes pθ𝒴T. The prior becomes irrelevant only in the infinite data limit (and only for regular models), meaning that ML and Bayesian inference become equivalent [54]. In case of practically non-identifiable models, Bayesian inference has at least two advantages over ML. First, by scrutinizing pθ𝒴T in detail, issues with structural or practical non-identifiability are revealed. Second, the introduction of priors can alleviate non-identifiability problems.

If little is known a priori about reasonable parameter values and the data constrain some parameters only vaguely, the use of a minimally informative prior is essential. It attempts to make pθ𝒴T as sensitive to the data as possible. Typically, a minimally informative prior maximizes the variance of the posterior pθ𝒴T. In contrast, we show that the minimally informative priors introduced below help confine pθ𝒴T within reasonable boundaries. Thus they reduce the variance of the posterior, if compared to uniform priors. However, minimally informative priors themselves are often improper, which might also render pθ𝒴T improper if p𝒴Tθ is practically or even structurally non-identifiable. Thus, the posterior pθ𝒴T will be dominated by the prior p(θ) in regions of Θ where data fail to inform us about the parameters. Fortunately, Bayesian statistics provides us with tools to render pθ𝒴T sufficiently proper such as theoretically derived upper limits on parameters or vaguely informative assumptions about cooperativity incorporated in p(θ). The benefits and limits of combinations of minimally and vaguely informative priors in the presence of practical non-identifiability will be discussed for two CRNs (Fig. 2).

FIG. 2. Chemical reaction networks of ligand-gated ion channels and their simulated patch-clamp data.

FIG. 2.

a, c The MMs (one MM per column) consist of three or five closed states “Cj” and one open state “Oj”. Binding steps (red arrows) have concentration-dependent rates. The CRNs are specified by the absolute rate constants kij, and L, the ligand concentration. Rates are given per subunit. Stoichiometry factors account for the number of subunits able to undergo the respective transitions. The units of the rates are in a. u. To calculate their SI units s-1 and μM-1s-1, one needs to multiply their value by 6/7 (Sec. V C 4). The open states Oj conduct a mean single-channel current i=1, and the closed states Cj conduct a current of i=0. The synthetic data were simulated with the Gillespie algorithm at a sampling rate of 10 ka.u. The Bayesian filter analysis frequency fana is 2 to 5 ka.u. Since the units are in a. u., the ratios of rates or inverse dwell times determine the CRN. Further their ratio to the sampling frequencies determines how detailed the kinetics are recorded. Similarly, the relative magnitude of i compared with σop and σex should be used to relate simulations to experimental conditions. b, d Open probability time traces calculated from normalized currents Po(t)=yINchi of simulated relaxation experiments of ligand concentration jumps with Nch=103 channels. For demonstration purposes, no experimental noise is added in this figure such that all fluctuations originate from Markov state transitions. However, when inferring the posteriors, additional experimental noise is added. The black lines are the theoretical open probabilities Po(t) of the model. Typically, we used the set {0.0625,0.125,0.25,0.5,1,2,4,8,16,64}μM of 10 ligand concentrations.

IV. PARAMETRIZATION OF THE RATE MATRIX

In the following, we will analyze patch-clamp data that are simulated with MMs involving only mono-molecular or pseudo-monomolecular chemical reactions such as conformation dynamics of a protein or binding/unbinding transitions at excess ligand. If the CRN describes a single molecule, it can only be in one of M Markov states at time t:

S(1,0,,0),,(0,,0,1). (13)

where we use a one-hot encoding of states. If s(t)=ei where eii=1M denotes the standard basis, then the channel is in state i at time t. State-to-state transitions are governed by a transition matrix T in discrete time,

TijPrs(t+Δt)=eis(t)=ej=[exp(ΔtK)]ij (14)

with time increment Δt, or with a rate matrix K in continuous time:

K=-i1ki1k12k1Mk21-i2ki2k2MkM1kM2-iMkiM (15)

where kij0 for ij. By definition, each column of K sums to zero reflecting the assumption that the CRN is closed. The dwell times to remain in the i-th state are exponentially distributed with mean τi=jikji-1. To facilitate the definition of a minimally informative prior in the next section, we use an alternative parameterization of K that does not involve the chemical rates kij:

K=-1/τ1ϵ12/τ2ϵ1M/τMϵ21/τ1-1/τ2ϵ2M/τMϵM1/τ1ϵM2/τ2-1/τM. (16)

The parameters ϵij denote the probability of transitioning from state j to state i after the random dwell time has passed. Thus, each chemical rate kji=τi-1ϵji is the product of a probability with the inverse mean dwell time. The parameters ϵji have no units, unless the statistical weight corresponds to a ligand-dependent kji. Since jϵji=1, both parameterizations of K have the same number of free parameters. For each column, we separated the inverse time-like scale parameters τi-1 from the probabilities ϵji, which are shape parameters. Because transition probabilities are constrained to 0ϵji1, the likelihood remains finite for all ϵji and pK𝒴T will be proper in these parameters (as long as Haldane-like priors beta(0,0) are excluded for ϵij). Then, only the dwell time parameters τi can render the HMM practically non-identifiable.

Figure 2 shows the time traces of plausible CRNs of two ligand-gated ion channels that have two binding pockets. These can be simulated with QuB [80] or an inhouse algorithm https://cloudhsm.it-dlz.de/s/QB2pQQ7ycMXEitE. The assumptions made to define a likelihood for these data are detailed in [38]. In App. B, we discuss the global sensitivity of the solution of the RE for CRN1 (Fig. 2 a) and demonstrate practical non-identifiability of the likelihood (App. B 3) even for over-optimistic data and strong prior knowledge (only a single rate constant is unknown).

V. DEFINING AND BENCHMARKING THE MINIMALLY INFORMATIVE PRIOR

The following section benchmarks the performance of the Bayesian filter for different combinations of minimally informative priors and physically motivated vaguely informative priors, for cases where the information content of the data is low relative to the number of parameters of the CRN. We first compare a uniform prior on the rate matrix p(K) with a minimally informative prior defined below (Eq. 20), which promises to be less biased (less misinformed). The reason is that the practical non-identifiability of the likelihood (App. B 1) is aggravated when using a uniform p(K). In an unfortunate combination, a uniform prior p(K) places the probability mass where the likelihood becomes less and less pronounced and reaches a constant finite value (App. B 1). With a minimally informative prior, however, one can drastically reduce the severity of this problem. But one should be aware of the limitations of both priors. See App. C 1 for a brief biophysical example for the problem of different parametrizations of statistical models and prior distributions, which gave rise to the following Eq. 17.

A. Definition of minimally informative (MI) priors by approximating Jeffreys's rule

We will use a revised version of Jeffreys's rule [64] to define the minimally informative prior, which treats location, scale and shape parameters independently:

pθ=pμ,τ,ϵdetFϵi1τi. (17)

The location parameters, μ, such as the mean value of the normal distribution, are assigned uniform priors. Each scaling parameter, τ, has a log-uniform prior. Only the shape parameters ϵ are treated conjointly by evaluating the Fisher matrix,

Fijϵ=-E2ϵiϵjlnp𝒴θ, (18)

according to Jeffreys's rule [64]. The separated treatment of location, scale and shape parameters can be applied to the used parametrization (Eq. 16) of K. For a brief introduction of Eq. 17, see App. C 2. In addition, we simplify Eq.17 by assuming that det[F(ϵ)] can be applied to each column of K (Eq. 16) independently. In that way, we obtain closed-form solutions of det[F(ϵ)] derived from simpler statistical models for the remaining ϵ of each column of K.

B. Minimally informative prior for the rate matrix inspired by Jeffreys's rule

We use a simplification of Eq. 17 that is common practice [11] for complex multi-parameter MMs. The priors used to infer MMs from MD simulations are constructed by applying Jeffreys's rule to simpler statistical models [11] instead of applying it to the entire model. Bayesian estimation of T (Eq. 14) from MD simulations [81] often uses one Dirichlet prior per column Ti,: of the transition matrix:

dirTi,:=1B(α)j=1MTijαi-1. (19)

However, the Jeffreys prior for T is not a product of Dirichlet distributions [82, 83]. Also, for HMMs of single-molecule force spectroscopy data [10], products of Dirichlet priors are used for T.

Here, we do not sample T but K (Eq. 14) because, in contrast to sampling T, it is trivial to incorporate information about the scaling of the binding rates at different ligand concentrations in a direct parameterization of K. The same applies to additional prior information on maximal binding rates. Note that with one exception, we use the parameterization of Eq. 16 to define p(K). The exception will be discussed later when we add the information about theoretical upper diffusion limits on binding rates k21 and k32 (CRN1 and CRN2) and k45 (CRN2). One can mix in the parameterization any kji, τi and ϵji equivalently as long as the pior remains equivalent, i.e., that p(θˆ)dθˆ=p(θ)dθ holds, with θˆ indicating a different parameterization. As the mean dwell times τi are scaling parameters, we use a log-uniform prior following the arguments above. We use the Dirichlet prior for the probabilities ϵji, which is the default prior for probability vectors. These probabilities should not be confused with the transition probabilities stored in T. Applying Eq. C3 to a multinomial likelihood, results in a Dirichlet prior with αj=0.5 for all j1,,Wi parameters. The value of Wi is the number of Markov transitions leaving the i-th state. For a physically meaningful CRN, K tends to be sparse, e.g., ligand binding and channel gating does not occur at the exact same instant of time. Thus for the i-th Markov state, the number of allowed transitions will usually satisfy Wi<M where M is the total number of states. The set Ωi{1,,M}{i} contains the indices of all states that can be reached by one Markov transition, leaving the i-th state. Then, using Eq. 17 and evaluating the det[F(ϵ)] factor for each column of K individually, we obtain

p(K)=iMlog-uniτi-1diriϵiα=iM1τi-1logbi/ai1BαijΩiϵjiαji-1, (20)

where α is the vector of concentration parameters and B(α) is the beta function at α. The set of ai and bi for the log uniform distributions are the upper and lower limits of τi. The topology of the CRN is controlled by the Dirichlet distribution. See App. D for an illustration of p(K) for a state with two or three leaving transitions.

C. Advantages and limitations of the minimally informative prior in the presence of practical non-identifiability

Using the Bayesian filter, we study the impact of three different priors on the performance of pK𝒴T, focusing on weakly informative patch-clamp data. The data are sampled from a 4-states-1-conducting-state HMM (CRN1 Fig. 2a). Our findings are presented in Fig. 3.

FIG. 3. Prior sensitivity analysis showing that the uniform prior aggravates problems caused by practical non-identifiability.

FIG. 3.

The figure contrasts the posteriors resulting from three different priors: uniform prior (green), minimally informative prior (black), and minimally informative prior with physical limits on the binding rates (blue). The rates are transformed to inverse dwell times and transition probabilities. a, RMSE of the mean of the marginal distribution vs. Nch based on the uniform prior (green), the minimally informative prior(black) and minimally informative prior with imposed imposed diffusion limits (blue). The cyan dashed curve is a fit based on fNch=a/Nch. A standard deviation of each data point which follows Nch is assumed. All data sets were generated with σ2=1 and σop2=0.2. b, Posterior distributions of the dwell times τi and transition probabilities ϵji for Nch=103. On the diagonal, 1D marginal posteriors are plotted. The off-diagonal plots show the 2D marginal distributions of the posterior. All samples of the parameters were normalized to their true values θˆ=θ/θtrue. The blue lines indicate the true parameters. The insets on the diagonal show the same histogram only plotted on the logarithmic scale to display the flatness of the posterior if a uniform prior is used. The (red) vertical bar indicates ϵ12=1, which corresponds to ϵ˜12=4 due to the normalization by the true values. The posterior is plotted based on samples with a Gelman-Rubin statistic of Rˆ=1.0.

1. The scaling of the RMSE

We define the Euclidean distance or root-mean-square error (RMSE) between Ktrue and the posterior mean of all chemical rates in log-space as

Error(logE[k])=llogEkl/kl,true2. (21)

Appendix E discusses why we use the logarithm of the chemical rates. In Fig. 3a, the RMSE of the inferred pK𝒴T is plotted against the number of ion channels per time trace Nch. We use Nch as a proxy for the quality or information content of the data. A regular statistical model is expected to show a 1/Nch-scaling of the RMSE, but the Bayesian filter is singular. This becomes apparent for the uniform prior: Below a critical value Nch,crit2103 the RMSE deviates visually from the behavior of the other two priors. Above Nch,crit the Bayesian filter behaves like a regular statistical model.

In the biased regime, Nch<Nch,crit, the uniform prior causes the RMSE to deviate from the 1/Nch-scaling, whereas the log-uniform prior scales as 1/Nch to far smaller Nch, because it imposes a penalty on larger values of τi-1. Deviations from 1/Nch-scaling of the RMSE to smaller values also occur if upper bounds are enforced on some kij. The log transformation eliminates the lower limits (App. E), but the upper bounds derived from the diffusion limit (Sec. V C 4) are still active. In contrast to the equivalence of the minimally informative and uniform prior above Nch,crit the parameter limits reduce the RMSE even at Nch that are two orders of magnitude larger than Nch,crit. Note that the RMSE is dominated by the error in τ1-1 whose uncertainty is reduced most strongly by the constraints (Fig. 14). The impact of the other upper limits is discussed below.

2. The likelihood dominates the posterior only in a small region of the parameter space

Next we explore the impact of the prior in more detail for a dataset in the critical regime (Nch=103). The symbol θ˜ denotes a parameter divided by its true value such that logθ˜ should ideally be zero. Figure 3b shows the marginal posteriors of log10τ˜i-1. With a uniform prior, the posterior of log10τ˜1-1 concentrates between 102 and 103 and would deviate even more strongly from zero, if the sampling box was larger. The insets in the diagonal panels provide an overview of the marginal posteriors of τ˜1 and τ˜2 over the full sampling box. To indicate the relative probability masses, the posterior ratios Rτ˜i-1𝒴Tpτ˜i-1𝒴T/pτ˜i,max-1𝒴T with i=1, 2 are plotted. The flat right tail of the marginal posterior obtained with the uniform prior (Fig. 3b, green curve) causes the observed deviations of the RMSE (Fig. 3a, green curve and Fig. 13). Furthermore, a flat priordominated part of pτ˜2-1𝒴T creates an exponentially growing part in plog10τ˜2-1𝒴T. The ratios Rτ˜1-1𝒴T and Rτ˜2-1𝒴T drop from their peak values to their right tails only by 1/10 before they reach a non-vanishing plateau. It takes τ˜2-1 two orders and τ˜1-1 three order of magnitude to do so. Hence, the posteriors plog10τ˜1𝒴T and plog10τ˜2𝒴T, are dominated by the uniform prior. It is plausible that this also holds for the entire pK𝒴T.

Given that the sampling box covers multiple orders of magnitude for τ1-1 and τ2-1, the flat part of the marginal posterior (where the data provide no information about these parameters) contributes a non-negligible probability mass to the posterior. Thus, posterior statistics such as mean, median, variance and derived quantities such as the RMSE become sensitive to the probability mass residing in the flat part. That should raise one's concern, because the inference will depend on the very limits of the sampling box. A larger sampling box would move probability mass from the peak into the flat area. Hence, when using a uniform prior in the critical regime, the limits of the sampling box for τ1-1 and τ2-1 act as highly informative user settings, even though they are often chosen arbitrarily and lack a physical justification. If instead Rτ˜1-1𝒴T were to drop to 10−4, say, before the marginal posterior flattens out, changes in the limits would not affect pK𝒴T, unless one increased the sampling box by many orders of magnitude. Because the posterior of τi is improper, we cannot define meaningful HDCVs to quantify their uncertainty, if τi is defined on the entire real axis. More information is needed, which unfortunately happens often by setting the sampling box limits more or less arbitrarily. Alternatively, vaguely informative priors can be defined by using physical justified information that penalize the tails of pK𝒴T as demonstrated below.

3. The minimally informative prior (partially) alleviates practical non-identifiability

A comparison of the posteriors obtained with a uniform and the minimally informative prior exemplifies the harm induced by the uniform prior (Fig. 3, black vs. green curve). The minimally informative prior (Eq. 20) penalizes large τi and decorrelates plog10τ2-1,ϵ12𝒴T. Note that τ2-1 and ϵ12 belong to the same column in K describing transitions leaving state C2. This accumulates probability mass at the true parameter values in pτ1𝒴T, pτ2-1𝒴T and pϵ12𝒴T. Consistently, the choice of prior has a stronger impact on τ1, τ2 and ϵ12 (which are much less determined by the data) than on τ3-1, ϵ23 and τ4-1. Overall, the minimally informative prior concentrates pK𝒴T closer to Ktrue.

Thus, when looking at the RMSE as a function of Nch, the minimally informative prior reduces the RMSE drastically below N=2103 (Fig. 3a) such that the 1/Nch regime expands. The 1/Nch scaling is unfortunate if one tries to increase the inference quality by measuring additional data. However, the 1/Nch scaling is desirable in the reverse direction in which Nch decreases, because it prevents inferences from becoming to quickly meaningless. Despite its heuristic definition, the prolonged 1/Nch-regime indicates that with this minimally informative prior one is less biased than with the uniform prior. Below Nch,crit one only sees the prior and hardly any effect of the data when using the uniform prior. The area around the true values has essentially no probability mass. However, the minimally informative prior concentrates the probability mass much closer to Ktrue. Nevertheless, if the likelihood is flat in τ1-1 and τ2-1 at some distance to the peak, it follows that pτ1-1𝒴T and pτ2-1𝒴T are improper. This does not change for the minimally informative prior, because the log-uniform prior cannot be normalized when defined over the entire positive real axis. It diverges at τi0 and decays too slowly to zero for τi. We return to this observation in sec. V D.

Cutting Θ with a sampling box into regions that are accessible and inaccessible to the sampler is not always a problem. Without knowing about practical non-identifiability problems, we showed in [38] that the HDCVs and thus the estimator have a frequentist interpretation as long as the peak of the posterior is towering highly enough over the flat parts. Thus, one finds Ktrue inside a volume with a certain probability mass with a frequency approximately equal to that probability [38].

4. Adding information from physically motivated upper limits for binding rates

So far, we have shown that using a minimally informative prior robustifies the inference below Nch,crit. To alleviate the problem that the posterior could still be improper or vague in some parameters, we can add more physically motivated information, such as the diffusion limit for the binding rates (Fig. 2, red arrows). Typically, the random collision rate is used as a upper limit for binding rates [84]. Here we use a more realistic estimate for the binding of small ligands [85]. To the best of our knowledge, only Bayesian statistics can rigorously take full advantage of limits on the parameters, because the introduction of parameter limits will impair the validity of the normal approximation of the sampling distribution of θML. To implement the diffusion limit consistently in a minimally informative way, we need to change the definition of the prior Eq. 20. The rate holds k32=ϵ32/τ2 such that we can also draw k32~log-uniform. So, the log-uniform prior is still used to set the time scale for all transitions leaving state C2. Then, we draw the statistical weights from the Dirichlet distribution for the second column of K such that τ2-1=k32/ϵ32. The other rates can be defined by kj2=τ2-1ϵj2. Whether we impose a log-uniform prior on τ2-1 or some kj2 is irrelevant as long as this prior is introduced once for each column of K.

We simulated the first binding rate (Model 1) with k21=2600μM-1s-1 which is the diffusion limit derived in [85] for ligand binding. The stoichiometric factor of 2 incorporates the structural information that two binding pockets are available. Of course the binding rate of a real process will be slower than the upper limit from [85]. To incorporate this aspect we take advantage of the fact that in our simulation of data and inference k21=2600μM-1s-1 is an arbitrary definition. In fact, we simulated k21=2600a.u. and defined this value to represent the value in SI units. We can also apply a different mapping of the arbitrary units to SI units. We may decide that 700 a. u. is supposed to represent the diffusion limit [85] in SI units. Thus we defined 700a.u.=600μM-1s-1. In that way we avoid the unrealistic and extreme scenario that k21,true is identical to the sampling box limit (which would still be for Bayesian inference a valid use case). All other “time-like” parameters such as sampling rates, dwell times and chemical rates need to rescaled by 6/7μM-1s-1a.u..

The impact of the upper limits is shown in Fig. 3ab and Fig. 4 (blue curves). The upper limits on the parameters k21,max and k32,max are now much smaller than the limits that we used previously. This solves the non-normalizability problem in the crucial parameter τ1-1=k21 and reduces the chance that τ2-1=k32/1-ϵ32 is non-normalizable. Still, τ2-1 could diverge, if the marginal posterior of ϵ321 diverges. The parameters τ1-1 and k32 are now bounded from above and below by mathematically and physically motivated limits.

FIG. 4. The uniform prior increases the posterior uncertainty.

FIG. 4.

We define a multivariate analog of the standard deviation (Frobenius norm of the covariance matrix of the posterior) vs. Nch. The colors encode the prior assumptions. The larger the Frobenius norm, the more uncertainty remains after the inference. Plot is based on the same data used in Fig. 3.

The diffusion limits k21<2700a.u. and k32<700a.u. restrict the RMSE (Fig. 3a, blue lines) to smaller values such that it drops below the 1/Nch regime for Nch<Nch,crit. However, even for data sets with Nch104,105, the constraint adds information and decreases the ED. Constraining the two binding rates τ1-1 and k32 also influences via the likelihood the marginal posteriors of other parameters, particularly pϵ21𝒴T (Fig. 3b). The constraints shape pϵ21𝒴T into a more pronounced distribution which covers the true value.

5. The uniform prior increases posterior uncertainty

The constraint only adds information to pK𝒴T as long as the data 𝒴T themselves do not restrict the HDCV within the prior's upper limits. To study the impact of the constraint it does not suffice to consider the RMSE. Notably, ErrorlogE[k] is not any quantity possible to report after an inference of real experimental data either. Ultimately, one judges the inference by its credibility intervals/regions (or confidence intervals/regions in an ML context) of the inferred parameters. Thus, only the posterior's shape in general (the median/mean/peak, covariance, and higher order statistical moments describing the tails) are at the modeler's disposal to assess the quality of the inference.

Therefore, we define a quantitative measure of the spread of pK𝒴T, which is the Frobenius norm

Σlog(k)=i,jNparamsΣlog(k),ij2, (22)

of the covariance matrix of the samples of pK𝒴T on the log space of the chemical rates. The observed transition around Nch,crit (Fig. 3a), that the location posterior derived from the uniform prior becomes equivalent to the posterior derived from Jeffreys prior without diffusion limit at Nch=2103, as judged by the RMSE, is only to some degree present in Σlog(k) (Fig. 4, black and green lines), which as noted measures the posterior's spread not location. The spread of pK𝒴T based on the uniform prior (green curve) does not converge to the spread of the minimally informative prior (black curve). It shrinks instead parallel given the Nch range that is investigated. This effect becomes more prominent for the more complex CRN2. Further, the larger the size of the sampling box the larger Σlog(k) due to the practical non-identifiability problem. The Frobenius norm of the covariance of pK𝒴T based on the Jeffreys prior, including the diffusion limit (blue curve), has prolonged smaller values than without the diffusion limit but finally converges towards the Frobenius norm of pK𝒴T derived from the prior without upper limits (black curve). Hence, the information from the added diffusion limit is made use of, even up to Nch=3104. Above Nch,crit the RMSE (Fig. 3a) is still a much more fluctuating parameter to benchmark the behavior of pK𝒴T than the rather non-fluctuating Σlog(k). Σlog(k) follows almost a straight 1/Nch-scaling (Fig. 4). In other words, repeating the experiment under identical conditions will leave Σlog(k) almost unchanged, but the posterior's location will be randomly translocated from data set to data set within the soft constraints of Σlog(k). Larger simulation boxes increase the value of the green curve the most, followed by a minor effect on the black curve. This is an elegant result because, ultimately, with real experimental data, the inference quality is judged by uncertainty quantification and not by the RMSE.

D. The minimally informative prior still generates an improper posterior making it difficult to decide which point estimate has the smallest error.

To confirm that pK𝒴T is still improper (Eq. 8) and where in the parameter space p(K) dominates pK𝒴T, the most worrying Rτ˜1-1𝒴T and Rk32𝒴T are plotted for Nch<2103 and sampled in larger sampling boxes (Fig. 5). The PC data originates from the 4-states-1-open-state CRN (Fig. 2 a). Note that τ1-1=k21 because only one Markov transition leaves state C1.

FIG. 5. The minimally informative prior without diffusion limit generates still improper posteriors for CRN1.

FIG. 5.

The relative marginal posteriors Rτ˜1-1𝒴T and Rk˜32𝒴T are plotted for different Nch and different sampling boxes. The tilde over the parameters indicates that each parameter is normalized to its true value. a Rτ˜1-1𝒴T displays power law-like behavior, but appears proper given the limits of the sampling box. b, However, the same Rτ˜1-1𝒴T sampled from a two magnitudes larger sampling box displays an area for large τ˜1-1 where the prior entirely dominates Rτ˜1-1𝒴T. The inset shows the mean, median, and peak for τ1-1 vs. Nch for the smaller sampling box (solid line) and larger sampling box (dotted) line. c, pτ˜1-1,k32𝒴T for Nch{100,200} (blue, red densities) displays a non-local correlation structure leading to a bias of Rk32𝒴T. The inset is based on the same data Nch{100,200} but k32 is transformed to τ˜2=k32/ϵ32. d, For Rk32𝒴T the heavy tails are much less a concern. The power law exponent for the smallest Nch is ϕ3.5. The insets of the panels display the mean (blue), the median (orange), and the peak (green) of the marginal posterior.

As a guide to the eye, functions fτ-1τ-1-ϕ (dashed lines, Fig. 5ab,d) that have a power law scaling are plotted, including the log-uniform prior. The right tails of Rτ˜1-1𝒴T are well approximated by different power laws until the limit, 103τ1,true, of the sampling box (Fig. 5a). The slopes of the tails are heavily influenced by the log uniform prior fτ-1τ1-1-1 but also the likelihood contributes some additional slope (in other words information) till approximately τ1,true-1103. For the smallest sampling box, for all values of Nch the Rτ˜1-1𝒴T seem to be proper but only vaguely identified. The exponent ϕ gradually increases with Nch indicating the increase of information about τ1-1. However, using a two-orders of-magnitude larger sampling box (Fig. 5b) demonstrates that for even larger τ1-1, ϕ=1 holds. I.e. the prior complete dominates pK𝒴T in this region of Θ in the direction of τ1-1. Hence, pK𝒴T indicates a practical non-identifiability problem of the HMM as only the prior (Eq. 10) and not the likelihood contributes to the posterior locally. For at least Nch600 the data do not contribute information for τ˜1-1>103. The limited influence of some chemical rates or dwell times in some parts of Θ on the signal is a general feature of partially observed CRNs (App. B). In the inset of Fig. 5b, the mean (blue), the median (orange), and the peak (green) of pτ1-1𝒴T vs. Nch for the smaller sampling box (solid) and larger simulation boxes (dashed) are plotted. Mean values or higher statistical moments do not exist for distributions with power law tails with exponents ϕ<2, thus the peak should be reported for the barely identified τ1-1. Consistently, the peak of Rτ˜1-1𝒴T (green curve inset Fig. 5 b) has maximally a relative error of a factor of 2 and seems unbiased and not diverging with increasing sampling box size, given the tested data and minimally informative prior. For the same data, the residuals of the mean (blue curve inset Fig. 5 b) of Rτ˜1-1𝒴T are heavily biased, sensitive to the sampling box limits and orders of magnitude away from τ˜1-1,true. The median (orange curve inset Fig. 5 b) is more robust against different sampling box sizes. However, it is still biased towards too large values of τ1-1, which is unsurprising as it is not defined on the full support (0,) if the shape of the log uniform prior [86] eventually dominates pτ1-1𝒴T for large τ1-1. This potentially indicates for parameters with strong practical non-identifiability degree (the magnitude of the posterior peak does not decay by multiple decades before reaching the area where the prior dominates the posterior) that it is better to report MAP values. In Fig. 5c we plot pτ˜1,k˜23𝒴T and in the inset pτ˜1-1,τ˜2-1𝒴T to demonstrate that the practical non-identifiability of τ1-1 leads to bias of k32 however the bias is much reduced for τ2-1. The bias of the corresponding pk32𝒴T is also present for different data sets (inset of Fig. 5d).

The tails of Rk32𝒴T given the sampling box, seem to be less heavy tailed (Fig. 5d) than those of Rτ˜1-1𝒴T. For Nch={100,200,300}, Rk32𝒴T appears to follow a power law, hence, Rk32𝒴T~k32-ϕ with ϕ=3.5. The quicker decaying power-law tails of Rk32𝒴T still create a skewed distribution. The skewed tail of Rk32𝒴T compensates partially for the bias of the peak region of Rk32𝒴T (Fig. 5d). Hence, Ek32 (blue curve, inset, Fig. 5d) is the least biased point estimator towards too small k32 values until the data are strong enough. In contrast to τ1-1, the standard Bayesian point estimate, Ek32 performs best for k32. Besides our observation, mathematics dictates that reporting θMAP for parameters whose posterior has a powerlaw tail with small ϕ2 is more robust because mean values and eventually medians for ϕ1 do not exist. However, it is unclear whether reporting the mean value for parameters with large ϕ because of their smallest bias (inset of Fig. 5d). generalizes to other CRNs. Note that from the bias of the peak of Rk32𝒴T to too small values, it is clear that small HDCIs around the peak will not include the Ktrue with the frequency equal to the probability mass they claim to have. Thus, posteriors which fullfill decently the asymptotic behaviour of the Berstein-von-Mises theorem need larger Nch or a second observable [38].

In summary, the minimally informative prior, particularly, its convex log uniform distribution for τi’s or kij’s has the desirable feature of concentrating pK𝒴T much closer to Ktrue, but it still produces improper pK𝒴T if no further upper limits can be justified. The minimally informative prior alleviates the improperness of pK𝒴T by making the posterior less sensitive to the often nonphysical and arbitrary limits of the sampling box, but the practical non-identifiability problem will become relevant when increasing the sampling boxes at some point for all data sets. Further, the higher the data quality, the less sensitive is the inference to the sampling boxes limits. Hence, the degree of the practical non-identifiability problem has to be judged based on how much the peak of Rτ1-1𝒴T decays before the slope of Rτ1-1𝒴T is essentially the slope of the prior.

E. Solving the practical non-identifiability problem with vague additional information on cooperativity

We exemplify how to robustify an improper pK𝒴T by physically justified vaguely informative prior distributions. For CRN1, we enforce a soft physical constraint on the one hand on the binding rates k21 and k32 and on the other hand on the unbinding rates k12 and k23 by a regularizing prior, plausible for homomeric proteins. Physical common sense dictates that one should be skeptical a priori, if binding rates or unbinding rates for the same/similar binding sites have values differing by orders of magnitude. One modeling assumption encoding this skepticism could be

pcoplog10k21/k32=normal0,σcop (23)

for binding and for unbinding

pcoplog10k23/k12=normal0,σcop (24)

with some finite standard deviation σcop (Fig. 6). Note that this corresponds to the classical definition of cooperativity: The ratio of K32 to K21: If the affinity Kij=kij/kji increases with the number of occupied binding sites, this is called positive cooperativity. I.e. if the microscopic binding rates(the binding rates per binding site) are constant, e.g. diffusion limited, the ratio r=K32/K21=k12/k23 can be used as measure of cooperativity with = 1 equals no cooperativity, > 1 is positive, < 1 negative cooperativity. The vaguely informative prior is a much less radical prior (assumption) than assuming identical microscopic rates (which is the non-cooperative assumption, σcop=0, for the binding and unbinding), as frequently done for ligand gated [87, 88] and more excessively for voltage gated ion channels [8992] to alleviate structurally non-identifiable or practical non-identifiability problems in HMM inferences. The CRN of the shaker channel [90] is a good example of how the structural prior information of having four subunits within the shaker channel implies a necessary complexity of the CRN to allow it to explain the data and to be physically interpretable. This CRN has that many voltage dependent rates that setting certain subsets of rates equal is used to avoid non-identifiable problems. However, this assumption might be incorrect for the ion channel at hand. Thus one gains identifiably of the parameters by loosing potentially the ability of the model to express the true process. Instead, assuming that cooperativity cannot change the chemical rates beyond some reasonable scale is physically plausible and restricts the model much less. One might debate the prior's variance and the prior's shape. Note that adding this additional regularizing prior solves the improperness problem of pτ1-1,k32𝒴T originating from a flat likelihood for high values of τ1-1, k32 no matter what finite σcop is used because of the quickly decaying tails of normal distributions. Notably, almost any additional prior that adds at least a tiny amount of decay till infinity to the otherwise log uniform dominated posterior would also render pK𝒴T proper. The normal distribution on the logspace has, by definition, desirable properties: It is symmetrical on the order of magnitude, and defines an area, within one standard deviation, with little impact on pK𝒴T and an area of increasing penalty for values away by multiple standard deviations. In that way extreme conclusions from the data, expressing strong cooperativity effects in the data-generating process, have to be more and more supported by the data to be represented with a relevant magnitude in pK𝒴T. Using the larger sampling box (Fig. 6) makes the regularization more urgent. In Fig. 6 panel a,b, we demonstrate how applying the additional prior renders pτ1-1,k32𝒴T proper even for Nch=102 and σcop=1. Notably, in this case ±σcop covers 2 oders of magitude. The bias (inset, Fig, 6a) of pk32𝒴T is drastically reduced with a finite σ compared to the pure minimally informative prior, and decreasing σ further improves the inference (Fig. 6b) in terms of bias and variance. Notably, the prior nudges the posterior to concentrate its mass between τ1,true-1 and k32,true. Thus, at some point, the smaller σcop of Eq. 23 and 24, the more is the variance of the posterior decreased but also the bias of plog10k23/k12 starts to dominate the posterior more and more. The traditional non-cooperativity assumption trades variance of pK𝒴T for a maximum of bias and thus should only be applied if there is strong a priori evidence that non-cooperativity is true in the data-generating process.

FIG. 6. A vaguely regularizing prior on the cooperativity factor renders the posterior proper even for the lowest quality data.

FIG. 6.

We demonstrate for the previously used data sets of CRN1 (Fig. 2a) with Nch=102 the effects of the value of σcop using the larger sampling box. Note that τ1-1=k21 holds. The black continuous lines indicate the value of τi,true-1 and k32,true on the x-axis. The black dashed lines indicate the corresponding τi,true-1 and k32,true in units of the true value of x-coordinate to visualize the bias of the regularizing prior, constraining the posterior more and more between the dashed and solid black line with decreasing σcop. a, pτ1-1𝒴T for either no or a series of increasingly strong regularization. The inset compares the effect of the vague regularization (orange) on the mean (solid curve) and median (dashed curve) vs. Nch with no regularization (blue). b, pk32𝒴T for the same series of increasing regularization. The inset compares the effect of the vague regularization (orange) on the mean (solid curve) and median (dashed curve) vs. Nch with no regularization (blue). c, The effect of the regularization on pτ1-1,k32𝒴T, with no regularization (blue) and the most vague regularization prior (σ=1, green).

F. Minimal informative prior to ward off the curse of complexity and dimensionality

Next, we investigate the challenges arising when the complexity of K is increased. CRN2 additionally contains two flip states (Fig. 2 d). We enforce microscopic reversibility, which reduces the number of to-be-inferred parameters by one. For a comment on how the minimally informative prior improves the convergence of the sampler to the true posterior (Ap. F 1) and alleviates the curse of dimensionality. In Fig. 7 a the pathological dimensions of pK𝒴T derived from the uniform p(K) (green) against the same dimensions of pK𝒴T (black) derived from the minimally informative p(K) are compared. The less pathological dimensions, in the sense that they deliver roughly Gaussian marginal posteriors, are plotted in Fig. 7 b16. Even for Nch=105, the posterior based on the uniform prior clearly demonstrates the practical non-identifiability feature of the likelihood in τ2-1 and τ3-1. The ridge-like correlation of pτ2-1,τ3-1 (panel a3) with a slow decay along the ridge demonstrates the practical non-identifiability problem. Notably, the ridge is strongly constraint for values smaller than τ2,true and τ2,true (panel a1–2) but above it seems to extend to infinity. The strong correlation between τ2-1, τ3-1 produces a paradoxical challenge of the minimally informative prior which turns out problematic for smaller Nch. The result of the strong correlation is that the sample value for τ2-1 can be predicted by a affine linear function τ2-1=mτ3-1+n (panel a3 dashed line). Ignoring the affine part of the function, the slope of loguniform prior on τ3-1 will be mapped by the linear function to pτ2-1𝒴T. Additionally, τ2-1 has its own log uniform prior. The same applies to both parameters the other way around. Hence, both one-dimensional priors together with the practical non-identifiability ridge, contribute to the posterior a scaling of τi-2 for both parameters, creating bias towards too small values (see, the corresponding pτ2-1,τ3-1𝒴T in Fig. 8 a). Based on the corresponding 2d marginal distributions (Fig. 7 a), the Markov transition probabilities ϵ12, ϵ32 and ϵ23 appear to have a more and more diverging posterior the larger the sampling box for τ2-1, τ3-1 gets. These parameters should be considered unidentified, based on the herein-employed heuristic and visual criterion (see discussion around Eq. 9) to asses the degree of practical non-identifiability pK𝒴T. Surprisingly, τ1-1 is for CRN2 not as unidentified as for CRN1 (Fig. 3 b). We employed a smaller sampling box for the uniform prior (because of an often not converging sampler, likely caused by the practical non-identifiability problem and the curse of dimensionality. Using a smaller sampling box for the uniform prior also disadvantages the minimally informative prior in comparison because less of the part of the parameter space where the likelihood is flat is possible for the sampler to reach. In the App. Fig. 15 we demonstrate that sufficiently proper posteriors are achieved, for the minimally informative prior at the experimentally possible, but certainly challenging data set of Nch=5103. However, with the uniform prior one needs an impossible data quality of Nch=2.5105. Hence, the minimally informative p(K) increases the range of acceptable data for this CRN roughly a 50-fold. Using only the minimally informative prior (black posterior) produces for k32 and k45 visually-visible improper marginal posteriors for Nch<5103 (see Fig. 7 and 8).

FIG. 7.

FIG. 7.

For the more complex CRN2 the minimally informative prior is necessary even for data sets of of unrealistic high quality such as Nch=105. a, Posteriors based on uniform (green) and minimally informative priors (black) are compared. The clearly non-Gaussian-shaped marginal posteriors plus those concerning τ1-1 are plotted in a, all other, rather Gaussian marginal posteriors, are shown in b. The insets a1, a2 visualize the posterior without the log transformation. The black posterior is equipped with the minimally informative p(K). The green posterior is based on the uniform p(K). The flat (green) posterior for τ2-1 and τ3-1, creates what appears to be a exponential increase for plogτ2-1𝒴T and plogτ3-1𝒴T. a3, demonstrates the positive linear correlation contained in pτ2-1,τ3-1𝒴T. Deviations of both parameters from the corresponding true value can be compensated to some extent by the other parameter. Note that the sampling box for the posterior samples of τ2-110-2,106.5 updated from the minimal information is more than an order of magnitude larger than the range of the simulation box used when the uniform p(K) is used. This disadvantages the posterior based on the minimally informative prior but demonstrates the larger robustness of the minimally informative prior. The posteriors derived from the uniform p(K) are sampled by kij and then mapped to the (τ, ϵ)–space.

FIG. 8. For CRN2, the minimally informative prior enables inference for about two orders of magnitude lower data quality. Adding information by diffusion limits and vague bias towards non-cooperativity allows us to work with three orders of magnitude lower data quality.

FIG. 8.

The data sets simulated from CRN2 (Fig. 2) are analyzed The color black refers in all plots to pK𝒴T based on the minimal informative prior. Blue corresponds to assuming diffusion-limited binding. Red an additional assumed vague prior on the cooperativity of the binding and unbinding rates. Magenta less vague no-cooperativity assumption. a, The true values of K are indicated by the blue lines. Posterior for Nch=102 for the minimally informative prior, minimally informative with upper limits and with an added vague no-cooperativity assumption. For visual clarity, we suppress τ6-1, ϵ54, ϵ25 in the main plot but add sub panels which display the corresponding posteriors. Note that these parameters are only slightly influenced by the priors, and even without the priors, the posterior is peaking Gaussian-like with some skewness. b, The RMSE of the log space of the chemical rates is plotted vs. Nch for the median (solid curve) and the marginal peak (dashed curve) for different prior assumptions. c, The Frobenius norm of all kijs of the covariance matrix of the samples of K

G. Effects of combining the theoretical diffusion limit with vague non-cooperativity assumptions

In Fig. 8 a we display pK𝒴T for the smallest tested data quality Nch=102 with the minimally informative prior and with the additional (vague and hard) constrains discussed below. Using only the minimally informative prior (black posterior) produces for k32 and k45 not sufficiently proper marginal posteriors. The first visually proper pK𝒴T appears with Nch=5103 (App. Fig. 15 a). From the discussion before, it is clear that the likelihood for Nch=5103 is only sufficiently practical identifiable that the improperness of pK𝒴T is not detected visually even though there is no reason to assume that it is not there if one would sample from much larger simulation boxes. Due to the complexity of the inference problem we employ the additional assumptions.

1. The vague no-cooperativity assumption increases the accuracy and decreases the uncertainty of the inference.

If one applies strict upper diffusion limits for k21, k32 and k45 (Fig. 8, blue posterior and curves) one gains a proper pK𝒴T for Nch=102 in the corresponding dimensions and also the other dimensions become sufficiently proper. Adding further vague prior assumptions log10k21/k32~normal(0,0.5), log10k12/k23~normal(0,0.5), log10k45/k32~cauchy(0,1) and log10k54/k23~cauchy(0,1), which regularizes pK𝒴T gently towards a non-cooperative CRN (Fig 8 red posterior), reduces the mean-distance between pK𝒴T and Ktrue and reduces the uncertainty (Fig. 8 c). Note, that a less-vague no-cooperativity assumption log10k21/k32~normal(0,0.1), log10k12/k23~normal(0,0.1) reduces the Frobenius norm and RMSE further (see 8 b-c). The RMSE (Fig. 8 b) shows that at Nch5104 the RMSE transitions to 1/Nch asymptote while for Nch<5104 the practical non-identifiability problem combined with prior assumptions influence the posterior. Above Nch=5104, the posteriors equipped with priors with diffusion limit produce similar RMSEs, unless the less vague cooperativity assumptions (Fig. 8 b magenta curves and posterior) are used. For the less vague prior the RMSE converges onto the curves of the other two prior assumptions (blue and red curve) around Nch=106. In contrast, the pure minimally informative prior has different RMSEs (Fig. 8 b black curve) for each data set. This shows that the vague no-cooperativity assumptions lost their influence on the RMSE, while the diffusion still influences the RMSE.

The Frobenius norm of the covariance matrix of pK𝒴T shows (Fig. 8 c) that enforced upper diffusion limits (blue, red, and magenta curve) still add information and reduce the uncertainty of pK𝒴T. Hence, even for data qualities of Nch>106, an ML inference would ignore relevant information to reduce the uncertainty of the inference. The Frobenius norm of the posteriors based on the pure minimally informative prior without additional assumptions transitions at Nch=105 to the Nch-0.5-scaling.

To summarize, with minimally informative prior with diffusion limits (Fig. 8), one can make inferences with more than 103 times smaller Nch per time trace compared to Bayesian inferences with the uniform prior (Fig. 7) or ML/MAP inferences. Note that theoretical data qualities of Nch>104 are beyond experimentally achievable data qualities. The added vague non-cooperativity prior contribute information to the posterior approximately until Nch=105 as judged by the RMSE and Frobenius norm. For the less vague prior the gain of relevant information lasts for values even higher than Nch=106 as judged by the Frobenius norm, but the RMSE are roughly the same.

2. To what extent can channel properties be assessed against the bias of the no-cooperativity prior?

We test for the different no-cooperativity priors, what typical data quantity is needed such that pK𝒴T supports positive cooperativity (defined as r=k12,true/k54,true>1 Sec. V E). One could also ask at what point becomes the bias of the additional soft coupling by the prior towards no-cooperativity detrimental to the inference because the DGP (Fig. 2) is cooperative with k12,true/k54,true=5/2. There are essentially two categories (negative k12/k54<1 and positive k12/k54>1) and the infinitesimal thin (green) line in between with no–cooperativity k12/k54=1. Let r=k12/k54 be our measure of coorperativity. If

r=1pr𝒴Tdr>0.5 (25)

the data supports positive cooperativity Notably, if r=1pr𝒴Tdr0.5, also a negative cooperativity model is plausible just as positive cooperativity. Following the Bayesian paradigm, we are not looking for any binary significant test result, but embrace the continuous aspect of the question at hand. The inequality 25 is fullfilled if the posterior median (solid lines) holds medpost(r)>1. We contrast the median with HDCIs that tell what is the smallest most probable interval. For skewed posteriors (because of uninformative data), HDCIs might indicate a different cooperativity model than the median. Note that for small Nch we work in a regime where frequentist testing would likely not produce significant results.

For the pure minimally informative prior (Fig. 9 a), and Nch5103,2104 the posterior flucuates between a weak indication based on the median (solid line) that there is positive cooperativity and that both models are equally plausible. In particular, the median is biased towards too small values of r. The 0.5-HDCI (lower limit dashed black lines) is almost entirely smaller than r=1. To have a unbiased median and a 0.5-HDCI that additionally supports qualitatively positive cooperativity one needs at least Nch>5104. In contrast, working with the uniform requires Nch=2.5105.

FIG. 9. The prediction of positive cooperativity (accelerated unbinding) of the unbinding of the second ligand gains certainty with vaguely informative prior assumptions with a bias towards no-cooperativity.

FIG. 9.

On the x-axis we plot Nch as a proxy for the information content in the data originating from CRN2. The dashed lines indicate the lower limit of the 0.5-HDCI, and the solid lines are the medians. The color corresponds to the prior assumptions with more information added from left to right. The set of three [0.99,0.5,0.3]-HDCIs is plotted in each panel. As a guide to the eye to discern the gain of certainty that there is negative cooperativity and the reduction of bias, we replot the median and lower limit of the 0.5-HDCI from the previous panel. For visual clearity we supress the black lines in panel d. a, minimally informative prior. b With an additional diffusion limit assumption. c With additional vague no-cooperativity assumption log10k21/k32~normal(0,0.5), log10k12/k23~normal(0,0.5), log10k45/k32~cauchy(0,1) and log10k54/k23~cauchy(0,1). d With an additional less vague no-cooperativity assumption log10k21/k32~normal(0,0.1), log10k12/k23~normal(0,0.1). A standard deviation of these priors corresponds approximately to 3.1 relative deviation of the corresponding parameters. Further, log10k45/k32~cauchy(0,1) and log10k54/k23~cauchy(0,1).

One may ask where the bias towards too small r values for Nch103,4104 comes from. The normal approximations used to derive the likelihood [38] is justified given the scale of Nch. Thus, we suspect the bias to originate from correlations and a strong practical non-identifiability problem in the likelihood/posterior. Imagine a strong positive correlation after the inference between, e.g., τ1 and τ2 such that one can predict very accurately from τ1 the scale of τ2. The limiting extreme case of such a correlation between two parameters is realized in the structurally non-identifiable problem of the linear birth-death model (Eq. 1). Giving kbirth a log-uniform prior would result in kdeath having a log-uniform prior. Supplying to both parameters log-uniform priors results – after the inference – in an effect of the combined prior on the posterior as kdeath-2 and equivalently kbirth-2. An anticorrelation between the parameters would eliminate the effect of the prior.

Adding the diffusion limit to the posterior (Fig. 9b) extends the region of a visually sufficiently proper posterior (Fig. 8) at least to Nch=102 and decreases uncertainty (difference between black and blue dashed line). But an effect of the capabilities of the median to predict qualitatively positive cooperativity is small or not existent. While not necessarily indicating a wrong model, the median is typically undecided and thus biased to too small r values. Hence, adding the diffusion limit reduces uncertainty of pr𝒴T but does not help to answer the fine-grained question of cooperativity, unless one works with unrealistic data qualities (see uncertainty Fig. 9b above Nch>105).

This changes when adding the vague no-cooperativity prior (Fig. 9 c), which by its definition biases pr𝒴T more around the green line. The median and the HDCIs (see, Fig. 9 c the difference between red and blue lines) are shifted towards rtrue against the bias of the Cauchy prior acting on r. The median now always indicates positive cooperativity. The 0.5-HDCI is, roughly speaking, undecided but much less biased without the vague regularization.

We show in Fig. 9d the effect of decreasing the variance of the prior for the ratios (increasing the bias towards non-cooperatively of the binding and unbinding rates) on the one hand between k12 and k23 and on the other hand between k21 and k32, while the cooperativity priors for ok45 and k32 and k54 and k23, remain the same. This structure vaguely incorporates the prior knowledge that the vertical Markov transitions (Fig. 2 c) in the CRN represent changes in the protein, which might alter the binding and unbinding rates to some amount. A further shift of the lower limit of the 0.5-HDCI (dashed lines) and the median (dotted lines) can be observed. For realistic PC data quality assumptions Nch<104 the prior combining diffusion limits with less vague non-cooperativity assumptions performs the strongest but also supposes the most. Only for high and unrealistic data quality Nch>105 the posteriors without the vague non-cooperativity assumptions (black) seem to have a smaller bias to smaller negative cooperativity.

VI. CONCLUSION

Bayesian inference offers efficacious remedies for practical non-identifiability problems in HMM inference thereby allowing parameter uncertainty quantification for finite data. Nevertheless, pathologies of the likelihood also pose challenges for Bayesian inference.

If little about the actual values of some parameters is known a priori, we show that minimally informative priors are crucial to expand the range of acceptable data quality. They attempt to make posteriors as sensitive to the data as possible, thereby also alleviating practical non-identifiability pathologies in HMMs. The suggested minimally informative prior increases accuracy and decreases uncertainty compared to a uniform prior.

Any prior dominates the posterior in the regions of a constant likelihood value (the essence of non-identifiability). The bias of the uniform prior to larger inverse dwell times or chemical rates combines in an unfortunate way with the practical non-identifiability problem of the likelihood itself. In contrast, the log-uniform part of the minimally informative prior puts equal statistical weight on each decade and thus alleviates this problem. It would also alleviate the problems mentioned in [35].

Notably, we show that the usually arbitrarily chosen simulation box limits determine the posterior on a relevant scale as soon as the simulation box is large enough given that one uses improper prior distributions. The minimally informative prior desensitizes the posterior concerning the sampling box limits. Only under rare conditions if the posterior has a peak close to the true values, that is multiple orders of magnitude higher than the purely prior-dominated parts, then this problem vanishes. This would make it possible to ignore the strictly prior-dominated parts. However, often, the peak will be less dominant. Importantly, if one uses the minimally informative prior for complex CRNs with a high dimensional parameter space, it is much simpler for the adaptive HMC sampler to produce well-mixing (converging) parameter chains, i.e., the samples indicate that the typical set [93] of the posterior was sampled.

We show that, unfortunately, for typical data qualities and quantities and realistic CRNs, further objective or subjective assumptions are necessary to obtain an interpretable and sufficiently proper posterior to overcome the challenges from the practical non-identifiability.

A solution to make the posterior proper is to apply meaningful limits to the relevant parameter subset of the sampling box, thereby reducing the uncertainty. The solution is objective if the limits can be theoretically derived (or are rooted in the physical properties of the molecules). Herein, it is shown that this information fusion from data and prior knowledge creates meaningful inferences with the lowest tested data quality, even for the most complex tested CRN. Nevertheless, derived theoretical limits might only sometimes be at hand, or even after their application, practical non-identifiability problems might remain in parameter dimensions where the upper limits do not apply. Herein, an additional vaguely informative prior on the ratio of some rates -a hyperparameter corresponding to the cooperativity of ligand binding and respective unbinding - is applied. Combing these objective and common sense (biochemical) prior assumptions deliver the best inferences in terms of RMSE and uncertainty of the posterior. This additional prior biases the CRN gently towards CRNs where ligand binding and unbinding of the different channel subunits occur independently but still allow for positive and negative cooperativity over orders of magnitude (depending on the choice of hyperparameters). Hence it is a much less radical assumption compared to the commonly used non-cooperativity assumption [87, 88]. Not using such a prior would mean that one is willing to accept a priori any order of magnitude of cooperativity effects to occur, which is against commonbiochemical experience - i.e. experience-based priors. Thus, extreme effects are only considered if the data is very certain about them. Thus, this prior is an Occam's razor.

Using this prior, even without the physically motivated upper sampling box limits, renders the posterior always proper at least in the relevant dimensions of the parameter space since the prior itself is proper. Notably, using these prior assumptions, one can learn from the HMM inference about negative cooperativity within the CRN with at least 103 times smaller data sets than with plain uniform prior assumptions or ML inferences. The allowed reduction of the data quality by a thousandfold is a prerequisite for inferring HMMs of this complexity scale with real world data.

One could also apply this technique to heteromeric proteins or across homomeric proteins containing mutated binding sites [9496] if the scale of the differences between binding pockets can be coarsely estimated a priori. The more coarse the a priori estimate is the heavier should the tails of the regularization prior be. A Cauchy prior on the logspace provides a heavier tail but is still a proper prior. A detailed study of the different possibilities is out of the scope of this paper.

In a summary, Bayesian inferences provides flexible tools to accommodate for the shortcomings of ML inferences due to omnipresent practical non-identifiability problems of the likelihood. Careful prior elicitation by being minimally informative where one has absolutely no information on the scale of parameters, but being vaguely informative where there is some physical common sense knowledge about some parameters and very informative when there is objective information such as a theoretical upper bound is key get the most of biophysical meaningful HMMs. Using this prior elicitation approach, we where able to obtain meaningful biological insight with a thousand fold lower data quality.

Supplementary Material

1

ACKNOWLEDGMENTS

The authors are grateful to E. Schulz for designing software to simulate channel activity and to Th. Eick for performing simulations. This project was funded by the German Research Foundation (DFG) within Research Group FOR 2518 DynIon (project P2). F. Paul acknowledges funding from the Yen PostDoctoral Fellowship in Interdisciplinary Research and from the National Cancer Institute of the National Institutes of Health (NIH) through Grant CAO93577. M. Habeck acknowledges the Carl Zeiss Foundation funding within the program “CZS Stiftungsprofessure.” We want to thank M. Bücker for helping with the computer cluster at Friedrich Schiller University and Jena and K. Benndorf for their comments on the manuscript.

Contributor Information

Jan L. Münch, Institute of Physiology II, Jena University Hospital, Friedrich Schiller University, Jena 07743, Germany

Ralf Schmauder, Institute of Physiology II, Jena University Hospital, Friedrich Schiller University, Jena 07743, Germany.

Fabian Paul, Department of Biochemistry and Molecular Biology, University of Chicago, Chicago, United States.

Michael Habeck, Microscopic Image Analysis Group, Jena University Hospital, Friedrich Schiller University, Jena 07743, Germany.

References

  • [1].Anderson D. F. and Kurtz T. G., Continuous time markov chain models for chemical reaction networks, in Design and analysis of biomolecular circuits: … (Springer, 2011) pp. 3–42. [Google Scholar]
  • [2].Hille B., Ionic channels in excitable membranes. current problems and biophysical approaches, Biophysical journal 22, 283 (1978). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [3].Colquhoun D., Hawkes G. A., and Bernard K., On the stochastic properties of single ion channels, P. of the Roy. Soc. of London. Series B. Biological Sciences 211, 205 (1981). [DOI] [PubMed] [Google Scholar]
  • [4].Loo D., Hazama A., Supplisson S., TuRK E., and Wright E. M., Relaxation kinetics of the na+/glucose cotransporter., Proceedings of the National Academy of Sciences 90, 5767 (1993). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [5].Eskandari S., Wright E., and Loo D., Kinetics of the reverse mode of the na+/glucose cotransporter, The Journal of membrane biology 204, 23 (2005). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [6].George A. and Zuckerman D. M., From average transient transporter currents to microscopic mechanism–a bayesian analysis, bioRxiv, 2023 (2023). [DOI] [PubMed] [Google Scholar]
  • [7].Milescu L. S., Yildiz A., Selvin P. R., and Sachs F., Maximum likelihood estimation of molecular motor kinetics from staircase dwell-time sequences, Biophysical journal 91, 1156 (2006). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [8].Müllner F. E., Syed S., Selvin P. R., and Sigworth F. J., Improved hidden markov models for molecular motors, part 1: basic theory, Biophysical journal 99, 3684 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [9].Syed S., Müllner F. E., Selvin P. R., and Sigworth F. J., Improved hidden markov models for molecular motors, part 2: extensions and application to experimental data, Biophysical journal 99, 3696 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [10].Chodera J. D., Elms P., Noe F., Keller B., Kaiser C. M., Ewall-Wice A., Marqusee S., Bustamante C., and Hinrichs N. S., Bayesian hidden markov model analysis of single-molecule force spectroscopy: Characterizing kinetics under measurement uncertainty (2011), arXiv:1108.1430 [cond-mat.stat-mech].
  • [11].Keller B. G., Kobitski A., Ja?schke A., Nienhaus G. U., and Noe? F., Complex rna folding kinetics revealed by single-molecule fret and hidden markov models, Journal of the American Chemical Society 136, 4534 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [12].Rosales R. A., Fitzgerald W. J., and Hladky S. B., Kernel estimates for one-and two-dimensional ion channel dwell-time densities, Biophysical journal 82, 29 (2002). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [13].Qin F. and Li L., Model-based fitting of single-channel dwell-time distributions, Biophysical journal 87, 1657 (2004). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [14].Shelley C. and Magleby K. L., Linking exponential components to kinetic states in markov models for single-channel gating, The Journal of general physiology 132, 295 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [15].Baum L. E. and Petrie T., Statistical inference for probabilistic functions of finite state markov chains, The annals of mathematical statistics 37, 1554 (1966). [Google Scholar]
  • [16].Rabiner L. R., A tutorial on hidden Markov models and selected applications in speech recognition, Proc. of the IEEE 77, 257 (1989). [Google Scholar]
  • [17].Chung S.-H., Moore J. B., Xia L., Premkumar L., and Gage P. W., Characterization of single channel currents using digital signal processing techniques based on hidden markov models, Philos. T. of the Roy. Soc. of Lond. Series B Bio. Sci. 329, 265 (1990). [DOI] [PubMed] [Google Scholar]
  • [18].Albertsen A. and Hansen U.-P., Estimation of kinetic rate constants from multi-channel recordings by a direct fit of the time series, Biophysical journal 67, 1393 (1994). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [19].Qin F., Auerbach A., and Sachs F., Hidden Markov Modeling for Single Channel Kinetics with Filtering and Correlated Noise, Biophys J. 79, 1928 (2000). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [20].de Gunst M. M., Künsch H., and Schouten J., Statistical analysis of ion channel data using hidden markov models with correlated state-dependent noise and filtering, Journal of the American Statistical Association 96, 805 (2001). [Google Scholar]
  • [21].Rosales R., Stark J. A., Fitzgerald W. J., and Hladky S. B., Bayesian restoration of ion channel records using hidden markov models, Biophysical journal 80, 1088 (2001). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [22].Venkataramanan L. and Sigworth F., Applying hidden Markov models to the analysis of single ion channel activity, Biophys J. 82, 1930 (2002). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [23].Rosales R. A., MCMC for hidden Markov models incorporating aggregation of states and filtering, Bull. Math. Biol. 66, 1173 (2004). [DOI] [PubMed] [Google Scholar]
  • [24].Kinz-Thompson C. D. and Gonzalez R. L. Jr, Increasing the time resolution of single-molecule experiments with bayesian inference, Biophysical journal 114, 289 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [25].Kilic Z., Sgouralis I., and Pressé S., Generalizing hmms to continuous time for fast kinetics: hidden markov jump processes, Biophysical journal 120, 409 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [26].Kilic Z., Sgouralis I., Heo W., Ishii K., Tahara T., and Pressé S., Extraction of rapid kinetics from smfret measurements using integrative detectors, Cell Reports Physical Science 2 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [27].Saurabh A., Fazel M., Safar M., Sgouralis I., and Pressé S., Single-photon smfret. i: Theory and conceptual basis, Biophysical Reports 3, 100089 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [28].Ray K. K., Kinz-Thompson C. D., Fei J., Wang B., Lin Q., and Gonzalez R. L., Entropic control of the free-energy landscape of an archetypal biomolecular machine, Proceedings of the National Academy of Sciences 120, e2220591120 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [29].Kuschke S., Thon S., Sattler C., Schwabe T., Benndorf K., and Schmauder R., camp binding to closed pacemaker ion channels is cooperative, Proceedings of the National Academy of Sciences 121, e2315132121 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [30].Kienker P., Equivalence of aggregated markov models of ion-channel gating, P. of the Roy. Soc. of London. B. Biological Sciences 236, 269 (1989). [DOI] [PubMed] [Google Scholar]
  • [31].Vajda S., Godfrey K. R., and Rabitz H., Similarity transformation approach to identifiability analysis of nonlinear compartmental models, Mathematical biosciences 93, 217 (1989). [DOI] [PubMed] [Google Scholar]
  • [32].Audoly S., Bellu G., D’Angio L., Saccomani M. P., and Cobelli C., Global identifiability of nonlinear models of biological systems, IEEE Transactions on biomedical engineering 48, 55 (2001). [DOI] [PubMed] [Google Scholar]
  • [33].Raue A., Kreutz C., Maiwald T., Bachmann J., Schilling M., Klingmüller U., and Timmer J., Structural and practical identifiability analysis of partially observed dynamical models by exploiting the profile likelihood, Bioinformatics 25, 1923 (2009). [DOI] [PubMed] [Google Scholar]
  • [34].Middendorf T. R. and Aldrich R. W., Structural identifiability of equilibrium ligand-binding parameters, Journal of General Physiology 149, 105 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [35].Raue A., Kreutz C., Theis F. J., and Timmer J., Joining forces of bayesian and frequentist methodology: a study for inference in the presence of non-identifiability, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 371, 20110544 (2013). [DOI] [PubMed] [Google Scholar]
  • [36].Middendorf T. R. and Aldrich R. W., The structure of binding curves and practical identifiability of equilibrium ligand-binding parameters, J. of General Physiology 149, 121 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [37].Delbrück M., Statistical fluctuations in autocatalytic reactions, The Journal of Chemical Physics 8, 120 (1940). [Google Scholar]
  • [38].Münch J. L., Paul F., Schmauder R., and Benndorf K., Bayesian inference of kinetic schemes for ion channels by kalman filtering, Elife 11, e62714 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [39].Gillespie D. T., The chemical langevin equation, The Journal of Chemical Physics 113, 297 (2000). [Google Scholar]
  • [40].Moffatt L., Estimation of Ion Channel Kinetics from Fluctuations of Macroscopic Currents, Biophys J. 93, 74 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [41].Hines K. E., Bankston J. R., and Aldrich R. W., Analyzing Single-Molecule Time Series via Nonparametric Bayesian Inference, Biophys J. 108, 540 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [42].Sgouralis I. and Pressé S., An introduction to infinite hmms for single-molecule data analysis, Biophysical journal 112, 2021 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [43].Sgouralis I. and Pressé S., Icon: an adaptation of infinite hmms for time traces with drift, Biophysical journal 112, 2117 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [44].Ghahramani Z., Learning dynamic bayesian networks, in International School on Neural Networks (Springer, 1997) pp. 168–197. [Google Scholar]
  • [45].Bechhoefer J., Hidden markov models for stochastic thermodynamics, New Journal of Physics 17, 075003 (2015). [Google Scholar]
  • [46].Kalman R. E., A new approach to linear filtering and prediction problems, J. of basic Engineering 82, 35 (1960). [Google Scholar]
  • [47].Kalman R. E. and Bucy R. S., New results in linear filtering and prediction theory, Journal of BAsic Enginheering 83, 95 (1961). [Google Scholar]
  • [48].Komorowski M., Finkenstädt B., Harper C. V., and Rand D. A., Bayesian inference of biochemical kinetic parameters using the linear noise approximation, BMC Bioinformatics 10, 343 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [49].Fink M. and Noble D., Markov models for ion channels: versatility versus identifiability and speed, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 367, 2161 (2009). [DOI] [PubMed] [Google Scholar]
  • [50].Fearnhead P., Giagos V., and Sherlock C., Inference for reaction networks using the linear noise approximation, Biometrics 70, 457 (2014). [DOI] [PubMed] [Google Scholar]
  • [51].Folia M. M. and Rattray M., Trajectory inference and parameter estimation in stochastic models with temporally aggregated data, Statistics and Computing 28, 1053 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [52].Myung I. J., Tutorial on maximum likelihood estimation, Journal of mathematical Psychology 47, 90 (2003). [Google Scholar]
  • [53].Casella G. and Berger R. L., Statistical inference (Cengage Learning, 2021). [Google Scholar]
  • [54].Gelman A., Carlin J. B., Stern H. S., and Rubin D. B., Bayesian data analysis (Chapman and Hall/CRC, 1995). [Google Scholar]
  • [55].Joshi M., Seidel-Morgenstern A., and Kremling A., Exploiting the bootstrap method for quantifying parameter confidence intervals in dynamical systems, Metabolic engineering 8, 447 (2006). [DOI] [PubMed] [Google Scholar]
  • [56].Ball F., Cai Y., Kadane J., and O’hagan A., Bayesian inference for ion–channel gating mechanisms directly from single–channel recordings, using markov chain monte carlo, Proceedings of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences 455, 2879 (1999). [Google Scholar]
  • [57].Gin E., Falcke M., Wagner L. E., Yule D. I., and Sneyd J., Markov chain Monte Carlo fitting of single-channel data from inositol trisphosphate receptors, J. of Theoretical Biology 257, 460 (2009). [DOI] [PubMed] [Google Scholar]
  • [58].Siekmann I., Wagner L. E., Yule D., Fox C., Bryant D., Crampin E. J., and Sneyd J., MCMC Estimation of Markov Models for Ion Channels, Biophys J. 100, 1919 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [59].Siekmann I., Sneyd J., and Crampin E. J., MCMC Can Detect Nonidentifiable Models, Biophys J. 103, 2275 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [60].Hines K. E., A Primer on Bayesian Inference for Biophysical Systems, Biophys J. 108, 2103 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [61].Ball F., MCMC for Ion-Channel Sojourn-Time Data: A Good Proposal, Biophys J. 111, 267 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [62].Wieland F.-G., Hauber A. L., Rosenblatt M., Tönsing C., and Timmer J., On structural and practical identifiability, Current Opinion in Systems Biology 25, 60 (2021). [Google Scholar]
  • [63].Jeffreys H., An invariant form for the prior probability in estimation problems, Proceedings of the Royal Society of London. Series A. Mathematical and Physical Sciences 186, 453 (1946). [DOI] [PubMed] [Google Scholar]
  • [64].Kass R. E. and Wasserman L., Formal rules for selecting prior distributions: A review and annotated bibliography, Journal of the American Statistical Association 435, 1343 (1996). [Google Scholar]
  • [65].Yang R. and Berger J. O., A catalog of noninformative priors, Vol. 2 (Institute of Statistics and Decision Sciences, Duke University Durham, NC, USA, 1996). [Google Scholar]
  • [66].Consonni G., Fouskakis D., Liseo B., and Ntzoufras I., Prior distributions for objective bayesian analysis, Bayesian Analysis 13, 627 (2018). [Google Scholar]
  • [67].Jaynes E. T., Prior probabilities, IEEE Transactions on systems science and cybernetics 4, 227 (1968). [Google Scholar]
  • [68].Gelman A. and Rubin D. B., A single series from the gibbs sampler provides a false sense of security, Bayesian statistics 4, 625 (1992). [Google Scholar]
  • [69].Duane S., Kennedy A. D., Pendleton B. J., and Roweth D., Hybrid monte carlo, Physics letters B 195, 216 (1987). [Google Scholar]
  • [70].Neal R. M. and Neal R. M., Monte carlo implementation, Bayesian learning for neural networks, 55 (1996). [Google Scholar]
  • [71].Hoffman M. D. and Gelman A., The no-u-turn sampler: adaptively setting path lengths in hamiltonian monte carlo., J. of Machine Learning Research 15, 1593 (2014). [Google Scholar]
  • [72].Gelman A., Lee D., and Guo J., Stan: A probabilistic programming language for bayesian inference and optimization, J. of Educational and Behavioral Statistics 40, 530 (2015). [Google Scholar]
  • [73].Betancourt M., A conceptual introduction to hamiltonian monte carlo (2018), arXiv:1701.02434 [stat.ME].
  • [74].Fisher R. A., Theory of statistical estimation, Mathematical Proceedings of the Cambridge Philosophical Society 22, 700–725 (1925). [Google Scholar]
  • [75].Watanabe S., Almost all learning machines are singular, in 2007 IEEE Symposium on Foundations of Computational Intelligence (IEEE, 2007) pp. 383–388. [Google Scholar]
  • [76].Browning A. P., Warne D. J., Burrage K., Baker R. E., and Simpson M. J., Identifiability analysis for stochastic differential equation models in systems biology, Journal of the Royal Society Interface 17, 20200652 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [77].Jahnke T. and Huisinga W., Solving the chemical master equation for monomolecular reaction systems analytically, J. Math. Biol. 54, 1 (2007). [DOI] [PubMed] [Google Scholar]
  • [78].Lam N. N., Docherty P. D., and Murray R., Practical identifiability of parametrised models: A review of benefits and limitations of various approaches, Mathematics and Computers in Simulation 199, 202 (2022). [Google Scholar]
  • [79].Gelman A. and Yao Y., Holes in bayesian statistics, Journal of Physics G: Nuclear and Particle Physics 48, 014002 (2020). [Google Scholar]
  • [80].Nicolai C. and Sachs F., Solving ion channel kinetics with the qub software, Biophysical Reviews and Letters 8, 191 (2013). [Google Scholar]
  • [81].Trendelkamp-Schroer B. and Noé F., Efficient bayesian estimation of markov model transition matrices with given stationary distribution, The Journal of chemical physics 138, 04B612 (2013). [DOI] [PubMed] [Google Scholar]
  • [82].Assoudou S. and Essebbar B., A bayesian model for binary markov chains, International Journal of Mathematics and Mathematical Sciences 2004, 421 (2004). [Google Scholar]
  • [83].Assoudou S. and Essebbar B., A bayesian model for markov chains via jeffrey’s prior, Communications in Statistics - Theory and Methods 32, 2163 (2003). [Google Scholar]
  • [84].Smoluchowski M. v., Versuch einer mathematischen theorie der koagulationskinetik kolloider lösungen, Zeitschrift für physikalische Chemie 92, 129 (1918). [Google Scholar]
  • [85].van Holde K., A hypothesis concerning diffusion-limited protein–ligand interactions, Biophys. Chem 101, 249 (2002). [DOI] [PubMed] [Google Scholar]
  • [86].Newman M. E., Power laws, pareto distributions and zipf’s law, Contemporary physics 46, 323 (2005). [Google Scholar]
  • [87].Lape R., Colquhoun D., and Sivilotti L. G., On the nature of partial agonism in the nicotinic receptor superfamily, Nature 454, 722 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [88].Khadra A., Yan Z., Coddou C., Tomić M., Sherman A., and Stojilkovic S. S., Gating properties of the p2x2a and p2x2b receptor channels: experiments and mathematical modeling, Journal of General Physiology 139, 333 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [89].Koren G., Liman E. R., Logothetis D. E., Nadal-Ginard B., and Hess P., Gating mechanism of a cloned potassium channel expressed in frog oocytes and mammalian cells, Neuron 4, 39 (1990). [DOI] [PubMed] [Google Scholar]
  • [90].Zagotta W. N., Hoshi T., and Aldrich R. W., Shaker potassium channel gating. iii: Evaluation of kinetic models for activation., The Journal of general physiology 103, 321 (1994). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [91].Silva J. and Rudy Y., Subunit interaction determines i ks participation in cardiac repolarization and repolarization reserve, Circulation 112, 1384 (2005). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [92].Beattie K. A., Hill A. P., Bardenet R., Cui Y., Vandenberg J. I., Gavaghan D. J., de Boer T. P., and Mirams G. R., Sinusoidal voltage protocols for rapid characterisation of ion channel kinetics, The Journal of physiology 596, 1813 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [93].Cover T. M., Elements of information theory (John Wiley & Sons, 1999). [Google Scholar]
  • [94].Nache V., Wongsamitkul N., Kusch J., Zimmer T., Schwede F., and Benndorf K., Deciphering the function of the cngb1b subunit in olfactory cng channels, Scientific reports 6, 29378 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [95].Wongsamitkul N., Nache V., Eick T., Hummert S., Schulz E., Schmauder R., Schirmeyer J., Zimmer T., and Benndorf K., Quantifying the cooperative subunit action in a multimeric membrane receptor, Scientific Reports 6, 20974 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [96].Schirmeyer J., Hummert S., Eick T., Schulz E., Schwabe T., Ehrlich G., Kukaj T., Wiegand M., Sattler C., Schmauder R., et al. , Thermodynamic profile of mutual subunit control in a heteromeric receptor, Proceedings of the National Academy of Sciences 118, e2100469118 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [97].Tveito A., Lines G. T., Edwards A. G., and McCulloch A., Computing rates of markov models of voltage-gated ion channels by inverting partial differential equations governing the probability density functions of the conducting and non-conducting states, Mathematical Biosciences 277, 126 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [98].Hines K. E., Middendorf T. R., and Aldrich R. W., Determination of parameter identifiability in nonlinear biophysical models: A Bayesian approach, The J. of General Physiology 143, 401 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [99].Milescu L. S., Akk G., and Sachs F., Maximum Likelihood Estimation of Ion Channel Kinetics from Macroscopic Currents, Biophys J. 88, 2494 (2005). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [100].Kusch J., Biskup C., Thon S., Schulz E., Nache V., Zimmer T., Schwede F., and Benndorf K., Interdependence of Receptor Activation and Ligand Binding in HCN2 Pacemaker Channels, Neuron 67, 75 (2010). [DOI] [PubMed] [Google Scholar]
  • [101].Kreutz C., Raue A., Kaschek D., and Timmer J., Profile likelihood in systems biology, The FEBS journal 280, 2564 (2013). [DOI] [PubMed] [Google Scholar]
  • [102].Datta G. S. and Ghosh M., On the invariance of noninformative priors, The annals of Statistics 24, 141 (1996). [Google Scholar]
  • [103].Berger J. O., Bernardo J. M., and Sun D., Overall objective priors, Bayesian Analysis 10, 189 (2015). [Google Scholar]
  • [104].Gelman A., Rubin D. B., et al. , Inference from iterative simulation using multiple sequences, Statistical science 7, 457 (1992). [Google Scholar]
  • [105].Brooks S. P. and Gelman A., General methods for monitoring convergence of iterative simulations, Journal of computational and graphical statistics 7, 434 (1998). [Google Scholar]
  • [106].Vehtari A., Gelman A., Simpson D., Carpenter B., Bürkner P.-C., et al. , Rank-normalization, folding, and localization: An improved r for assessing convergence of mcmc, Bayesian Analysis 16, 667 (2021). [Google Scholar]
  • [107].Vats D. and Knudson C., Revisiting the gelman–rubin diagnostic, Statistical Science 36, 518 (2021). [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

1

Articles from bioRxiv are provided here courtesy of Cold Spring Harbor Laboratory Preprints

RESOURCES