Skip to main content
UKPMC Funders Author Manuscripts logoLink to UKPMC Funders Author Manuscripts
. Author manuscript; available in PMC: 2023 Aug 16.
Published in final edited form as: Bayesian Anal. 2023 Sep;18(3):807–840. doi: 10.1214/22-BA1327

Combining chains of Bayesian models with Markov melding

Andrew A Manderson *, Robert J B Goudie
PMCID: PMC7614958  EMSID: EMS152034  PMID: 37587923

Abstract

A challenge for practitioners of Bayesian inference is specifying a model that incorporates multiple relevant, heterogeneous data sets. It may be easier to instead specify distinct submodels for each source of data, then join the submodels together. We consider chains of submodels, where submodels directly relate to their neighbours via common quantities which may be parameters or deterministic functions thereof. We propose chained Markov melding, an extension of Markov melding, a generic method to combine chains of submodels into a joint model. One challenge we address is appropriately capturing the prior dependence between common quantities within a submodel, whilst also reconciling differences in priors for the same common quantity between two adjacent submodels. Estimating the posterior of the resulting overall joint model is also challenging, so we describe a sampler that uses the chain structure to incorporate information contained in the submodels in multiple stages, possibly in parallel. We demonstrate our methodology using two examples. The first example considers an ecological integrated population model, where multiple data sets are required to accurately estimate population immigration and reproduction rates. We also consider a joint longitudinal and time-to-event model with uncertain, submodel-derived event times. Chained Markov melding is a conceptually appealing approach to integrating submodels in these settings.

Keywords: Combining models, Markov melding, Bayesian graphical models, Multi-stage estimation, Model/data integration, Integrated population model

1. Introduction

The Bayesian philosophy is appealing in part because the posterior distribution quantifies all sources of uncertainty. However, a joint model for all data and parameters is a prerequisite to posterior inference, and in situations where multiple, heterogeneous sources of data are available, specifying such a joint model is a formidable task. Models that consider such data are necessary to describe complex phenomena at a useful precision. One possible approach begins by specifying individual submodels for each source of data. These submodels could guide the statistician when directly specifying the joint model, but to use the submodels only informally seems wasteful. Instead, it may be preferable to construct a joint model by formally joining the individual submodels together.

Some specific forms of combining data are well established. Meta-analyses and evidence synthesis methods are widely used to summarise data, often using hierarchical models (Ades and Sutton, 2006; Presanis et al., 2014). Outside of the statistical literature, a common name for combining multiple data is data fusion (Lahat et al., 2015; Kedem et al., 2017), though there are many distinct methods that fall under this general name. Interest in integrating data is not just methodological; applied researchers often collect multiple disparate data sets, or data of different modalities, and wish to combine them. For example, to estimate SARS-CoV-2 positivity Donnat et al. (2020) build an intricate hierarchical model that integrates both testing data and self-reported questionnaire data, and Parsons et al. (2021) specify a hierarchical model of similar complexity to estimate the number of injecting drug users in Ukraine. Both applications specify Bayesian models with data-specific components, which are united in a hierarchical manner. In conservation ecology, integrated population models (IPMs) (Besbeas et al., 2002; Brooks et al., 2004; Schaub and Abadi, 2011; Maunder and Punt, 2013; Zipkin and Saunders, 2018) are used to estimate population dynamics, e.g. reproduction and immigration rates, using multiple data on the same population. Such data have standard models associated with them, such as the Cormack-Jolly-Seber model (Lebreton et al., 1992) for capture-recapture data, and the IPM serves as the framework in which the standard models are combined. More generally, the applications we list illustrate the importance of generic, flexible methods for combining data to applied researchers.

Markov melding (Goudie et al., 2019) is a general statistical methodology for combining submodels. Specifically, it considers M submodels that share some common quantity ϕ, with each of the m = 1, …, M submodels possessing distinct parameters ψm, data Ym, and form pm(ϕ, ψm, Ym). Goudie et al. (2019) then propose to combine the submodels into a joint model, denoted pmeld(ϕ, ψ1, …, ψM, Y1, …, YM). However, it is unclear how to integrate models where there is no single quantity ϕ common to all submodels, such as for submodels that are linked in a chain structure.

We propose an extension to Markov melding, which we call chained Markov melding1, which facilitates the combination of M submodels that are in a chain structure. For example, when M = 3 we address the case in which submodel 1 and 2 share a common quantity ϕ1∩2, and submodel 2 and 3 share a different quantity ϕ2∩3. Our extension addresses previously unconsidered complications including the distinct domains (and possibly supports) of the common quantities, and the desire to capture possible prior correlation between them. Two examples serve to illustrate our methodology, which we introduce in the following section. The computational effort required to fit a complex, multi-response model is a burden to the model development process. We propose a multi-stage posterior estimation method that exploits the properties of our chained melded model to reduce this burden. We can parallelise aspects of the computation across the submodels, using less computationally expensive techniques for some submodels. Reusing existing software implementations of submodels, and subposterior samples where available, is also possible. Multi-stage samplers can aid in understanding the contribution of each submodel to the final posterior, and are used in many applied settings, including hierarchical modelling (Lunn et al., 2013) and joint models (Mauff et al., 2020).

One contribution of our work is to clarify the informal process commonly used in applied analyses of summarising and/or approximating submodels for use in subsequent analyses. The two most common approximation strategies seem to be (i) approximating the subposterior of the common quantity with a normal distribution for use in subsequent models (see, e.g. Jackson and White, 2018; Nicholson et al., 2021) and (ii) taking only a point estimate of the subposterior, and treating it as a known value in further models. These strategies may, but not always, produce acceptable approximations to the chained melded model. Both the chained melded model and these approximation strategies are examples of ‘multi-phase’ and ‘multi-source’ inference (Meng, 2014), with the melding approach most comprehensively accounting for uncertainty.

1.1. Example introduction

In this section we provide a high-level overview of two applications that require integrating a chain of submodels, with more details in Sections 4 and 5. Our first example decomposes a joint model into its constituent submodels and rejoins them. This simple situation allows us to compare the output from the chained melding process to the complete joint model, and is meant to illustrate both the ‘chain-of-submodels’ notion and the mechanics of chained melding. The second example is a realistic and complex setting in which the combining of submodels without chained Markov melding is nonobvious. Our comparator is the common technique of summarising previously considered submodels with point estimates, and demonstrates the importance of fully accounting for uncertainty.

An integrated population model for little owls

Integrated population models (IPMs) (Zipkin and Saunders, 2018) combine multiple data to estimate key quantities governing the dynamics of a specific population. Schaub et al. (2006) and Abadi et al. (2010) used an IPM to estimate fecundity, immigration, and yearly survival rates for a population of little owls. These authors collect and model three types of data, illustrated in Figure 1. Capture-recapture data Y1, and associated capture-recapture submodel p1(ϕ1∩2, ψ1, Y1), are acquired by capturing and tagging owls each year, and then counting the number of tagged individuals recaptured in subsequent years. Population counts Y2 are obtained by observing the number of occupied nesting sites, and are modelled in p2(ϕ1∩2, ϕ2∩3, ψ2, Y2). Finally, nest-record data Y3 counts both the number of reproductive successes and possible breading pairs, and is associated with a submodel for fecundity p3(ϕ2∩3, ψ3, Y3). The population count model p2 shares the parameter ϕ1∩2 with the capture-recapture model p1, and the parameter ϕ2∩3 with the fecundity model p3; each of the m = 1, 2, 3 submodels has distinct, submodel-specific parameters ψm. No single source of data is sufficient to estimate all quantities of interest, so it is necessary to integrate the three submodels into a single joint model to produce acceptably precise estimates of fecundity and immigration rates. We will show that the chained Markov melding framework developed in Section 2 encapsulates the process of integrating these submodels, producing results that are concordant with the original joint IPM.

Figure 1.

Figure 1

A simplified DAG of the integrated population model (IPM) for the little owls. The capture-recapture submodel (p1) is surrounded by the blue line, the population count submodel (p2) by the black line, and the fecundity submodel (p3) by the red line. The capture-recapture and population count submodels share parameters affecting the juvenile and adult survival rate (ϕ1∩2), whilst the parameter for fecundity is common to both the population count and fecundity submodels (ϕ2∩3). The combination of all the submodels forms the IPM.

Survival analysis with time varying covariates and uncertain event times

Our second example considers the time to onset of respiratory failure (RF) amongst patients in intensive care units, and factors that influence the onset of RF. A patient can be said to be experiencing RF if the ratio of the partial pressure of arterial blood oxygen (PaO2) to the faction of inspired oxygen (FiO2) is less than 300mmHg (The ARDS Definition Task Force, 2012), though this is not the only definition of RF. Patients’ PaO2/FiO2 (P/F) ratios are typically measured only a few times a day. The relative infrequency of P/F ratio data, when combined with the intrinsic variability in each individual’s blood oxygen level, results in significant uncertainty in about the time of onset of RF.

Factors that influence the time to onset of RF are both longitudinal and time invariant. Both types of data can be considered in joint models (Rizopoulos, 2012), which are composed of two distinct submodels, one for each data type. However, existing joint models are not able to incorporate the uncertainty surrounding the event time, which may result in overconfident and/or biased estimates of the parameters in the joint model.

Chained Markov melding offers a conceptually straightforward, Bayesian approach to incorporating uncertain event times into joint models. Specifically, we consider the event time as a submodel-derived quantity from a hierarchical regression model akin to Lu and Meeker (1993). We call this submodel the uncertain event time submodel and denote it p1(ϕ1∩2, ψ1, Y1), where ϕ1∩2 incorporates the event time. The survival submodel p2(ϕ1∩2, ϕ2∩3, ψ2, Y2) uses the event time within ϕ1∩2, the common quantity, as the response. We treat the longitudinal submodel, p3(ϕ2∩3, ψ3, Y3), separately from the survival submodel, as is common in two-stage joint modelling (Mauff et al., 2020), and denote the subject-specific parameters that also appear in the survival model as ϕ2∩3. Each of the m = 1, 2, 3 has submodel-specific data Ym and parameters ψm. The high level submodel relationships are displayed as a DAG in Figure 2.

Figure 2.

Figure 2

A simplified DAG of the submodels considered in the survival analysis example. The event time submodel p1 defines the event time ϕ1∩2 as noninvertible function of the other model parameters (denoted by the dotted line), whilst the survival submodel p2 considers ϕ1∩2 as the response. The longitudinal submodel p3 has parameters ϕ2∩3 in common with the survival submodel.

It is in examples such as this one that we foresee the most use for chained Markov melding; a fully Bayesian approach is desired and the submodels are nontrivial in complexity, with no previously existing or obvious joint model.

1.2. Markov melding

We now review Markov melding (Goudie et al., 2019) before detailing our proposed extension. As noted in the introduction, Markov melding is a method for combining M submodels p1(ϕ, ψ1, Y1), …, pM (ϕ, ψM, YM) which share the same ϕ. When the sub-model prior marginals pm(ϕ) are identical, i.e. pm(ϕ) = p(ϕ) for all m, it is possible to combine the submodels using Markov combination (Dawid and Lauritzen, 1993; Massa and Lauritzen, 2010)

pcomb(ϕ,ψ1,,ψM,Y1,,YM)=p(ϕ)m=1Mpm(ψm,Ymϕ),=m=1Mpm(ϕ,ψm,Ym)p(ϕ)M1. (1)

Markov combination is not immediately applicable when submodel prior marginals are distinct, so Goudie et al. define a marginal replacement procedure, where individual submodel prior marginals are replaced with a common marginal ppool(ϕ) = h(p1(ϕ), …, pM(ϕ)) which is the result of a pooling function h that appropriately summarises all prior marginals (the choice of which is described below). The result of marginal replacement is

prepl,m(ϕ,ψm,Ym)=ppool(ϕ)pm(ϕ,ψm,Ym)pm(ϕ). (2)

Goudie et al. show that prepl,m(ϕ, ψm, Ym) minimises the Kullback–Leibler (KL) divergence between a distribution q(ϕ, ψm, Ym) and pm(ϕ, ψm, Ym) under the constraint that q(ϕ) = ppool(ϕ), and that marginal replacement is valid when ϕ is a deterministic function of the other parameters in submodel m. Markov melding joins the submodels via the Markov combination of the marginally replaced submodels

pmeld(ϕ,ψ1,,ψM,Y1,,YM)=ppool(ϕ)m=1Mprepl,m(ψm,Ymϕ),=ppool(ϕ)m=1Mpm(ϕ,ψm,Ym)pm(ϕ). (3)

Pooled prior

Goudie et al. proposed forming ppool(ϕ) using linear or logarithmic prior pooling (O’Hagan et al., 2006; Genest et al., 1986)

ppool,lin(ϕ)=1Klin(λ)m=1Mλmpm(ϕ),Klin(λ)=m=1Mλmpm(ϕ)dϕ, (4)
ppool,log(ϕ)=1Klog(λ)m=1Mpm(ϕ)λm,Klog(λ)=m=1Mpm(ϕ)λmdϕ, (5)

where λ = (λ1, …, λM) are nonnegative weights, which are chosen subjectively to ensure ppool(ϕ) appropriately represents prior knowledge about the common quantity. Two special cases of pooling are of particular interest. Product of experts (PoE) pooling (Hinton, 2002) is a special case of logarithmic pooling that occurs when λm = 1 for all m. Dictatorial pooling is a special case of either pooling method when λm′ = 1 and, for all mm, λm = 0.

2. Chained model specification

Consider m = 1, …, M submodels each with data Ym and parameters θm denoted pm(θm, Ym), with M ≥ 3. We assume that the submodels are connected in a manner akin to a chain and so can be ordered such that only ‘adjacent’ submodels in the chain have parameters in common. Specifically we assume that submodels m and m + 1 have some parameter ϕmm+1 in common for m = 1, …, M − 1. For notational convenience define ϕ1 = ϕ1∩2, ϕM = ϕM−1∩M and ϕm = (ϕm1∩m, ϕmm+1) for m = 2, …, M −1, so that ϕm ⊆ θm denotes the parameters in model m shared with another submodel. The submodel-specific parameters of submodel m are thus ψm = θm \ ϕm. Define the vector of all common quantities ϕ=m=1Mϕm=(ϕ12,ϕ23,,ϕM1M) so that all elements in ϕ are unique. Further denote by ϕ−m the subvector of ϕ excluding the mth element. It will also be convenient to define ψ = (ψ1, …, ψM) and likewise Y = (Y1, …, YM). Note that all components of ϕ, ψ and Y may themselves be multivariate. Additionally, because ϕm∩m+1 may be a deterministic function of either θm or θm+1 we refer to ϕmm+1 as a common parameter or a common quantity as appropriate.

All submodels, and marginal and conditional distributions thereof, have density functions that are assumed to exist and integrate to one. When considering conditional distributions we assume that the parameter being conditioned on has support in the relevant region. We define the mth subposterior as pm(ϕm, ψm | Ym).

2.1. Extending marginal replacement

We now define the chained melded model by extending the marginal replacement procedure to submodels linked in a chain-like way. The proposed chained marginal replacement operation modifies the submodels to enforce a common prior for ϕ. This consistency allows us to employ Markov combination to unite the submodels.

Specifically, the mth marginally replaced submodel is

prepl,m(ϕ,ψm,Ym)=ppool(ϕ)pm(ψm,Ymϕ)=ppool(ϕ)pm(ϕm,ψm,Ym)pm(ϕm), (6)

where ppool(ϕ) = g(p1(ϕ1), p2(ϕ2), …, pm(ϕm)) is a pooling function that appropriately summarises all submodel prior marginals. The second equality in Equation (6) is because of the conditional independence (ψm, Ymϕm) | ϕm that exists due to the chained relationship between submodels. It is important to note that prepl,m(ϕ, ψm, Ym) is defined on a larger parameter space than pm(ϕm, ψm, Ym), as it includes ϕ−m.

Define prepl,m(ϕm, ψm, Ym) = ∫ prepl,m(ϕ, ψm, Ym)dϕ−m. Each marginally replaced submodel, as defined in Equation (6), minimises the following KL divergence2

prepl,m(ϕm,ψm,Ym)=argminq{DKL(q||pm)q(ϕm)=ppool(ϕm)forallϕm}, (7)

where ppool(ϕm) = ∫ ppool(ϕ)dϕm. We can thus interpret prepl,m(ϕm, ψm, Ym) as a minimally modified pm(ϕm, ψm, Ym) which admits ppool(ϕm) as a marginal. Note that it is the combination of prepl,m(ϕm, ψm, Ym) and ppool(ϕ−m | ϕm) that uniquely determine (6).

We form the chained melded model by taking the Markov combination of the marginally replaced submodels

pmeld(ϕ,ψ,Y)=ppool(ϕ)m=1Mprepl,m(ψm,Ymϕ), (8)
=ppool(ϕ)m=1Mpm(ϕm,ψm,Ym)pm(ϕm). (9)

Rewriting (9) in terms of ϕmm+1 for m = 1, …, M − 1 yields

pmeld(ϕ,ψ,Y)=ppool(ϕ)p1(ϕ12,ψ1,Y1)p1(ϕ12)pM(ϕM1M,ψM,YM)pM(ϕM1M),×m=2M1(pm(ϕm1m,ϕmm+1,ψm,Ym)pm(ϕm1m,ϕmm+1)). (10)

Finally, we use chained melded posterior pmeld(ϕ, ψ | Y) ∝ pmeld(ϕ, ψ, Y) to refer to posterior of the chained melded model conditioned on all data.

2.2. Pooled prior

Specifying (9) requires a joint prior for ϕ. As in Markov melding we form the joint prior by pooling the marginal priors, selecting a pooling function g that appropriately represents prior knowledge about the common quantities. We define ppool(ϕ) as a generic function of all prior marginals

ppool(ϕ)=g(p1(ϕ1),p2(ϕ2),,pM(ϕM)), (11)
=g(p1(ϕ12),p2(ϕ12,ϕ23),,pM(ϕM1M)), (12)

because we do not always wish to assume independence between the components of ϕ.

Two special cases of Equation (12) are noteworthy. Firstly, if all components of ϕ are independent, then we can form ppool(ϕ) as the product of M − 1 standard pooling functions hm defined in Section 1.2

ppool(ϕ)=m=1M1ppool,m(ϕmm+1), (13)
ppool,m(ϕmm+1)=hm(pm(ϕmm+1),pm+1(ϕmm+1)). (14)

A second case, in between complete dependence (12) and independence (14), is that if pm(ϕm−1∩m, ϕm∩m+1) = pm(ϕm−1∩m)pm(ϕm∩m+1) then we can define

ppool(ϕ)=g1(p1(ϕ12),,pm(ϕm1m))g2(pm(ϕmm+1),,pM(ϕM)), (15)

without any additional assumptions. That is, if any two consecutive components of ϕ are independent in the submodel containing both of them, we can divide the pooled prior specification problem into two pooling functions. The smaller number of arguments to g1 and g2 make it easier to choose appropriate forms for those functions.

Selecting a specific form of g is not trivial given the many choices of functional form and pooling weights (the latter of which we discuss momentarily). One complication is that standard linear and logarithmic pooling, as defined in Equations (4) and (5), are not immediately applicable when the submodel marginal distributions consider different quantities. We now propose extensions to logarithmic, linear, and dictatorial pooling for use in the case of chained melding.

Chained logarithmic pooling

Extending logarithmic pooling for chained Markov melding is straightforward. We define the logarithmically pooled prior to be

ppool,log(ϕ)=1Klog(λ)m=1Mpm(ϕm)λm, (16)

with Klog(λ)=m=1Mpm(ϕm)λmdϕ for nonnegative weight vector λ = (λ1, …, λM) and m=1Mλm1. Note that (16) does not imply independence between the elements of ϕ because

m=1Mpm(ϕm)λm=p1(ϕ12)λ1m=2M1(pm(ϕm1m,ϕmm+1)λm)pM(ϕM1M)λM. (17)

When λ1 = λ2 = … = λM = 1 we obtain a special case which we call product-of-experts (PoE) pooling (Hinton, 2002).

Chained linear pooling

Our generalisation of linear pooling to handle marginals of different quantities is a two step procedure. The first step forms intermediary pooling densities via standard linear pooling, using appropriate marginals of the relevant quantity

ppool,m(ϕmm+1)λm,1pm(ϕmm+1)+λm,2pm+1(ϕmm+1), (18)

where λm = (λm,1, λm,2) are nonnegative pooling weights, and for m = 2, …, M − 1

pm(ϕmm+1)=pm(ϕm1m,ϕmm+1)dϕm1m. (19)

For m = 1 and m = M the relevant marginals are p1(ϕ1∩2) and pM(ϕM−1∩M). In step two we form the pooled prior as the product of the intermediaries

ppool,lin(ϕ)=1Klin(λ)m=1M1ppool,m(ϕmm+1), (20)

with Klin(λ)=m=1M1ppool,m(ϕmm+1)dϕ, for λ = (λ1, …, λM). Clearly, this assumes prior independence amongst all components of ϕ which may be undesirable, particularly if this independence was not present under one or more of the submodel priors. We discuss extensions to linear pooling that enable prior dependence between the components of ϕ in Section 6.

Dictatorial pooling

Chained Markov melding does not admit a direct analogue to dictatorial pooling as defined in Section 1.2 because not all submodel prior marginals contain all common quantities. For example, consider the logarithmically pooled prior of Equation (16) with, say, the mth entry in λ set to 1 and all others set to 0. This choice of λ results in ppool(ϕ) = p(ϕm), which is flat for ϕ−m. It seems reasonable to require any generalisation of dictatorial pooling to result in a reasonable prior for all components in ϕ. Such a generalisation should also retain the original intention of dictatorial pooling, i.e. ‘the authoritative prior for ϕm is pm(ϕm)’.

We propose two possible forms of dictatorial pooling that satisfy the aforementioned criteria. Partial dictatorial pooling enforces a single submodel prior for the relevant components of ϕ, with no restrictions on the pooling of the remaining components; and complete dictatorial pooling which requires selecting one of the two possible submodel priors for each component of ϕ.

Partial dictatorial pooling considers pm(ϕm) as the authoritative prior for ϕm = (ϕm−1∩m, ϕmm+1). This results in,

ppool,dict(ϕ)=g1(p1(ϕ12),,pm1(ϕm2m1))×pm(ϕm1m,ϕmm+1)×g2(pm+1(ϕm+1m+2),,pM(ϕM1M)), (21)

where g1 and g2 are linear or logarithmic pooling functions as desired3.

Complete dictatorial pooling requires the marginal pooled prior for each component in ϕ to be chosen solely on the basis of only one of the two priors specified for it under the submodels. For m = 1, …, M − 1, the mth marginal of the pooled prior is either

ppool,dict(ϕmm+1){pm(ϕmm+1)orpm+1(ϕmm+1). (22)

If two consecutive marginals are chosen to have the same submodel prior, then we wish to retain the dependence between ϕm−1∩m and ϕm∩m+1 present in pm. We thus redefine consecutive terms so that

ppool,dict(ϕm1m)ppool,dict(ϕmm+1)=pm(ϕm1m)pm(ϕmm+1)(FROMEQ.(22);ppool,dict(ϕm1m)ppool,dict(ϕmm+1)pm(ϕm1m,ϕmm+1).(REDEFINED) (23)

The complete dictatorially pooled prior is thus

ppool,dict(ϕ)=m=1M1ppool,dict(ϕmm+1), (24)

where, subject to the potential modification in Equation (23), the terms in the product are as defined in Equation (22). For example, if M = 5 and we wish to ignore p2 and p4 when constructing the pooled prior and instead associate ϕ1∩2 with p1, both ϕ2∩3 and ϕ3∩4 with p3, and ϕ4∩5 with p5, then

ppool,dict(ϕ)=p1,dict(ϕ12)P3,dict(ϕ23)p3,dict(ϕ34)p5,dict(ϕ45)APPLYEQ.(23)=p1(ϕ12)p3(ϕ23,ϕ34)p5(ϕ45). (25)

Pooling weights

Choosing values for the pooling weights is an important step in specifying the pooled prior (Carvalho et al., 2022; Abbas, 2009; Rufo et al., 2012a,b). Because appropriate values for the weights depend on the submodels being pooled and the information available a priori, universal recommendations are impossible, so we illustrate the impact of different choices in a straightforward example. It is important that prior predictive visualisations of the pooled prior are produced (Gabry et al., 2019; Gelman et al., 2020) to guide the choice of pooling weights and ensure that the result suitably represents the available information. Figure 3 illustrates how λ and the choice of pooling method impacts ppool(ϕ) when pooling normal distributions. Specifically, we consider M = 3 submodels and pool

p1(ϕ12)=N(ϕ12;μ1,σ12),p3(ϕ23)=N(ϕ23;μ3,σ32),p2(ϕ12,ϕ23)=N([ϕ12ϕ23];[μ2,1μ2,2],[σ22ρσ22ρσ22σ22]), (26)

where N(ϕ; μ, σ2) is the normal density function with mean μ and variance σ2 (or covariance matrix where appropriate). The two dimensional density function p2 has an additional parameter ρ, which controls the intra-submodel marginal correlation. We set μ1 = −2.5, μ2 = [μ2,1 μ2,2]′ = [0 0]′, μ3=2.5,σ12=σ22=σ32=1 and ρ = 0.8. In the logarithmic case we set λ1 = λ3 and parameterise λ2 = 1−2λ1, so that λ1 +λ2 +λ3 = 1 whilst limiting ourselves to varying only λ1. Similarly, in the linear case we set λ1,1 = λ2,2 = λ1 and λ1,2 = λ2,1 = 1 2λ1. We consider 5 evenly spaced values of λ1 [0, 0.5].

Figure 3.

Figure 3

Contour plots of ppool(ϕ) (red) under logarithmic and linear pooling (left and right column respectively). The three original densities p1(ϕ1∩2), p3(ϕ2∩3) and p2(ϕ1∩2, ϕ2∩3) are shown in blue, with the univariate densities shown on the appropriate axis. The pooling weight parameter λ1 is indicated in the plot titles.

For both pooling methods, as the weight λ1 associated with models p1 and p3 increases, the relative contributions of p1(ϕ1∩2) and p3(ϕ2∩3) increase. Note the lack of correlation in ppool under linear pooling (right column of Figure 3) due to Equation (20). A large, near-flat plateau is visible in the λ1 = 0.25 and λ1 = 0.375 cases, which is a result of the mixture of four, 2-D normal distributions that linear pooling produces in this example. The logarithmic pooling process produces a more concentrated prior for small values of λ1, and does not result in a priori independence between ϕ1∩2 and ϕ2∩3. Appendix A shows analytically that λ2 controls the quantity of correlation present in ppool in this setting.

3. Posterior estimation

We now present a multi-stage MCMC method for generating samples from the melded posterior. Whilst the melded posterior is a standard Bayesian posterior and so can, in principle, be targeted using any suitable Monte Carlo method, in practice this may be cumbersome or infeasible. More specifically, it may be feasible to fit each submodel separately using standard methods, but when the submodels are combined – either through Markov melding, or by expanding the definition of one submodel to include another – the computation required to estimate the posterior in a single step poses an insurmountable barrier. In such settings we can employ multi-stage posterior estimation methods including Tom et al. (2010), Lunn et al. (2013), Hooten et al. (2019), and Mauff et al. (2020). We propose a multi-stage strategy that uses the chain-like relationship to both avoid evaluating all submodels simultaneously, and parallelise the computation required in the first stage to produce posterior samples in less time than an equivalent sequential method4. Avoiding concurrently evaluating all submodels also enables the reuse of existing software, minimising the need for custom submodel and/or sampler implementations.

We also describe an approximate method, where stage one submodels are summarised by normal distributions for use in stage two.

We consider the M = 3 case, as this setting includes both of our examples. Our approach can be extended to M > 3 settings, although we anticipate that it is unlikely to be suitable for large M settings. We discuss some of difficulties associated with generic, parallel methodology for efficient posterior sampling in Section 6.

3.1. Parallel sampler

Our proposed strategy involves obtaining in stage one samples from submodels 1 and 3 in parallel. Stage two reuses these samples in a Metropolis-within-Gibbs sampler, which targets the full melded posterior. The stage specific targets are displayed in Figure 4.

Figure 4.

Figure 4

A graphical depiction of the submodels and their shared quantities, with the parallel sampling strategy overlaid. The stage one (s1) targets are surrounded by blue dashed lines, with the stage two (s2) target in red.

The parallel sampler assumes that the pooled prior decomposes such that

ppool(ϕ)=ppool,1(ϕ12)ppool,2(ϕ12,ϕ23)ppool,3(ϕ23). (27)

All pooled priors trivially satisfy (27) by assuming ppool,1(ϕ1∩2) and ppool,3(ϕ2∩3) are improper and/or flat distributions. Alternatively we may choose ppool,1(ϕ1∩2) = p1(ϕ1∩2) and ppool,3(ϕ2∩3) = p3(ϕ2∩3), with appropriate adjustments to ppool,2(ϕ1∩2, ϕ2∩3). This choice targets, in stage one, the subposteriors of p1 and p3 under their original prior for ϕ1∩2 and ϕ2∩3 respectively.

Stage one

Two independent, parallel sampling processes occur in stage one. Terms from the melded model that pertain to p1 and p3 are isolated

pmeld,1(ϕ12,ψ1Y1)ppool,1(ϕ12)p1(ϕ12,ψ1,Y1)p1(ϕ12), (28)
pmeld,3(ϕ23,ψ3Y3)ppool,3(ϕ23)p3(ϕ23,ψ3,Y3)p3(ϕ23), (29)

and targeted using standard MCMC methodology. Assuming that the stage one chains converge and after discarding warmup iterations –possibly thinning them, if within-chain correlation is high– we obtain N1 samples from {(ϕ12,ψ1)n}n=1N1 from pmeld,1(ϕ1∩2, ψ2 | Y1), and N3 samples {(ϕ23,ψ3)n}n=1N3 from pmeld,3(ϕ2∩3, ψ3 | Y3). For well mixing stage one Markov chains targeting the correct stationary distribution, and large values of N1 or N3, the stage one samples accurately approximate the subposteriors.

Stage two

Stage two targets the melded posterior of Equation (9) using a Metropolis-within-Gibbs sampler, where the proposal distributions are

ϕ12,ψ1ϕ23,ψ2,ψ3pmeld,1(ϕ12,ψ1Y1), (30)
ϕ23,ψ3ϕ12,ψ1,ψ2pmeld,3(ϕ23,ψ3Y3), (31)
ψ2*|ϕ12,ϕ23,ψ1,ψ3q(ψ2*|ψ2), (32)

where q(ψ2ψ2) is a generic proposal distribution for ψ2. We draw an index n1 uniformly from {1, …, N1} and use the corresponding value (ϕ12,ψ1)n1 as the proposal, doing likewise for n3 and (ϕ23,ψ3)n3. The acceptance probabilities for these updates are

α((ϕ12,ψ1)n1,(ϕ12,ψ1)n1)=ppool,2(ϕ12,ϕ23)p2(ϕ12,ϕ23,ψ2,Y2)p2(ϕ12,ϕ23)ppool,2(ϕ12,ϕ23)p2(ϕ12,ϕ23,ψ2,Y2)p2(ϕ12,ϕ23), (33)
α((ϕ23,ψ3)n3,(ϕ23,ψ3)n3)=ppool,2(ϕ12,ϕ23)p2(ϕ12,ϕ23,ψ2,Y2)p2(ϕ12,ϕ23)ppool,2(ϕ12,ϕ23)p2(ϕ12,ϕ23,ψ2,Y2)p2(ϕ12,ϕ23), (34)
α(ψ2,ψ2)=p2(ϕ12,ϕ23,ψ2,Y2)p2(ϕ12,ϕ23,ψ2,Y2)q(ψ2ψ2)q(ψ2ψ2), (35)

where α(x, z) denotes the probability associated with a move from z to x. Note that all stage two acceptance probabilities only contain terms from the second submodel and the pooled prior, and thus do not depend on ψ1 or ψ3. If a move is accepted then we also store the index, i.e. n1 or n3, associated with the move, otherwise we store the current value of the index. The stored indices are used to appropriately resample ψ1 and ψ3 from the stage one samples.

3.2. Normal approximations to submodel components

Normal approximations are commonly employed to summarise submodels for subsequent use in more complex models. For example, two-stage meta-analyses often use a normal distribution centred on each studies’ effect estimate (Burke et al., 2017). Suppose we employ such an approximation to summarise the prior and posterior of ϕ1∩2 and ϕ2∩3 under p1 and p3 respectively. In addition, assume that (a) such approximations are appropriate for p1(ϕ1∩2), p1(ϕ1∩2 | Y1), p3(ϕ2∩3), and p3(ϕ2∩3 | Y3), (b) we are not interested in ψ1 and ψ3, and can integrate them out of all relevant densities, and (c) we employ our second form of dictatorial pooling and choose p2(ϕ1∩2, ϕ2∩3) as the authoritative prior. The latter two assumptions imply that the melded posterior of interest is proportional to

pmeld(ϕ12,ϕ23,ψ2Y)p1(ϕ12Y1)p1(ϕ12)p2(ϕ12,ϕ23,ψ2Y2)p3(ϕ23Y3)p3(ϕ23). (36)

Denote the normal approximation of p1(ϕ1∩2 | Y1) as P^1(ϕ12μ^1,^1) which is a normal distribution with mean μ^1 and covariance matrix ^1. The corresponding normal approximation of the prior p1(ϕ1∩2) is P^1(ϕ12μ^1,0,^1,0). The equivalent approximations for the subposterior and prior of p3 are P^3(ϕ23μ^3,^3) and p^3(ϕ23μ^3,0,^3,0) respectively. Substituting in the approximations and using standard results for Gaussian density functions (see Bromiley (2003) and Appendix C) results in

p^meld(ϕ12,ϕ23,ψ2Y)p^((ϕ12,ϕ23)μ^,^)p2(ϕ12,ϕ23,ψ2Y2), (37)

where

μ^nu=[μ^1μ^3],^nu=[^100^3],μ^de=[μ^1,0μ^3,0],^de=[^1,000^3,0],^=(^nu1^de1)1,μ^=^(^nu1μ^nu^de1μ^de). (38)

Standard MCMC methods can be used to sample from the approximate melded posterior. If instead we opt for product-of-experts pooling, all μ^de and ^de terms disappear from the parameter definitions in Equation (38).

4. An integrated population model for little owls

We now return to the integrated population model (IPM) for the little owls introduced in Section 1.1. Finke et al. (2019) consider a number of variations on the original model of Schaub et al. (2006) and Abadi et al. (2010): here we consider only the variant from Finke et al. (2019) with the highest marginal likelihood (Model 4 of their online supplement). This example is particularly interesting to us as, for a certain choice of pooling function and pooling weights, the chained Markov melded model and the IPM are identical. This coincidence allows us to use the posterior from the IPM as a benchmark for our multi-stage sampler.

Before we detail the specifics of each submodel, we must introduce some notation. Data and parameters are stratified into two age-groups a ∈ {J, A} where J denotes juvenile owls (less than one year old) and A adults, two sexes s ∈ {M, F}, and observations occur annually at times t ∈ {1, …, T }, with T = 25. The sex- and age-specific probability of an owl surviving from time t to t + 1 is δa,s,t, and the sex-specific probability of a previously captured owl being recaptured at time t + 1 is πs,t+1 so long as the owl is alive at time t + 1.

4.1. Capture recapture: p1

Capture-recapture data pertain to owls that are released at time t (having been captured and tagged), and then recaptured at time u = t + 1, …, T, or not recaptured before the conclusion of the study, in which case u = T + 1. Define Ma,s,t,u as the number of owls of age-group a and sex s released at time t and recaptured at time u. We aggregate these observations into age- and sex-specific matrices Ma,s, with T rows, corresponding to release times, and T +1 columns, corresponding to recapture times. Let Ra,s,t=u=1T+1Ma,s,t,u be the number of owls released at time t, i.e. a vector containing the row-wise sum of the entries in Ma,s. The recapture times for owls released at time t follow an age- and sex-specific multinomial likelihood

(Ma,s,t,1,,Ma,s,t,T+1)Multinomial(Ra,s,t,Qa,s,t), (39)

with probabilities Qa,s,t = (Qa,s,t,1, …, Qa,s,t,T+1) such that

Qa,s,t,u={0,foru=1,,tδa,s,tπs,ur=t+1u1δa,s,r(1πs,r),foru=t+1,,T1r=1TQa,s,t,r,ifu=T+1. (40)

4.2. Count data model: p2

To estimate population abundance, a two level model is used: the first level models the observed (counted) number of females at each point in time denoted yt, with a second, latent process modelling the total number of females in population. The observation model is

ytxtPoisson(xt), (41)

where we denote the number of juvenile and adult females in the population at time t as xJ,t and xA,t respectively, with xt = xJ,t + xA,t. If surt adult females survive from time t − 1 to time t, and immt adult females immigrate over the same time period, then the latent, population level model is

xJ,txt1,ρ,δJ,F,t1Poisson(xt1ρ2δJ,F,t1),surtxt1,δA,F,t1Binomial(xt1,δA,F,t1),immtxt1,ηtPoisson(xt1ηt),xA,t=surt+immt, (42)

where ηt is the immigration rate. The initial population sizes xJ,1 and xA,1 have independent discrete uniform priors on {0, 1, …, 50}. If xt−1 = 0 then we assume that the Poisson and binomial distributions become point masses at zero.

4.3. Fecundity: p3

The fecundity submodel considers the number of breeding females at time t denoted Nt, and the number of chicks produced that survive and leave the nest denoted nt. A Poisson model is employed to estimate fecundity (reproductive rate) ρ

ntPoisson(Ntρ). (43)

4.4. Parameterisation and melding quantities

Abadi et al. (2010) parameterise the time dependent quantities via linear predictors to minimise the number of parameters in the submodels. The specific parameterisation of Finke et al. (2019) we employ is

logit(δa,s,t)=α0+α1𝕀(s=M)+α2𝕀(a=A),log(ηt)=α6,logit(πs,u)=α4𝕀(s=M)+α5,u,foru=2,T, (44)

thus the quantities in common between the submodels are ϕ1∩2 = (α0, α2) and ϕ2∩3 = ρ. To align the notation of this example with our chained melding notation we define, for all permitted values of a, s and t, Y1 = (Ma,s), ψ1=(α1,α4,(α5,u)u=2T) Y2 = (yt), ψ2 = (xJ,t, α6, surt, immt); and Y3 = (Nt, nt), ψ;3 = ∅. Note that the definition of ϕ1∩2 does not include α1 as it is male specific and does not exist in p2. The model variant of Finke et al. (2019) we consider does not include α3, and for comparability we keep the other parameter indices the same.

4.5. Priors

We use the priors of Finke et al. (2019) for the parameters in each submodel. Denote α = (α0, α1, α2, α4, α6). In both p1 and p2 the elements of α are assigned independent Normal(0, 22) priors truncated to [−10, 10]. The time varying recapture probabilities α5,u also have Normal(0, 22) priors truncated to [−10, 10]. A Uniform(0, 10) prior is assigned to ρ in p2 and p3.

To completely specify pmeld we must choose how to form ppool(ϕ1∩2, ϕ2∩3). We form ppool(ϕ1∩2, ϕ2∩3) using three different pooling methods and estimate the melded posterior in each case. The first pooling method is product-of-experts (PoE) pooling, which is logarithmic pooling with λ = (1, 1, 1), and we denote the melded posterior as pmeld,PoE. We also use logarithmic pooling with λ=(12,12,12), which is denoted pmeld,log and results in the chained melded model being identical to the IPM. The final pooling method is linear pooling with λ=(12,12,12,12), denoted pmeld,lin.

4.6. Posterior estimation

We estimate the melded posterior – pmeld(ϕ, ψ | Y), proportional to Equation (9) – using both the parallel sampler (Section 3.1) and normal approximation (Section 3.2). This allows us to use pre-existing implementations of the submodels. Specifically, the capture-recapture submodel is written in BUGS (Lunn et al., 2009) and sampled via rjags (Plummer, 2019). The fecundity submodel is written in Stan (Carpenter et al., 2017) and sampled via rstan (Stan Development Team, 2021). The count data submodel is also written in BUGS, and we reuse this implementation in stage two of the multi-stage sampler via NIMBLE (de Valpine et al., 2017) and its R interface (NIMBLE Development Team, 2019). The approximate melded posterior obtained by Section 3.2 is sampled using rjags. Code and data for this example, as well as trace plots and numerical convergence measures (Vehtari et al., 2020) for both stages of the parallel sampling process, are available in the accompanying online repository5.

4.7. Results

We empirically validate our methodology and sampler by comparing the melded posterior samples to a large sample – 6 chains, each containing 1×105 post-warmup iterations – from the original IPM posterior. Similarity in the posteriors is expected as the IPM is effectively the joint model we wish to approximate with the chained melded model. It is simply fortunate, from a modelling standpoint, that this example’s joint model is easy to construct and computationally feasible with standard tools. Note that under logarithmic pooling with λ=(12,12,12) the melded posterior is identical to the original IPM, so any differences between the two posteriors are attributable to the multi-stage sampler. Figure 5 depicts the posterior credible intervals (Gabry et al., 2021; Kay, 2020) for the common quantities from the individual submodels, the melded models, and the original IPM. The top row in Figure 5 indicates that the count data alone (p2) contain minimal information about α0, α2 and ρ; incorporating the data from the other submodels is essential for precise estimates.

Figure 5.

Figure 5

Top row: credible intervals for ϕ1∩2 = (α0, α2) and ϕ2∩3 = ρ from the posterior of the original integrated population model pipm, and the individual subposteriors from submodels p1, p2, and p3. Bottom row: credible intervals for the same quantities, but with a different x-axis scale, from the original IPM (repeated from top row); the chained melded posteriors using product-of-experts pooling, logarithmic pooling, and linear pooling denoted pmeld, pmeld,log and pmeld,lin; and the melded posterior using the normal approximation P^meld. Intervals are 50%, 80%, 95%, and 99% wide.

The multi-stage sampler works well by producing melded posterior estimates generally similar to the original IPM estimate, and are near identical for logarithmic pooling. PoE pooling produces the posterior most different from the original IPM, as it yields a prior for (α0, α2) that is more concentrated around zero than the other pooling methods. The lack of large differences between the melded posteriors that use different pooled priors indicates that the prior has almost no effect on the posterior. The similarity of the approximate approach (p^meld - bottom row of Figure 5) to the melding approaches suggests that the normal approximations are good summaries of the subposteriors, and that the approximate melding procedure of Section 3.2 is suitable for this example.

5. Survival analysis with time varying covariates and uncertain event times

We return now to the respiratory failure example introduced in Section 1.1. Our intention is to illustrate the application of chained Markov melding to an example of realistic complexity, and explore empirically the importance of accounting for all sources of uncertainty by comparing chained Markov melding to equivalent analyses which use only a point estimate summary of the uncertainty. Specifically, event times and indicators are a noninvertible function of other parameters in the first submodel, and are an uncertain response in the survival submodel. Chained Markov melding enables us to specify a suitable joint model despite these complications.

There are i = 1, …, N individuals in the data set. Each individual is admitted to the ICU at time 0, and is discharged or dies at time Ci. See Appendix I for information on how the N = 37 individuals were selected from MIMIC-III (Johnson et al., 2016).

5.1. P/F ratio submodel (B-spline): p1

The first submodel fits a B-spline to the PaO2/FiO2 data to calculate if and when an individual experiences respiratory failure. Each individual has PaO2/FiO2 ratio observations zi,j (in units of mmHg) at times ti,j, with j = 1, …, Ji. For each individual denote the vector of observations zi = (zi,1, …, zi,Ji) and observation times ti = (ti,1, …, ti,Ji). To improve computational performance, we standardise the P/F ratio data for each individual such that zi,j=z˜i,jz¯is^i, where z˜i,j is the underlying unstandardised observation with mean z¯i and standard deviation s^i. Similarly we rescale the threshold for respiratory failure: τi=300z¯is^i.

We choose to model the P/F ratio using cubic B-splines and 7 internal knots, and do not include an intercept column in the spline basis (for background on B-splines see: Chapter 2 in Hastie and Tibshirani, 1999; and the supplementary material of Wang and Yan, 2021). The internal knots are evenly spaced between two additional boundary knots at min(ti) and max(ti). These choices result in k = 1, …, 10 spline basis terms per individual, with coefficients ζi,k where ζi = (ζi,1, …, ζi,10). We denote the individual specific B-spline basis evaluated at time ti,j as Bi(ti,j) [0, )10 so that the submodel can be written as

zi,j=β0,i+Bi(ti,j)ζi+εi,j. (45)

We employ a weakly informative prior for the intercept β0,i ∼ N(0, 12), a heavy tailed distribution for the error term6 εi,j ∼ t5(0, ωi), and a weakly informative half-normal prior for the unknown scale parameter ωi N+(0, 12). For the spline basis coefficients we set ζi,1 N(0, 0.12), and for k = 2, …, 10 we employ the random-walk prior ζi,k N(ζi,k−1, 0.12) from Kharratzadeh (2017).

We identify that a respiratory failure event occurred (which we denote by di = 1) at event time Ti if a solution to the following optimisation problem exists

Ti=mint{τi=β0,i+Bi(t)ζit[max(0,min(ti)),max(ti)]}, (46)

We attempt to solve Equation 46 using a standard multiple root finder (Soetaert et al., 2020). If there are no roots then the individual died or was discharged before respiratory failure occurred so we set Ti = Ci and di = 0. The relationship between Ti and other model coefficients is displayed in the left hand panel of Figure 6.

Figure 6. Parameters and form for the P/F ratio submodel (p1, left) and cumulative fluid submodel (p3, right).

Figure 6

5.2. Cumulative fluid submodel (piecewise linear) p3

The rate of fluid administration reflects the clinical management of patients by ICU staff, and hence changes to the rate reflect decisions to change treatment strategy. We employ a breakpoint regression model to capture the effect of such decisions, and consider only one breakpoint as this appears sufficient to fit the observed data. Specifically, we model the 8-hourly cumulative fluid balance data xi,l (in litres) at times ui,l, l = 1, …, Li.

The cumulative data are derived from the raw fluid input/output observations, which we detail in Appendix D. We denote the complete vector of observations by xi = (xi,1, …, xi,Li) and times by ui = (ui,1, …, ui,Li).

We assume a piecewise linear model with η0,i as the value at the breakpoint at time κi, slope η1,ib before the breakpoint, and slope η1,ia after the breakpoint. We write this submodel as

xi,l=mi(ui,l)+ϵi,l,mi(ui,l)=η0,i+η1,ib(ui,lκi)𝟙{ui,l<κi}+η1,ia(ui,lκi)𝟙{ui,lκi}. (47)

It will be useful to refer to the fitted value of this submodel at arbitrary time as mi(t). We assume a weakly informative prior for the error term ϵi,lN(0,σx,i2), with individual-specific error variances σx,i N+(0, 52), and specific, informative priors for the slope before the breakpoint η1,ib ∼ Gamma(1.53, 0.24) and after η1,ia ∼ Gamma(1.53, 0.24). An appropriate prior for κi and η0,i is challenging to specify due to the relationship between the two parameters and the individual-specific support for κi. We address both challenges by reparameterisation, resulting in a prior for κi that, in the absence of other information, places the breakpoint in the middle of an individual’s ICU stay, and a prior for η0,i that captures the diverse pathways into ICU that an individual can experience. Details and justifications for all the informative priors are available in Appendix E. Figure 6 displays the parameters and their relationship to the fitted regression line.

5.3. Survival submodel p2

The rate at which fluid is administered is thought to influence the time to respiratory failure (Seethala et al., 2017), so we explore this relationship using a survival model. Individuals experience respiratory failure (di = 1) at time 0 < t < Ci, or are censored (di = 0, t = Ci). We assume a Weibull hazard with shape parameter γ for the event times. All individuals have baseline (time invariant) covariates wi,a, a = 1, …, A, with wi = (1, wi,1, …, wi,A) (i.e. including an intercept term), and common coefficients θ = (θ0, …, θA). The hazard is assumed to be influenced by these covariates and the rate of increase tmi(t) in the cumulative fluid balance. The strength of the latter relationship is captured by α. Hence, the hazard is

hi(t)=γtγ1exp{wiθ+αtmi(t)}, (48)
tmi(t)=η1,ib𝟙{t<κi}+η1,ia𝟙{tκi}, (49)

The survival function at an individual’s observed event time and status, (Ti, di), denoted Si(Ti)=exp{0Tihi(u)du}, has an analytic form which we derive in Appendix F.

Thus, the likelihood for individual i is

p(Ti,diγ,θ,α,κi,η1,ib,η1,ia,wi)=hi(Ti)diSi(Ti), (50)

where we suppress the dependence on the parameters on the right hand side for brevity.

Our priors, which we justify in Appendix G, for the submodel specific parameters are γ ∼ Gamma(9.05, 8.72), α ∼ SkewNormal(0, 0.5, −2), θa SkewNormal(0, 0.5, −1), and θ0 N(Ê, 0.52) where Ê is the log of the crude event rate (Brilleman et al., 2020). We adopt the same priors as the cumulative fluid balance submodel for κi, η1,ib, and η1,ia.

5.4. Chained Markov melding details

To combine the submodels with chained Markov melding we must define the common quantities ϕ1∩2 and ϕ2∩3. We meld p1 and p2 by treating the derived event times and indicators {(Ti,di)}i=1N under p1 as the “response”, i.e. event times, in p2. Care is required when defining ϕ1∩2 under p1 as it is a deterministic function of β0, i and ζi. Define χ1,i = (β0, i, ζi) and ϕ1∩2,i = f(χ1,i) = (Ti, di), where f is the output from attempting to solve Equation (46), so that ϕ1∩2 = (f(χ1, i), …, f(χ1,N)). The parameters shared by Equations (47) and (49) constitute ϕ23=(η1,ib,η1,ia,κi)i=1N.

To completely align with our chained melding notation we also define, for the P/F submodel, Y1=(zi,ti)i=1N and ψ1=(ωi)i=1N, noting that ψ1 and (χ1,i, …, χ1,N) have no components in common. For the cumulative fluid submodel we define Y3=(xi,ui)i=1N, and ψ3=(η0,i,σx,i2)i=1N. Finally, for the survival submodel we define Y2=(wi)i=1N and ψ2 = (γ, θ, α).

5.5. Pooling and estimation

We consider logarithmic pooling with λ=(45,45,45) (any smaller value of λ results in a prior that is so uninformative that it causes computational problems) and with λ = (1, 1, 1) (Product-of-Experts). Because the correlation between ϕ1∩2 and ϕ2∩3 in p2(ϕ1∩2, ϕ2∩3) is important, we do not consider linear pooling in this example. Logarithmic pooling requires estimates of p1(ϕ1∩2) and p2(ϕ1∩2, ϕ2∩3). Because these are mixed distributions, with both discrete and continuous components, standard kernel density estimation, as suggested by Goudie et al. (2019), is inappropriate. Instead we fit, to transformed versions of ϕ1∩2 and ϕ2∩3, a mixture containing a discrete component and either a Gaussian or beta distribution, depending on the transformation. Further details for all the mixture distribution estimates are contained in Appendix H.

We use the parallel multi-stage sampler with ppool,1(ϕ1∩2) = p1(ϕ1∩2), ppool,3(ϕ2∩3) = p3(ϕ2∩3) and ppool,2(ϕ1∩2, ϕ2∩3) = ppool(ϕ) / (p1(ϕ1∩2)p3(ϕ2∩3)). That is, in stage one we target the subposteriors p1(ϕ1∩2, ψ1 | Y1) and p3(ϕ2∩3, ψ3 | Y3); in stage two we target the full melded model. Targeting p1(ϕ1∩2, ψ1 | Y1) in stage one alleviates the need to solve Equation (46) within an MCMC iteration, instead turning the production of ϕ1∩2 into an embarrassingly parallel, post-stage-one processing step. Attempting to sample the melded posterior directly would involve solving (46) many times within each iteration, presenting a sizeable computational hurdle which we avoid. It is crucial for the convergence of our multi-stage sampler that the components of ϕ1∩2 and ϕ2∩3 are updated individual-at-a-time in stage two. This is possible due to the conditional independence between individuals in the stage one posterior, and Appendix K contains the details of this scheme. The stage one subposteriors are sampled using Stan, using 5 chains with 103 warm-up iterations and 104 post warm-up iterations. We use Stan to sample ψ2 where, in every MH-within-Gibbs step, we run Stan for 9 warm-up iterations and 1 post warm-up iteration7. We run 5 chains of 104 iterations for all stage two targets. Visual and numerical diagnostics (Vehtari et al., 2020) are assessed and are available in the repository accompanying this paper8.

5.6. Results

We first inspect the subposterior fitted values for p1 and p3. The top row of Figure 7 displays the P/F data, the fitted submodel, and derived event times for individuals i = 17 and 29. The spline appears to fit the raw P/F data well, with the heavy tailed error term accounting for the larger deviations away from the fitted value. It is interesting to see the relatively wide, multimodal distribution for (T29, d29) (there is a second mode at (T29 = C29, d29 = 0) and for other individuals not shown here). The bottom row of Figure 7 displays the cumulative fluid data and the fitted submodel, with the little noise in the data resulting in minimal uncertainty about the fitted value and a concentrated subposterior distribution.

Figure 7.

Figure 7

The P/F ratio data (Y1, top row); cumulative fluid data (Y3, bottom row); subposterior means and 95% credible intervals for each of the submodels (black solid lines and grey intervals); and stage one event times (Ti, red rug in the top row) for individuals i = 17 and 29.

To assess the importance of fully accounting for the uncertainty in ϕ1∩2 and ϕ2∩3, we compare the posterior for ψ2 obtained using the chained melding approach with the posterior obtained by fixing ϕ1∩2 and ϕ2∩3. Plugging in a point estimate reflects common applied statistical practice when combining submodels, particularly when a distributional approximation is difficult to obtain (as it is for p1(ϕ1∩2 | Y1)). Additionally, standard survival models and software typically do not permit uncertainty in event times and indicators, rendering such a plug-in approach necessary.

Specifically, we fix ϕ1∩2 to the median value9 for each individual under p1(ϕ1∩2 | Y1) and denote it ϕ^12 and use the subposterior mean of p3(ϕ2∩3 | Y3) denoted ϕ^23 With these fixed values we sample p(ψ2ϕ^12,ϕ^23,Y2). We also compare the melded posterior to the submodel marginal prior p2(ψ2), but we note that this comparison is difficult to interpret, as the melding process alters the prior for ψ2. Figure 8 displays the aforementioned densities for (θ3, θ17, γ, α) ⊂ ψ2, with (θ3, θ17) chosen as they exhibit the greatest sensitivity to the fixing of ϕ1∩2 and ϕ2∩3. For the baseline coefficients (θ3, θ17) the chained melding posterior differs slightly in location from p(ψ2ϕ^12,ϕ^23,Y2) with a small increase in uncertainty. A more pronounced change is visible for α, where the melding process has added a notable degree of uncertainty and shifted the posterior leftwards.

Figure 8.

Figure 8

Density estimates for a subset of ψ2. The submodel marginal prior p2(ψ2) is shown as the grey dotted line (note that this is not the marginal prior under the melded model). The figure also contains the subposteriors obtained from chained melding using PoE pooling (red, solid line) and logarithmic pooling (blue, solid line), as well as the posterior using the fixed values p(ψ2ϕ^12,ϕ^23,Y2) (black, dashed line).

To investigate which part of the melding process causes this change in the posterior of α, we consider fixing either one of ϕ1∩2 and ϕ2∩3 to their respective point estimates. That is, we employ Markov melding as described in Section 1.2, using either logarithmic or PoE pooling, to obtain pmeld(αϕ^12,Y2,Y3) and pmeld(αϕ^23,Y1,Y2). Figure 9 displays the same distributions for α as Figure 8, and adds the posteriors obtained using one fixed value (ϕ^12orϕ^23) whilst melding the other non-fixed parameter.

Figure 9.

Figure 9

Median (vertical line), 50%, 80%, 95%, and 99% credible intervals (least transparent to most transparent) for α. The marginal prior (grey, top row) and posterior using fixed ϕ^12 and ϕ^23 (black, bottom row) are as in Figure 8. For the chained melded posteriors (red and blue, rows 2 and 3) and the melded posteriors (red and blue, rows 4 – 7), the tick label on the y-axis denotes the type of pooling used, and which of ϕ1∩2 and/or ϕ2∩3 are fixed.

Evident for both choices of pooling is the importance of incorporating the uncertainty in ϕ1∩2. This is expected given the large uncertainty and multimodal nature of ϕ1∩2 compared to ϕ2∩3 (see Figure 7). We suspect that it is the multimodality in p1(ϕ1∩2 | Y1) that produces the shift in posterior mode of ϕ1∩2, with the width of p1(ϕ1∩2 | Y1) affecting the increase in uncertainty. Because we prefer the chained melded posterior, under either pooling method, for its full accounting of uncertainty we conclude that p(αϕ^12,ϕ^23,Y2) is both overconfident and biased.

The marginal changes to the components of ψ2 visible in Figure 8 appear small, however the cumulative effect of such changes becomes apparent when inspecting the posterior of the survival function. Figure 10 displays the model-based, mean survival function under the melded posterior (using PoE pooling), and corresponding draws of ϕ1∩2 converted into survival curves using the Kaplan-Meier estimator. Also shown are the Kaplan-Meier estimate of ϕ^12 and the mean survival function computed using p(ψ2ϕ^12,ϕ^23,Y2). The posterior survival functions differ markedly, with the 95% intervals overlapping only for small values of time. It is also interesting to see that ϕ^12, despite being a reasonable point estimate of p1(ϕ1∩2 | Y1), is not very likely under the melded posterior. Figure 10 also suggests that the Weibull hazard is insufficiently flexible for this example. We discuss the complexities of other hazards in Section 6.

Figure 10.

Figure 10

Survival curves and mean survival function at time t. The red, stepped lines are draws of ϕ1∩2 from the melded posterior using PoE pooling, converted into survival curves via the Kaplan-Meier estimator. The smooth red line and interval (posterior mean and 95% credible interval) denote the model-based, mean survival function obtained from the melded posterior (PoE pooling) values of ψ2 and ϕ2∩3. The blue dashed line is the Kaplan-Meier estimate of ϕ^12, and the blue solid line and interval are the corresponding model-based estimate from p(ψ2ϕ^12,ϕ^23,Y2).

6. Conclusion

This paper introduces the chained Markov melded model. In doing so we make explicit the notion of submodels related in a chain-like way, describe a generic methodology for joining together any number of such submodels and illustrate its application with our examples. Our examples also demonstrate the importance of quantifying the uncertainty when joining submodels; not doing so can produce biased, over-confident inference. We also present the choices, and their impacts, that users of chained Markov melding must make which include: the choice of pooling function, and where required the pooling weights; the choice of posterior sampler and the design thereof, including the apportionment of the pooled prior over the stages and stage-specific MCMC techniques.

We have introduced extensions to linear and logarithmic pooling to marginals of different but overlapping quantities. Linear pooling, introduced in Section 1.2, could be extended to induce dependence between the components of ϕ using multivariate or vine copulas (Kurowicka and Joe, 2011; Nelsen, 2006), or other techniques (Lin et al., 2014). Copula methods are particularly appealing as, depending on the choice of copula, they yield computationally cheap to evaluate expressions for the density function, are easy to sample, and induce correlation between an arbitrary number of marginals.

Our parallel multi-stage sampler currently only considers M = 3 submodels, rather than the fully generic definition of chained Markov melding in Equation (10). Whilst we anticipate needing more complex methods in large M settings, the value of M at which the performance of our multi-stage sampler becomes unacceptable will depend on the specific submodels and data under consideration. A general method would consider a large and arbitrary number of submodels in a chain, and initially split the chain into more/fewer pieces depending on the computational resources available. Designing such a method is complex, as it would have to:

  • avoid requiring the inverse of any component of ϕ with a noninvertible definition,

  • estimate the relative cost of sampling each submodel’s subposterior, to split the chain of submodels into steps/jobs of approximately the same computational cost,

  • decide the order in which pieces of the chain are combined.

These are substantial challenges. It may be possible to use combine the ideas in Lindsten et al. (2017) and Kuntz et al. (2021), who propose a parallel Sequential Monte Carlo method, with the aforementioned constraints to obtain a generic methodology. Ideally we would retain the ability to use existing implementations of the submodels, however the need to recompute the weights of the particles, and hence reevaluate previously considered submodels, may preclude this requirement. Our current sampler is also sensitive to large differences in location or scale of the target distribution between the stages. The impact of these differences can be ameliorated using the methodology of Manderson and Goudie (2022), and, more generally, Sequential Monte Carlo samplers are likely to perform better in these settings.

Our chained Markov melding methodology is general and permits any form of uncertainty in the common quantities. In Section 5 we use our chained melded model to incorporate uncertainty in the event times and indicators into a survival submodel. Some specific forms of uncertainty in the event times have been considered in previous work. These include Wang et al. (2020), who consider uncertain event times arising from record linkage, where the event time is assumed to be one of a finite number of event times arising from the record linkage; and Giganti et al. (2020), Oh et al. (2018), and Oh et al. (2021), who leverage external validation data to account for measurement error in the event time. However, the general and Bayesian nature of our methodology readily facilitates any form of uncertainty in the event times and the event indicators; uncertainty in the latter is not considered in the cited papers.

The example in Section 5 has three more interesting aspects to discuss. Firstly, the P/F ratio data used in the first submodel is obtained by finding all blood gas measurements from arterial blood samples. Approximately 20% of the venous/arterial labels are missing. In these instances a logistic regression model, fit by the MIMIC team10, is used to predict the missing label based on other covariates. It is theoretically possible to refit the model in a Bayesian framework and use the chained melded model to incorporate the uncertainty in the predicted sample label – adding another ‘link’ to the chain.

Secondly, the application of our multi-stage sampler to this example is similar to the two-stage approach used for joint longitudinal and time-to-event models (see Mauff et al., 2020, for a description of this approach). In the two-stage approach, the longitudinal model is fit using standard MCMC methods in stage one, and the samples are reused in stage two when considering the time-to-event data. This can significantly reduce the computational effort required to fit the joint model. However, unlike our multi-stage sampler, the typical two-stage approach does not target the full posterior distribution, which can lead to biased estimates (though Mauff et al. (2020) extend the typical two-stage approach to reduce this bias).

Thirdly, we observe a lack of flexibility the baseline hazard, visible in Figure 10. More complex hazards could be employed, e.g. modelling the (log-)hazard using a (penalised) B-spline (Royston and Parmar, 2002; Rosenberg, 1995; Rutherford et al., 2015). However, this increased flexibility precludes an analytic form for the survival function. Whilst numerical integration is possible it is not trivial, particularly when the hazard is discontinuous, as our hazard is at the breakpoint. Splines also have more coefficients than the single parameter of the Weibull hazard. Identifiability issues arise with a small number of individuals, many of whom are censored, and are compounded when there are a relatively large number of other parameters (α, θ). Whilst we do not believe these costs are worth incurring for our example, for settings with a larger number of patients and more complicated longitudinal submodels the increased flexibility may be vital.

Supplementary Material

Appendix

Acknowledgements

We thank Sarah L Cowan for assistance in understanding respiratory failure, and Anne Presanis and Brian Tom for many helpful discussions about the methodological aspects of this work. We also thank Luiz Max Carvalho for comments on an earlier version of this paper. This work was supported by The Alan Turing Institute under the UK Engineering and Physical Sciences Research Council (EPSRC) [EP/N510129/1] and the UK Medical Research Council [programme code MC UU 00002/2].

Footnotes

1

”Chained graphs” were considered by Lauritzen and Richardson (2002), however they are unrelated to our proposed model. We use “chained” to emphasise the nature of the relationships between submodels.

2

This is shown in Appendix B of the online supplement to Goudie et al. (2019).

3

Some care is required if the authoritative submodel is pm for m ∈ {1, 2, M − 1, M}. If it is taken to be m ∈ {1, 2}, then g1 does not exist, and additionally in the m = 1 case p1(ϕ0∩1, ϕ1∩2) ≔ p1(ϕ1∩2). The m ∈ { M − 1, M} cases have analogous definitions.

4

For completeness, Appendix B describes such a sequential MCMC sampler. We do not use the sequential sampler in this paper.

6

P/F data contain many outliers for, amongst many possible reasons, arterial/venous blood sample mislabelling; incorrectly recorded oxygenation support information; and differences between sample collection time, lab result time, and the observation time as recorded in the EHR.

7

We also initialise Stan at the previous value of ψ2, and disable all adaptive procedures as the default (identity) mass matrix and step size are suitable for this example.

9

For each individual the samples of (Ti,di)i=1N-pairs are sorted by Ti, and the N2th tuple (T^i,d^i) is chosen as the median.

10

The coefficients, classification threshold, and the imputation used in the case of missing data are supplied in the blood-gasses.sql file in the GitHub repository accompanying this paper. No other information is available about this model (the data used to produce the coefficients, and the performance of the fitted model).

Contributor Information

Andrew A. Manderson, Email: andrew.manderson@mrc-bsu.cam.ac.uk.

Robert J. B. Goudie, Email: robert.goudie@mrc-bsu.cam.ac.uk.

References

  1. Abadi F, Gimenez O, Ullrich B, Arlettaz R, Schaub M. Estimation of Immigration Rate Using Integrated Population Models. Journal of Applied Ecology. 2010;47(2):393–400. [Google Scholar]
  2. Abbas AE. A Kullback-Leibler View of Linear and Log-Linear Pools. Decision Analysis. 2009 [Google Scholar]
  3. Ades AE, Sutton AJ. Multiparameter Evidence Synthesis in Epidemiology and Medical Decision-Making: Current Approaches. Journal of the Royal Statistical Society: Series A (Statistics in Society) 2006;169(1):5–35. [Google Scholar]
  4. Belgorodski N, Greiner M, Tolksdorf K, Schueller K. rriskDistributions: Fitting Distributions to given Data or Known Quantiles. R package version 2.1.2. 2017 [Google Scholar]
  5. Besbeas P, Freeman SN, Morgan BJT, Catchpole EA. Integrating Mark–Recapture–Recovery and Census Data to Estimate Animal Abundance and Demographic Parameters. Biometrics. 2002;58(3):540–547. doi: 10.1111/j.0006-341x.2002.00540.x. [DOI] [PubMed] [Google Scholar]
  6. Brilleman S. Simsurv: Simulate Survival Data. R package version 1.0.0. 2021 [Google Scholar]
  7. Brilleman SL, Elci EM, Novik JB, Wolfe R. Bayesian Survival Analysis Using the Rstanarm R Package. 2020:arXiv:2002.09633. [stat] [Google Scholar]
  8. Bromiley P. Products and Convolutions of Gaussian Probability Density Functions. Tina-Vision Memo. 2003;3(4):1. [Google Scholar]
  9. Brooks SP, King R, Morgan BJT. A Bayesian Approach to Combining Animal Abundance and Demographic Data. Animal Biodiversity and Conservation. 2004;27(1) [Google Scholar]
  10. Burke DL, Ensor J, Riley RD. Meta-Analysis Using Individual Participant Data: One-Stage and Two-Stage Approaches, and Why They May Differ. Statistics in Medicine. 2017;36(5):855–875. doi: 10.1002/sim.7141. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Carpenter B, Gelman A, Hoffman M, Lee D, Goodrich B, Betancourt M, Brubaker M, Guo J, Li P, Riddell A. Stan: A Probabilistic Programming Language. Journal of Statistical Software. 2017;76(1):1–32. doi: 10.18637/jss.v076.i01. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Carvalho LM, Villela DAM, Coelho FC, Bastos LS. Bayesian Inference for the Weights in Logarithmic Pooling. Bayesian Analysis. 2022:1–29. [Google Scholar]
  13. Crowther MJ, Lambert PC. Simulating Biologically Plausible Complex Survival Data. Statistics in Medicine. 2013;32(23):4118–4134. doi: 10.1002/sim.5823. [DOI] [PubMed] [Google Scholar]
  14. Dawid AP, Lauritzen SL. Hyper Markov Laws in the Statistical Analysis of Decomposable Graphical Models. The Annals of Statistics. 1993;21(3):1272–1317. [Google Scholar]
  15. de Valpine P, Turek D, Paciorek CJ, Anderson-Bergman C, Lang DT, Bodik R. Programming with Models: Writing Statistical Algorithms for General Model Structures with NIMBLE. Journal of Computational and Graphical Statistics. 2017;26(2):403–413. [Google Scholar]
  16. Donnat C, Miolane N, Bunbury F, Kreindler J. A Bayesian Hierarchical Network for Combining Heterogeneous Data Sources in Medical Diagnoses; Proceedings of the Machine Learning for Health NeurIPS Workshop; 2020. pp. 53–84. Proceedings of Machine Learning Research. [Google Scholar]
  17. Finke A, King R, Beskos A, Dellaportas P. Efficient Sequential Monte Carlo Algorithms for Integrated Population Models. Journal of Agricultural, Biological and Environmental Statistics. 2019;24(2):204–224. [Google Scholar]
  18. Gabry J, Mahr T, Bürkner P-C, Modrák M, Barrett M, Weber F, Sroka EC, Vehtari A. Bayesplot: Plotting for Bayesian Models. R package version 1.8.0. 2021 [Google Scholar]
  19. Gabry J, Simpson D, Vehtari A, Betancourt M, Gelman A. Visualization in Bayesian Workflow. Journal of the Royal Statistical Society: Series A (Statistics in Society) 2019;182(2):389–402. [Google Scholar]
  20. Gelman A, Vehtari A, Simpson D, Margossian CC, Carpenter B, Yao Y, Kennedy L, Gabry J, Bürkner P-C, Modrák M. Bayesian Workflow. 2020:arXiv:2011.01808. [stat] [Google Scholar]
  21. Genest C, McConway KJ, Schervish MJ. Characterization of Externally Bayesian Pooling Operators. The Annals of Statistics. 1986;14(2):487–501. [Google Scholar]
  22. Giganti MJ, Shaw PA, Chen G, Bebawy SS, Turner MM, Sterling TR, Shepherd BE. Accounting for Dependent Errors in Predictors and Time-to-Event Outcomes Using Electronic Health Records, Validation Samples, and Multiple Imputation. Annals of Applied Statistics. 2020;14(2):1045–1061. doi: 10.1214/20-aoas1343. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Goudie RJB, Presanis AM, Lunn D, De Angelis D, Wernisch L. Joining and Splitting Models with Markov Melding. Bayesian Analysis. 2019;14(1):81–109. doi: 10.1214/18-BA1104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Hastie T, Tibshirani R. Generalized Additive Models. Boca Raton, Fla: Chapman & Hall/CRC; 1999. [Google Scholar]
  25. Hinton GE. Training Products of Experts by Minimizing Contrastive Divergence. Neural Computation. 2002;14(8):1771–1800. doi: 10.1162/089976602760128018. [DOI] [PubMed] [Google Scholar]
  26. Hooten MB, Johnson DS, Brost BM. Making Recursive Bayesian Inference Accessible. The American Statistician. 2019:1–10. [Google Scholar]
  27. Jackson D, White IR. When Should Meta-Analysis Avoid Making Hidden Normality Assumptions? Biometrical Journal. 2018;60(6):1040–1058. doi: 10.1002/bimj.201800071. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Johnson AEW, Pollard TJ, Shen L, Lehman L-wH, Feng M, Ghassemi M, Moody B, Szolovits P, Anthony Celi L, Mark RG. MIMIC-III, a Freely Accessible Critical Care Database. Scientific Data. 2016;3(1):160035. doi: 10.1038/sdata.2016.35. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Kay M. Tidybayes: Tidy Data and Geoms for Bayesian Models. R package version 2.0.2. 2020 [Google Scholar]
  30. Kedem B, De Oliveira V, Sverchkov M. Statistical Data Fusion. World Scientific; 2017. [Google Scholar]
  31. Kharratzadeh M. Splines in Stan. Stan Case Studies. 2017;4 [Google Scholar]
  32. Kuntz J, Crucinio FR, Johansen AM. The Divide-and-Conquer Sequential Monte Carlo Algorithm: Theoretical Properties and Limit Theorems. 2021:arXiv:2110.15782. [math, stat] [Google Scholar]
  33. Kurowicka D, Joe H, editors. Dependence Modeling: Vine Copula Handbook. Singapore: World Scientific; 2011. [Google Scholar]
  34. Lahat D, Adali T, Jutten C. Multimodal Data Fusion: An Overview of Methods, Challenges, and Prospects. Proceedings of the IEEE. 2015;103(9):1449–1477. [Google Scholar]
  35. Lauritzen SL, Richardson TS. Chain Graph Models and Their Causal Interpretations. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2002;64(3):321–348. [Google Scholar]
  36. Lebreton J-D, Burnham KP, Clobert J, Anderson DR. Modeling Survival and Testing Biological Hypotheses Using Marked Animals: A Unified Approach with Case Studies. Ecological Monographs. 1992;62(1):67–118. [Google Scholar]
  37. Lin G, Dou X, Kuriki S, Huang J-S. Recent Developments on the Construction of Bivariate Distributions with Fixed Marginals. Journal of Statistical Distributions and Applications. 2014;1(1):14. [Google Scholar]
  38. Lindsten F, Johansen AM, Naesseth CA, Kirkpatrick B, Schön TB, Aston JAD, Bouchard-Côté A. Divide-and-Conquer with Sequential Monte Carlo. Journal of Computational and Graphical Statistics. 2017;26(2):445–458. [Google Scholar]
  39. Lu CJ, Meeker WQ. Using Degradation Measures to Estimate a Time-to-Failure Distribution. Technometrics. 1993;35(2):161–174. [Google Scholar]
  40. Lunn D, Barrett J, Sweeting M, Thompson S. Fully Bayesian Hierarchical Modelling in Two Stages, with Application to Meta-Analysis. Journal of the Royal Statistical Society Series C. 2013;62(4):551–572. doi: 10.1111/rssc.12007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Lunn D, Spiegelhalter D, Thomas A, Best N. The BUGS Project: Evolution, Critique and Future Directions. Statistics in Medicine. 2009;28(25):3049–3067. doi: 10.1002/sim.3680. [DOI] [PubMed] [Google Scholar]
  42. Manderson AA, Goudie RJB. A Numerically Stable Algorithm for Integrating Bayesian Models Using Markov Melding. Statistics and Computing. 2022;32(2):24. doi: 10.1007/s11222-022-10086-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Massa MS, Lauritzen SL. Algebraic Methods in Statistics and Probability II. Vol. 516. Amer. Math. Soc; Providence, RI: 2010. Combining Statistical Models; pp. 239–259. (Contemp Math). [Google Scholar]
  44. Mauff K, Steyerberg E, Kardys I, Boersma E, Rizopoulos D. Joint Models with Multiple Longitudinal Outcomes and a Time-to-Event Outcome: A Corrected Two-Stage Approach. Statistics and Computing. 2020;30(4):999–1014. [Google Scholar]
  45. Maunder MN, Punt AE. A Review of Integrated Analysis in Fisheries Stock Assessment. Fisheries Research. 2013;142:61–74. [Google Scholar]
  46. Meng X-L. In: Past, Present, and Future of Statistical Science. Lin X, Genest C, Banks DL, Molenberghs G, Scott DW, Wang J-L, editors. Chapman and Hall/CRC; 2014. A Trio of Inference Problems That Could Win You a Nobel Prize in Statistics (If You Help Fund It) pp. 561–586. [Google Scholar]
  47. Nelsen RB. An Introduction to Copulas. second edition. Springer; New York: 2006. [Google Scholar]
  48. Nicholson G, Blangiardo M, Briers M, Diggle PJ, Fjelde TE, Ge H, Goudie RJB, Jersakova R, King RE, Lehmann BCL, Mallon A-M, et al. Interoperability of Statistical Models in Pandemic Preparedness: Principles and Reality. Statistical Science. 2021 doi: 10.1214/22-STS854. (forthcoming) [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. NIMBLE Development Team. NIMBLE: MCMC, Particle Filtering, and Programmable Hierarchical Modeling. R package manual version 0.9.0. 2019 [Google Scholar]
  50. Oh EJ, Shepherd BE, Lumley T, Shaw PA. Considerations for Analysis of Time-to-Event Outcomes Measured with Error: Bias and Correction with SIMEX. Statistics in medicine. 2018;37(8):1276–1289. doi: 10.1002/sim.7554. [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. Oh EJ, Shepherd BE, Lumley T, Shaw PA. Raking and Regression Calibration: Methods to Address Bias from Correlated Covariate and Time-to-Event Error. Statistics in Medicine. 2021;40(3):631–649. doi: 10.1002/sim.8793. [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. O’Hagan A, Buck C, Daneshkhah A, Eiser J, Garthwaite P, Jenkinson D, Oakley J, Rakow T. Statistics in Practice. Wiley; 2006. Uncertain Judgements: Eliciting Experts’ Probabilities. [Google Scholar]
  53. Parsons J, Niu X, Bao L. A Bayesian Hierarchical Modeling Approach to Combining Multiple Data Sources: A Case Study in Size Estimation. 2021:arXiv:2012.05346. [stat] [Google Scholar]
  54. Plummer M. Rjags: Bayesian Graphical Models Using MCMC. R package version 4-10. 2019 [Google Scholar]
  55. Presanis AM, Pebody RG, Birrell PJ, Tom BDM, Green HK, Durnall H, Fleming D, De Angelis D. Synthesising Evidence to Estimate Pandemic (2009) A/H1N1 Influenza Severity in 2009-2011. Annals of Applied Statistics. 2014;8(4):2378–2403. [Google Scholar]
  56. Rizopoulos D. Joint Models for Longitudinal and Time-to-Event Data: With Applications in R. CRC Press; 2012. [Google Scholar]
  57. Rosenberg PS. Hazard Function Estimation Using B-splines. Biometrics. 1995;51(3):874–887. [PubMed] [Google Scholar]
  58. Royston P, Parmar MKB. Flexible Parametric Proportional-Hazards and Proportional-Odds Models for Censored Survival Data, with Application to Prognostic Modelling and Estimation of Treatment Effects. Statistics in Medicine. 2002;21(15):2175–2197. doi: 10.1002/sim.1203. [DOI] [PubMed] [Google Scholar]
  59. Rufo MJ, Martín J, Pérez CJ. Log-Linear Pool to Combine Prior Distributions: A Suggestion for a Calibration-Based Approach. Bayesian Analysis. 2012a;7(2):411–438. [Google Scholar]
  60. Rufo MJ, Párez CJ, Martín J. A Bayesian Approach to Aggregate Experts’ Initial Information. Electronic Journal of Statistics. 2012b;6:2362–2382. none. [Google Scholar]
  61. Rutherford MJ, Crowther MJ, Lambert PC. The Use of Restricted Cubic Splines to Approximate Complex Hazard Functions in the Analysis of Time-to- Event Data: A Simulation Study. Journal of Statistical Computation and Simulation. 2015;85(4):777–793. [Google Scholar]
  62. Schaub M, Abadi F. Integrated Population Models: A Novel Analysis Framework for Deeper Insights into Population Dynamics. Journal of Ornithology. 2011;152(1):227–237. [Google Scholar]
  63. Schaub M, Ullrich B, Knötzsch G, Albrecht P, Meisser C. Local Population Dynamics and the Impact of Scale and Isolation: A Study on Different Little Owl Populations. Oikos. 2006;115(3):389–400. [Google Scholar]
  64. Seethala RR, Hou PC, Aisiku IP, Frendl G, Park PK, Mikkelsen ME, Chang SY, Gajic O, Sevransky J. Early Risk Factors and the Role of Fluid Administration in Developing Acute Respiratory Distress Syndrome in Septic Patients. Annals of Intensive Care. 2017;7(1):11. doi: 10.1186/s13613-017-0233-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  65. Soetaert K, Hindmarsh AC, Eisenstat SC, Moler C, Dongarra J, Saad Y. Rootsolve: Nonlinear Root Finding, Equilibrium and Steady-State Analysis of Ordinary Differential Equations. R package version 1.8.2.1. 2020 [Google Scholar]
  66. Stan Development Team. RStan: The R Interface to Stan. R package version 2.26. 2021 [Google Scholar]
  67. The ARDS Definition Task Force. Acute Respiratory Distress Syndrome: The Berlin Definition. JAMA. 2012;307(23):2526–2533. doi: 10.1001/jama.2012.5669. [DOI] [PubMed] [Google Scholar]
  68. Tom JA, Sinsheimer JS, Suchard MA. Reuse, Recycle, Reweigh: Combating Influenza through Efficient Sequential Bayesian Computation for Massive Data. The Annals of Applied Statistics. 2010;4(4):1722–1748. doi: 10.1214/10-AOAS349. [DOI] [PMC free article] [PubMed] [Google Scholar]
  69. Vehtari A, Gelman A, Simpson D, Carpenter B, Bürkner P-C. Rank-Normalization, Folding, and Localization: An Improved R for Assessing Convergence of MCMC. Bayesian Analysis. 2020 [Google Scholar]
  70. Wang W, Aseltine R, Chen K, Yan J. Integrative Survival Analysis with Uncertain Event Times in Application to a Suicide Risk Study. Annals of Applied Statistics. 2020;14(1):51–73. [Google Scholar]
  71. Wang W, Yan J. Shape-Restricted Regression Splines with R Package Splines2. Journal of Data Science. 2021;19(3):498–517. [Google Scholar]
  72. Zipkin EF, Saunders SP. Synthesizing Multiple Data Types for Biological Conservation Using Integrated Population Models. Biological Conservation. 2018;217:240–250. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Appendix

RESOURCES