Abstract
Motivated by data measuring progression of leishmaniosis in a cohort of US dogs, we develop a Bayesian longitudinal model with autoregressive errors to jointly analyze ordinal and continuous outcomes. Multivariate methods can borrow strength across responses and may produce improved longitudinal forecasts of disease progression over univariate methods. We explore the performance of our proposed model under simulation, and demonstrate that it has improved prediction accuracy over traditional Bayesian hierarchical models. We further identify an appropriate model selection criterion. We show that our method holds promise for use in the clinical setting, particularly when ordinal outcomes are measured alongside other variables types that may aid clinical decision making. This approach is particularly applicable when multiple, imperfect measures of disease progression are available.
Keywords: Bayesian, MCMC, Ordinal regression, Longitudinal data analysis
1. Introduction
Given the complexity of many disease processes, it is common for researchers to measure many facets of the illness under study. For example, consider the well referenced Primary Biliary Cirrohsis (PBC) longitudinal dataset originally collected by Therneau and Grambsch (2000), which encodes a variety of variables. These include an ordinal variable denoting the histologic stage of disease, and biomarker information (e.g. serum bilirubin, albumin) for many subjects. Motivated by this type of multivariate longitudinal data, we consider approaches that jointly analyze ordinal and continuous responses, and develop an associated Bayesian hierarchical model.
The serial correlation that accompanies multivariate longitudinal data can be accounted for in a variety of ways. A sensible starting point for analysis is to use univariate regression models with subject specific effects, where a separate model is estimated for each of the responses. By combining covariance structures, a set of univariate models with subject specific effects could be extended to multivariate models with subject specific effects. While multivariate models are more complex than their univariate counterparts, they have the potential to produce better fits (Li et al. 2016) by borrowing information across responses and can help deal with differing levels of temporal granularity stemming from difficulties in attaining certain outcomes.
Alternative strategies for dealing with serial correlation include non-diagonal error structures, inclusion of lagged outcome variables as covariates, or combinations of the three described approaches. In what follows we develop a model that is based on a combination of patient specific effects and a first-order autoregressive (AR1) error structure. The error structure uses a separable cross-covariance function, an approach sometimes referred to as “intrinsic coregionalizations” (Genton and Kleiber 2015). Motivation for extending models beyond subject specific effects comes from Chi and Reinsel (1989), who saw improved fits when adding an AR1 error structure to linear mixed models, and Wang and Fan (2010), who saw improved prediction capabilities when adding AR error structures to multivariate linear mixed models.
In this manuscript, we propose a multivariate Bayesian hierarchical model with autoregressive errors in order to jointly analyze continuous and ordinal longitudinal data. We start by detailing related works and our contributions in this space. We next present our approach, the associated priors, and implementation details. For reproducibility and to serve as a template for future implementations, we provide a publicly available software package bmrarm (available at https://github.com/nickseedorff/bmrarm) for R (R Core Team 2021). Method development is followed by a simulation study, which assesses parameter recovery, forecast accuracy, and approaches for model comparison. Finally, we present an application using data from a longitudinal veterinary medicine study.
2. Motivating application
2.1. Canine leishmaniosis (CanL)
Our motivating example comes from a longitudinal study of canine leishmaniosis (CanL). Canine leishmaniosis is a progressive wasting disease of dogs and humans (Duprey et al. 2006), is usually fatal if left untreated, and is a neglected tropical disease, which tend to disproportionately impact the world’s poorest people (Feasey et al. 2009). Human leishmanios is estimated to cause upwards of 20,000 deaths per year and the case-fatality rate for treated individuals is believed to be between 10 and 20% (Alvar et al. 2012). Leishmanios is a vector-borne disease caused by variants of the Leishmanaia donovani complex (Duprey et al. 2006). In humans, it is transmitted by sand flies (Chappuis et al. 2007) and more than 90% of cases are from six countries: India, Bangladesh, Sudan, South Sudan, Ethiopia and Brazil (Alvar et al. 2012).
CanL is enzootic in the U.S. in certain dog breeds (Toepp et al. 2017; Petersen and Barr 2009), and because a complete cure is rarely achievable, treatment efforts largely focus on maintenance and quality of life. The primary route of transmission to humans is via the sand fly with dogs acting as the primary reservoir, and vertical transmission (mother to offspring) appears to also play a role among dogs (Petersen and Barr 2009; Schaut et al. 2015; Ribeiro et al. 2018). Infection in U.S. dogs is limited to pups exposed in utero from a number of breeds (Duprey et al. 2006; Solano-Gallego et al. 2017).
LeishVet, a scientific association that focuses on leishmaniosis in veterinary medicine, provides clinical staging guidelines through a 4-category ordinal variable (LeishVet 2016). We use a modified version of this ordinal outcome, along with a ‘Stage 0’, indicating no evidence of disease, as our primary outcome and term it ‘LeishVet score’. An abbreviated version of the staging guidelines can be found in Fig. S1, while therapeutic recommendations and additional details can be found on the association’s website (LeishVet 2016).
2.2. Data specifics
Our dataset of interest contains samples from 50 dogs, two of which were dropped because they had only a single observation, providing no longitudinal information. The rest of the subjects had an initial observation and were then tracked at 3 month intervals for seven total visits in 2019 and 2020. The primary analysis goal is to accurately forecast disease progression as substantial time can elapse between observations by a veterinarian. 40 of the 48 subjects (83.3%) had their most recent visit at the 6th time point after baseline. Of the remaining eight dogs, two had their final visit at the 3rd time point after baseline, two had it at the 4th, and four had it at the 5th. Additionally, LeishVet score was unobserved for all dogs at the first follow-up visit. 83.3% of the dogs transitioned disease states at least once and 52.1% transitioned two or more times (Table 1).
Table 1.
Frequency of transitions and unique disease states per canine for the leishmaniosis dataset
| Disease state transitions |
Unique disease states |
||
|---|---|---|---|
| Count | Number of subjects (%) | State | Number of subjects (%) |
| 0 | 8 (16.7%) | ||
| 1 | 15 (31.2%) | 1 | 8 (16.7%) |
| 2 | 10 (20.8%) | 2 | 28 (58.3%) |
| 3 | 10 (20.8%) | 3 | 12 (25.0%) |
| 4 | 3 (6.3%) | ||
| 5 | 2 (4.2%) | ||
Transitions between qualitative disease states is of primary interest, because it captures disease progression. Subjects transitioned different numbers of times and between different numbers of distinct disease states throughout the study
The dataset contains various assays and biomarkers, including an enzyme-linked immunosorbent assay (ELISA) for antibodies recognizing soluble leishmania antigen (anti-SLA). The higher the value of anti-SLA, the higher the amount of antibodies and the more advanced the disease. For the purposes of this study, the disease state is comprised of the qualitative LeishVet score (Fig. S1) as well as the quantitative anti-SLA levels (visualized in Fig. 1).
Fig. 1.
(Left) LeishVet score across time by subject. LeishVet score was not recorded at the first follow-up for all subjects. (Right) Log anti-SLA across time by subject. The linear regression line (blue) indicates a small positive trend in log anti-SLA over time
There is sizeable missingness for both outcomes. In addition to the “missing” LeishVet scores at the first follow-up visit, there are four LeishVet and nine anti-SLA responses (assumed to be) missing at random; for reference there are 318 observed or partially observed multivariate responses. There are additional variables that could be included in a multivariable analysis such as baseline assessments, age group, location, and ectoparasiticide treatment status. Further details concerning the dataset can be found in Sect. 7.
3. Background and related work
3.1. Generalized linear mixed models
Generalized linear mixed models (GLMMs) with different distributional assumptions can be used to separately model serially correlated continuous and ordinal outcomes. To introduce GLMMs, we start with generalized linear models (GLMs) which have three main components: a random component, a systematic component, and a link function (Agresti 2012). Assume we have a random variable , which we believe follows a distribution from the exponential family (random component) and whose mean is . We then use a link function, , to relate the mean of that random variable to a linear combination of covariates (systematic component):
| (1) |
where is a vector of covariates that commonly includes a constant term and is an unknown vector of fixed effects. GLMs assume the responses are conditionally independent given the covariates, an assumption often violated in longitudinal studies. To address this, GLMMs allow for inclusion of subject specific effects, denoted here using and commonly referred to as random effects. We now assume that responses are conditionally independent given the covariates and random effects, while expressing the systematic component as:
| (2) |
where indexes the observation for patient i, is the number of observations for this subject, and is the number of subjects. The above is referred to as a random intercept model (Agresti 2012), and assuming there is a constant term in , we can assign a distribution to the random effects such as . Important for later discussion, the random effects are assumed independent of the covariates. For longitudinal data, we may additionally assume that trajectories are patient specific across time, in which case we could fit a random intercept and slope model:
| (3) |
where and is a 2 × 1 vector which we could assume . This serves as a middle ground between fitting a single GLM and fitting a separate GLM for each subject, as information is shared across subjects, but subjects each have their own trend line. Bayesian hierarchical models provide an analogous approach, where the distinction between fixed and random effects is blurred because all parameters are treated as random and the Bayesian interpretation of probability is employed. Nevertheless, we will use the terms “random effects” and “subject specific effects” interchangeably throughout.
Focusing on related Bayesian methods that allow for subject specific effects, the brms package (Bürkner 2018) package for R (R Core Team 2021) can be used to implement both linear mixed models and cumulative link mixed models, which link the cumulative probabilities to the systematic component and can be used for ordinal outcomes. While brms cannot model the residual correlation between continuous and ordinal outcomes, it is based on the STAN programming language (Carpenter et al. 2017) and therefore may straightforwardly use a non-informative prior on . This is a notable advantage compared to methods that rely on Gibbs samplers for hierarchical modeling, where a seemingly non-informative inverse-Wishart prior on can have enormous influence (Schuurman et al. 2016), yet is often chosen for conjugacy.
A variety of groups have looked at incorporating both subject specific effects and autocorrelated errors. Chi and Reinsel (1989) developed frequentist linear models with subject specific effects and an AR1 error process within subjects, where they saw improved fits by inclusion of the error structure. They also noted this can sometimes shrink the patient specific effects, thereby reducing the effective number of parameters in the model. Varin and Czado (2009) developed a model for longitudinal ordinal data that included patient specific intercepts and an AR error structure. Models were fit using maximum pairwise likelihood, but notably, their method and software does not allow for subject specific slopes.
3.2. Multivariate mixed models and related methods
Multivariate methods with different distributional assumptions are often used to jointly model responses of different data types. Catalno (1997) developed a modeling approach for clustered ordinal and continuous outcomes using Generalized Estimating Equations (GEEs). Gueorguieva and Agresti (2001) used a Monte Carlo expectation-conditional maximization algorithm to fit joint models for clustered binary and continuous data. Focusing on Bayesian methods, Cowles et al. (1996) used the latent variable approach of Albert and Chib (1993) to propose a multivariate Bayesian tobit model with subject specific intercepts to model longitudinal ordinal compliance data. Noting issues of poor mixing and slow convergence, Cowles (1996) developed a more efficient sampling routine based on a Hastings-within-Gibbs step. One approach to improving sampling efficiency is to use a parameter expansion for data augmentation (PX-DA) routine (Liu and Wu 1999; Meng and Dyk 1999) which Talhouk et al. (2012) and Li et al. (2020) utilized for multivariate binary data. However, implementing a PX-DA routine is complicated by the hierarchical nature of our model and we instead adopt the Cowles (1996) solution in our proposed framework. Li et al. (2016) developed a joint model for longitudinal continuous, binary, and ordinal events. In extensive simulations studies, Li et al. (2016) saw improved fits for their multivariate approach over univariate strategies when there was nonzero correlation between the subject specific effects of the various outcomes. In other related work, Teimourian et al. (2015) adopted a Bayesian framework to model ordinal and skewed continuous responses, while Ghasemzadeh et al. (2020) used a Bayesian quantile regression approach to jointly model continuous and ordinal data.
Looking for available software for fitting Bayesian hierarchical multivariate models with ordinal and continuous outcomes, the only R package we identified was the MCMCglmm package (Hadfield 2010). The MCMCglmm package has two noteable limitations in regards to our proposed method. First, it cannot fit models with an autoregressive structure in the residual covariance matrix. Second, when using hierarchical models, the prior on the subject specific effects covariance matrix must be an inverse-Wishart. Since it is difficult to specify a default inverse-Wishart that has limited impact on posterior estimates, MCMCglmm allows users to specify an improper prior with negative degrees of freedom, but the posterior is not guaranteed to be proper in these settings.
3.2.1. Scaled inverse-Wishart priors
When working with Bayesian hierachical models, it is common to assign an inverse- Wishart prior, such as where is , due to conjugacy. As described above, Schuurman et al. (2016) noted that this can have enormous influence on posterior estimates when variance terms in are close to 0. To overcome this type of issue, Gelman and Hill (2007) recommended the scaled inverse-Wishart (SIW) prior developed in O’Malley and Zaslavsky (2008), which uses the decomposition . The elements of , a vector of scale parameters, and , an unscaled covariance matrix, are not identifiable but the elements of are. For applications, Gelman and Hill (2007) assigned with a uniform prior on , for . This implied a uniform distribution on the correlations and an extremely vague prior on the variances.
3.3. Autoregressive models (mean structure)
Autoregressive models can incorporate lagged outcome values into the mean structure of current observations. These methods often result in overly complex implementations due to the “initial conditions problem", which occurs when working with short panel or longitudinal data. With this in mind, we next detail the current state of autoregressive models for categorical outcomes before considering incorporation of serial autocorrelation through the covariance structure.
Models for binary outcomes with lagged components in the mean structure have a rich history in the econometrics literature and have traditionally relied on observed discrete outcomes as dependent variables. In the binary and ordinal situations, this leads to the state dependence models described in Heckman (1981b):
| (4) |
| (5) |
where is the observed K-level ordinal outcome for patient i at their lth observation, is the corresponding latent outcome where if and only if , is a vector of person and time specific covariates, is a subject specific intercept that is assumed to be time invariant and independent of the covariates, and are assumed to be normally distributed with unit variance. The terms relate the discrete response from the prior observation to the current latent value. To model subjective wellbeing, Pudney (2008) developed an alternative formulation, termed a ‘Latent Auto-Regression Model’ (LAR), where dynamic feedback was captured through the latent outcomes:
| (6) |
| (7) |
Pudney (2008) specifically advocated for use of the LAR formulation when the true data generating mechanism is continuous, but outcomes are discretized due to limitations of the measurement process. Pudney (2008) used a simulated maximum likelihood approach to estimate model parameters, while Hasegawa (2009) and Stegmueller (2013) adopted Bayesian frameworks, relying on the latent variable model proposed in Albert and Chib (1993). Using a Bayesian multivariate approach, Steele and Grundy (2021) applied similar principles to develop a method for analyzing bivariate ordinal outcomes.
Implementation of these models is complicated by the ‘Initial Conditions Problem’, which is a deeply researched issue regarding panel and longitudinal data when there are a small number of observations per subject (Matyas and Sevestre 2008). To summarise, a subject’s initial latent value, , is rarely independent of their unobserved heterogeneity (), mandating an expression for the initial observation , . However, we do not assume the first observation represents the start of the process and cannot typically derive a closed form expression for the marginal probability (marginal of their prior history) when using categorical (Matyas and Sevestre 2008) or multivariate data. Ignoring this problem can lead to severe bias in the parameter estimates (Cappellari and Jenkins 2008).
There are two common approaches to addressing the initial conditions problems, the ‘Heckman’ and ‘Woolridge’ approaches. The former approach, presented in Heckman (1981a), relies on an approximation to , . The latter strategy, detailed in Wooldridge (2005), conditions on the initial observations to model the random effects. The Woolridge approach is not readily applicable to the LAR setting so the Heckman solution was adopted in Pudney (2008), Stegmueller (2013), Steele and Grundy (2021). In it’s simplest form for a univariate outcome, Heckman (1981a) suggested modeling the initial outcomes using different regression coefficients and a scalar multiple of the subject specific intercepts:
| (8) |
Here is a vector, and are scalars, and the errors are once again restricted to have unit variance. Expansions of the initial covariates, summary measures of the non-initial covariates, and different covariates could also be included as predictors.
To extend LAR models to the multivariate setting, Steele and Grundy (2021) detailed a flexible framework that allowed for response vectors of any length and, when working with categorical outcomes, incorporated process dynamics through a latent variable approach. Let be a vector of either observed or unobserved continuous responses. In the case of unobserved responses, these correspond to associated binary or ordinal response vectors () where
| (9) |
Then the dynamics of the process can incorporated through a transition matrix:
| (10) |
where is a matrix, is a matrix, and is a vector corresponding to a different random intercept for each outcome. When dealing with a continuous response vector, was an unconstrained covariance matrix, while the diagonal terms were restricted to be one in the categorical case. Steele and Grundy (2021) stated that the initial values are modeled using the Heckman approximation (Heckman 1981a), but omit discussion on how this extended to the multivariate case. We thus assume the initial responses were specified as a linear combination of baseline covariates and the random effect associated with that outcome:
| (11) |
In contrast, Alessie et al. (2004) extended the approximation to the multivariate setting by including a linear combination of the random effects in each regression equation:
| (12) |
where is a matrix, is a matrix, is a vector, and is allowed to have different correlations than .
3.4. Multivariate covariance structure approaches
The autoregressive approaches that model dependence through the mean structure require additional considerations due to the initial conditions problem, an alternative route to encoding serial dependence is through the covariance structure. For these methods, the primary difficultly is in constructing a valid “cross-covariance” function that adequately specifies the relationship across outcomes. This general approach is applicable to spatial and higher dimensional problems, as well as the special case of temporal covariance (in which includes only one dimension over which covariance is defined). To embed our work in an appropriately general context, we next summarize some of the work highlighted by Genton and Kleiber (2015), which describes several such general approaches commonly used in the spatial literature.
As defined in Genton and Kleiber (2015), a cross-covariance matrix function is separable if
| (13) |
where is the covariance between outcome i at location and outcome j at location . Note that ‘location’ can refer to a point in time for our longitudinal context. can be any valid correlation function while is a covariance matrix that relates the outcomes and does not depend on spatial location. This is considered separable since neither nor are functions of both location and outcome.
These models, sometimes referred to as ‘intrinsic coregionalizations’ (Genton and Kleiber 2015), offer simplification because the joint covariance can be written as a kronecker product. Wang and Fan (2010) used a frequentist version of this approach to formulate multivariate linear mixed models with autoregressive errors for application to longitudinal data, where the full covariance matrix for subject i was specified through . By adding the AR component to the error structure, the authors saw improved predictive performance in both simulation studies and their application. Wang and Fan (2012) extended this to multivariate t linear mixed models using Bayesian methodology. Methods based on separable cross-covariance functions assume the same decay or range for all outcomes, so their scope of application is limited to cases where this assumption may be reasonable.
Relaxing the rigidity of the separable methods, the most common approach to defining a valid cross-covariance covariance function is through linear combinations of univariate covariance functions, termed the ‘linear model of coregionalization’ (Genton and Kleiber 2015). These models have been broadly applied, especially in Bayesian contexts; for examples see (Schmidt and Gelfand 2003; Jin et al. 2007), and (MacNab 2016). Overviews are given in Banerjee et al. (2015) and Cressie and Wikle (2011).
For the linear model of coregionalization, the cross-covariance function is constructed through a linear combination of valid stationary covariance functions:
| (14) |
where represents some measure of spatial distance between the two observations and is a full rank matrix such that . We can let , however, typically and is a full rank matrix. can be uniquely defined either through a cholesky decomposition of (Schmidt and Gelfand 2003), or through a decomposition based on Given’s angles and an ordering restriction of the associated eigenvalues (Kang and Cressie 2011). While the linear model of coregionalization offers flexibility, it comes at a computational cost since conjugacy with is lost and algorithms must rely on alternative sampling techniques.
3.5. Statement of contribution
Due to the improvement in predictive performance witnessed in Wang and Fan (2010), the computational simplifications offered by separable covariance functions, and the problems inherent in the mean structure focused approaches, we base our method on the model of intrinsic coregionalization. Explicitly, our primary contributions in this space are:
We develop a method based on the intrinsic model of coregionalization for longitudinal multivariate responses of mixed data type (ordinal and continuous). This extends the work of Wang and Fan (2010) and Wang and Fan (2012) to a nonlinear and mixed outcome type setting. It should be noted that we only consider the AR1 case, which is a special case of the more general formulation used in the referenced manuscripts.
Our methodology uses a scaled inverse-Wishart prior (O’Malley and Zaslavsky 2008) on the covariance matrix for the patient specific effects. This employs a less informative and more flexible prior than those used in Li et al. (2016) and MCMCglmm, but still guarantees a proper posterior.
We allow for residual correlation between the ordinal and continuous outcomes, an option not available in brms or in the work of Li et al. (2016). Combined with the scaled inverse-Wishart prior, our implementation is a novel multivariate method, even without the autoregressive component.
4. Model definition and software implementation
We now develop a model based on the intrinsic model of coregionalization, which we selected in large part based on the increased predictive performance witnessed in Wang and Fan (2010). Since there is high computational demand associated with ordinal regression problems, we opt for a separable covariance structure approach over the more flexible linear model of coregionalization. An alternative strategy could use autoregressive models with lagged outcomes in the mean structure, but these require additional considerations due to the initial conditions problem and were dramatically outperformed by random slope and intercept models when applied to our motivating dataset.
4.1. Model definition
Let be a vector of observed continuous responses at patient i’s observation, . ‘j’ is reserved for indexing the outcomes. Similarly, let be a vector of observed categorical responses, while is an associated vector of continuous latent variables. We combine all continuous variables into a single vector of length , and stack these into patient specific matrices . is a corresponding matrix of subject and time specific covariates, which includes a vector of constant terms. Additionally, let be an vector of times since baseline. If subject i was observed at baseline and years 1, 3, and 4 post baseline, then . Finally, let be a matrix of subject specific intercepts and slopes, while is a design matrix constructed for the corresponding patient specific effects. Utilizing the latent variable representation from Albert and Chib (1993) and the operator, which appends the column vectors of a matrix, we define our model as:
| (15) |
| (16) |
| (17) |
which implies that . Additionally, we restrict the first cutpoint to be 0 and the second cutpoint to be 1 for each outcome and is a matrix. The constraints naturally combine with the likelihood to enforce an ordering restriction on the cutpoint parameters (). The two cutpoint constraints allow to be an unconstrained covariance matrix, but omit inclusion of binary outcomes.
is an AR1 correlation function where . This correlation function can be treated as a special case of the more general formulation used in Wang and Fan (2010). This function is appropriate for equally spaced observations or observations that are unequally spaced due to missingness (Jones 2011). In principle, a (negative) exponential correlation function should be used for a truly continuous time AR1 process (Jones 2011). Here we consider a function meant for equally spaced observations, as this fit our motivating application and allows for negative correlation. Future work could expand our method for use with a variety of correlation functions that accommodate continuous time.
We refer to our proposed model as a Bayesian mixed response autoregressive model (bmrarm), due to the autoregressive error component. The bmrarm is eventually compared to simpler implementations without the autoregressive covariance structure, in both univariate and multivariate formats, where the multivariate versions are fit using our software. Specifically, these simpler models use and have either subject specific intercepts and slopes, or just subject specific intercepts. We refer to these models as bmrslope and bmrint, respectively.
4.2. Priors
We place priors that allow us to jointly draw and , where the prior on is a matrix-normal distribution as described in Ding and Cook (2014):
| (18) |
| (19) |
and assign independent uniform priors to the latent cutpoint parameters
| (20) |
where is chosen a priori to be arbitrarily large. In the case of an ordinal outcome with three levels, all cutpoints are restricted and no priors are assigned. Next we place a uniform prior on the autoregressive parameter .
To complete the model specification, we must place a prior on the covariance matrix for the subject specific effects, . A common default prior is . However, when the variance terms in are close to zero and there are a small number of subjects, this prior can have enormous influence resulting in significant bias (Schuurman et al. 2016). An alternative strategy is to use the data dependent priors discussed in Schuurman et al. (2016), but we prefer fully Bayesian approaches that do not use the data in both prior and likelihood components. Instead, we opt for the scaled inverse-Wishart (SIW) prior (O’Malley and Zaslavsky 2008), which as previously discussed, uses the decomposition .
For applications with the SIW prior, Gelman and Hill (2007) assigned and for . However, in our sampler the variance terms of tend to get stuck near 0 for extended periods of time when using priors this vague. Assigning corrected this issue, still offers a more dispersed prior on the variances than and implies a uniform distribution on the correlations (Fig. 2). Future implementations could explore the parameter expansion approaches discussed in Gelman et al. (2008), which addressed the issue of variance parameters getting stuck near zero and could allow for a more flexible SIW prior.
Fig. 2.
(Left) Comparison of prior densities for using the proposed IW and SIW implementations while (the number of outcomes). (Right) Histogram of the implied prior on the correlations between the patient specific effects. The plot is based on , but all off diagonal terms have a similar result
To implement the SIW prior we actually assign and place a prior on , which implies the previously noted prior on (Eq. 15). Each iteration we update and scale it to obtain draws from . In general, we have found increased stability when parameterizing the algorithm so that and are conditionally independent, and implement the model accordingly.
Derivation of the full conditional distributions, details on accommodating missing data, and the implementation of the full MCMC sampler are described in the supplemental information.
5. Forecasts
Simulated forecasts can be generated in analytically equivalent but mechanically distinct ways. We start with the posterior predictive distribution , which is the predictive distribution for the unobserved values conditional on the observed values (Gelman et al. 2013).
The first line of the above equation suggests we can build a joint sampler for the unknown parameters and the new observations. This can be accomplished by including outcomes as “NA” in the dataframe supplied to the function call, similar to JAGS (Plummer 2003). To get predictions for an ordinal outcome, users can discretize the values returned in the“bmrarm” object using draws from the cutpoint parameters. The second suggests that once the samplers have converged to the posterior, those samples can be used to simulate new observations.
Suppose we want to forecast a continuous response for subject i at with the second approach, where the model is based on a single ordinal and single continuous response. To simplify notation, let and there are draws from the posterior distribution . Each set of parameters () can be used to generate a prediction using , which is a univariate normal distribution that results from conditioning a multivariate normal on the (previously) observed continuous and latent outcomes. Those simulated predictions can then be used to obtain summaries such as sample moments and quantiles, as well as an estimate of the posterior predictive distribution:
The ordinal case is slightly more complicated. The posterior predictive distribution is:
This implies that each set of posterior draws () can be used to generate a latent outcome using , and then discretized using to obtain ordinal valued predictions. As in the continuous case, is a univariate normal distribution that results from conditioning a multivariate normal on the (previously) observed continuous and latent outcomes.
6. Simulation studies
We now present several simulation studies where data generation was based on our motivating application. The first study assessed parameter recovery and convergence, the second evaluated forecast accuracy, and the third analyzed methods for model selection. bmrarm and bmrslope with a single ordinal outcome are defined using:
where for bmrarm and for bmrslope. For the design matrices, we used , where is a vector of times since baseline for subject i.
In each of the following simulation studies we mirrored the leishmaniosis data and generated samples from 48 subjects. For each subject, we generated two continuous outcomes and discretized one of them into an ordinal variable with five levels. Subjects had 4-7 (vector valued) responses with probabilities 0.042, 0.042, 0.083, and 0.833, respectively. Additionally, the ordinal outcomes at the first follow-up were discarded for all subjects, in alignment with the actual data. We randomly set nine of the continuous outcomes and four additional ordinal outcomes to missing. True values for each of the parameters can be found Table 3. We note that the subject specific effects were randomly generated for each dataset based on , so we only report 95% CI coverage. The bmrarm default hyperparameters were used in both simulation studies: , , , and , , .
Table 3.
Parameter recovery based on 400 simulations datasets
| Param | Truth | bmrarm (True Model) |
bmrslope |
||||
|---|---|---|---|---|---|---|---|
| Mean | SD | Cover | Mean | SD | Cover | ||
| 0.50 | 0.494 | 0.112 | 0.958 | 0.491 | 0.112 | 0.948 | |
| 0.18 | 0.193 | 0.049 | 0.945 | 0.196 | 0.049 | 0.940 | |
| 0.05 | 0.052 | 0.078 | 0.945 | 0.052 | 0.079 | 0.942 | |
| 0.10 | 0.100 | 0.033 | 0.955 | 0.100 | 0.033 | 0.948 | |
| 0.30 | 0.384 | 0.096 | 0.890 | 0.261 | 0.056 | 0.885 | |
| 0.02 | 0.033 | 0.024 | 0.970 | 0.019 | 0.015 | 0.958 | |
| 0.13 | 0.155 | 0.031 | 0.908 | 0.101 | 0.010 | 0.230 | |
| 0.22 | 0.238 | 0.140 | 0.960 | 0.397 | 0.151 | 0.720 | |
| 0.18 | 0.174 | 0.071 | 0.948 | 0.255 | 0.067 | 0.750 | |
| 0.04 | 0.044 | 0.012 | 0.955 | 0.049 | 0.012 | 0.892 | |
| 0.05 | 0.069 | 0.029 | 0.928 | 0.082 | 0.030 | 0.788 | |
| 1.50 | 1.548 | 0.093 | 0.922 | 1.555 | 0.093 | 0.920 | |
| 1.90 | 1.989 | 0.142 | 0.915 | 2.003 | 0.141 | 0.890 | |
| Covariances | 0.973 | 0.957 | |||||
| 0.35 | 0.408 | 0.099 | 0.922 | ||||
| Overall | 0.944 | 0.902 | |||||
| Random Effects | 0.943 | 0.905 | |||||
| Overall w/o RE | 0.948 | 0.871 | |||||
Truth = true value of the parameter, Mean = mean of the posterior means, SD = mean of the posterior standard deviations, Cover = the empirical 95% CI coverage, w/o RE = without random effects
6.1. Simulation: convergence and recovery
For the first set of studies we generated data using bmrarm () where and . We then assessed convergence and parameter recovery of both bmrarm and bmrslope.
6.1.1. Convergence assessment
For a randomly selected dataset, we simulated four separate chains with different starting values for the cutpoint parameters. The latent values were set to −0.5 when and when , otherwise they were set to the midpoint of the cutpoints they were bound between. The rest of the parameters were initialized at either 0 or 1 by default, depending on whether they were positively constrained. For each chain we obtain 25,000 draws, of which 5,000 were discarded as burn-in. We thinned the samples by 5, primarily to reduce the size of the stored objects. Five of the difficult to sample parameters were assessed visually in (Figs. 3 and 4). The primary issue we saw was slow mixing of and , which we accounted for by taking a large number of samples; all effective sample sizes were above 200.
Fig. 3.
Trace and density plots for the cutpoint parameters corresponding to the latent outcome. Data comes from the randomly selected replicate dataset
Fig. 4.
Trace and density plots residual variance and subject specific intercept variance corresponding to the latent outcome
We also report posterior summaries for the five parameters for each chain (Table 2), along with the Gelman-Rubin diagnostic (Gelman and Rubin 1992), which was calculated using the coda package (Plummer et al. 2006); they suggest adequate convergence. While not reported in the table, the Gelman-Rubin diagnostic was applied to all parameters, including the patient specific effects and the latent outcomes, of which the maximum upper CI was 1.014.
Table 2.
Posterior estimates and convergence diagnostics for the cutpoint parameters and the variance term corresponding to the ordinal outcome
| Param | Chain | Mean | SD | Lower 95 | Upper 95 | Eff | Gelman-Rubin |
|---|---|---|---|---|---|---|---|
| 1 | 1.42 | 0.077 | 1.29 | 1.59 | 510.21 | 1.006 | |
| 2 | 1.43 | 0.074 | 1.30 | 1.59 | 531.06 | ||
| 3 | 1.43 | 0.077 | 1.29 | 1.59 | 362.99 | ||
| 4 | 1.43 | 0.077 | 1.29 | 1.60 | 527.56 | ||
| 1 | 1.72 | 0.111 | 1.53 | 1.96 | 405.19 | 1.004 | |
| 2 | 1.73 | 0.107 | 1.54 | 1.95 | 399.55 | ||
| 3 | 1.72 | 0.108 | 1.53 | 1.94 | 305.34 | ||
| 4 | 1.73 | 0.109 | 1.54 | 1.96 | 355.01 | ||
| 1 | 0.29 | 0.065 | 0.18 | 0.43 | 401.08 | 1.004 | |
| 2 | 0.29 | 0.061 | 0.19 | 0.43 | 452.52 | ||
| 3 | 0.29 | 0.064 | 0.18 | 0.43 | 444.41 | ||
| 4 | 0.29 | 0.063 | 0.19 | 0.43 | 453.50 | ||
| 1 | 0.10 | 0.074 | 0.01 | 0.29 | 715.69 | 1.002 | |
| 2 | 0.10 | 0.077 | 0.01 | 0.29 | 648.98 | ||
| 3 | 0.10 | 0.075 | 0.01 | 0.29 | 588.83 | ||
| 4 | 0.10 | 0.078 | 0.01 | 0.30 | 494.28 | ||
| 1 | 0.06 | 0.024 | 0.02 | 0.12 | 233.61 | 1.008 | |
| 2 | 0.06 | 0.023 | 0.03 | 0.12 | 221.16 | ||
| 3 | 0.06 | 0.024 | 0.02 | 0.12 | 229.42 | ||
| 4 | 0.06 | 0.026 | 0.03 | 0.13 | 228.67 |
Mean = posterior mean, SD = posterior standard deviation, Lower 95 = the lower 95% credible interval, Upper 95 = the Upper 95% credible interval, Eff = effective sample size, Gelman-Rubin = upper CI from the Gelman-Rubin Diagnostic
6.1.2. Parameter recovery
Since our visual and metric based convergence assessment revealed no notable issues, we evaluated bmrarm posterior estimates from the 400 simulated datasets. Our primary interest was in the mean of the posterior means and the coverage of the 95% credible intervals. For each dataset, we took 25,000 draws, of which 5,000 were discarded as burn-in and thinned to keep one of every five iterations, primarily for space reduction. The median run time for these simulations was 24.8 min; all simulations and the application were run on the University of Iowa High Performance Computing system.
Using the bmrarm package we fit bmrslope to the simulated datasets drawing 10,000 samples, discarding 2,000 as burn-in and thinning by 2. On average, these models took 3.4 min. There is room for improvement in the speed of simpler models as the code was written for the autoregressive case and could cater more directly to implementations without an autoregressive term.
Generating data and evaluating posterior estimates from bmrarm, the overall 95% credible interval (CI) coverage for all unknown parameters (excluding the latent outcomes) was 0.944 (Table 3). The estimates of the latent cutpoints ( and ), the autoregressive term (), and the variance term corresponding to the latent outcome () were biased upward, but still had reasonable coverage (≥ 0.890). bmrslope compensated for with inflated estimates of the variance terms of , which was paired with poor coverage and downward bias of (Table 3).
6.2. Simulation: forecast quality
Our primary analysis goal was to produce accurate short-term forecasts of disease progression. Thus, it was of interest to assess out-of-sample predictive performance. To compare the forecasting ability of bmrarm to bmrint and bmrslope, we used the 400 datasets that were generated using bmrarm (). As a reminder, bmrint is akin to bmrslope, except (i.e. slopes are not subject specific). For each dataset and subject, we drew 4 additional observations from the appropriate multivariate normal distribution that conditioned on their previous responses. Thus, all observations for a subject can be seen as a draw from a (joint) multivariate distribution, where the final 4 observations were held out when fitting the models and were used for prediction evaluation.
For the ordinal outcome, forecasts were assessed with the (discrete) ranked probability score (RPS, Epstein (1969)). To introduce RPS, let be a a vector of forecast probabilities corresponding to a three level ordinal outcome, such as low, medium, and high. Additionally, let be a vector which indicates the true outcome value, which is ‘low’ in this example. Now we define the RPS for a single forecast:
| (21) |
where K, which is three in our example, is the number of levels for the ordinal outcome. RPS is a strictly proper scoring rule, meaning its expected value is uniquely optimized by using the true probability distribution for (Wilks 2005). RPS is bound between 0 and 1, with 0 being a perfect score, and takes the ordering into account by more heavily penalizing forecasts as probability is shifted to categories further from the true outcome. For example, using a true outcome of ‘low’ and , the RPS is 0.325. When , the RPS is 0.565. Despite both forecasts assigning the same probability to the correct outcome, the latter was penalized for assigning a large probability (0.7) to the ‘high’ category.
As seen in Table 4, bmrarm offered the lowest RPS at all four time points, indicating it had the highest predictive power for the ordinal outcome. RPS for bmrarm was closely followed by that of bmrslope, while bmrint performed substantially worse. For this work, we calculated RPS using the rps function from the verification package (NCAR 2015), which took the mean of the individual RPS scores over all subjects.
Table 4.
Evaluation of 1–4 step ahead forecasts for bmrarm and two baseline models
| Outcome | Metric | Steps Ahead | Models | ||
|---|---|---|---|---|---|
| bmrint | bmrslope | bmrarm | |||
| Ordinal | RPS | 1 | 0.122 | 0.084 | 0.082 |
| Ordinal | RPS | 2 | 0.129 | 0.089 | 0.088 |
| Ordinal | RPS | 3 | 0.137 | 0.090 | 0.088 |
| Ordinal | RPS | 4 | 0.146 | 0.088 | 0.087 |
| Continuous | RMSE | 1 | 0.896 | 0.442 | 0.427 |
| Continuous | RMSE | 2 | 1.079 | 0.533 | 0.525 |
| Continuous | RMSE | 3 | 1.267 | 0.604 | 0.594 |
| Continuous | RMSE | 4 | 1.462 | 0.669 | 0.658 |
| Continuous | Coverage | 1 | 0.792 | 0.932 | 0.953 |
| Continuous | Coverage | 2 | 0.706 | 0.901 | 0.956 |
| Continuous | Coverage | 3 | 0.626 | 0.889 | 0.958 |
| Continuous | Coverage | 4 | 0.566 | 0.883 | 0.963 |
RMSE = root mean square error, coverage = 95% prediction interval coverage, RPS = ranked probability score
For the continuous outcome, performance was quantified through root mean squared error (RMSE) of prediction while using the mean of draws from our posterior predictive distribution as a point estimate. Additionally, we evaluated the proportion of 95% prediction intervals that contained the true outcome. As seen in Table 4, bmrarm had the lowest RMSE at all four time points, and its prediction interval coverage was close to 95% (Fig. 4).
Using the same randomly selected dataset as Sect. 6.1.1, we visualized 1–4 step ahead point estimates (mean) for all three models for nine canine patients (Fig. 5). The model without subject specific slopes (bmrint), did a poor job at matching patient trajectories. On the other hand, while bmrarm forecasts are pulled towards a patient’s most recent observation, there is sizable overlap between the mean predictions for bmrslope and bmrarm. Of note, forecasts were generated using draws from multivariate normal posterior predictive distributions. Due to the probabilistic nature of the forecasts and the relatively small number of posterior samples (4000), the mean prediction lines do not visually appear perfectly linear (Fig. 5).
Fig. 5.
1–4 step ahead mean based point estimates for 9 sample canine patients, where lines are colored according to the model. Black circles indicate observations that were used to fit the model, while the black triangles were used to evaluate predictions
6.3. Simulation: model selection
With a single dataset, a standard way to compare models and assess their out of sample predictive accuracy is through cross-validation or information criteria, which can be seen as approximations to different types of cross validation (Gelman et al. 2014). Cross-validation is a more direct route of assessment, but requires refitting models. Given the potentially infeasible computational burden of refitting our model, we opted for Deviance information criterion (DIC), defined in Spiegelhalter et al. (2002).
When using hierarchical models, there is ambiguity when defining the likelihood and several forms of DIC have been developed. The selection criterion has a different“focus” depending on the form of the likelihood used. As an example, Spiegelhalter et al. (2002) defined the hierarchical model and considered two possible forms of the likelihood. The likelihood could be viewed as with prior or as with prior . In the context of mixed effects models, we can treat as the subject specific effects and will refer to DIC based on as conditional DIC (cDIC) and DIC based on as marginal DIC (mDIC).
Chan and Grant (2016) explored cDIC and mDIC (which they refer to as the observed-data DIC) in the context of latent variable models, with specific interest in the stochastic volatility models. Through simulation studies, the authors show that cDIC favors overfitted models, while mDIC performs well. cDIC is straight forward to calculate and is readily available in commonly used Bayesian software, while mDIC is often difficult to calculate which limits its adoption. In the case of Chan and Grant (2016), they developed an importance sampling algorithm to calculate mDIC.
6.3.1. Likelihood and DIC
To simplify notation for this section, let be a matrix of responses. To define DIC let be the deviance and be the posterior mean of the parameters. Then the effective number of parameters is and , where is estimated with . DIC is a tradeoff between goodness of fit and complexity, with lower values indicating a better fit. is the joint likelihood over all subjects and observations, which can be factorized into . When , there is not conditional independence given the subject specific effects and the likelihood cannot be further factorized.
6.3.2. Likelihoods
We now formally define the likelihoods and DIC calculations used in our model selection. Focusing on the continuous outcomes (observed and unobserved), our model is
| (22) |
The model can be formulated in (at least) two ways. The first is as a mixed model where we condition on the subject specific effects
| (23) |
and we refer to as the ‘continuous outcome conditional likelihood’, which is the likelihood contribution from the ith subject. Alternatively, we can view this as a linear model with additional structure added to the residual covariance
| (24) |
and we refer to as the ‘continuous outcome marginalized likelihood’. This relies on a general result that occurs when a normal likelihood is combined with a normal prior on the subject specific effects; it can be confirmed by utilizing the Woodbury matrix identity (Woodbury 1950) to solve
Finally, need to define the non-augmented likelihood contributions that are used to calculate cDIC and mDIC. Let denote posterior draws from all parameters of interest, then we can write:
When calculating cDIC we use the ‘continuous outcome conditional likelihood’ for , while the ‘continuous outcome marginalized likelihood’ is used when calculating mDIC. Both forms of the likelihood (contributions) are multivariate normal and can be written as a conditional multivariate normal times a marginal multivariate normal (line 3). To find , we once again rely on numeric integration using the function from the OpenMX package. When is missing, is integrated over the real line and does not impact the value of the likelihood. When is missing, it is viewed as a parameter and does not contribute to the likelihood.
6.3.3. Simulation studies
To assess cDIC and mDIC, we simulated 400 datasets under bmrarm and bmrslope. We then fit bmrint, bmrslope, and bmrarm to each dataset. As a reminder, bmrint is a multivariate model with subject specific intercepts and no autoregressive term, while bmrslope is a similarly defined model with both subject specific intercepts and slopes. Both bmrint and bmrslope were fit using the bmrarm package. bmrint was not selected by either criteria for any dataset and was excluded from Table 5.
Table 5.
Simulation study comparing selection capabilities of cDIC and mDIC
| Data generating model | Proportion of times bmrarm was selected by each criterion | |||
|---|---|---|---|---|
| cDIC | ΔcDIC≤-2 | mDIC | ΔmDIC≤-2 | |
| bmrslope | 0.350 | 0.318 | 0.215 | 0.078 |
| bmrarm () | 0 | 0 | 0.98 | 0.948 |
Values are the proportion of times each selection criteria selected each model. ‘ΔmDIC≤-2’ is the proportion of times the bmrarm model results in a decrease of mDIC of at least 2 (compared to bmrslope)
When we generated data from bmrslope, mDIC did a better job than cDIC at selecting the true model (0.785 vs 0.65; 0.785 comes from 1 - 0.215). If we mandate that we need a decrease in DIC of at least 2 to select the more complicated bmrarm method, then mDIC only selected the incorrect model 7.8% of the time (31.8% for cDIC). Generating data from bmrarm with , cDIC never selected the correct model while mDIC was correct 98.0% of the time. Even when needing a decrease of at least 2, mDIC selected the more complicated autoregressive model 94.8% of simulations. Since mDIC did a superior job of distinguishing between the two models, similar to the results obtained in Chan and Grant (2016), we used mDIC for the leishmaniosis application and as the default information criterion in the package.
7. Application
Recall the leishmaniosis dataset presented in Sect. 2. We analyzed this data using bmrarm, bmrint, bmrslope, and univariate approaches available in the brms package. The primary outcome was LeishVet score (ordinal) and the secondary outcome was log anti-SLA (continuous). The dataset contained 318 observed or partially observed multivariate responses and a handful of potential covariates including age group at baseline, study location, ectoparasiticide treatment group, and a dual path platform (DPP) leishmania serological test (only performed at enrollment). Higher values of DPP indicated more advanced disease.
Summary statistics for both dependent variables and all covariates are reported in Table 6. DPP is extremely right skewed, and the log of it appeared to have a linear association with the secondary outcome, log SLA (Fig. 7). This makes sense, as both measure anti-leishmania antigen antibodies. Thus we used (and therefore report) log DPP for modeling purposes. We also combined age groups into 0-2 vs 3-11 due to the small samples sizes for the older age groups. Similarly, we only observed one case of LeishVet score = 4 and set this to LeishVet score = 3 prior to fitting the models.
Table 6.
Summary statistics for the 2 observed responses (bolded) and the four covariates
| Variable | Type | Level | Count (%) | Mean (SD) | Missing (%) |
|---|---|---|---|---|---|
| LeishVet Score | Ordinal | 0 | 30 (9.3%) | NA | 52 (16.1%) |
| 1 | 116 (36.0%) | NA | |||
| 2 | 115 (35.7%) | NA | |||
| 3 | 8 (2.5%) | NA | |||
| 4 | 1 (0.3%) | NA | |||
| Age | Ordinal | 0–2 | 21 (43.8%) | NA | 0 (0%) |
| 3–5 | 23 (47.9%) | NA | |||
| 6–8 | 3 (6.2%) | NA | |||
| 9–11 | 1 (2.1%) | NA | |||
| Location | Nominal | B | 19 (39.6%) | NA | 0 (0%) |
| M | 17 (35.4%) | NA | |||
| W | 12 (25.0%) | NA | |||
| Treatment | Binary | Blue | 24 (50%) | NA | 0 (0%) |
| Yellow | 24 (50%) | NA | |||
| Log anti-SLA | Continuous | NA | −0.27 (0.78) | 9 (2.8%) | |
| Baseline log DPP | Continuous | NA | 2.14 (1.45) | 0 (0%) |
Age is the baseline age group, location is the facility where the canine is located, treatment is the ectoparasiticide treatment group, and DPP is a dual path platform leishmania serological test. DPP detects anti-leishmania antibodies similar to anti-SLA ELISA but is more specific
Fig. 7.
Marginal associations between the four covariates and baseline values of the dependent variables. The -value between log SLA and log DPP is based on pearson correlation, while -values for LeishVet score used fishers exact test
We first visualized LeishVet score and log of anti-SLA, and saw evidence of a positive association between the two (Fig. 6), as has been previously observed (Proverbio et al. 2014). This motivated the use of multivariate approaches. Marginal associations between covariates and baseline values of the dependent variables are plotted in Fig. 7. Visual inspection indicated a positive association between baseline log SLA and log DPP at enrollment (pearson correlation -value< 0.001). We also saw that location W had a large number of dogs with baseline LeishVet score 0 or and older dogs tended to have higher baseline LeishVet score. This is consistent with age associated disease progression, which has also been observed in other cohorts (Toepp et al. 2019). Formal tests for association (fisher’s exact test) between baseline LeishVet score and the categorical covariates (age group, location, treatment) were all non-significant. Nevertheless, we used log DPP, age group, location, and treatment as covariates in the regression models, preferring a-priori specification to hypothesis test based model selection.
Fig. 6.
Log anti-SLA plotted against LeishVet score. This image does not account for the dependent nature of the responses. LeishVet score is equal to 4 for only a single observation, thus presenting zero variability
7.1. Regression models
We fit a bivariate bmrarm to the leishmaniosis data. The design matrix, , contributed an intercept, a linear temporal term, an indicator for treatment status, log DPP values, and dummy variable for age > 3 and location. To write the regression equation we use to indicate rows corresponding to subject i, and is a design matrix for the subject specific effects, which contributed an intercept and a linear temporal term. Then, using and to denote the unobserved continuous and observed continuous outcomes, respectively, we can write our equation as:
where . There were several baseline methods we compared to bmrarm, which had the same fixed effect structure, but differed in subject specific effects and covariance structures.
Univariate models fit using the function from the brms package with subject specific intercepts. Log anti-SLA was fit using all default settings while LeishVet score used family = cumulative(‘probit’).
The same setup as (1) with subject specific intercepts and slopes.
bmrint and bmrslope
The package default hyperparameters were used for bmrarm, bmrint, and bmrslope: , , , and , , .
7.2. Convergence assessment
Before interpretation and model comparison, we assessed convergence of the bmrarm method, for which drew 25,000 samples and discarded the first 5,000 as burn-in. Samples were thinned by five to reduce the size of the returned objects. We first evaluated convergence of the sampler, focusing on , , , . Visual inspection of the trace plots (Figs. 8 and 9) indicated inefficient mixing, but no serious issues. The inefficiency was accounted for by taking a large number of draws. The maximum Gelman-Rubin upper CI for any parameter, including the patient specific effects and latent outcomes, was 1.007, well below the commonly used threshold of 1.1 (Roy 2020).
Fig. 8.
leishmaniosis application: Trace and density plots for the one free cutpoint and variance term corresponding to the latent outcome
Fig. 9.
leishmaniosis application: Trace and density plots subject specific variance terms corresponding to the latent outcome
To further evaluate the appropriateness of bmrarm we used several posterior predictive checks. Gelman et al. (2013) defined a Bayesian -value as
Posterior predictive checks can be implemented through a simulation based approach by comparing realized test quantities to predictive test quantities , which are based on the S posterior draws. When is close to 0 or 1, this indicates a lack of fit and requires further investigation of the mechanics of the model.
Two aspects of our model were evaluated in this way. The posterior predictive check assessed whether the replicated datasets had a reasonable number of ordinal state transitions. We specifically compared the number of dogs with 0 transitions, 1–2 transitions, and more than 3 transitions. Results are displayed in Table 7, which indicated a reasonable number of state transitions in the replicated datasets. For this check, predictions were generated simultaneously for all time points and outcomes using the fixed effects, patient specific effects, and residual covariance terms.
Table 7.
Frequency of ordinal state transitions between the replicated and observed data
| Outcome | Test | T(y) | 95% CI for | -value |
|---|---|---|---|---|
| Ordinal | % with transitions = 0 | 16.7 | [2.1, 20.8] | 0.14 |
| % with 1≤ transitions ≤2 | 52.1 | [37.5, 66.7] | 0.52 | |
| % with transitions ≥3 | 31.3 | [22.9, 52.1] | 0.85 |
Second, we replicated the continuous outcome at the final time point for each subject using the posterior draws and their prior multivariate responses. Our ultimate goal was short term forecasts of disease progression, this let us evaluate if the posterior draws and prior observations combined to make realistic one-step-ahead predictions. We compared the replicated final responses to the observed data by using the median, standard deviation, min, and max as test quantities. There were no serious discrepancies between the summaries of the observed and replicated data (Fig. 10), although the posterior predictive checks implied that the proposed method may tend to overestimate the maximum observation as well as the spread of the observations.
Fig. 10.
Posterior predictive checks for the continuous outcome for the leishmaniosis dataset. Checks are based on replicated the final outcomes for each subject using the posterior draws and their previous responses. Plots were made using the ppc_stat function from the bayesplot package (Gabry and Mahr 2021)
7.3. Model comparison
We return to comparison of bmrarm and the four baseline models, which was accomplished using mDIC. The brms specification used different constraints to identify the latent outcomes in the ordinal models. Since we integrated out the latent outcomes and use the non-augmented likelihood for mDIC, it was still appropriate to compare the methods. We first saw that bmrint and bmrslope offered notable improvements over their univariate counterparts (Table 8). Additionally, bmrarm was the best performing model with a mDIC decrease of roughly 3.1 from bmrslope. The posterior mean of was 0.17 and the 95% credible interval did not contain 0, indicating support for it’s inclusion.
Table 8.
mDIC comparison for the leishmaniosis application
| Type | Model | mDIC | pD | Mean (95 % CI) | Time (per chain) |
|---|---|---|---|---|---|
| Univariate | Intercept | 949.9 | 20.4 | 0.7 min | |
| Int + Slope | 907.9 | 24.6 | 2.6 min | ||
| Bivariate | bmrint | 942.6 | 21.8 | 6.7 min | |
| bmrslope | 902.1 | 25.2 | 9.8 min | ||
| bmrarm | 899.0 | 25.5 | 0.17 (0.02, 0.33) | 29.8 min |
7.4. Results
Table 9 contains posterior means, 95% credible intervals (CI), and the associated priors for the fitted bmrarm. As previously observed, the posterior mean for the autoregressive term, , was 0.172 and the credible interval did not contain 0. Posterior estimates of and were positive for both outcomes and credible intervals did not contain 0, except for for the continuous outcome. These positive associations were anticipated, as we expect disease to get worse over time. The posterior mean of was large in magnitude and negative for both outcomes, and its CIs did not contain 0. The estimated effect of blDPP, which indicates more advanced disease, was positive for both outcomes.
Table 9.
Prior distributions and parameter estimates for analysis of the leishmaniosis dataset
| Parameter | Prior | Outcome | Posterior Mean (SD) | Posterior 95% CI |
|---|---|---|---|---|
| Ordinal | 0.172 (0.079) | (0.02, 0.33) | ||
| Ordinal | 2.36 (0.168) | (2.065, 2.718) | ||
| Ordinal | 0.472 (0.168) | (0.139, 0.802) | ||
| Ordinal | 0.22 (0.128) | (− 0.026, 0.474) | ||
| Ordinal | 0.066 (0.026) | (0.016, 0.118) | ||
| Ordinal | 0.346 (0.147) | (0.063, 0.644) | ||
| Ordinal | 0.085 (0.051) | (− 0.014, 0.188) | ||
| Ordinal | 0.102 (0.152) | (− 0.197, 0.4) | ||
| Ordinal | − 0.975 (0.171) | (− 1.319, − 0.642) | ||
| Cont | − 1.175 (0.164) | (− 1.497, − 0.85) | ||
| Cont | − 0.026 (0.126) | (− 0.272, 0.222) | ||
| Cont | 0.077 (0.02) | (0.038, 0.117) | ||
| Cont | 0.289 (0.147) | (− 0.003, 0.582) | ||
| Cont | 0.293 (0.052) | (0.189, 0.393) | ||
| Cont | 0.026 (0.153) | (− 0.272, 0.333) | ||
| Cont | − 0.375 (0.171) | (− 0.715, − 0.041) | ||
| Ordinal | 0.287 (0.054) | (0.198, 0.408) | ||
| Shared | − 0.02 (0.081) | (− 0.179, 0.138) | ||
| Cont | 0.142 (0.018) | (0.113, 0.181) | ||
| Ordinal | 0.078 (0.063) | (0.01, 0.244) | ||
| Ordinal | − 0.249 (0.286) | (− 0.721, 0.362) | ||
| Shared | 0.196 (0.307) | (− 0.449, 0.733) | ||
| Shared | − 0.012 (0.285) | (− 0.55, 0.539) | ||
| Ordinal | 0.011 (0.005) | (0.004, 0.022) | ||
| Shared | 0.225 (0.245) | (− 0.285, 0.652) | ||
| Shared | 0.212 (0.227) | (− 0.255, 0.619) | ||
| Cont | 0.159 (0.059) | (0.064, 0.295) | ||
| Cont | − 0.183 (0.209) | (− 0.542, 0.27) | ||
| Cont | 0.012 (0.004) | (0.006, 0.021) |
The posterior mean of the residual correlation () was negligible, implying conditional independence given the fixed and random effects. While this indicated the lack of need for a joint residual covariance matrix for this application, we feel this will not always be the case. All CI for the correlations in contained 0; this was not unexpected given the small sample size. However, posterior means of 3 of the 4 shared correlation parameters were ≥ 0.196, indicating support for jointly estimating the patient specific effects.
1–4 step forecasts are presented for LeishVet scores in Fig. 11 and log anti-SLA in Fig. 12, which appeared in line with expectations based on the observed values. Continued longitudinal follow up of study subjects would permit observation of more transition events and is expected to enhance the clinical utility of models of this form, and will be used for future analyses.
Fig. 11.
LeishVet score probability forecasts using a bmrarm. Black dots indicate observed outcomes, which have probability of 1 for their true value
Fig. 12.
Log anti-SLA forecasts using a bmrarm. Red dots indicate observed responses, while blue indicate the mean and 95% credible intervals of the predicted outcomes
8. Conclusions
In this paper we developed a multivariate Bayesian hierarchical model with autoregressive errors that incorporates ordinal and continuous responses, which is accompanied by a reproducibility package (https://github.com/nickseedorff/bmrarm). Efforts were made to have reasonable sampling efficiency and to use flexible priors, such as the scaled inverse-Wishart prior for the patient specific effects covariance structure. A simulation study indicated that when an autoregressive component is present in the data generation process, its inclusion in the model can improve short term forecasts of both the continuous and ordinal outcomes. An additional simulation study motivated mDIC as a criterion for model selection, and mDIC showed preference to our method over several baseline models.
8.1. Limitations and future work
The primary limitations for the method are the high computational cost, the requirement that observations are (roughly) equally spaced; it does not allow for inclusion of binary outcomes, the range parameter is assumed equivalent across all outcomes, the SIW prior may be overly informative when using the default hyperparameters, and our derivations are based on a single ordinal outcome. Future work should look to further improve the computational efficiency of the method, possibly through a parameter-expansion for data-augmentation (Liu and Wu 1999) routine, and expand the number and types of outcomes that can be incorporated. Efforts could also be made to allow for different covariates for the different outcomes, which may help with implementation schemes for binary responses. Relaxing the assumption of equally spaced observations and allowing to differ between outcomes are areas ripe for investigation.
Despite these challenges and areas for improvement, the proposed method holds promise for application in clinical settings. In particular, improved methods for longitudinal forecasting based on models which can include ordinal outcomes alongside other variable types may aid clinical decision-making. Furthermore, the use of fully Bayesian techniques allow the straightforward use of custom utility and loss functions, going beyond forecasts to quantify optimal treatment decisions for individuals.
Supplementary Material
Acknowledgements
Research and data collection reported in this publication was supported by the National Institute of Allergy and Infectious Disease of the National National Institutes of Health under Award Number R01AI139267, as well as through an award from the Masters of Foxhounds Association Foundation. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health or any other party.
Footnotes
Supplementary Information The online version contains supplementary material available at https://doi.org/10.1007/s00180-022-01280-x.
References
- Agresti A (2012) Categorical data analysis. Wiley series in probability and statistics. Wiley, Hoboken [Google Scholar]
- Albert JH, Chib S (1993) Bayesian analysis of binary and polychotomous response data. J Am Stat Assoc 88(422):669–679. 10.1080/01621459.1993.10476321 [DOI] [Google Scholar]
- Alessie R, Hochguertel S, van Soest A (2004) Ownership of stocks and mutual funds: a panel data analysis. Rev Econ Stat 86(3):783–796 [Google Scholar]
- Alvar J, Vélez ID, Bern C et al. (2012) Leishmaniasis worldwide and global estimates of its incidence. PLoS One 7(5):e35671. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Banerjee S, Carlin BP, Gelfand AE (2015) Hierarchical modeling and analysis for spatial data, 2nd ed. Chapman & Hall/CRC, London [Google Scholar]
- Bürkner PC (2018) Advanced Bayesian multilevel modeling with the R package brms. R J 10(1):395–411. 10.32614/RJ-2018-017 [DOI] [Google Scholar]
- Cappellari L, Jenkins SP (2008) The dynamics of social assistance receipt: measurement and modelling issues, with an application to Britain. OECD Social, Employment and Migration Working Papers 67, OECD Publishing, 10.1787/236346714741 [DOI] [Google Scholar]
- Carpenter B, Gelman A, Hoffman MD et al. (2017) Stan: a probabilistic programming language. J Stat Softw Artic 76(1):1–32. 10.18637/jss.v076.i01 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Catalno PJ (1997) Bivariate modelling of clustered continuous and ordered categorical outcomes. Stat Med 16(8):883–900. [DOI] [PubMed] [Google Scholar]
- Chan JC, Grant AL (2016) On the observed-data deviance information criterion for volatility modeling. J Financ Econom 14(4):772–802 [Google Scholar]
- Chappuis F, Sundar S, Hailu A et al. (2007) Visceral leishmaniasis: what are the needs for diagnosis, treatment and control? Nat Rev Microbiol 5(11):873–882 [DOI] [PubMed] [Google Scholar]
- Chi EM, Reinsel GC (1989) Models for longitudinal data with random effects and ar(1) errors. J Am Stat Assoc 84(406):452–459 [Google Scholar]
- Cowles MK (1996) Accelerating Monte Carlo Markov chain convergence for cumulative-link generalized linear models. Stat Comput 6:101–111 [Google Scholar]
- Cowles MK, Carlin BP, Connett JE (1996) Bayesian tobit modeling of longitudinal ordinal clinical trial compliance data with nonignorable missingness. J Am Stat Assoc 91(433):86–98 [Google Scholar]
- Cressie N, Wikle CK (2011) Statistics for spatio-temporal data. Wiley, Hoboken [Google Scholar]
- Ding S, Cook RD (2014) Dimension folding pca and pfc for matrix-valued predictors. Stat Sin 24(1):463–492 [Google Scholar]
- Duprey ZH, Steurer FJ, Rooney JA et al. (2006) Canine visceral leishmaniasis, United States and Canada, 2000–2003. Emerg Infect Dis 12(3):440–446 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Epstein ES (1969) A scoring system for probability forecasts of ranked categories (1962–1982). J Appl Meteorol 8(6):985–987 [Google Scholar]
- Feasey N, Wansbrough-Jones M, Mabey DCW et al. (2009) Neglected tropical diseases. Br Med Bull 93(1):179–200. 10.1093/bmb/ldp046 [DOI] [PubMed] [Google Scholar]
- Gabry J, Mahr T (2021) Bayesplot: plotting for bayesian models. R package version 1.8.0 [Google Scholar]
- Gelman A, Hill J (2007) Data analysis using regression and multilevel/hierarchical models. In: Vol analytical methods for social research. Cambridge University Press, New York [Google Scholar]
- Gelman A, Rubin DB (1992) Inference from Iterative simulation using multiple sequences. Stat Sci 7(4):457–472. 10.1214/ss/1177011136 [DOI] [Google Scholar]
- Gelman A, van Dyk DA, Huang Z et al. (2008) Using redundant parameterizations to fit hierarchical models. J Comput Gr Stat 17(1):95–122. 10.1198/106186008X287337 [DOI] [Google Scholar]
- Gelman A, Carlin J, Stern H et al. (2013) Bayesian data analysis, 3rd ed. Chapman & Hall/CRC, Boca Raton [Google Scholar]
- Gelman A, Hwang J, Vehtari A (2014) Understanding predictive information criteria for bayesian models. Stat Comput 24:997–1016 [Google Scholar]
- Genton MG, Kleiber W (2015) Cross-covariance functions for multivariate geostatistics. Stat Sci 30(2):147–163. 10.1214/14-STS487 [DOI] [Google Scholar]
- Ghasemzadeh S, Ganjali M, Baghfalaki T (2020) Bayesian quantile regression for joint modeling of longitudinal mixed ordinal and continuous data. Commun Stat Simul Comput 49(2):375–395. 10.1080/03610918.2018.1484482 [DOI] [Google Scholar]
- Gueorguieva RV, Agresti A (2001) A correlated probit model for joint modeling of clustered binary and continuous responses. J Am Stat Assoc 96(455):1102–1112 [Google Scholar]
- Hadfield JD (2010) Mcmc methods for multi-response generalized linear mixed models: the MCMCglmm R package. J Stat Softw 33(2):1–2220808728 [Google Scholar]
- Hasegawa H (2009) Bayesian dynamic panel ordered probit model and its application to subjective well being. Commun Stat Simul Comput 38(6):1321–1347. 10.1080/03610910902903133 [DOI] [Google Scholar]
- Heckman JJ (1981) The incidental parameters problem and the problem of initial conditions in estimating discrete time-discrete data stochastic process. In: Manski CF, McFadden DL (eds) Structural analysis of discrete data with econometric applications. The MIT Press, Cambridge, pp 179–195 [Google Scholar]
- Heckman James J (1981) Statistical models for discrete panel data. Structural analysis of discrete data with econometric applications 114:178 [Google Scholar]
- Jin X, Banerjee S, Carlin BP (2007) Order-free co-regionalized areal data models with application to multiple-disease mapping. J Royal Stat Soc Ser B (Stat Method) 69(5):817–838 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jones RH (2011) Bayesian information criterion for longitudinal and clustered data. Stat Med 30(25):3050–3056. 10.1002/sim.4323 [DOI] [PubMed] [Google Scholar]
- Kang EL, Cressie N (2011) Bayesian inference for the spatial random effects model. J Am Stat Assoc 106(495):972–983 [Google Scholar]
- LeishVet (2016) Clinical staging, treatment and prognosis. https://www.leishvet.org/fact-sheet/clinical-staging/
- Li Q, Pan J, Belcher J (2016) Bayesian inference for joint modelling of longitudinal continuous, binary and ordinal events. Stat Methods Med Res 25(6):2521–2540. 10.1177/0962280214526199 [DOI] [PubMed] [Google Scholar]
- Li ZR, McComick TH, Clark SJ (2020) Using bayesian latent gaussian graphical models to infer symptom associations in verbal autopsies. Bayesian Anal 15(3):781–807. 10.1214/19-BA1172 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liu JS, Wu YN (1999) Parameter expansion for data augmentation. J Am Stat Assoc 94(448):1264–1274 [Google Scholar]
- MacNab YC (2016) Linear models of coregionalization for multivariate lattice data: order-dependent and order-free cmcars. Stat Methods Med Res 25(4):1118–1144. 10.1177/0962280216660419 [DOI] [PubMed] [Google Scholar]
- Matyas L, Sevestre P (2008) The econometrics of panel data: fundamentals and recent developments in theory and practice, 3rd edn. Springer, Berlin [Google Scholar]
- Meng XL, Dyk DAV (1999) Seeking efficient data augmentation schemes via conditional and marginal augmentation. Biometrika 86(2):301–320 [Google Scholar]
- NCAR (2015) Verification: weather forecast verification utilities. R Package Vers 1:42 [Google Scholar]
- Neale MC, Hunter MD, Pritikin JN et al. (2016) OpenMx 2.0: extended structural equation and statistical modeling. Psychometrika 81(2):535–549. 10.1007/s11336-014-9435-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
- O’Malley AJ, Zaslavsky AM (2008) Domain-level covariance analysis for multilevel survey data with structured nonresponse. J Am Stat Assoc 103(484):1405–1418 [Google Scholar]
- Petersen CA, Barr SC (2009) Canine Leishmaniasis in North America: emerging or newly recognized? Vet Clin North Am Small Anim Pract 39(6):1065–1074 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Plummer M (2003) Jags: A program for analysis of bayesian graphical models using gibbs sampling [Google Scholar]
- Plummer M, Best N, Cowles K et al. (2006) Coda: convergence diagnosis and output analysis for mcmc. R News 6(1):7–11 [Google Scholar]
- Proverbio D, Spada E, Bagnagatti de Giorgi G et al. (2014) Relationship between leishmania ifat titer and clinicopathological manifestations (clinical score) in dogs. BioMed Res Int. 10.1155/2014/412808 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pudney S (2008) The dynamics of perception: modelling subjective wellbeing in a short panel. J Royal Stat Soc Series A (Stat Soc) 171(1):21–40 [Google Scholar]
- R Core Team (2021) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna [Google Scholar]
- Ribeiro RR, Michalick MSM, da Silva ME et al. (2018) Canine Leishmaniasis: an overview of the current status and strategies for control. Biomed Res Int. 10.1155/2018/3296893 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Roy V (2020) Convergence diagnostics for markov chain monte carlo. Annu Rev Stat Appl 7(1):387–412. 10.1146/annurev-statistics-031219-041300 [DOI] [Google Scholar]
- Schaut RG, Robles-Murguia M, Juelsgaard R et al. (2015) Vectorborne transmission of leishmania infantum from hounds, United States. Emerg Infect Dis 21(12):2209–2212. 10.3201/eid2112.141167 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schmidt AM, Gelfand AE (2003) A bayesian coregionalization approach for multivariate pollutant data. J Geophys Res Atmos. 10.1029/2002JD002905 [DOI] [Google Scholar]
- Schuurman NK, Grasman RPPP, Hamaker EL (2016) A comparison of inverse-wishart prior specifications for covariance matrices in multilevel autoregressive models. Multivar Behav Res 51(2–3):185–206. 10.1080/00273171.2015.1065398 [DOI] [PubMed] [Google Scholar]
- Solano-Gallego L, Cardoso L, Pennisi MG et al. (2017) Diagnostic challenges in the era of canine leishmania infantum vaccines. Trends Parasitol 33(9):706–717 [DOI] [PubMed] [Google Scholar]
- Spiegelhalter DJ, Best NG, Carlin BP et al. (2002) Bayesian measures of model complexity and fit. J Royal Stat Soc Ser B 64(4):583–639 [Google Scholar]
- Steele F, Grundy E (2021) Random effects dynamic panel models for unequally spaced multivariate categorical repeated measures: an application to child-parent exchanges of support. J Royal Stat Soc Ser C (Appl Statist) 70(1):3–23. 10.1111/rssc.12446 [DOI] [Google Scholar]
- Stegmueller D (2013) Modeling dynamic preferences: a bayesian robust dynamic latent ordered probit model. Polit Anal 21(3):314–333 [Google Scholar]
- Talhouk A, Doucet A, Murphy K (2012) Efficient bayesian inference for multivariate probit models with sparse inverse correlation matrices. J Comput Gr Stat 21(3):739–757. 10.1080/10618600.2012.679239 [DOI] [Google Scholar]
- Teimourian M, Baghfalaki T, Ganjali M et al. (2015) Joint modeling of mixed skewed continuous and ordinal longitudinal responses: a bayesian approach. J Appl Stat 42(10):2233–2256. 10.1080/02664763.2015.1023557 [DOI] [Google Scholar]
- Therneau Terry M, Grambsch Patricia M (2000) Modeling survival data: extending the cox model. Springer, New York [Google Scholar]
- Toepp AJ, Schaut RG, Scott BD et al. (2017) Leishmania incidence and prevalence in us hunting hounds maintained via vertical transmission. Vet Parasitol Reg Stud Rep 10:75–81 [DOI] [PubMed] [Google Scholar]
- Toepp AJ, Monteiro GR, Coutinho JF et al. (2019) Comorbid infections induce progression of visceral leishmaniasis. Parasit Vectors 12(1):1–12 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Varin C, Czado C (2009) A mixed autoregressive probit model for ordinal longitudinal data. Biostatistics 11(1):127–138. 10.1093/biostatistics/kxp042 [DOI] [PubMed] [Google Scholar]
- Wang WL, Fan TH (2010) ECM-based maximum likelihood inference for multivariate linear mixed models with autoregressive errors. Comput Stat Data Anal 54(5):1328–1341. 10.1016/j.csda.2009.11.021 [DOI] [Google Scholar]
- Wang WL, Fan TH (2012) Bayesian analysis of multivariate t linear mixed models using a combination of ibf and gibbs samplers. J Multivar Anal 105(1):300–310. 10.1016/j.jmva.2011.10.006 [DOI] [Google Scholar]
- Wilhelm S, G MB (2015) tmvtnorm: truncated multivariate normal and student t distribution. R package version 1.4-10 [Google Scholar]
- Wilks D (2005) Statistical methods in the atmospheric sciences. International Geophysics. Elsevier Science, Amsterdam [Google Scholar]
- Woodbury M (1950) Inverting modified matrices. Department of Statistics, Princeton University, Princeton, Tech. rep [Google Scholar]
- Wooldridge JM (2005) Simple solutions to the initial conditions problem in dynamic, nonlinear panel data models with unobserved heterogeneity. J Appl Econom 20(1):39–54. 10.1002/jae.770 [DOI] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.












