Joint Bayesian longitudinal models for mixed outcome types and associated model selection techniques

Nicholas Seedorff; Grant Brown; Breanna Scorza; Christine A Petersen

doi:10.1007/s00180-022-01280-x

. Author manuscript; available in PMC: 2024 Jan 30.

Published in final edited form as: Comput Stat. 2022 Sep 18;38(4):1735–1769. doi: 10.1007/s00180-022-01280-x

Joint Bayesian longitudinal models for mixed outcome types and associated model selection techniques

Nicholas Seedorff ¹, Grant Brown ¹, Breanna Scorza ², Christine A Petersen ²

PMCID: PMC10825672 NIHMSID: NIHMS1893623 PMID: 38292019

Abstract

Motivated by data measuring progression of leishmaniosis in a cohort of US dogs, we develop a Bayesian longitudinal model with autoregressive errors to jointly analyze ordinal and continuous outcomes. Multivariate methods can borrow strength across responses and may produce improved longitudinal forecasts of disease progression over univariate methods. We explore the performance of our proposed model under simulation, and demonstrate that it has improved prediction accuracy over traditional Bayesian hierarchical models. We further identify an appropriate model selection criterion. We show that our method holds promise for use in the clinical setting, particularly when ordinal outcomes are measured alongside other variables types that may aid clinical decision making. This approach is particularly applicable when multiple, imperfect measures of disease progression are available.

Keywords: Bayesian, MCMC, Ordinal regression, Longitudinal data analysis

1. Introduction

Given the complexity of many disease processes, it is common for researchers to measure many facets of the illness under study. For example, consider the well referenced Primary Biliary Cirrohsis (PBC) longitudinal dataset originally collected by Therneau and Grambsch (2000), which encodes a variety of variables. These include an ordinal variable denoting the histologic stage of disease, and biomarker information (e.g. serum bilirubin, albumin) for many subjects. Motivated by this type of multivariate longitudinal data, we consider approaches that jointly analyze ordinal and continuous responses, and develop an associated Bayesian hierarchical model.

The serial correlation that accompanies multivariate longitudinal data can be accounted for in a variety of ways. A sensible starting point for analysis is to use univariate regression models with subject specific effects, where a separate model is estimated for each of the responses. By combining covariance structures, a set of univariate models with subject specific effects could be extended to multivariate models with subject specific effects. While multivariate models are more complex than their univariate counterparts, they have the potential to produce better fits (Li et al. 2016) by borrowing information across responses and can help deal with differing levels of temporal granularity stemming from difficulties in attaining certain outcomes.

Alternative strategies for dealing with serial correlation include non-diagonal error structures, inclusion of lagged outcome variables as covariates, or combinations of the three described approaches. In what follows we develop a model that is based on a combination of patient specific effects and a first-order autoregressive (AR1) error structure. The error structure uses a separable cross-covariance function, an approach sometimes referred to as “intrinsic coregionalizations” (Genton and Kleiber 2015). Motivation for extending models beyond subject specific effects comes from Chi and Reinsel (1989), who saw improved fits when adding an AR1 error structure to linear mixed models, and Wang and Fan (2010), who saw improved prediction capabilities when adding AR error structures to multivariate linear mixed models.

In this manuscript, we propose a multivariate Bayesian hierarchical model with autoregressive errors in order to jointly analyze continuous and ordinal longitudinal data. We start by detailing related works and our contributions in this space. We next present our approach, the associated priors, and implementation details. For reproducibility and to serve as a template for future implementations, we provide a publicly available software package bmrarm (available at https://github.com/nickseedorff/bmrarm) for R (R Core Team 2021). Method development is followed by a simulation study, which assesses parameter recovery, forecast accuracy, and approaches for model comparison. Finally, we present an application using data from a longitudinal veterinary medicine study.

2. Motivating application

2.1. Canine leishmaniosis (CanL)

Our motivating example comes from a longitudinal study of canine leishmaniosis (CanL). Canine leishmaniosis is a progressive wasting disease of dogs and humans (Duprey et al. 2006), is usually fatal if left untreated, and is a neglected tropical disease, which tend to disproportionately impact the world’s poorest people (Feasey et al. 2009). Human leishmanios is estimated to cause upwards of 20,000 deaths per year and the case-fatality rate for treated individuals is believed to be between 10 and 20% (Alvar et al. 2012). Leishmanios is a vector-borne disease caused by variants of the Leishmanaia donovani complex (Duprey et al. 2006). In humans, it is transmitted by sand flies (Chappuis et al. 2007) and more than 90% of cases are from six countries: India, Bangladesh, Sudan, South Sudan, Ethiopia and Brazil (Alvar et al. 2012).

CanL is enzootic in the U.S. in certain dog breeds (Toepp et al. 2017; Petersen and Barr 2009), and because a complete cure is rarely achievable, treatment efforts largely focus on maintenance and quality of life. The primary route of transmission to humans is via the sand fly with dogs acting as the primary reservoir, and vertical transmission (mother to offspring) appears to also play a role among dogs (Petersen and Barr 2009; Schaut et al. 2015; Ribeiro et al. 2018). Infection in U.S. dogs is limited to pups exposed in utero from a number of breeds (Duprey et al. 2006; Solano-Gallego et al. 2017).

LeishVet, a scientific association that focuses on leishmaniosis in veterinary medicine, provides clinical staging guidelines through a 4-category ordinal variable (LeishVet 2016). We use a modified version of this ordinal outcome, along with a ‘Stage 0’, indicating no evidence of disease, as our primary outcome and term it ‘LeishVet score’. An abbreviated version of the staging guidelines can be found in Fig. S1, while therapeutic recommendations and additional details can be found on the association’s website (LeishVet 2016).

2.2. Data specifics

Our dataset of interest contains samples from 50 dogs, two of which were dropped because they had only a single observation, providing no longitudinal information. The rest of the subjects had an initial observation and were then tracked at 3 month intervals for seven total visits in 2019 and 2020. The primary analysis goal is to accurately forecast disease progression as substantial time can elapse between observations by a veterinarian. 40 of the 48 subjects (83.3%) had their most recent visit at the 6th time point after baseline. Of the remaining eight dogs, two had their final visit at the 3rd time point after baseline, two had it at the 4th, and four had it at the 5th. Additionally, LeishVet score was unobserved for all dogs at the first follow-up visit. 83.3% of the dogs transitioned disease states at least once and 52.1% transitioned two or more times (Table 1).

Table 1.

Frequency of transitions and unique disease states per canine for the leishmaniosis dataset

Disease state transitions		Unique disease states
Count	Number of subjects (%)	State	Number of subjects (%)
0	8 (16.7%)
1	15 (31.2%)	1	8 (16.7%)
2	10 (20.8%)	2	28 (58.3%)
3	10 (20.8%)	3	12 (25.0%)
4	3 (6.3%)
5	2 (4.2%)

Open in a new tab

Transitions between qualitative disease states is of primary interest, because it captures disease progression. Subjects transitioned different numbers of times and between different numbers of distinct disease states throughout the study

The dataset contains various assays and biomarkers, including an enzyme-linked immunosorbent assay (ELISA) for antibodies recognizing soluble leishmania antigen (anti-SLA). The higher the value of anti-SLA, the higher the amount of antibodies and the more advanced the disease. For the purposes of this study, the disease state is comprised of the qualitative LeishVet score (Fig. S1) as well as the quantitative anti-SLA levels (visualized in Fig. 1).

Fig. 1 — (Left) LeishVet score across time by subject. LeishVet score was not recorded at the first follow-up for all subjects. (Right) Log anti-SLA across time by subject. The linear regression line (blue) indicates a small positive trend in log anti-SLA over time

There is sizeable missingness for both outcomes. In addition to the “missing” LeishVet scores at the first follow-up visit, there are four LeishVet and nine anti-SLA responses (assumed to be) missing at random; for reference there are 318 observed or partially observed multivariate responses. There are additional variables that could be included in a multivariable analysis such as baseline assessments, age group, location, and ectoparasiticide treatment status. Further details concerning the dataset can be found in Sect. 7.

3. Background and related work

3.1. Generalized linear mixed models

Generalized linear mixed models (GLMMs) with different distributional assumptions can be used to separately model serially correlated continuous and ordinal outcomes. To introduce GLMMs, we start with generalized linear models (GLMs) which have three main components: a random component, a systematic component, and a link function (Agresti 2012). Assume we have a random variable $Y_{i}$ , which we believe follows a distribution from the exponential family (random component) and whose mean is $E ⌊ Y_{i} ⌋ = μ_{i}$ . We then use a link function, $g ()$ , to relate the mean of that random variable to a linear combination of covariates (systematic component):

g (μ_{i}) = β^{T} x_{i} i \in {1, \dots, N}

(1)

where $x_{i}$ is a vector of covariates that commonly includes a constant term and $β$ is an unknown vector of fixed effects. GLMs assume the responses are conditionally independent given the covariates, an assumption often violated in longitudinal studies. To address this, GLMMs allow for inclusion of subject specific effects, denoted here using $α_{i}$ and commonly referred to as random effects. We now assume that responses are conditionally independent given the covariates and random effects, while expressing the systematic component as:

g (μ_{i l}) = β^{T} x_{i l} + α_{i} l \in {1, \dots, n_{i}} i \in {1, \dots, N}

(2)

where $y_{i l}$ indexes the $l th$ observation for patient i, $n_{i}$ is the number of observations for this subject, and $N$ is the number of subjects. The above is referred to as a random intercept model (Agresti 2012), and assuming there is a constant term in $x_{i l}$ , we can assign a distribution to the random effects such as $α_{i} \sim N (0, σ_{α}^{2})$ . Important for later discussion, the random effects are assumed independent of the covariates. For longitudinal data, we may additionally assume that trajectories are patient specific across time, in which case we could fit a random intercept and slope model:

g (μ_{i l}) = β^{T} x_{i l} + u_{i l}^{T} α_{i} l \in {1, \dots, n_{i}} i \in {1, \dots, N}

(3)

where $u_{i l}^{T} = [1 t]$ and $α_{i}$ is a 2 × 1 vector which we could assume $α_{i} \sim N (0, Σ_{α})$ . This serves as a middle ground between fitting a single GLM and fitting a separate GLM for each subject, as information is shared across subjects, but subjects each have their own trend line. Bayesian hierarchical models provide an analogous approach, where the distinction between fixed and random effects is blurred because all parameters are treated as random and the Bayesian interpretation of probability is employed. Nevertheless, we will use the terms “random effects” and “subject specific effects” interchangeably throughout.

Focusing on related Bayesian methods that allow for subject specific effects, the brms package (Bürkner 2018) package for R (R Core Team 2021) can be used to implement both linear mixed models and cumulative link mixed models, which link the cumulative probabilities to the systematic component and can be used for ordinal outcomes. While brms cannot model the residual correlation between continuous and ordinal outcomes, it is based on the STAN programming language (Carpenter et al. 2017) and therefore may straightforwardly use a non-informative prior on $Σ_{α}$ . This is a notable advantage compared to methods that rely on Gibbs samplers for hierarchical modeling, where a seemingly non-informative inverse-Wishart prior on $Σ_{α}$ can have enormous influence (Schuurman et al. 2016), yet is often chosen for conjugacy.

A variety of groups have looked at incorporating both subject specific effects and autocorrelated errors. Chi and Reinsel (1989) developed frequentist linear models with subject specific effects and an AR1 error process within subjects, where they saw improved fits by inclusion of the error structure. They also noted this can sometimes shrink the patient specific effects, thereby reducing the effective number of parameters in the model. Varin and Czado (2009) developed a model for longitudinal ordinal data that included patient specific intercepts and an AR error structure. Models were fit using maximum pairwise likelihood, but notably, their method and software does not allow for subject specific slopes.

3.2. Multivariate mixed models and related methods

Multivariate methods with different distributional assumptions are often used to jointly model responses of different data types. Catalno (1997) developed a modeling approach for clustered ordinal and continuous outcomes using Generalized Estimating Equations (GEEs). Gueorguieva and Agresti (2001) used a Monte Carlo expectation-conditional maximization algorithm to fit joint models for clustered binary and continuous data. Focusing on Bayesian methods, Cowles et al. (1996) used the latent variable approach of Albert and Chib (1993) to propose a multivariate Bayesian tobit model with subject specific intercepts to model longitudinal ordinal compliance data. Noting issues of poor mixing and slow convergence, Cowles (1996) developed a more efficient sampling routine based on a Hastings-within-Gibbs step. One approach to improving sampling efficiency is to use a parameter expansion for data augmentation (PX-DA) routine (Liu and Wu 1999; Meng and Dyk 1999) which Talhouk et al. (2012) and Li et al. (2020) utilized for multivariate binary data. However, implementing a PX-DA routine is complicated by the hierarchical nature of our model and we instead adopt the Cowles (1996) solution in our proposed framework. Li et al. (2016) developed a joint model for longitudinal continuous, binary, and ordinal events. In extensive simulations studies, Li et al. (2016) saw improved fits for their multivariate approach over univariate strategies when there was nonzero correlation between the subject specific effects of the various outcomes. In other related work, Teimourian et al. (2015) adopted a Bayesian framework to model ordinal and skewed continuous responses, while Ghasemzadeh et al. (2020) used a Bayesian quantile regression approach to jointly model continuous and ordinal data.

Looking for available software for fitting Bayesian hierarchical multivariate models with ordinal and continuous outcomes, the only R package we identified was the MCMCglmm package (Hadfield 2010). The MCMCglmm package has two noteable limitations in regards to our proposed method. First, it cannot fit models with an autoregressive structure in the residual covariance matrix. Second, when using hierarchical models, the prior on the subject specific effects covariance matrix must be an inverse-Wishart. Since it is difficult to specify a default inverse-Wishart that has limited impact on posterior estimates, MCMCglmm allows users to specify an improper prior with negative degrees of freedom, but the posterior is not guaranteed to be proper in these settings.

3.2.1. Scaled inverse-Wishart priors

When working with Bayesian hierachical models, it is common to assign an inverse- Wishart prior, such as $Σ_{α} \sim I W (I_{P}, P)$ where $Σ_{α}$ is $P \times P$ , due to conjugacy. As described above, Schuurman et al. (2016) noted that this can have enormous influence on posterior estimates when variance terms in $Σ_{α}$ are close to 0. To overcome this type of issue, Gelman and Hill (2007) recommended the scaled inverse-Wishart (SIW) prior developed in O’Malley and Zaslavsky (2008), which uses the decomposition $Σ_{α} = D i a g (ξ) Q D i a g (ξ)$ . The elements of $ξ$ , a vector of scale parameters, and $Q$ , an unscaled covariance matrix, are not identifiable but the elements of $Σ_{α}$ are. For applications, Gelman and Hill (2007) assigned $Q \sim I W (Ψ = I_{P}, ν = P + 1)$ with a uniform prior on $ξ_{m}$ , $ξ_{m} \sim U (a_{ξ} = 0, b_{ξ} = 100)$ for $m \in {1, \dots, J}$ . This implied a uniform distribution on the correlations and an extremely vague prior on the variances.

3.3. Autoregressive models (mean structure)

Autoregressive models can incorporate lagged outcome values into the mean structure of current observations. These methods often result in overly complex implementations due to the “initial conditions problem", which occurs when working with short panel or longitudinal data. With this in mind, we next detail the current state of autoregressive models for categorical outcomes before considering incorporation of serial autocorrelation through the covariance structure.

Models for binary outcomes with lagged components in the mean structure have a rich history in the econometrics literature and have traditionally relied on observed discrete outcomes as dependent variables. In the binary and ordinal situations, this leads to the state dependence models described in Heckman (1981b):

z_{i l} = k \Leftrightarrow γ_{k - 1} < y_{i l} \leq γ_{k} k \in 1, . ., K

(4)

y_{i l} = ρ_{1} 1_{[z_{i l - 1} = 1]} + \dots + ρ_{K - 1} 1_{[z_{i l - 1} = K - 1]} + β^{T} x_{i l} + α_{i} + ϵ_{i l}

(5)

where $z_{i l}$ is the observed K-level ordinal outcome for patient i at their lth observation, $y_{i l}$ is the corresponding latent outcome where $z_{i l} = k$ if and only if $y_{i l} \in (γ_{k - 1}, γ_{k}]$ , $x_{i l}$ is a $(p + 1) \times 1$ vector of person and time specific covariates, $α_{i}$ is a subject specific intercept that is assumed to be time invariant and independent of the covariates, and $ϵ_{i l}$ are assumed to be normally distributed with unit variance. The $ρ$ terms relate the discrete response from the prior observation to the current latent value. To model subjective wellbeing, Pudney (2008) developed an alternative formulation, termed a ‘Latent Auto-Regression Model’ (LAR), where dynamic feedback was captured through the latent outcomes:

z_{i l} = k \Leftrightarrow γ_{k - 1} < y_{i l} \leq γ_{k} k \in 1, . ., K

(6)

y_{i l} = ρ y_{i l - 1} + β^{T} x_{i l} + α_{i} + ϵ_{i l}

(7)

Pudney (2008) specifically advocated for use of the LAR formulation when the true data generating mechanism is continuous, but outcomes are discretized due to limitations of the measurement process. Pudney (2008) used a simulated maximum likelihood approach to estimate model parameters, while Hasegawa (2009) and Stegmueller (2013) adopted Bayesian frameworks, relying on the latent variable model proposed in Albert and Chib (1993). Using a Bayesian multivariate approach, Steele and Grundy (2021) applied similar principles to develop a method for analyzing bivariate ordinal outcomes.

Implementation of these models is complicated by the ‘Initial Conditions Problem’, which is a deeply researched issue regarding panel and longitudinal data when there are a small number of observations per subject (Matyas and Sevestre 2008). To summarise, a subject’s initial latent value, $y_{i 1}$ , is rarely independent of their unobserved heterogeneity ( $α_{i}$ ), mandating an expression for the initial observation $y_{i 1} ∣ x_{i 1}$ , $α_{i}$ . However, we do not assume the first observation represents the start of the process and cannot typically derive a closed form expression for the marginal probability (marginal of their prior history) when using categorical (Matyas and Sevestre 2008) or multivariate data. Ignoring this problem can lead to severe bias in the parameter estimates (Cappellari and Jenkins 2008).

There are two common approaches to addressing the initial conditions problems, the ‘Heckman’ and ‘Woolridge’ approaches. The former approach, presented in Heckman (1981a), relies on an approximation to $y_{i 1} ∣ x_{i 1}$ , $α_{i}$ . The latter strategy, detailed in Wooldridge (2005), conditions on the initial observations to model the random effects. The Woolridge approach is not readily applicable to the LAR setting so the Heckman solution was adopted in Pudney (2008), Stegmueller (2013), Steele and Grundy (2021). In it’s simplest form for a univariate outcome, Heckman (1981a) suggested modeling the initial outcomes using different regression coefficients and a scalar multiple of the subject specific intercepts:

y_{i 1} = δ^{T} x_{i 1} + λ α_{i} + ϵ_{i 1}

(8)

Here $δ$ is a $(p + 1) \times 1$ vector, $λ$ and $α_{i}$ are scalars, and the errors are once again restricted to have unit variance. Expansions of the initial covariates, summary measures of the non-initial covariates, and different covariates could also be included as predictors.

To extend LAR models to the multivariate setting, Steele and Grundy (2021) detailed a flexible framework that allowed for response vectors of any length and, when working with categorical outcomes, incorporated process dynamics through a latent variable approach. Let $y_{i l}$ be a $J \times 1$ vector of either observed or unobserved continuous responses. In the case of unobserved responses, these correspond to associated binary or ordinal response vectors ( $z_{i l}$ ) where

z_{i l j} = k \Leftrightarrow γ_{j k - 1} < y_{i l j} \leq γ_{j k} k \in 1, . ., K_{j}

(9)

Then the dynamics of the process can incorporated through a transition matrix:

y_{i l} = β^{T} x_{i l} + M y_{i l - 1} + α_{i} + ϵ_{i l} ϵ_{i l} \sim N (0, Σ)

(10)

where $β$ is a $(p + 1) \times J$ matrix, $M$ is a $J \times J$ matrix, and $α_{i}$ is a $J \times 1$ vector corresponding to a different random intercept for each outcome. When dealing with a continuous response vector, $Σ$ was an unconstrained covariance matrix, while the diagonal terms were restricted to be one in the categorical case. Steele and Grundy (2021) stated that the initial values are modeled using the Heckman approximation (Heckman 1981a), but omit discussion on how this extended to the multivariate case. We thus assume the initial responses were specified as a linear combination of baseline covariates and the random effect associated with that outcome:

y_{i 1 j} = δ_{j}^{T} x_{i 1} + λ_{j} α_{i j} + ϵ_{i 1 j} j \in {1, 2}

(11)

In contrast, Alessie et al. (2004) extended the approximation to the multivariate setting by including a linear combination of the random effects in each regression equation:

y_{i 1} = δ^{T} x_{i 1} + Λ α_{i} + ϵ_{i 1} ϵ_{i 1} \sim N (0, Σ_{1})

(12)

where $δ$ is a $(p + 1) \times J$ matrix, $Λ$ is a $J \times J$ matrix, $α_{i}$ is a $J \times 1$ vector, and $Σ_{1}$ is allowed to have different correlations than $Σ$ .

3.4. Multivariate covariance structure approaches

The autoregressive approaches that model dependence through the mean structure require additional considerations due to the initial conditions problem, an alternative route to encoding serial dependence is through the covariance structure. For these methods, the primary difficultly is in constructing a valid “cross-covariance” function that adequately specifies the relationship across outcomes. This general approach is applicable to spatial and higher dimensional problems, as well as the special case of temporal covariance (in which includes only one dimension over which covariance is defined). To embed our work in an appropriately general context, we next summarize some of the work highlighted by Genton and Kleiber (2015), which describes several such general approaches commonly used in the spatial literature.

As defined in Genton and Kleiber (2015), a cross-covariance matrix function is separable if

C_{i j} (s_{1}, s_{2}) = ρ (s_{1}, s_{2}) R_{i j}

(13)

where $C_{i j} (s_{1}, s_{2})$ is the covariance between outcome i at location $s_{1}$ and outcome j at location $s_{2}$ . Note that ‘location’ can refer to a point in time for our longitudinal context. $ρ (.)$ can be any valid correlation function while $R$ is a covariance matrix that relates the outcomes and does not depend on spatial location. This is considered separable since neither $ρ$ nor $R$ are functions of both location and outcome.

These models, sometimes referred to as ‘intrinsic coregionalizations’ (Genton and Kleiber 2015), offer simplification because the joint covariance can be written as a kronecker product. Wang and Fan (2010) used a frequentist version of this approach to formulate multivariate linear mixed models with autoregressive errors for application to longitudinal data, where the full covariance matrix for subject i was specified through $Σ_{i} = R \otimes ρ (t_{i})$ . By adding the AR component to the error structure, the authors saw improved predictive performance in both simulation studies and their application. Wang and Fan (2012) extended this to multivariate t linear mixed models using Bayesian methodology. Methods based on separable cross-covariance functions assume the same decay or range for all outcomes, so their scope of application is limited to cases where this assumption may be reasonable.

Relaxing the rigidity of the separable methods, the most common approach to defining a valid cross-covariance covariance function is through linear combinations of univariate covariance functions, termed the ‘linear model of coregionalization’ (Genton and Kleiber 2015). These models have been broadly applied, especially in Bayesian contexts; for examples see (Schmidt and Gelfand 2003; Jin et al. 2007), and (MacNab 2016). Overviews are given in Banerjee et al. (2015) and Cressie and Wikle (2011).

For the linear model of coregionalization, the cross-covariance function is constructed through a linear combination of valid stationary covariance functions:

C_{i j} (h) = \sum_{k = 1}^{r} ρ_{k} (h) A_{i k} A_{j k}

(14)

where $h$ represents some measure of spatial distance between the two observations and $A$ is a full rank $p \times r$ matrix such that $R = A A^{T}$ . We can let $1 \leq r \leq p$ , however, typically $r = p$ and $A$ is a full rank $p \times p$ matrix. $A$ can be uniquely defined either through a cholesky decomposition of $R$ (Schmidt and Gelfand 2003), or through a decomposition based on Given’s angles and an ordering restriction of the associated eigenvalues (Kang and Cressie 2011). While the linear model of coregionalization offers flexibility, it comes at a computational cost since conjugacy with $R$ is lost and algorithms must rely on alternative sampling techniques.

3.5. Statement of contribution

Due to the improvement in predictive performance witnessed in Wang and Fan (2010), the computational simplifications offered by separable covariance functions, and the problems inherent in the mean structure focused approaches, we base our method on the model of intrinsic coregionalization. Explicitly, our primary contributions in this space are:

We develop a method based on the intrinsic model of coregionalization for longitudinal multivariate responses of mixed data type (ordinal and continuous). This extends the work of Wang and Fan (2010) and Wang and Fan (2012) to a nonlinear and mixed outcome type setting. It should be noted that we only consider the AR1 case, which is a special case of the more general formulation used in the referenced manuscripts.
Our methodology uses a scaled inverse-Wishart prior (O’Malley and Zaslavsky 2008) on the covariance matrix for the patient specific effects. This employs a less informative and more flexible prior than those used in Li et al. (2016) and MCMCglmm, but still guarantees a proper posterior.
We allow for residual correlation between the ordinal and continuous outcomes, an option not available in brms or in the work of Li et al. (2016). Combined with the scaled inverse-Wishart prior, our implementation is a novel multivariate method, even without the autoregressive component.

4. Model definition and software implementation

We now develop a model based on the intrinsic model of coregionalization, which we selected in large part based on the increased predictive performance witnessed in Wang and Fan (2010). Since there is high computational demand associated with ordinal regression problems, we opt for a separable covariance structure approach over the more flexible linear model of coregionalization. An alternative strategy could use autoregressive models with lagged outcomes in the mean structure, but these require additional considerations due to the initial conditions problem and were dramatically outperformed by random slope and intercept models when applied to our motivating dataset.

4.1. Model definition

Let $y_{i l}^{o}$ be a $J_{o} \times 1$ vector of observed continuous responses at patient i’s $l th$ observation, $l \in {1, \dots, n_{i}}$ . ‘j’ is reserved for indexing the outcomes. Similarly, let $z_{i l}$ be a $J_{u} \times 1$ vector of observed categorical responses, while $y_{i l}^{u}$ is an associated vector of continuous latent variables. We combine all continuous variables into a single vector of length $J = J_{u} + J_{o}$ , $y_{i l}^{T} = [[y_{i l}^{u}]^{T} [y_{i l}^{o}]^{T}]$ and stack these into patient specific matrices $Y_{i} = {[y_{i 1} \dots y_{i n_{i}}]}^{T}$ . $X_{i} = {[x_{i 1} \dots x_{i n_{i}}]}^{T}$ is a corresponding $n_{i} \times (p + 1)$ matrix of subject and time specific covariates, which includes a vector of constant terms. Additionally, let $t_{i}$ be an $n_{i} \times 1$ vector of times since baseline. If subject i was observed at baseline and years 1, 3, and 4 post baseline, then $t_{i} = {[t_{i 1} = 0 t_{i 2} = 1 t_{i 3} = 3 t_{i 4} = 4]}^{T}$ . Finally, let $α_{i}$ be a $2 \times J$ matrix of subject specific intercepts and slopes, while $U_{i}$ is a design matrix constructed for the corresponding patient specific effects. Utilizing the latent variable representation from Albert and Chib (1993) and the $vec ()$ operator, which appends the column vectors of a matrix, we define our model as:

z_{i l j} = k \Leftrightarrow γ_{j k - 1} < y_{i l j} \leq γ_{j k}, k \in {1, \dots, K_{j}}, j \in {1, \dots, J_{u}}

(15)

v e c (Y_{i}) ∣ α_{i} \sim N (v e c (X_{i} β + U_{i} α_{i}), Σ_{i} = R \otimes C_{ρ} (t_{i}))

(16)

v e c (α_{i}) \sim N (0, Σ_{α})

(17)

which implies that $Y_{i} \sim N_{n_{i} \times J} (X_{i} β + U_{i} α_{i}, R, C_{ρ} (t_{i}))$ . Additionally, we restrict the first cutpoint to be 0 and the second cutpoint to be 1 for each outcome and $β$ is a $(p + 1) \times J$ matrix. The constraints naturally combine with the likelihood to enforce an ordering restriction on the cutpoint parameters ( $- \infty = γ_{j 0} < γ_{j 1} = 0 < γ_{j 2} = 1 < \dots < γ_{j K - 1} < γ_{j K} = \infty \forall j \in {1, \dots, J_{u}}$ ). The two cutpoint constraints allow $R$ to be an unconstrained covariance matrix, but omit inclusion of binary outcomes.

$C_{ρ} (t_{i})$ is an AR1 correlation function where $C_{ρ} (t_{i l}, t_{i l^{'}}) = ρ^{∣ t_{i l} - t_{i l^{'}} ∣}$ . This correlation function can be treated as a special case of the more general formulation used in Wang and Fan (2010). This function is appropriate for equally spaced observations or observations that are unequally spaced due to missingness (Jones 2011). In principle, a (negative) exponential correlation function should be used for a truly continuous time AR1 process (Jones 2011). Here we consider a function meant for equally spaced observations, as this fit our motivating application and allows for negative correlation. Future work could expand our method for use with a variety of correlation functions that accommodate continuous time.

We refer to our proposed model as a Bayesian mixed response autoregressive model (bmrarm), due to the autoregressive error component. The bmrarm is eventually compared to simpler implementations without the autoregressive covariance structure, in both univariate and multivariate formats, where the multivariate versions are fit using our software. Specifically, these simpler models use $Σ_{i} = Σ = R \otimes I$ and have either subject specific intercepts and slopes, or just subject specific intercepts. We refer to these models as bmr_slope and bmr_int, respectively.

4.2. Priors

We place priors that allow us to jointly draw $β$ and $R$ , where the prior on $β$ is a matrix-normal distribution as described in Ding and Cook (2014):

β ∣ R \sim N_{(p + 1) \times J} (0, R, σ_{β}^{2} I_{p + 1})

(18)

R \sim I W (I_{J}, J)

(19)

and assign independent uniform priors to the latent cutpoint parameters

π (γ_{j}) \propto \prod_{k \in {3, \dots, K_{j} - 1}} 1_{[1 < γ_{j k} < δ]}

(20)

where $δ$ is chosen a priori to be arbitrarily large. In the case of an ordinal outcome with three levels, all cutpoints are restricted and no priors are assigned. Next we place a uniform prior on the autoregressive parameter $ρ \sim U (- 1, 1)$ .

To complete the model specification, we must place a prior on the covariance matrix for the subject specific effects, $Σ_{α}$ . A common default prior is $I W (Ψ = I_{2 J}, ν = 2 J)$ . However, when the variance terms in $Σ_{α}$ are close to zero and there are a small number of subjects, this prior can have enormous influence resulting in significant bias (Schuurman et al. 2016). An alternative strategy is to use the data dependent priors discussed in Schuurman et al. (2016), but we prefer fully Bayesian approaches that do not use the data in both prior and likelihood components. Instead, we opt for the scaled inverse-Wishart (SIW) prior (O’Malley and Zaslavsky 2008), which as previously discussed, uses the decomposition $Σ_{α} = D i a g (ξ) Q D i a g (ξ)$ .

For applications with the SIW prior, Gelman and Hill (2007) assigned $Q \sim I W (Ψ = I, ν = 2 J + 1)$ and $ξ_{m} \sim U (a_{ξ} = 0, b_{ξ} = 100)$ for $m \in {1, \dots, 2 J}$ . However, in our sampler the variance terms of $Σ_{α}$ tend to get stuck near 0 for extended periods of time when using priors this vague. Assigning $ξ_{m} \sim U (0.2, 5)$ corrected this issue, still offers a more dispersed prior on the variances than $I W (I, 2 J)$ and implies a uniform distribution on the correlations (Fig. 2). Future implementations could explore the parameter expansion approaches discussed in Gelman et al. (2008), which addressed the issue of variance parameters getting stuck near zero and could allow for a more flexible SIW prior.

Fig. 2 — (Left) Comparison of prior densities for $Σ_{α 11}$ using the proposed IW and SIW implementations while $J = 2$ (the number of outcomes). (Right) Histogram of the implied prior on the correlations between the patient specific effects. The plot is based on $\frac{Σ_{α 31}}{\sqrt{Σ_{α 11} Σ_{α 33}}}$ , but all off diagonal terms have a similar result

To implement the SIW prior we actually assign $v e c (α_{i}) = D i a g (ξ) v e c ({\tilde{α}}_{i})$ and place a prior on $v e c ({\tilde{α}}_{i}) \sim N (0, Q)$ , which implies the previously noted prior on $v e c (α_{i})$ (Eq. 15). Each iteration we update ${\tilde{α}}_{i}$ and scale it to obtain draws from $α_{i}$ . In general, we have found increased stability when parameterizing the algorithm so that $Q$ and $ξ$ are conditionally independent, and implement the model accordingly.

Derivation of the full conditional distributions, details on accommodating missing data, and the implementation of the full MCMC sampler are described in the supplemental information.

5. Forecasts

Simulated forecasts can be generated in analytically equivalent but mechanically distinct ways. We start with the posterior predictive distribution $π (y_{n e w} ∣ Y_{o l d})$ , which is the predictive distribution for the unobserved values conditional on the observed values (Gelman et al. 2013).

π (y_{n e w} ∣ Y_{o l d}) = \int_{θ} π (y_{n e w}, θ ∣ Y_{o l d}) d θ = \int_{θ} π (y_{n e w} ∣ θ, Y_{o l d}) π (θ ∣ Y_{o l d}) d θ

The first line of the above equation suggests we can build a joint sampler for the unknown parameters and the new observations. This can be accomplished by including outcomes as “NA” in the dataframe supplied to the function call, similar to JAGS (Plummer 2003). To get predictions for an ordinal outcome, users can discretize the $y_{n e w}^{u}$ values returned in the“bmrarm” object using draws from the cutpoint parameters. The second suggests that once the samplers have converged to the posterior, those samples can be used to simulate new observations.

Suppose we want to forecast a continuous response for subject i at $t_{n e w}$ with the second approach, where the model is based on a single ordinal and single continuous response. To simplify notation, let $θ_{i} = {β, R, ρ, α_{i}, y_{i}^{u}, γ}$ and there are draws $θ_{i}^{(s)} \in {1, . ., S}$ from the posterior distribution $π (θ_{i} ∣ z, Y^{o})$ . Each set of parameters ( $θ_{i}^{(s)}$ ) can be used to generate a prediction using $π (y_{i l_{n e w}}^{o} ∣ θ_{i}^{(s)}, Y_{i}^{o})$ , which is a univariate normal distribution that results from conditioning a multivariate normal on the (previously) observed continuous and latent outcomes. Those simulated predictions can then be used to obtain summaries such as sample moments and quantiles, as well as an estimate of the posterior predictive distribution:

π (y_{i l_{n e w}}^{o} ∣ z, Y^{o}) \approx \frac{1}{S} \sum_{i = 1}^{S} π (y_{i l_{n e w}}^{o} ∣ θ_{i}^{(s)}, Y_{i}^{o}) θ_{i}^{(s)} \sim π (θ_{i} ∣ z, Y^{o})

The ordinal case is slightly more complicated. The posterior predictive distribution is:

π (z_{i l_{n e w}} = k ∣ z, Y^{o}) = \int_{θ_{i}} π (z_{i l_{n e w}} = k, θ_{i} ∣ z, Y^{o}) d θ_{i} = \int_{θ_{i}} \int_{y_{i l_{n e w}^{u}}} π (z_{i l_{n e w}} = k ∣ y_{i l_{n e w}}^{u}, θ_{i}) π (y_{i l_{n e w}}^{u} ∣ θ_{i}, Y^{o}) π (θ_{i} ∣ z, Y^{o}) d y_{i l_{n e w}^{u}} d θ_{i} = \int_{θ_{i}} \int_{y_{i l_{n e w}^{u}}} 1_{[γ_{k - 1}^{(s)} < y_{i l_{n e w}}^{u} \leq γ_{k}^{(s)}]} π (y_{i l_{n e w}}^{u} ∣ θ_{i}, Y^{o}) π (θ_{i} ∣ z, Y^{o}) d y_{i l_{n e w}^{u}} d θ_{i} \approx \frac{1}{S} \sum_{i = 1}^{S} 1_{[γ_{k - 1}^{(s)} < y_{i l_{n e w}}^{u, (s)} \leq γ_{k}^{(s)}]} {y_{i l_{n e w}}^{u}, θ_{i}}^{(s)} \sim π (y_{i l_{n e w}}^{u} ∣ θ_{i}, Y_{i}^{o}) π (θ_{i} ∣ z, Y^{o})

This implies that each set of posterior draws ( $θ_{i}^{(s)}$ ) can be used to generate a latent outcome using $π (y_{i l_{n e w}}^{u} ∣ θ_{i}^{(s)}, Y_{i}^{o})$ , and then discretized using $γ^{(s)}$ to obtain ordinal valued predictions. As in the continuous case, $π (y_{i l_{n e w}}^{u} ∣ θ_{i}^{(s)}, Y_{i}^{o})$ is a univariate normal distribution that results from conditioning a multivariate normal on the (previously) observed continuous and latent outcomes.

6. Simulation studies

We now present several simulation studies where data generation was based on our motivating application. The first study assessed parameter recovery and convergence, the second evaluated forecast accuracy, and the third analyzed methods for model selection. bmrarm and bmr_slope with a single ordinal outcome are defined using:

z_{i l} = k \Leftrightarrow γ_{k - 1} < y_{i l 1} \leq γ_{k} k \in {1, \dots, 5} v e c (Y_{i}) = v e c (X_{i} β + U_{i} α_{i}) + ϵ_{i} ϵ_{i} \sim N (0, Σ_{i})

where $Σ_{i} = R \otimes C_{ρ} (t_{i})$ for bmrarm and $Σ_{i} = Σ = R \otimes I$ for bmr_slope. For the design matrices, we used $X_{i} = U_{i} = [1_{n_{i}} t_{i}]$ , where $t_{i}$ is a $n_{i} \times 1$ vector of times since baseline for subject i.

In each of the following simulation studies we mirrored the leishmaniosis data and generated samples from 48 subjects. For each subject, we generated two continuous outcomes and discretized one of them into an ordinal variable with five levels. Subjects had 4-7 (vector valued) responses with probabilities 0.042, 0.042, 0.083, and 0.833, respectively. Additionally, the ordinal outcomes at the first follow-up were discarded for all subjects, in alignment with the actual data. We randomly set nine of the continuous outcomes and four additional ordinal outcomes to missing. True values for each of the parameters can be found Table 3. We note that the subject specific effects were randomly generated for each dataset based on $Σ_{α}$ , so we only report 95% CI coverage. The bmrarm default hyperparameters were used in both simulation studies: $a_{ξ} = 0.2$ , $b_{ξ} = 5$ , $Ψ = I$ , and $ν = 5$ , $σ_{β}^{2} = 10^{5}$ , $δ = 10^{4}$ .

Table 3.

Parameter recovery based on 400 simulations datasets

Param	Truth	bmrarm (True Model)			bmr_slope
Param	Truth	Mean	SD	Cover	Mean	SD	Cover
$β_{11}$	0.50	0.494	0.112	0.958	0.491	0.112	0.948
$β_{21}$	0.18	0.193	0.049	0.945	0.196	0.049	0.940
$β_{12}$	0.05	0.052	0.078	0.945	0.052	0.079	0.942
$β_{22}$	0.10	0.100	0.033	0.955	0.100	0.033	0.948
$Σ_{11}$	0.30	0.384	0.096	0.890	0.261	0.056	0.885
$Σ_{21}$	0.02	0.033	0.024	0.970	0.019	0.015	0.958
$Σ_{22}$	0.13	0.155	0.031	0.908	0.101	0.010	0.230
$Σ_{α 11}$	0.22	0.238	0.140	0.960	0.397	0.151	0.720
$Σ_{α 22}$	0.18	0.174	0.071	0.948	0.255	0.067	0.750
$Σ_{α 33}$	0.04	0.044	0.012	0.955	0.049	0.012	0.892
$Σ_{α 44}$	0.05	0.069	0.029	0.928	0.082	0.030	0.788
$γ_{3}$	1.50	1.548	0.093	0.922	1.555	0.093	0.920
$γ_{4}$	1.90	1.989	0.142	0.915	2.003	0.141	0.890
$Σ_{α}$ Covariances				0.973			0.957
$ρ$	0.35	0.408	0.099	0.922
Overall				0.944			0.902
Random Effects				0.943			0.905
Overall w/o RE				0.948			0.871

Open in a new tab

Truth = true value of the parameter, Mean = mean of the posterior means, SD = mean of the posterior standard deviations, Cover = the empirical 95% CI coverage, w/o RE = without random effects

6.1. Simulation: convergence and recovery

For the first set of studies we generated data using bmrarm ( $(Σ_{i} = R \otimes C_{ρ} (t_{i}))$ ) where $ρ = 0.35$ and $C_{ρ} (t_{i})_{l l^{'}} = ρ^{∣ t_{i l} - t_{i l^{'}} ∣}$ . We then assessed convergence and parameter recovery of both bmrarm and bmr_slope.

6.1.1. Convergence assessment

For a randomly selected dataset, we simulated four separate chains with different starting values for the cutpoint parameters. The latent values were set to −0.5 when $z_{i l} = 1$ and $γ_{K - 1} + 0.5$ when $z_{i l} = K$ , otherwise they were set to the midpoint of the cutpoints they were bound between. The rest of the parameters were initialized at either 0 or 1 by default, depending on whether they were positively constrained. For each chain we obtain 25,000 draws, of which 5,000 were discarded as burn-in. We thinned the samples by 5, primarily to reduce the size of the stored objects. Five of the difficult to sample parameters $(γ_{3}, γ_{4}, Σ_{11}, Σ_{α 11}, Σ_{α 22})$ were assessed visually in (Figs. 3 and 4). The primary issue we saw was slow mixing of $Σ_{α 11}$ and $Σ_{α 22}$ , which we accounted for by taking a large number of samples; all effective sample sizes were above 200.

Fig. 3 — Trace and density plots for the cutpoint parameters corresponding to the latent outcome. Data comes from the randomly selected replicate dataset

Fig. 4 — Trace and density plots residual variance and subject specific intercept variance corresponding to the latent outcome

We also report posterior summaries for the five parameters for each chain (Table 2), along with the Gelman-Rubin diagnostic (Gelman and Rubin 1992), which was calculated using the coda package (Plummer et al. 2006); they suggest adequate convergence. While not reported in the table, the Gelman-Rubin diagnostic was applied to all parameters, including the patient specific effects and the latent outcomes, of which the maximum upper CI was 1.014.

Table 2.

Posterior estimates and convergence diagnostics for the cutpoint parameters and the variance term corresponding to the ordinal outcome

Param	Chain	Mean	SD	Lower 95	Upper 95	Eff	Gelman-Rubin
$γ_{3}$	1	1.42	0.077	1.29	1.59	510.21	1.006
	2	1.43	0.074	1.30	1.59	531.06
	3	1.43	0.077	1.29	1.59	362.99
	4	1.43	0.077	1.29	1.60	527.56
$γ_{4}$	1	1.72	0.111	1.53	1.96	405.19	1.004
	2	1.73	0.107	1.54	1.95	399.55
	3	1.72	0.108	1.53	1.94	305.34
	4	1.73	0.109	1.54	1.96	355.01
$Σ_{11}$	1	0.29	0.065	0.18	0.43	401.08	1.004
	2	0.29	0.061	0.19	0.43	452.52
	3	0.29	0.064	0.18	0.43	444.41
	4	0.29	0.063	0.19	0.43	453.50
$Σ_{α 11}$	1	0.10	0.074	0.01	0.29	715.69	1.002
	2	0.10	0.077	0.01	0.29	648.98
	3	0.10	0.075	0.01	0.29	588.83
	4	0.10	0.078	0.01	0.30	494.28
$Σ_{α 22}$	1	0.06	0.024	0.02	0.12	233.61	1.008
	2	0.06	0.023	0.03	0.12	221.16
	3	0.06	0.024	0.02	0.12	229.42
	4	0.06	0.026	0.03	0.13	228.67

Open in a new tab

Mean = posterior mean, SD = posterior standard deviation, Lower 95 = the lower 95% credible interval, Upper 95 = the Upper 95% credible interval, Eff = effective sample size, Gelman-Rubin = upper CI from the Gelman-Rubin Diagnostic

6.1.2. Parameter recovery

Since our visual and metric based convergence assessment revealed no notable issues, we evaluated bmrarm posterior estimates from the 400 simulated datasets. Our primary interest was in the mean of the posterior means and the coverage of the 95% credible intervals. For each dataset, we took 25,000 draws, of which 5,000 were discarded as burn-in and thinned to keep one of every five iterations, primarily for space reduction. The median run time for these simulations was 24.8 min; all simulations and the application were run on the University of Iowa High Performance Computing system.

Using the bmrarm package we fit bmr_slope to the simulated datasets drawing 10,000 samples, discarding 2,000 as burn-in and thinning by 2. On average, these models took 3.4 min. There is room for improvement in the speed of simpler models as the code was written for the autoregressive case and could cater more directly to implementations without an autoregressive term.

Generating data and evaluating posterior estimates from bmrarm, the overall 95% credible interval (CI) coverage for all unknown parameters (excluding the latent outcomes) was 0.944 (Table 3). The estimates of the latent cutpoints ( $γ_{3}$ and $γ_{4}$ ), the autoregressive term ( $ρ$ ), and the variance term corresponding to the latent outcome ( $Σ_{11}$ ) were biased upward, but still had reasonable coverage (≥ 0.890). bmr_slope compensated for $ρ = 0$ with inflated estimates of the variance terms of $Σ_{α}$ , which was paired with poor coverage and downward bias of $Σ_{22}$ (Table 3).

6.2. Simulation: forecast quality

Our primary analysis goal was to produce accurate short-term forecasts of disease progression. Thus, it was of interest to assess out-of-sample predictive performance. To compare the forecasting ability of bmrarm to bmr_int and bmr_slope, we used the 400 datasets that were generated using bmrarm ( $ρ = 0.35$ ). As a reminder, bmr_int is akin to bmr_slope, except $U_{i} = 1_{n_{i}}$ (i.e. slopes are not subject specific). For each dataset and subject, we drew 4 additional observations from the appropriate multivariate normal distribution that conditioned on their previous responses. Thus, all observations for a subject can be seen as a draw from a (joint) multivariate distribution, where the final 4 observations were held out when fitting the models and were used for prediction evaluation.

For the ordinal outcome, forecasts were assessed with the (discrete) ranked probability score (RPS, Epstein (1969)). To introduce RPS, let $p$ be a $3 \times 1$ a vector of forecast probabilities corresponding to a three level ordinal outcome, such as low, medium, and high. Additionally, let $o = {[1, 0, 0]}^{T}$ be a vector which indicates the true outcome value, which is ‘low’ in this example. Now we define the RPS for a single forecast:

RPS = \frac{1}{K - 1} \sum_{m = 1}^{K} {[(\sum_{k = 1}^{m} p_{k}) - (\sum_{k = 1}^{m} o_{k})]}^{2}

(21)

where K, which is three in our example, is the number of levels for the ordinal outcome. RPS is a strictly proper scoring rule, meaning its expected value is uniquely optimized by using the true probability distribution for $p$ (Wilks 2005). RPS is bound between 0 and 1, with 0 being a perfect score, and takes the ordering into account by more heavily penalizing forecasts as probability is shifted to categories further from the true outcome. For example, using a true outcome of ‘low’ and $p = {[0.2, 0.7, 0.1]}^{T}$ , the RPS is 0.325. When $p = {[0.2, 0.1, 0.7]}^{T}$ , the RPS is 0.565. Despite both forecasts assigning the same probability to the correct outcome, the latter was penalized for assigning a large probability (0.7) to the ‘high’ category.

As seen in Table 4, bmrarm offered the lowest RPS at all four time points, indicating it had the highest predictive power for the ordinal outcome. RPS for bmrarm was closely followed by that of bmr_slope, while bmr_int performed substantially worse. For this work, we calculated RPS using the rps function from the verification package (NCAR 2015), which took the mean of the individual RPS scores over all subjects.

Table 4.

Evaluation of 1–4 step ahead forecasts for bmrarm and two baseline models

Outcome	Metric	Steps Ahead	Models
Outcome	Metric	Steps Ahead	bmr_int	bmr_slope	bmrarm
Ordinal	RPS	1	0.122	0.084	0.082
Ordinal	RPS	2	0.129	0.089	0.088
Ordinal	RPS	3	0.137	0.090	0.088
Ordinal	RPS	4	0.146	0.088	0.087
Continuous	RMSE	1	0.896	0.442	0.427
Continuous	RMSE	2	1.079	0.533	0.525
Continuous	RMSE	3	1.267	0.604	0.594
Continuous	RMSE	4	1.462	0.669	0.658
Continuous	Coverage	1	0.792	0.932	0.953
Continuous	Coverage	2	0.706	0.901	0.956
Continuous	Coverage	3	0.626	0.889	0.958
Continuous	Coverage	4	0.566	0.883	0.963

Open in a new tab

RMSE = root mean square error, coverage = 95% prediction interval coverage, RPS = ranked probability score

For the continuous outcome, performance was quantified through root mean squared error (RMSE) of prediction while using the mean of draws from our posterior predictive distribution as a point estimate. Additionally, we evaluated the proportion of 95% prediction intervals that contained the true outcome. As seen in Table 4, bmrarm had the lowest RMSE at all four time points, and its prediction interval coverage was close to 95% (Fig. 4).

Using the same randomly selected dataset as Sect. 6.1.1, we visualized 1–4 step ahead point estimates (mean) for all three models for nine canine patients (Fig. 5). The model without subject specific slopes (bmr_int), did a poor job at matching patient trajectories. On the other hand, while bmrarm forecasts are pulled towards a patient’s most recent observation, there is sizable overlap between the mean predictions for bmr_slope and bmrarm. Of note, forecasts were generated using draws from multivariate normal posterior predictive distributions. Due to the probabilistic nature of the forecasts and the relatively small number of posterior samples (4000), the mean prediction lines do not visually appear perfectly linear (Fig. 5).

Fig. 5 — 1–4 step ahead mean based point estimates for 9 sample canine patients, where lines are colored according to the model. Black circles indicate observations that were used to fit the model, while the black triangles were used to evaluate predictions

6.3. Simulation: model selection

With a single dataset, a standard way to compare models and assess their out of sample predictive accuracy is through cross-validation or information criteria, which can be seen as approximations to different types of cross validation (Gelman et al. 2014). Cross-validation is a more direct route of assessment, but requires refitting models. Given the potentially infeasible computational burden of refitting our model, we opted for Deviance information criterion (DIC), defined in Spiegelhalter et al. (2002).

When using hierarchical models, there is ambiguity when defining the likelihood and several forms of DIC have been developed. The selection criterion has a different“focus” depending on the form of the likelihood used. As an example, Spiegelhalter et al. (2002) defined the hierarchical model $p (y, α, ϕ) = p (y ∣ α) p (α ∣ ψ) p (ψ)$ and considered two possible forms of the likelihood. The likelihood could be viewed as $p (y ∣ α)$ with prior $p (α) = \int_{ψ} p (α ∣ ψ) p (ψ) d ψ$ or as $p (y ∣ ψ) = \int_{α} p (y ∣ α) p (α ∣ ψ) d α$ with prior $p (ψ)$ . In the context of mixed effects models, we can treat $α$ as the subject specific effects and will refer to DIC based on $p (y ∣ α)$ as conditional DIC (cDIC) and DIC based on $p (y ∣ ψ)$ as marginal DIC (mDIC).

Chan and Grant (2016) explored cDIC and mDIC (which they refer to as the observed-data DIC) in the context of latent variable models, with specific interest in the stochastic volatility models. Through simulation studies, the authors show that cDIC favors overfitted models, while mDIC performs well. cDIC is straight forward to calculate and is readily available in commonly used Bayesian software, while mDIC is often difficult to calculate which limits its adoption. In the case of Chan and Grant (2016), they developed an importance sampling algorithm to calculate mDIC.

6.3.1. Likelihood and DIC

To simplify notation for this section, let $Y$ be a matrix of responses. To define DIC let $D (θ) = - 2 l o g (p (Y ∣ θ))$ be the deviance and $\bar{θ}$ be the posterior mean of the parameters. Then the effective number of parameters is $p_{D} = E_{θ ∣ y} [D (θ)] - D (\bar{θ})$ and $DIC = D (\bar{θ}) + 2_{p_{D}}$ , where $E_{θ ∣ y} [D (θ)]$ is estimated with $\bar{D (θ)}$ . DIC is a tradeoff between goodness of fit and complexity, with lower values indicating a better fit. $p (Y ∣ θ)$ is the joint likelihood over all subjects and observations, which can be factorized into $\prod_{i = 1}^{N} p (Y_{i} ∣ θ)$ . When $ρ \neq 0$ , there is not conditional independence given the subject specific effects and the likelihood cannot be further factorized.

6.3.2. Likelihoods

We now formally define the likelihoods and DIC calculations used in our model selection. Focusing on the continuous outcomes (observed and unobserved), our model is

v e c (Y_{i}) = v e c (X_{i} β) + (I \otimes U_{i}) v e c (α_{i}) + ϵ_{i}

(22)

The model can be formulated in (at least) two ways. The first is as a mixed model where we condition on the subject specific effects

v e c (Y_{i}) \sim N (v e c (X_{i} β) + (I \otimes U_{i}) v e c (α_{i}), Σ_{i})

(23)

and we refer to $f (Y_{i} ∣ β, α_{i}, Σ)$ as the ‘continuous outcome conditional likelihood’, which is the likelihood contribution from the ith subject. Alternatively, we can view this as a linear model with additional structure added to the residual covariance

v e c (Y_{i}) \sim N (v e c (X_{i} β), (I \otimes U_{i}) Σ_{α} (I \otimes U_{i})^{T} + Σ_{i})

(24)

and we refer to $f (Y_{i} ∣ β, Σ_{α}, Σ_{i})$ as the ‘continuous outcome marginalized likelihood’. This relies on a general result that occurs when a normal likelihood is combined with a normal prior on the subject specific effects; it can be confirmed by utilizing the Woodbury matrix identity (Woodbury 1950) to solve

f (Y_{i} ∣ β, Σ_{α}, Σ_{i}) = \int_{α_{i}} f (Y_{i} ∣ β, α_{i}, Σ_{i}) f (α_{i} ∣ Σ_{α}) d α_{i}

Finally, need to define the non-augmented likelihood contributions that are used to calculate cDIC and mDIC. Let $θ$ denote posterior draws from all parameters of interest, then we can write:

f (z_{i}, Y_{i}^{o} ∣ θ) = \int_{y_{i n_{i}}^{u}} \dots \int_{y_{i 1}^{u}} f (z_{i}, Y_{i}^{o}, y_{i}^{u} ∣ θ) d y_{i 1}^{u} \dots d y_{i n_{i}}^{u} = \int_{y_{i n_{i}}^{u}} \dots \int_{y_{i 1}^{u}} f (z_{i} ∣ y_{i}^{u}) f (y_{i}^{u}, Y_{i}^{o} ∣ θ) d y_{i 0}^{u} \dots d y_{i n_{i}}^{u} = \int_{y_{i n_{i}}^{u}} \dots \int_{y_{i 1}^{u}} f (z_{i} ∣ y_{i}^{u}) f (y_{i}^{u} ∣ Y_{i}^{o}, θ) f (Y_{i}^{o} ∣ θ) d y_{i 1}^{u} \dots d y_{i n_{i}}^{u} = f (Y_{i}^{o} ∣ θ) \int_{γ_{z_{i n_{i}} - 1}}^{γ_{z_{i n_{i}}}} \dots \int_{γ_{z_{i 1} - 1}}^{γ_{z_{i 1}}} f (y_{i}^{u} ∣ Y_{i}^{o}, θ) d y_{i 1}^{u} \dots d y_{i n_{i}}^{u} = f (Y_{i}^{o} ∣ θ) f (z_{i} ∣ Y_{i}^{o}, θ)

When calculating cDIC we use the ‘continuous outcome conditional likelihood’ for $f (y_{i}^{u}, Y_{i}^{o} ∣ θ)$ , while the ‘continuous outcome marginalized likelihood’ is used when calculating mDIC. Both forms of the likelihood (contributions) are multivariate normal and can be written as a conditional multivariate normal times a marginal multivariate normal (line 3). To find $f (z_{i} ∣ Y_{i}^{o}, θ)$ , we once again rely on numeric integration using the $omxMnor ()$ function from the OpenMX package. When $z_{i l}$ is missing, $y_{i l}^{u}$ is integrated over the real line and does not impact the value of the likelihood. When $y_{i l}^{o}$ is missing, it is viewed as a parameter and does not contribute to the likelihood.

6.3.3. Simulation studies

To assess cDIC and mDIC, we simulated 400 datasets under bmrarm and bmr_slope. We then fit bmr_int, bmr_slope, and bmrarm to each dataset. As a reminder, bmr_int is a multivariate model with subject specific intercepts and no autoregressive term, while bmr_slope is a similarly defined model with both subject specific intercepts and slopes. Both bmr_int and bmr_slope were fit using the bmrarm package. bmr_int was not selected by either criteria for any dataset and was excluded from Table 5.

Table 5.

Simulation study comparing selection capabilities of cDIC and mDIC

Data generating model	Proportion of times bmrarm was selected by each criterion
Data generating model	cDIC	ΔcDIC≤-2	mDIC	ΔmDIC≤-2
bmr_slope	0.350	0.318	0.215	0.078
bmrarm ( $ρ = 0.35$ )	0	0	0.98	0.948

Open in a new tab

Values are the proportion of times each selection criteria selected each model. ‘ΔmDIC≤-2’ is the proportion of times the bmrarm model results in a decrease of mDIC of at least 2 (compared to bmr_slope)

When we generated data from bmr_slope, mDIC did a better job than cDIC at selecting the true model (0.785 vs 0.65; 0.785 comes from 1 - 0.215). If we mandate that we need a decrease in DIC of at least 2 to select the more complicated bmrarm method, then mDIC only selected the incorrect model 7.8% of the time (31.8% for cDIC). Generating data from bmrarm with $ρ = 0.35$ , cDIC never selected the correct model while mDIC was correct 98.0% of the time. Even when needing a decrease of at least 2, mDIC selected the more complicated autoregressive model 94.8% of simulations. Since mDIC did a superior job of distinguishing between the two models, similar to the results obtained in Chan and Grant (2016), we used mDIC for the leishmaniosis application and as the default information criterion in the package.

7. Application

Recall the leishmaniosis dataset presented in Sect. 2. We analyzed this data using bmrarm, bmr_int, bmr_slope, and univariate approaches available in the brms package. The primary outcome was LeishVet score (ordinal) and the secondary outcome was log anti-SLA (continuous). The dataset contained 318 observed or partially observed multivariate responses and a handful of potential covariates including age group at baseline, study location, ectoparasiticide treatment group, and a dual path platform (DPP) leishmania serological test (only performed at enrollment). Higher values of DPP indicated more advanced disease.

Summary statistics for both dependent variables and all covariates are reported in Table 6. DPP is extremely right skewed, and the log of it appeared to have a linear association with the secondary outcome, log SLA (Fig. 7). This makes sense, as both measure anti-leishmania antigen antibodies. Thus we used (and therefore report) log DPP for modeling purposes. We also combined age groups into 0-2 vs 3-11 due to the small samples sizes for the older age groups. Similarly, we only observed one case of LeishVet score = 4 and set this to LeishVet score = 3 prior to fitting the models.

Table 6.

Summary statistics for the 2 observed responses (bolded) and the four covariates

Variable	Type	Level	Count (%)	Mean (SD)	Missing (%)
LeishVet Score	Ordinal	0	30 (9.3%)	NA	52 (16.1%)
		1	116 (36.0%)	NA
		2	115 (35.7%)	NA
		3	8 (2.5%)	NA
		4	1 (0.3%)	NA
Age	Ordinal	0–2	21 (43.8%)	NA	0 (0%)
		3–5	23 (47.9%)	NA
		6–8	3 (6.2%)	NA
		9–11	1 (2.1%)	NA
Location	Nominal	B	19 (39.6%)	NA	0 (0%)
		M	17 (35.4%)	NA
		W	12 (25.0%)	NA
Treatment	Binary	Blue	24 (50%)	NA	0 (0%)
		Yellow	24 (50%)	NA
Log anti-SLA	Continuous		NA	−0.27 (0.78)	9 (2.8%)
Baseline log DPP	Continuous		NA	2.14 (1.45)	0 (0%)

Open in a new tab

Age is the baseline age group, location is the facility where the canine is located, treatment is the ectoparasiticide treatment group, and DPP is a dual path platform leishmania serological test. DPP detects anti-leishmania antibodies similar to anti-SLA ELISA but is more specific

Fig. 7 — Marginal associations between the four covariates and baseline values of the dependent variables. The $p$ -value between log SLA and log DPP is based on pearson correlation, while $p$ -values for LeishVet score used fishers exact test

We first visualized LeishVet score and log of anti-SLA, and saw evidence of a positive association between the two (Fig. 6), as has been previously observed (Proverbio et al. 2014). This motivated the use of multivariate approaches. Marginal associations between covariates and baseline values of the dependent variables are plotted in Fig. 7. Visual inspection indicated a positive association between baseline log SLA and log DPP at enrollment (pearson correlation $p$ -value< 0.001). We also saw that location W had a large number of dogs with baseline LeishVet score 0 or and older dogs tended to have higher baseline LeishVet score. This is consistent with age associated disease progression, which has also been observed in other cohorts (Toepp et al. 2019). Formal tests for association (fisher’s exact test) between baseline LeishVet score and the categorical covariates (age group, location, treatment) were all non-significant. Nevertheless, we used log DPP, age group, location, and treatment as covariates in the regression models, preferring a-priori specification to hypothesis test based model selection.

7.1. Regression models

We fit a bivariate bmrarm to the leishmaniosis data. The design matrix, $X$ , contributed an intercept, a linear temporal term, an indicator for treatment status, log DPP values, and dummy variable for age > 3 and location. To write the regression equation we use $X_{i}$ to indicate rows corresponding to subject i, and $U_{i}$ is a design matrix for the subject specific effects, which contributed an intercept and a linear temporal term. Then, using $y_{i j}^{l e i s h v e t}$ and $y_{i j}^{l o g s l a}$ to denote the unobserved continuous and observed continuous outcomes, respectively, we can write our equation as:

[\begin{matrix} y_{i 1}^{l e i s h v e t} & y_{i 1}^{l o g s l a} \\ y_{i 2}^{l e i s h v e t} & y_{i 2}^{l o g s l a} \\ ⋮ & ⋮ \\ y_{i n_{i}}^{l e i s h v e t} & y_{i n_{i}}^{l o g s l a} \end{matrix}] = Y_{i} = X_{i} β + U_{i} α_{i} + ϵ_{i}

where $v e c (ϵ_{i}) \sim N (0, R \otimes C_{ρ} (t_{i}))$ . There were several baseline methods we compared to bmrarm, which had the same fixed effect structure, but differed in subject specific effects and covariance structures.

Univariate models fit using the $brm ()$ function from the brms package with subject specific intercepts. Log anti-SLA was fit using all default settings while LeishVet score used family = cumulative(‘probit’).
The same setup as (1) with subject specific intercepts and slopes.
bmr_int and bmr_slope

The package default hyperparameters were used for bmrarm, bmr_int, and bmr_slope: $a_{ξ} = 0.2$ , $b_{ξ} = 5$ , $Ψ = I$ , and $ν = 5$ , $σ_{β}^{2} = 10^{5}$ , $δ = 10^{4}$ .

7.2. Convergence assessment

Before interpretation and model comparison, we assessed convergence of the bmrarm method, for which drew 25,000 samples and discarded the first 5,000 as burn-in. Samples were thinned by five to reduce the size of the returned objects. We first evaluated convergence of the sampler, focusing on $γ_{3}$ , $Σ_{11}$ , $Σ_{α 11}$ , $Σ_{α 22}$ . Visual inspection of the trace plots (Figs. 8 and 9) indicated inefficient mixing, but no serious issues. The inefficiency was accounted for by taking a large number of draws. The maximum Gelman-Rubin upper CI for any parameter, including the patient specific effects and latent outcomes, was 1.007, well below the commonly used threshold of 1.1 (Roy 2020).

Fig. 8 — leishmaniosis application: Trace and density plots for the one free cutpoint and variance term corresponding to the latent outcome

Fig. 9 — leishmaniosis application: Trace and density plots subject specific variance terms corresponding to the latent outcome

To further evaluate the appropriateness of bmrarm we used several posterior predictive checks. Gelman et al. (2013) defined a Bayesian $p$ -value as

p_{B} = P r (T (y^{r e p}, θ) \geq T (y, θ) ∣ y)

Posterior predictive checks can be implemented through a simulation based approach by comparing realized test quantities $T (y, θ^{s})$ to predictive test quantities $T (y^{r e p, s}, θ^{s})$ , which are based on the S posterior draws. When $p_{B}$ is close to 0 or 1, this indicates a lack of fit and requires further investigation of the mechanics of the model.

Two aspects of our model were evaluated in this way. The posterior predictive check assessed whether the replicated datasets had a reasonable number of ordinal state transitions. We specifically compared the number of dogs with 0 transitions, 1–2 transitions, and more than 3 transitions. Results are displayed in Table 7, which indicated a reasonable number of state transitions in the replicated datasets. For this check, predictions were generated simultaneously for all time points and outcomes using the fixed effects, patient specific effects, and residual covariance terms.

Table 7.

Frequency of ordinal state transitions between the replicated and observed data

Outcome	Test	T(y)	95% CI for $T (y^{r e p})$	$p$ -value
Ordinal	% with transitions = 0	16.7	[2.1, 20.8]	0.14
	% with 1≤ transitions ≤2	52.1	[37.5, 66.7]	0.52
	% with transitions ≥3	31.3	[22.9, 52.1]	0.85

Open in a new tab

Second, we replicated the continuous outcome at the final time point for each subject using the posterior draws and their prior $n_{i} - 1$ multivariate responses. Our ultimate goal was short term forecasts of disease progression, this let us evaluate if the posterior draws and prior observations combined to make realistic one-step-ahead predictions. We compared the replicated final responses to the observed data by using the median, standard deviation, min, and max as test quantities. There were no serious discrepancies between the summaries of the observed and replicated data (Fig. 10), although the posterior predictive checks implied that the proposed method may tend to overestimate the maximum observation as well as the spread of the observations.

Fig. 10 — Posterior predictive checks for the continuous outcome for the leishmaniosis dataset. Checks are based on replicated the final outcomes for each subject using the posterior draws and their previous $n_{i} - 1$ responses. Plots were made using the `ppc_stat` function from the **bayesplot** package (Gabry and Mahr 2021)

7.3. Model comparison

We return to comparison of bmrarm and the four baseline models, which was accomplished using mDIC. The brms specification used different constraints to identify the latent outcomes in the ordinal models. Since we integrated out the latent outcomes and use the non-augmented likelihood for mDIC, it was still appropriate to compare the methods. We first saw that bmr_int and bmr_slope offered notable improvements over their univariate counterparts (Table 8). Additionally, bmrarm was the best performing model with a mDIC decrease of roughly 3.1 from bmr_slope. The posterior mean of $ρ$ was 0.17 and the 95% credible interval did not contain 0, indicating support for it’s inclusion.

Table 8.

mDIC comparison for the leishmaniosis application

Type	Model	mDIC	pD	Mean $ρ$ (95 % CI)	Time (per chain)
Univariate	Intercept	949.9	20.4		0.7 min
	Int + Slope	907.9	24.6		2.6 min
Bivariate	bmr_int	942.6	21.8		6.7 min
	bmr_slope	902.1	25.2		9.8 min
	bmrarm	899.0	25.5	0.17 (0.02, 0.33)	29.8 min

Open in a new tab

7.4. Results

Table 9 contains posterior means, 95% credible intervals (CI), and the associated priors for the fitted bmrarm. As previously observed, the posterior mean for the autoregressive term, $ρ$ , was 0.172 and the credible interval did not contain 0. Posterior estimates of $β_{t i m e}$ and $β_{[a g e \geq 3]}$ were positive for both outcomes and credible intervals did not contain 0, except for $β_{[a g e \geq 3]}$ for the continuous outcome. These positive associations were anticipated, as we expect disease to get worse over time. The posterior mean of $β_{[L o c = W]}$ was large in magnitude and negative for both outcomes, and its CIs did not contain 0. The estimated effect of blDPP, which indicates more advanced disease, was positive for both outcomes.

Table 9.

Prior distributions and parameter estimates for analysis of the leishmaniosis dataset

Parameter	Prior	Outcome	Posterior Mean (SD)	Posterior 95% CI
$ρ$	$U (- 1, 1)$	Ordinal	0.172 (0.079)	(0.02, 0.33)
$γ_{3}$	$U (0, 10^{4})$	Ordinal	2.36 (0.168)	(2.065, 2.718)
$β_{i n t e r c e p t}$	$N (0, R, 10^{5} I)$	Ordinal	0.472 (0.168)	(0.139, 0.802)
$β_{[t r e a t m e n t = Y]}$		Ordinal	0.22 (0.128)	(− 0.026, 0.474)
$β_{t i m e}$		Ordinal	0.066 (0.026)	(0.016, 0.118)
$β_{[a g e \geq 3]}$		Ordinal	0.346 (0.147)	(0.063, 0.644)
$β_{b l D P P}$		Ordinal	0.085 (0.051)	(− 0.014, 0.188)
$β_{[L o c = M]}$		Ordinal	0.102 (0.152)	(− 0.197, 0.4)
$β_{[L o c = W]}$		Ordinal	− 0.975 (0.171)	(− 1.319, − 0.642)
$β_{i n t e r c e p t}$		Cont	− 1.175 (0.164)	(− 1.497, − 0.85)
$β_{[t r e a t m e n t = Y]}$		Cont	− 0.026 (0.126)	(− 0.272, 0.222)
$β_{t i m e}$		Cont	0.077 (0.02)	(0.038, 0.117)
$β_{[a g e \geq 3]}$		Cont	0.289 (0.147)	(− 0.003, 0.582)
$β_{b l D P P}$		Cont	0.293 (0.052)	(0.189, 0.393)
$β_{[L o c = M]}$		Cont	0.026 (0.153)	(− 0.272, 0.333)
$β_{[L o c = W]}$		Cont	− 0.375 (0.171)	(− 0.715, − 0.041)
$R_{11}$	$I W (I, 4)$	Ordinal	0.287 (0.054)	(0.198, 0.408)
$\frac{R_{12}}{\sqrt{R_{11} R_{22}}}$		Shared	− 0.02 (0.081)	(− 0.179, 0.138)
$R_{22}$		Cont	0.142 (0.018)	(0.113, 0.181)
$Σ_{α 11}$	$S I W (I, 5, 0.2, 5)$	Ordinal	0.078 (0.063)	(0.01, 0.244)
$\frac{Σ_{α 12}}{\sqrt{Σ_{α 11} Σ_{α 22}}}$		Ordinal	− 0.249 (0.286)	(− 0.721, 0.362)
$\frac{Σ_{α 13}}{\sqrt{Σ_{α 11} Σ_{α 33}}}$		Shared	0.196 (0.307)	(− 0.449, 0.733)
$\frac{Σ_{α 14}}{\sqrt{Σ_{α 11} Σ_{α 44}}}$		Shared	− 0.012 (0.285)	(− 0.55, 0.539)
$Σ_{α 22}$		Ordinal	0.011 (0.005)	(0.004, 0.022)
$\frac{Σ_{α 23}}{\sqrt{Σ_{α 22} Σ_{α 33}}}$		Shared	0.225 (0.245)	(− 0.285, 0.652)
$\frac{Σ_{α 24}}{\sqrt{Σ_{α 22} Σ_{α 44}}}$		Shared	0.212 (0.227)	(− 0.255, 0.619)
$Σ_{α 33}$		Cont	0.159 (0.059)	(0.064, 0.295)
$\frac{Σ_{α 34}}{\sqrt{Σ_{α 33} Σ_{α 44}}}$		Cont	− 0.183 (0.209)	(− 0.542, 0.27)
$Σ_{α 44}$		Cont	0.012 (0.004)	(0.006, 0.021)

Open in a new tab

The posterior mean of the residual correlation ( $\frac{R_{12}}{\sqrt{R_{11} R_{22}}}$ ) was negligible, implying conditional independence given the fixed and random effects. While this indicated the lack of need for a joint residual covariance matrix for this application, we feel this will not always be the case. All CI for the correlations in $Σ_{α}$ contained 0; this was not unexpected given the small sample size. However, posterior means of 3 of the 4 shared correlation parameters were ≥ 0.196, indicating support for jointly estimating the patient specific effects.

1–4 step forecasts are presented for LeishVet scores in Fig. 11 and log anti-SLA in Fig. 12, which appeared in line with expectations based on the observed values. Continued longitudinal follow up of study subjects would permit observation of more transition events and is expected to enhance the clinical utility of models of this form, and will be used for future analyses.

8. Conclusions

In this paper we developed a multivariate Bayesian hierarchical model with autoregressive errors that incorporates ordinal and continuous responses, which is accompanied by a reproducibility package (https://github.com/nickseedorff/bmrarm). Efforts were made to have reasonable sampling efficiency and to use flexible priors, such as the scaled inverse-Wishart prior for the patient specific effects covariance structure. A simulation study indicated that when an autoregressive component is present in the data generation process, its inclusion in the model can improve short term forecasts of both the continuous and ordinal outcomes. An additional simulation study motivated mDIC as a criterion for model selection, and mDIC showed preference to our method over several baseline models.

8.1. Limitations and future work

The primary limitations for the method are the high computational cost, the requirement that observations are (roughly) equally spaced; it does not allow for inclusion of binary outcomes, the range parameter is assumed equivalent across all outcomes, the SIW prior may be overly informative when using the default hyperparameters, and our derivations are based on a single ordinal outcome. Future work should look to further improve the computational efficiency of the method, possibly through a parameter-expansion for data-augmentation (Liu and Wu 1999) routine, and expand the number and types of outcomes that can be incorporated. Efforts could also be made to allow for different covariates for the different outcomes, which may help with implementation schemes for binary responses. Relaxing the assumption of equally spaced observations and allowing $ρ$ to differ between outcomes are areas ripe for investigation.

Despite these challenges and areas for improvement, the proposed method holds promise for application in clinical settings. In particular, improved methods for longitudinal forecasting based on models which can include ordinal outcomes alongside other variable types may aid clinical decision-making. Furthermore, the use of fully Bayesian techniques allow the straightforward use of custom utility and loss functions, going beyond forecasts to quantify optimal treatment decisions for individuals.

Supplementary Material

Supplementary Information PDF

NIHMS1893623-supplement-Supplementary_Information_PDF.pdf^{(716.9KB, pdf)}

Acknowledgements

Research and data collection reported in this publication was supported by the National Institute of Allergy and Infectious Disease of the National National Institutes of Health under Award Number R01AI139267, as well as through an award from the Masters of Foxhounds Association Foundation. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health or any other party.

Footnotes

Supplementary Information The online version contains supplementary material available at https://doi.org/10.1007/s00180-022-01280-x.

References

Agresti A (2012) Categorical data analysis. Wiley series in probability and statistics. Wiley, Hoboken [Google Scholar]
Albert JH, Chib S (1993) Bayesian analysis of binary and polychotomous response data. J Am Stat Assoc 88(422):669–679. 10.1080/01621459.1993.10476321 [DOI] [Google Scholar]
Alessie R, Hochguertel S, van Soest A (2004) Ownership of stocks and mutual funds: a panel data analysis. Rev Econ Stat 86(3):783–796 [Google Scholar]
Alvar J, Vélez ID, Bern C et al. (2012) Leishmaniasis worldwide and global estimates of its incidence. PLoS One 7(5):e35671. [DOI] [PMC free article] [PubMed] [Google Scholar]
Banerjee S, Carlin BP, Gelfand AE (2015) Hierarchical modeling and analysis for spatial data, 2nd ed. Chapman & Hall/CRC, London [Google Scholar]
Bürkner PC (2018) Advanced Bayesian multilevel modeling with the R package brms. R J 10(1):395–411. 10.32614/RJ-2018-017 [DOI] [Google Scholar]
Cappellari L, Jenkins SP (2008) The dynamics of social assistance receipt: measurement and modelling issues, with an application to Britain. OECD Social, Employment and Migration Working Papers 67, OECD Publishing, 10.1787/236346714741 [DOI] [Google Scholar]
Carpenter B, Gelman A, Hoffman MD et al. (2017) Stan: a probabilistic programming language. J Stat Softw Artic 76(1):1–32. 10.18637/jss.v076.i01 [DOI] [PMC free article] [PubMed] [Google Scholar]
Catalno PJ (1997) Bivariate modelling of clustered continuous and ordered categorical outcomes. Stat Med 16(8):883–900. [DOI] [PubMed] [Google Scholar]
Chan JC, Grant AL (2016) On the observed-data deviance information criterion for volatility modeling. J Financ Econom 14(4):772–802 [Google Scholar]
Chappuis F, Sundar S, Hailu A et al. (2007) Visceral leishmaniasis: what are the needs for diagnosis, treatment and control? Nat Rev Microbiol 5(11):873–882 [DOI] [PubMed] [Google Scholar]
Chi EM, Reinsel GC (1989) Models for longitudinal data with random effects and ar(1) errors. J Am Stat Assoc 84(406):452–459 [Google Scholar]
Cowles MK (1996) Accelerating Monte Carlo Markov chain convergence for cumulative-link generalized linear models. Stat Comput 6:101–111 [Google Scholar]
Cowles MK, Carlin BP, Connett JE (1996) Bayesian tobit modeling of longitudinal ordinal clinical trial compliance data with nonignorable missingness. J Am Stat Assoc 91(433):86–98 [Google Scholar]
Cressie N, Wikle CK (2011) Statistics for spatio-temporal data. Wiley, Hoboken [Google Scholar]
Ding S, Cook RD (2014) Dimension folding pca and pfc for matrix-valued predictors. Stat Sin 24(1):463–492 [Google Scholar]
Duprey ZH, Steurer FJ, Rooney JA et al. (2006) Canine visceral leishmaniasis, United States and Canada, 2000–2003. Emerg Infect Dis 12(3):440–446 [DOI] [PMC free article] [PubMed] [Google Scholar]
Epstein ES (1969) A scoring system for probability forecasts of ranked categories (1962–1982). J Appl Meteorol 8(6):985–987 [Google Scholar]
Feasey N, Wansbrough-Jones M, Mabey DCW et al. (2009) Neglected tropical diseases. Br Med Bull 93(1):179–200. 10.1093/bmb/ldp046 [DOI] [PubMed] [Google Scholar]
Gabry J, Mahr T (2021) Bayesplot: plotting for bayesian models. R package version 1.8.0 [Google Scholar]
Gelman A, Hill J (2007) Data analysis using regression and multilevel/hierarchical models. In: Vol analytical methods for social research. Cambridge University Press, New York [Google Scholar]
Gelman A, Rubin DB (1992) Inference from Iterative simulation using multiple sequences. Stat Sci 7(4):457–472. 10.1214/ss/1177011136 [DOI] [Google Scholar]
Gelman A, van Dyk DA, Huang Z et al. (2008) Using redundant parameterizations to fit hierarchical models. J Comput Gr Stat 17(1):95–122. 10.1198/106186008X287337 [DOI] [Google Scholar]
Gelman A, Carlin J, Stern H et al. (2013) Bayesian data analysis, 3rd ed. Chapman & Hall/CRC, Boca Raton [Google Scholar]
Gelman A, Hwang J, Vehtari A (2014) Understanding predictive information criteria for bayesian models. Stat Comput 24:997–1016 [Google Scholar]
Genton MG, Kleiber W (2015) Cross-covariance functions for multivariate geostatistics. Stat Sci 30(2):147–163. 10.1214/14-STS487 [DOI] [Google Scholar]
Ghasemzadeh S, Ganjali M, Baghfalaki T (2020) Bayesian quantile regression for joint modeling of longitudinal mixed ordinal and continuous data. Commun Stat Simul Comput 49(2):375–395. 10.1080/03610918.2018.1484482 [DOI] [Google Scholar]
Gueorguieva RV, Agresti A (2001) A correlated probit model for joint modeling of clustered binary and continuous responses. J Am Stat Assoc 96(455):1102–1112 [Google Scholar]
Hadfield JD (2010) Mcmc methods for multi-response generalized linear mixed models: the MCMCglmm R package. J Stat Softw 33(2):1–2220808728 [Google Scholar]
Hasegawa H (2009) Bayesian dynamic panel ordered probit model and its application to subjective well being. Commun Stat Simul Comput 38(6):1321–1347. 10.1080/03610910902903133 [DOI] [Google Scholar]
Heckman JJ (1981) The incidental parameters problem and the problem of initial conditions in estimating discrete time-discrete data stochastic process. In: Manski CF, McFadden DL (eds) Structural analysis of discrete data with econometric applications. The MIT Press, Cambridge, pp 179–195 [Google Scholar]
Heckman James J (1981) Statistical models for discrete panel data. Structural analysis of discrete data with econometric applications 114:178 [Google Scholar]
Jin X, Banerjee S, Carlin BP (2007) Order-free co-regionalized areal data models with application to multiple-disease mapping. J Royal Stat Soc Ser B (Stat Method) 69(5):817–838 [DOI] [PMC free article] [PubMed] [Google Scholar]
Jones RH (2011) Bayesian information criterion for longitudinal and clustered data. Stat Med 30(25):3050–3056. 10.1002/sim.4323 [DOI] [PubMed] [Google Scholar]
Kang EL, Cressie N (2011) Bayesian inference for the spatial random effects model. J Am Stat Assoc 106(495):972–983 [Google Scholar]
LeishVet (2016) Clinical staging, treatment and prognosis. https://www.leishvet.org/fact-sheet/clinical-staging/
Li Q, Pan J, Belcher J (2016) Bayesian inference for joint modelling of longitudinal continuous, binary and ordinal events. Stat Methods Med Res 25(6):2521–2540. 10.1177/0962280214526199 [DOI] [PubMed] [Google Scholar]
Li ZR, McComick TH, Clark SJ (2020) Using bayesian latent gaussian graphical models to infer symptom associations in verbal autopsies. Bayesian Anal 15(3):781–807. 10.1214/19-BA1172 [DOI] [PMC free article] [PubMed] [Google Scholar]
Liu JS, Wu YN (1999) Parameter expansion for data augmentation. J Am Stat Assoc 94(448):1264–1274 [Google Scholar]
MacNab YC (2016) Linear models of coregionalization for multivariate lattice data: order-dependent and order-free cmcars. Stat Methods Med Res 25(4):1118–1144. 10.1177/0962280216660419 [DOI] [PubMed] [Google Scholar]
Matyas L, Sevestre P (2008) The econometrics of panel data: fundamentals and recent developments in theory and practice, 3rd edn. Springer, Berlin [Google Scholar]
Meng XL, Dyk DAV (1999) Seeking efficient data augmentation schemes via conditional and marginal augmentation. Biometrika 86(2):301–320 [Google Scholar]
NCAR (2015) Verification: weather forecast verification utilities. R Package Vers 1:42 [Google Scholar]
Neale MC, Hunter MD, Pritikin JN et al. (2016) OpenMx 2.0: extended structural equation and statistical modeling. Psychometrika 81(2):535–549. 10.1007/s11336-014-9435-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
O’Malley AJ, Zaslavsky AM (2008) Domain-level covariance analysis for multilevel survey data with structured nonresponse. J Am Stat Assoc 103(484):1405–1418 [Google Scholar]
Petersen CA, Barr SC (2009) Canine Leishmaniasis in North America: emerging or newly recognized? Vet Clin North Am Small Anim Pract 39(6):1065–1074 [DOI] [PMC free article] [PubMed] [Google Scholar]
Plummer M (2003) Jags: A program for analysis of bayesian graphical models using gibbs sampling [Google Scholar]
Plummer M, Best N, Cowles K et al. (2006) Coda: convergence diagnosis and output analysis for mcmc. R News 6(1):7–11 [Google Scholar]
Proverbio D, Spada E, Bagnagatti de Giorgi G et al. (2014) Relationship between leishmania ifat titer and clinicopathological manifestations (clinical score) in dogs. BioMed Res Int. 10.1155/2014/412808 [DOI] [PMC free article] [PubMed] [Google Scholar]
Pudney S (2008) The dynamics of perception: modelling subjective wellbeing in a short panel. J Royal Stat Soc Series A (Stat Soc) 171(1):21–40 [Google Scholar]
R Core Team (2021) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna [Google Scholar]
Ribeiro RR, Michalick MSM, da Silva ME et al. (2018) Canine Leishmaniasis: an overview of the current status and strategies for control. Biomed Res Int. 10.1155/2018/3296893 [DOI] [PMC free article] [PubMed] [Google Scholar]
Roy V (2020) Convergence diagnostics for markov chain monte carlo. Annu Rev Stat Appl 7(1):387–412. 10.1146/annurev-statistics-031219-041300 [DOI] [Google Scholar]
Schaut RG, Robles-Murguia M, Juelsgaard R et al. (2015) Vectorborne transmission of leishmania infantum from hounds, United States. Emerg Infect Dis 21(12):2209–2212. 10.3201/eid2112.141167 [DOI] [PMC free article] [PubMed] [Google Scholar]
Schmidt AM, Gelfand AE (2003) A bayesian coregionalization approach for multivariate pollutant data. J Geophys Res Atmos. 10.1029/2002JD002905 [DOI] [Google Scholar]
Schuurman NK, Grasman RPPP, Hamaker EL (2016) A comparison of inverse-wishart prior specifications for covariance matrices in multilevel autoregressive models. Multivar Behav Res 51(2–3):185–206. 10.1080/00273171.2015.1065398 [DOI] [PubMed] [Google Scholar]
Solano-Gallego L, Cardoso L, Pennisi MG et al. (2017) Diagnostic challenges in the era of canine leishmania infantum vaccines. Trends Parasitol 33(9):706–717 [DOI] [PubMed] [Google Scholar]
Spiegelhalter DJ, Best NG, Carlin BP et al. (2002) Bayesian measures of model complexity and fit. J Royal Stat Soc Ser B 64(4):583–639 [Google Scholar]
Steele F, Grundy E (2021) Random effects dynamic panel models for unequally spaced multivariate categorical repeated measures: an application to child-parent exchanges of support. J Royal Stat Soc Ser C (Appl Statist) 70(1):3–23. 10.1111/rssc.12446 [DOI] [Google Scholar]
Stegmueller D (2013) Modeling dynamic preferences: a bayesian robust dynamic latent ordered probit model. Polit Anal 21(3):314–333 [Google Scholar]
Talhouk A, Doucet A, Murphy K (2012) Efficient bayesian inference for multivariate probit models with sparse inverse correlation matrices. J Comput Gr Stat 21(3):739–757. 10.1080/10618600.2012.679239 [DOI] [Google Scholar]
Teimourian M, Baghfalaki T, Ganjali M et al. (2015) Joint modeling of mixed skewed continuous and ordinal longitudinal responses: a bayesian approach. J Appl Stat 42(10):2233–2256. 10.1080/02664763.2015.1023557 [DOI] [Google Scholar]
Therneau Terry M, Grambsch Patricia M (2000) Modeling survival data: extending the cox model. Springer, New York [Google Scholar]
Toepp AJ, Schaut RG, Scott BD et al. (2017) Leishmania incidence and prevalence in us hunting hounds maintained via vertical transmission. Vet Parasitol Reg Stud Rep 10:75–81 [DOI] [PubMed] [Google Scholar]
Toepp AJ, Monteiro GR, Coutinho JF et al. (2019) Comorbid infections induce progression of visceral leishmaniasis. Parasit Vectors 12(1):1–12 [DOI] [PMC free article] [PubMed] [Google Scholar]
Varin C, Czado C (2009) A mixed autoregressive probit model for ordinal longitudinal data. Biostatistics 11(1):127–138. 10.1093/biostatistics/kxp042 [DOI] [PubMed] [Google Scholar]
Wang WL, Fan TH (2010) ECM-based maximum likelihood inference for multivariate linear mixed models with autoregressive errors. Comput Stat Data Anal 54(5):1328–1341. 10.1016/j.csda.2009.11.021 [DOI] [Google Scholar]
Wang WL, Fan TH (2012) Bayesian analysis of multivariate t linear mixed models using a combination of ibf and gibbs samplers. J Multivar Anal 105(1):300–310. 10.1016/j.jmva.2011.10.006 [DOI] [Google Scholar]
Wilhelm S, G MB (2015) tmvtnorm: truncated multivariate normal and student t distribution. R package version 1.4-10 [Google Scholar]
Wilks D (2005) Statistical methods in the atmospheric sciences. International Geophysics. Elsevier Science, Amsterdam [Google Scholar]
Woodbury M (1950) Inverting modified matrices. Department of Statistics, Princeton University, Princeton, Tech. rep [Google Scholar]
Wooldridge JM (2005) Simple solutions to the initial conditions problem in dynamic, nonlinear panel data models with unobserved heterogeneity. J Appl Econom 20(1):39–54. 10.1002/jae.770 [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Information PDF

NIHMS1893623-supplement-Supplementary_Information_PDF.pdf^{(716.9KB, pdf)}

[R1] Agresti A (2012) Categorical data analysis. Wiley series in probability and statistics. Wiley, Hoboken [Google Scholar]

[R2] Albert JH, Chib S (1993) Bayesian analysis of binary and polychotomous response data. J Am Stat Assoc 88(422):669–679. 10.1080/01621459.1993.10476321 [DOI] [Google Scholar]

[R3] Alessie R, Hochguertel S, van Soest A (2004) Ownership of stocks and mutual funds: a panel data analysis. Rev Econ Stat 86(3):783–796 [Google Scholar]

[R4] Alvar J, Vélez ID, Bern C et al. (2012) Leishmaniasis worldwide and global estimates of its incidence. PLoS One 7(5):e35671. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] Banerjee S, Carlin BP, Gelfand AE (2015) Hierarchical modeling and analysis for spatial data, 2nd ed. Chapman & Hall/CRC, London [Google Scholar]

[R6] Bürkner PC (2018) Advanced Bayesian multilevel modeling with the R package brms. R J 10(1):395–411. 10.32614/RJ-2018-017 [DOI] [Google Scholar]

[R7] Cappellari L, Jenkins SP (2008) The dynamics of social assistance receipt: measurement and modelling issues, with an application to Britain. OECD Social, Employment and Migration Working Papers 67, OECD Publishing, 10.1787/236346714741 [DOI] [Google Scholar]

[R8] Carpenter B, Gelman A, Hoffman MD et al. (2017) Stan: a probabilistic programming language. J Stat Softw Artic 76(1):1–32. 10.18637/jss.v076.i01 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] Catalno PJ (1997) Bivariate modelling of clustered continuous and ordered categorical outcomes. Stat Med 16(8):883–900. [DOI] [PubMed] [Google Scholar]

[R10] Chan JC, Grant AL (2016) On the observed-data deviance information criterion for volatility modeling. J Financ Econom 14(4):772–802 [Google Scholar]

[R11] Chappuis F, Sundar S, Hailu A et al. (2007) Visceral leishmaniasis: what are the needs for diagnosis, treatment and control? Nat Rev Microbiol 5(11):873–882 [DOI] [PubMed] [Google Scholar]

[R12] Chi EM, Reinsel GC (1989) Models for longitudinal data with random effects and ar(1) errors. J Am Stat Assoc 84(406):452–459 [Google Scholar]

[R13] Cowles MK (1996) Accelerating Monte Carlo Markov chain convergence for cumulative-link generalized linear models. Stat Comput 6:101–111 [Google Scholar]

[R14] Cowles MK, Carlin BP, Connett JE (1996) Bayesian tobit modeling of longitudinal ordinal clinical trial compliance data with nonignorable missingness. J Am Stat Assoc 91(433):86–98 [Google Scholar]

[R15] Cressie N, Wikle CK (2011) Statistics for spatio-temporal data. Wiley, Hoboken [Google Scholar]

[R16] Ding S, Cook RD (2014) Dimension folding pca and pfc for matrix-valued predictors. Stat Sin 24(1):463–492 [Google Scholar]

[R17] Duprey ZH, Steurer FJ, Rooney JA et al. (2006) Canine visceral leishmaniasis, United States and Canada, 2000–2003. Emerg Infect Dis 12(3):440–446 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R18] Epstein ES (1969) A scoring system for probability forecasts of ranked categories (1962–1982). J Appl Meteorol 8(6):985–987 [Google Scholar]

[R19] Feasey N, Wansbrough-Jones M, Mabey DCW et al. (2009) Neglected tropical diseases. Br Med Bull 93(1):179–200. 10.1093/bmb/ldp046 [DOI] [PubMed] [Google Scholar]

[R20] Gabry J, Mahr T (2021) Bayesplot: plotting for bayesian models. R package version 1.8.0 [Google Scholar]

[R21] Gelman A, Hill J (2007) Data analysis using regression and multilevel/hierarchical models. In: Vol analytical methods for social research. Cambridge University Press, New York [Google Scholar]

[R22] Gelman A, Rubin DB (1992) Inference from Iterative simulation using multiple sequences. Stat Sci 7(4):457–472. 10.1214/ss/1177011136 [DOI] [Google Scholar]

[R23] Gelman A, van Dyk DA, Huang Z et al. (2008) Using redundant parameterizations to fit hierarchical models. J Comput Gr Stat 17(1):95–122. 10.1198/106186008X287337 [DOI] [Google Scholar]

[R24] Gelman A, Carlin J, Stern H et al. (2013) Bayesian data analysis, 3rd ed. Chapman & Hall/CRC, Boca Raton [Google Scholar]

[R25] Gelman A, Hwang J, Vehtari A (2014) Understanding predictive information criteria for bayesian models. Stat Comput 24:997–1016 [Google Scholar]

[R26] Genton MG, Kleiber W (2015) Cross-covariance functions for multivariate geostatistics. Stat Sci 30(2):147–163. 10.1214/14-STS487 [DOI] [Google Scholar]

[R27] Ghasemzadeh S, Ganjali M, Baghfalaki T (2020) Bayesian quantile regression for joint modeling of longitudinal mixed ordinal and continuous data. Commun Stat Simul Comput 49(2):375–395. 10.1080/03610918.2018.1484482 [DOI] [Google Scholar]

[R28] Gueorguieva RV, Agresti A (2001) A correlated probit model for joint modeling of clustered binary and continuous responses. J Am Stat Assoc 96(455):1102–1112 [Google Scholar]

[R29] Hadfield JD (2010) Mcmc methods for multi-response generalized linear mixed models: the MCMCglmm R package. J Stat Softw 33(2):1–2220808728 [Google Scholar]

[R30] Hasegawa H (2009) Bayesian dynamic panel ordered probit model and its application to subjective well being. Commun Stat Simul Comput 38(6):1321–1347. 10.1080/03610910902903133 [DOI] [Google Scholar]

[R31] Heckman JJ (1981) The incidental parameters problem and the problem of initial conditions in estimating discrete time-discrete data stochastic process. In: Manski CF, McFadden DL (eds) Structural analysis of discrete data with econometric applications. The MIT Press, Cambridge, pp 179–195 [Google Scholar]

[R32] Heckman James J (1981) Statistical models for discrete panel data. Structural analysis of discrete data with econometric applications 114:178 [Google Scholar]

[R33] Jin X, Banerjee S, Carlin BP (2007) Order-free co-regionalized areal data models with application to multiple-disease mapping. J Royal Stat Soc Ser B (Stat Method) 69(5):817–838 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R34] Jones RH (2011) Bayesian information criterion for longitudinal and clustered data. Stat Med 30(25):3050–3056. 10.1002/sim.4323 [DOI] [PubMed] [Google Scholar]

[R35] Kang EL, Cressie N (2011) Bayesian inference for the spatial random effects model. J Am Stat Assoc 106(495):972–983 [Google Scholar]

[R36] LeishVet (2016) Clinical staging, treatment and prognosis. https://www.leishvet.org/fact-sheet/clinical-staging/

[R37] Li Q, Pan J, Belcher J (2016) Bayesian inference for joint modelling of longitudinal continuous, binary and ordinal events. Stat Methods Med Res 25(6):2521–2540. 10.1177/0962280214526199 [DOI] [PubMed] [Google Scholar]

[R38] Li ZR, McComick TH, Clark SJ (2020) Using bayesian latent gaussian graphical models to infer symptom associations in verbal autopsies. Bayesian Anal 15(3):781–807. 10.1214/19-BA1172 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R39] Liu JS, Wu YN (1999) Parameter expansion for data augmentation. J Am Stat Assoc 94(448):1264–1274 [Google Scholar]

[R40] MacNab YC (2016) Linear models of coregionalization for multivariate lattice data: order-dependent and order-free cmcars. Stat Methods Med Res 25(4):1118–1144. 10.1177/0962280216660419 [DOI] [PubMed] [Google Scholar]

[R41] Matyas L, Sevestre P (2008) The econometrics of panel data: fundamentals and recent developments in theory and practice, 3rd edn. Springer, Berlin [Google Scholar]

[R42] Meng XL, Dyk DAV (1999) Seeking efficient data augmentation schemes via conditional and marginal augmentation. Biometrika 86(2):301–320 [Google Scholar]

[R43] NCAR (2015) Verification: weather forecast verification utilities. R Package Vers 1:42 [Google Scholar]

[R44] Neale MC, Hunter MD, Pritikin JN et al. (2016) OpenMx 2.0: extended structural equation and statistical modeling. Psychometrika 81(2):535–549. 10.1007/s11336-014-9435-8 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R45] O’Malley AJ, Zaslavsky AM (2008) Domain-level covariance analysis for multilevel survey data with structured nonresponse. J Am Stat Assoc 103(484):1405–1418 [Google Scholar]

[R46] Petersen CA, Barr SC (2009) Canine Leishmaniasis in North America: emerging or newly recognized? Vet Clin North Am Small Anim Pract 39(6):1065–1074 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R47] Plummer M (2003) Jags: A program for analysis of bayesian graphical models using gibbs sampling [Google Scholar]

[R48] Plummer M, Best N, Cowles K et al. (2006) Coda: convergence diagnosis and output analysis for mcmc. R News 6(1):7–11 [Google Scholar]

[R49] Proverbio D, Spada E, Bagnagatti de Giorgi G et al. (2014) Relationship between leishmania ifat titer and clinicopathological manifestations (clinical score) in dogs. BioMed Res Int. 10.1155/2014/412808 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R50] Pudney S (2008) The dynamics of perception: modelling subjective wellbeing in a short panel. J Royal Stat Soc Series A (Stat Soc) 171(1):21–40 [Google Scholar]

[R51] R Core Team (2021) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna [Google Scholar]

[R52] Ribeiro RR, Michalick MSM, da Silva ME et al. (2018) Canine Leishmaniasis: an overview of the current status and strategies for control. Biomed Res Int. 10.1155/2018/3296893 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R53] Roy V (2020) Convergence diagnostics for markov chain monte carlo. Annu Rev Stat Appl 7(1):387–412. 10.1146/annurev-statistics-031219-041300 [DOI] [Google Scholar]

[R54] Schaut RG, Robles-Murguia M, Juelsgaard R et al. (2015) Vectorborne transmission of leishmania infantum from hounds, United States. Emerg Infect Dis 21(12):2209–2212. 10.3201/eid2112.141167 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R55] Schmidt AM, Gelfand AE (2003) A bayesian coregionalization approach for multivariate pollutant data. J Geophys Res Atmos. 10.1029/2002JD002905 [DOI] [Google Scholar]

[R56] Schuurman NK, Grasman RPPP, Hamaker EL (2016) A comparison of inverse-wishart prior specifications for covariance matrices in multilevel autoregressive models. Multivar Behav Res 51(2–3):185–206. 10.1080/00273171.2015.1065398 [DOI] [PubMed] [Google Scholar]

[R57] Solano-Gallego L, Cardoso L, Pennisi MG et al. (2017) Diagnostic challenges in the era of canine leishmania infantum vaccines. Trends Parasitol 33(9):706–717 [DOI] [PubMed] [Google Scholar]

[R58] Spiegelhalter DJ, Best NG, Carlin BP et al. (2002) Bayesian measures of model complexity and fit. J Royal Stat Soc Ser B 64(4):583–639 [Google Scholar]

[R59] Steele F, Grundy E (2021) Random effects dynamic panel models for unequally spaced multivariate categorical repeated measures: an application to child-parent exchanges of support. J Royal Stat Soc Ser C (Appl Statist) 70(1):3–23. 10.1111/rssc.12446 [DOI] [Google Scholar]

[R60] Stegmueller D (2013) Modeling dynamic preferences: a bayesian robust dynamic latent ordered probit model. Polit Anal 21(3):314–333 [Google Scholar]

[R61] Talhouk A, Doucet A, Murphy K (2012) Efficient bayesian inference for multivariate probit models with sparse inverse correlation matrices. J Comput Gr Stat 21(3):739–757. 10.1080/10618600.2012.679239 [DOI] [Google Scholar]

[R62] Teimourian M, Baghfalaki T, Ganjali M et al. (2015) Joint modeling of mixed skewed continuous and ordinal longitudinal responses: a bayesian approach. J Appl Stat 42(10):2233–2256. 10.1080/02664763.2015.1023557 [DOI] [Google Scholar]

[R63] Therneau Terry M, Grambsch Patricia M (2000) Modeling survival data: extending the cox model. Springer, New York [Google Scholar]

[R64] Toepp AJ, Schaut RG, Scott BD et al. (2017) Leishmania incidence and prevalence in us hunting hounds maintained via vertical transmission. Vet Parasitol Reg Stud Rep 10:75–81 [DOI] [PubMed] [Google Scholar]

[R65] Toepp AJ, Monteiro GR, Coutinho JF et al. (2019) Comorbid infections induce progression of visceral leishmaniasis. Parasit Vectors 12(1):1–12 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R66] Varin C, Czado C (2009) A mixed autoregressive probit model for ordinal longitudinal data. Biostatistics 11(1):127–138. 10.1093/biostatistics/kxp042 [DOI] [PubMed] [Google Scholar]

[R67] Wang WL, Fan TH (2010) ECM-based maximum likelihood inference for multivariate linear mixed models with autoregressive errors. Comput Stat Data Anal 54(5):1328–1341. 10.1016/j.csda.2009.11.021 [DOI] [Google Scholar]

[R68] Wang WL, Fan TH (2012) Bayesian analysis of multivariate t linear mixed models using a combination of ibf and gibbs samplers. J Multivar Anal 105(1):300–310. 10.1016/j.jmva.2011.10.006 [DOI] [Google Scholar]

[R69] Wilhelm S, G MB (2015) tmvtnorm: truncated multivariate normal and student t distribution. R package version 1.4-10 [Google Scholar]

[R70] Wilks D (2005) Statistical methods in the atmospheric sciences. International Geophysics. Elsevier Science, Amsterdam [Google Scholar]

[R71] Woodbury M (1950) Inverting modified matrices. Department of Statistics, Princeton University, Princeton, Tech. rep [Google Scholar]

[R72] Wooldridge JM (2005) Simple solutions to the initial conditions problem in dynamic, nonlinear panel data models with unobserved heterogeneity. J Appl Econom 20(1):39–54. 10.1002/jae.770 [DOI] [Google Scholar]

PERMALINK

Joint Bayesian longitudinal models for mixed outcome types and associated model selection techniques

Nicholas Seedorff

Grant Brown

Breanna Scorza

Christine A Petersen

Abstract

1. Introduction

2. Motivating application

2.1. Canine leishmaniosis (CanL)

2.2. Data specifics

Table 1.

Fig. 1.

3. Background and related work

3.1. Generalized linear mixed models

3.2. Multivariate mixed models and related methods

3.2.1. Scaled inverse-Wishart priors

3.3. Autoregressive models (mean structure)

3.4. Multivariate covariance structure approaches

3.5. Statement of contribution

4. Model definition and software implementation

4.1. Model definition

4.2. Priors

Fig. 2.

5. Forecasts

6. Simulation studies

Table 3.

6.1. Simulation: convergence and recovery

6.1.1. Convergence assessment

Fig. 3.

Fig. 4.

Table 2.

6.1.2. Parameter recovery

6.2. Simulation: forecast quality

Table 4.

Fig. 5.

6.3. Simulation: model selection

6.3.1. Likelihood and DIC

6.3.2. Likelihoods

6.3.3. Simulation studies

Table 5.

7. Application

Table 6.

Fig. 7.

Fig. 6.

7.1. Regression models

7.2. Convergence assessment

Fig. 8.

Fig. 9.

Table 7.

Fig. 10.

7.3. Model comparison

Table 8.

7.4. Results

Table 9.

Fig. 11.

Fig. 12.

8. Conclusions

8.1. Limitations and future work

Supplementary Material

Acknowledgements

Footnotes

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases