Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2023 Aug 30.
Published in final edited form as: Stat Med. 2022 May 20;41(19):3661–3678. doi: 10.1002/sim.9441

The Polytomous Discrimination Index for Prediction Involving Multistate Processes Under Intermittent Observation

Shu Jiang 1, Richard J Cook 2,*
PMCID: PMC9308735  NIHMSID: NIHMS1807290  PMID: 35596238

Summary

With the increasing importance of predictive modeling in health research comes the need for methods to rigorously assess predictive accuracy. We consider the problem of evaluating the accuracy of predictive models for nominal outcomes when outcome data are coarsened at random. We first consider the problem in the context of a multinomial response modeled by polytomous logistic regression. Attention is then directed to the motivating setting in which class membership corresponds to the state occupied in a multistate disease process at a time horizon of interest. Here, class (state) membership may be unknown at the time horizon since disease processes are under intermittent observation. We propose a novel extension to the polytomous discrimination index to address this and evaluate the predictive accuracy of an intensity-based model in the context of a study involving patients with arthritis from a registry at the University of Toronto Centre for Prognosis Studies in Rheumatic Diseases.

Keywords: Coarsening, classification, discrimination, intermittent observation, multistate processes, predictive model, risk scores

1. Introduction

1.1. Introduction to Prediction

Predictive modeling is of increasing importance in the era of personalized and stratified medicine1. Much of the early work on methods for assessing prediction accuracy involved continuous outcomes where performance metrics include the proportion of explained variation2 and the “leave-one-out” analogue called the PRESS statistic3, which better reflects out-of-sample performance. Any loss function can be specified of course, with the overall performance reflected by the expected loss. Harrell’s concordance measure, called the C index, is another popular measure of performance which is geared towards assessing discriminatory power of a predictive model4. With dichotomous outcomes, point prediction yields a predicted response on the same scale as the response, while probabilistic prediction uses an estimated probability of the response. Point prediction has considerable appeal in medical research; discrimination measures based on misclassification rates and receiver operating characteristic curves are within this context5,6, where the latter reflects of the utility of an underlying risk score for the classification of individuals. Since probabilistic prediction involves the use of estimated probabilities, the Brier score is natural to use when assessing predictive performance in this setting7. Extensions of these measures have been proposed for right-censored data, where the goal is typically set to predicting the event status (failed/not failed) at some time horizon; due to censoring these extensions typically involve either imputation or use of inverse probability of censoring weights to address the fact that the failure status may be unknown for some individuals due to right-censoring8,9.

Two general approaches are adopted for measuring predictive performance to deal with polytomous outcomes; the hypervolume under the ROC manifold (HUM)10 and the polytomous discrimination index (PDI)11,12. The former is a generalization of the area under the receiver operator characteristic curve. Consider a nominal outcome with K potential categories and a randomly selected K-tuple with one individual from each class. The HUM is the probability that the outcomes of all K individuals in this K-tuple are correctly classified. The term volume under the surface (VUS) is often used when the outcome has three categories13,14, where the term HUM is used when K > 3. The polytomous discrimination index (PDI) for a particular category k is the probability that the subject in category k from a random K-tuple is correctly assigned to that category. This can be computed for each category k, and when these are averaged over all K categories an overall PDI is obtained.

We consider the challenge of measuring predictive performance based on multistate models for chronic disease processes which can be naturally characterized in terms of distinct stages15. Markov models are used routinely in such settings and considered here, where transition intensities are modulated by multiplicative covariate effects16,17,15. We suppose interest lies in predicting state occupancy at a particular time horizon based on a fitted multistate model. Examples of such problems are numerous, including prediction of nosocomial infections in the ICU and patient outcomes18, and prediction of outcomes following bone marrow transplantation19. Putter et al.20 report on a detailed analysis founded on a five-state model used to characterize disease course in breast cancer patients following surgery. States included on occupied when no events have occurred, and ones representing local recurrence, distant metastases, both local recurrence and distant metastases, and death; these authors also discuss the utility of multistate modelling for making predictions conditional on any observed history. More recently, Spitoni et al.21 discuss Brier score and Kullback-Leibler type loss functions for predictions based on multistate process along with methods for estimating the corresponding expected loss functions under right-censoring – they then discuss extensions accommodating dynamic prediction.

The setting of interest involves a registry of patients attending a rheumatology clinic for periodic health assessments – at these clinic visits information is collected on the disease state; see Section 1.2 for full details of the motivating study. Specifically we consider the problem of predicting state occupancy at a specified time horizon based on data arising from intermittent observation of a continuous-time multistate processes. Intermittent observation of the processes makes it challenging to estimate the prediction accuracy of a model since it may not be known which state is occupied by some individuals at the time horizon of interest. We propose a novel extension to the polytomous discrimination index to accommodate such an intermittent observation scheme wherein the state occupied may be unknown for a subset of individuals in the validation sample. Our motivating application, described in detail under Section 1.2, involves the prediction of sacroiliac joint damage in patients in a psoriatic arthritis clinic, but there are many other clinical settings where this problem arises. In osteoporosis, for example, individuals are at risk of fractures (detected upon radiographic examination) due to weakened integrity of the bone. The development and evaluation of predictive models must deal with the fact that fracture status may be observed intermittently and so the event status at a particular time horizon may be unknown. Similarly in breast cancer prevention studies, individuals free of breast cancer are typically recruited but some may develop ductal carcinoma in situ, and ultimately progress to invasive breast cancer22; multistate processes can be used effectively to model this progression. Since individuals in prevention studies are screened periodically (i.e. annually or biannually) states are only known at the intermittent observation times of the disease process.

As a preliminary investigation we considered a simplified version of the problem involving a simple categorical (polytomous) response with analyses based on a multinomial regression model. In this setting we consider the problem where the validation sample does not report the categorical response for all individuals, but rather may simply indicate a set of possible categorical responses. We use the term coarsening to describe the general phenomenon whereby there is a loss of information about measurements; Heitjan and Rubin23 define coarsening to include “as special cases rounded, heaped, censored, partially categorized and missing data”. Thus it is a slightly more general concept than what comes to mind from the term “missing data”, and we use it here to encompass both the case where a set of possible categorical responses is reported rather than a single category, and the case where information is incomplete on state occupancy for a multistate process due to intermittent observation of a continuous-time process.

The remainder of the paper is organized as follows. In Section 1.2, we describe the motivating problem involving data from the University of Toronto Psoriatic Arthritis Cohort where the goal is to predict different forms of sacroiliac joint damage in patients with psoriatic arthritis. In Section 2, we review the polytomous discrimination index for multinomial data and extend it to deal with coarsened data. The purpose of this section is to explore the impact of coarsening and methods for dealing with it in a simplified setting via multinomial representation. In Section 3, we introduce notation for the analysis of multistate processes under intermittent observation, and use the framework proposed in Section 2 to deal with prediction of the state occupied at a specified time horizon. Simulation studies are carried out to investigate the finite sample performance of the proposed estimation procedure under both the multinomial (Section 2) and the multistate (Section 3) settings. Section 4 involves an application to the motivating data where we use human leukocyte antigens to predict the presence of unilateral sacroiliac joint damage or axial disease in patients with psoriatic arthritis and estimate the PDI as a function of time. Concluding remarks and topics for future research are given in Section 5.

1.2. Prediction of Sacroiliac Involvement in Psoriatic Arthritis

The motivating problem arose in a collaboration with researchers at the Centre for Prognosis Studies in Rheumatic Disease in the University Health Network at the University of Toronto. These researchers maintain the Psoriatic Arthritis Cohort (UTPAC), founded in 1976 and now comprised of approximately 2000 patients. Upon recruitment, individuals provide biospecimens which are used for genetic testing and for proteomic analysis. Recruited patients are scheduled for annual clinic examination and biannual radiographic examination of joint damage. The broad aim of this cohort is to provide a platform for study of the clinical and radiological disease course, and to identify genetic and other types of risk factors for high disease activity and rapid progression of joint damage.

There has been considerable discussion in the rheumatology literature in recent years regarding the nature of spinal involvement in individuals with psoriatic arthritis. Ankylosing spondylitis is an arthritis condition with axial sacroiliac joint involvement, whereas other arthritic conditions tend to the development of unilateral sacroiliac joint damage. The aim of the current study is to predict spinal involvement in individual patients, and more specifically whether patients are likely to experience unilateral damage of the sacroiliac joints, or bilateral damage. The latter represents axial disease which is associated with greater pain and mobility impairment. Accurate prediction of axial disease is important as those at high risk may be given more intensive, potentially toxic and expensive, preventative therapy. Prediction of unilateral sacroiliac damage is also important ; researchers speculate that this may represent a distinct disease process.

We adopt a multistate model for the analysis. Figure 1(a) shows a simple four state model that can be used to characterize the onset of unilateral (left or right side) sacroiliac joint damage, as well as axial disease. The extent of joint damage is assessed using the New York Radiological Grading Criteria24 with damage defined here as grade 2 or higher. An individual free of sacroiliac damage makes a 0 → k transition upon the onset of grade 2 or higher damage in the left (k = 1) or right (k = 2) sacroiliac joint, and enters state 3 upon the development of grade 2 or higher damage in their second sacroiliac joint. Figure 1(b) shows sample data for six individuals in the UTPAC with the length of the lines representing how long they were under follow-up, and the vertical ticks representing visits at which x-rays are taken and damage can be assessed. We note that no transition times in Figure 1(a) are observed due to intermittent radiological assessments and hence we only know the state occupied at the times radiological assessments are made. Different line types in Figure 1(b) are used to depict the different states of sacroiliac damage; gaps are used to denote periods when the damage state is unknown.

FIGURE 1.

FIGURE 1

Four state diagram for the onset of sacroiliac damage in psoriatic arthritis (panel (a)) and six sample timelines depicting data obtained from follow-up visits information on sacroiliac damage is acquired in patients from the University of Toronto Psoriatic Arthritis Cohort (panel (b)); gaps in timelines of panel (b) reflect periods during which the state is unknown.

2. Prediction with Multinomial Outcomes

2.1. The Polytomous Discrimination Index

Before considering the challenges involved in prediction and multistate processes under intermittent observation, we consider an analogous setting involving a coarsened multinomial random variable Y taking on the values 0, 1, 2, … , K. If X = (X1, … , Xp)′ is a p×1 covariate vector, let P(Y = kX) = πk(X), k = 1, … , K and P(Y=0X)=1k=1Kπk(X). Consider a multinomial regression model of the form

log(πk(X)π0(X))=X¯βk=ηk,k=1,,K, (1)

where X¯=(1,X), βk = (βk0, … , βkp)′ is (p + 1) × 1 vector of regression coefficients, k = 1, 2, … , K, and β=(β1,,βK) is K(p + 1) × 1 vector. We let ηk=X¯βk be a linear predictor associated with outcome k in (1), and η = (η1, … , ηK)′ denote the K × 1 vector of linear predictors in which we suppress the notation for the dependence on X. For the purpose of predictive modeling we refer to η1, … , ηK as risk scores and η as the multivariate risk score with dimension K. We initially assume that β is known to focus on estimation of the PDI in an idealized setting, but investigate properties when β is estimated in Section 2.3. We consider the setting where the prediction for an individual with X = x is Y^=argmaxk{πk(x),k=0,,K}. That is, the predicted outcome is the class with the highest probability of occurrence given X = x; alternative prediction rules can be adopted if additional costs or utilities are specified but in the absence of these we adopt this standard practice6.

Let P denote a population of interest and Pk={i:iP,Yi=k} the sub-population of individuals in class k, k = 0, 1, … , K. We then let i = {i0, i1, … , iK} denote a (K + 1)-tuple of individuals from P wherein ikPk, k = 0, 1, … , K, and let P(K+1)={i:ijPj,j=0,1,,K} be the set of all possible such (K + 1)-tuples. Next we let {Xi0, Xi1, … , XiK} denote the random set of covariate vectors associated with the (K + 1)-tuple iP(K+1). Based on (1), let

πk(xij)=exp(X¯ijβk)1+l=1Kexp(X¯ijβl), (2)

be the conditional probability of a response in class k given covariate xij which we denote more compactly as πijk in what follows. The polytomous discrimination index (PDI) for category k, denoted by Δk, is defined as the probability that, among the individuals in a randomly selected (K + 1)-tuple, the individual in class k is assigned to class k. To define this we first let

Ak(i)=I(πikk>πijk,jk,j=0,,K) (3)

indicate such an assignment for iP(K+1). Then

Δk=E{Ak(i)}, (4)

where the expectation is over {Xi0, Xi1, … , XiK}. The dimension of the integration necessary to compute (4) can be reduced by working with the multidimensional risk scores since η represents a sufficient dimension reduction from X if K < p. Strong assumptions for the multivariate covariate distribution are required to compute Δk, so it is more commonly estimated empirically as we next describe.

Here we consider a validation sample S comprised of n independent individuals drawn at random from a target population for which the prediction model is to be applied. Let Sk={i:iS,Yi=k} be the subset of individuals in the validation sample who are known to be in class k, where ∣Sk∣ = nk, k = 0, 1, … , K. In what follows i = (i0, … , iK) is a vector of labels for a (K + 1)-tuple of individuals constructed from the validation sample with ikSk, k = 0, 1, … , K; we let T={i:ikSk,k=0,1,,K} denote the set of all k=0Knk possible (K + 1)-tuples based on the validation sample. An estimating function for Δk can then be defined as,

Uk(Δk)=iT(Ak(i)Δk), (5)

with solution

Δ^k=1k=0KnkiTAk(i), (6)

k = 0, 1, … , K. The overall polytomous discrimination index Δ is defined as the simple average of the category-specific measures:

Δ^=1Kk=0KΔ^k. (7)

2.2. Estimation with Coarsened Validation Data

We now consider the case in which the true class membership is unknown for some individuals in the validation samples due to coarsening. Methods for dealing with coarsened data are well-developed and we do not consider the formation of a predictive model but rather how to evaluate predictive accuracy in terms of the polytomous discrimination index when responses are only known to be in one of a set of classes. Let Ci be the coarsened response for individual i where, if K = 2 for example, Ci{0,1,2,(0,1),(0,2),(1,2),(0,1,2)} and the first three elements 0, 1 and 2 are realized when there is no coarsening. We omit the noninformative outcomes defined as those that have probability one (e.g. Ci=(0,1,2) when K = 2).

As in Section 2.1 we use a subscript on individual labels to denote the class they are in, but here we introduce a superscript p to indicate that these may be pseudo-individuals who are conceptualized to represent the possible class membership of an individual whose response is coarsened. For example if Ci=(j,k), then there are two pseudo-individuals associated with individual i, with one pseudo-individual assigned to class j and another to class k; we label these pseudo-individuals ijp and ikp respectively but note that the values of ijp and ikp are equal to i — the subscript represents the class considered for a particular allocation. We further let Skp={i:iS,kCi} be the set of individuals who are known to be in class k (i.e. if Ci=k) or may be in class k (i.e. if kCi), k = 0, 1, 2. To unify the notation, we label all individuals in Sk by ikp whether it is known that Yi = k or we simply know YkCi. Note that I(Ci=1) indicates the outcome for individual i is observed precisely in which case ikp=i if Yi = k. If Ci=(j,k), then there is a pseudo-individual ijpSjp and a pseudo-individual ikpSkp. More generally if Ci=m, then there will be m pseudo-individuals corresponding to individual i, with each one belonging to one of m different sets Slp, lCi. Following this construction we let Tp={ip:ikpSkp,k=0,1,,K} denote the set of all possible (K + 1)-tuples based on the pseudo-individuals conceptualized corresponding to the coarsened validation sample. We assume coarsening at random in the sense of Heitjan and Rubin23.

Let

wik=P(Yi=kCi,Xi)=P(Yi=kXi)jCiP(Yi=jXi), (8)

be the conditional probability individual i is in class k given their coarsened response Ci, k = 0, 1, 2. If Ci=k then wik = 1 and wij = 0 for j ≠ k, k = 0, 1, 2. We let ip=(i0p,i1p,,iKp) represent a (K + 1)-tuple of individuals or pseudo-individuals where ijpSjp and let D(ip)=I(ijpikp,jk,j,k=0,1,,K) is the indicator that this (K + 1)-tuple of potential pseudo-individuals is comprised of distinct real individuals.

Next we let

Ak(ip)=I(πikpk>πijpk,jk,j=0,,K) (9)

be the indicator that the (pseudo) individual from class k in ip has highest predictive probability of being in class k, and define the estimating function for Δk as,

U¯k(Δk)=ipTpD(ip){w(ip)(Ak(ip)Δk)}, (10)

where wi(ip) is the product wi0p0wi1p1wi2p2. Note that in the absence of coarsening,

SjpSkp=forjk, (11)

D(ip) = 1 and w(ip) = 1, and we retrieve the standard estimator of Δk given in equation (6). More generally, we obtain

Δ^k=ipTpD(ip)w(ip)Ak(ip)ipTpD(ip)w(ip), (12)

and we again estimate the overall polytomous discrimination index as Δ^=k=0KΔ^k(K+1).

2.3. Simulation Studies Involving Coarsened Multinomial Data

Here we report on the results of simulation studies in which we focus on estimation of the PDI measure described in Sections 2.1 and 2.2. We consider three classes (K = 2) in this simulation study and express the class probabilities given the covariates as in (1). We set p = 2 and adopt a covariate model with X ~ BV N(μ, Σ) with μ = (μ1, μ2)′ and

Σ=(σ12ρσ1σ2ρσ2σ1σ22); (13)

we set μ = (0, 0)′ and σ1 = σ2 = 1 and ρ = 0.4. The regression coefficients satisfy (β21, β22)′ = (2β11, 2β12)′ and we set (β11, β12)′ = (log 1.5, log 2)′ to represent moderate covariate effects, and (β11, β12)′ = (log 3, log 4)′ for stronger covariate effects. The intercepts β10 and β20 are chosen to give pre-specified marginal probabilities P(Y = 0) = 0.4, P(Y = 1) = 0.4 and P(Y = 2) = 0.2.

To avoid high dimensional integration in (4) to determine Δk, k = 0, 1, 2 and Δ, we use Monte Carlo methods to approximate them by simulating a dataset of one million individuals with complete covariate values and class membership; the resulting numerical values are reported under the column headed “Value” in Table 1. To examine the validity of the weighted estimating function for coarsened data, we consider two approaches to estimation of Δk based on (12). As a first pass, we use the true β value and directly evaluate the PDI in validation samples of 500 individuals. The second approach is to estimate β from an independent training sample of 1000 individuals with coarsening, using an EM algorithm25 for estimation and use the resulting estimate of β to estimate the PDI in a validation sample of 500 individuals. Here if D is the set of indices labeling individuals in the training data and coarsening is at random, the maximum likelihood estimate β^ maximizes the observed data loglikelihood

(β)=iDlogP(YiCixi)

where P(YiCixi)=kCiπk(xi;β). We then use A^k(ip) obtained from (9) with the estimate of β used to estimate the classification probabilities. The third approach involves estimating β based on a complete case analysis and using the resulting estimate β~ to compute the PDI. All three approaches are evaluated under varying degree of coarsening: 0%, 30%, and 60% where the percentages correspond to the probabilities of coarsening in the sample. In the absence of coarsening β^=β~ so there is only one set of results for this setting. We specify CiYiYiCi, Xi for coarsening at random and generate the coarsened data such that P(Ci1)=0.3 for moderate coarsening and 0.6 for more severe coarsening, i = 1, … , n. If Ci=2 and Yi = 1, for example, we consider Ci=(Yi,j) with j = 0 or j = 2 to define the possible coarsening for this individual; we choose Ci=(0,1) or Ci=(1,2) with equal probability.

TABLE 1.

Empirical performance of estimates of Δk, k = 0, 1, 2 and Δ with no, moderate (30%) and heavier (60%) coarsening; the ASE is the average of bootstrap standard errors based on 500 bootstrap samples created for each simulated data; the ECP is the empirical coverage probability of nominal 95% confidence intervals constructed based on the normal approximation of the estimator using the bootstrap standard error while ECP is the corresponding empirical coverage probability when the confidence intervals are constructed based on the logit transformation; training samples are of 1000 observations; validation samples involve 500 individuals; nsim = 1000.

Percentage of Individuals with Coarsened Observations
0%
30%
60%
Parameter Value Method EST ESE ASE ECP ECP EST ESE ASE ECP ECP EST ESE ASE ECP ECP
MODERATE COVARIATE EFFECTS; (β11, β12)′ = (log 1.5, log 2)′, (β21, β22)′ = (2β11, 2β12)′
Δ0 0.670 True 0.665 0.028 0.029 0.967 0.956 0.665 0.024 0.024 0.949 0.943 0.663 0.022 0.023 0.961 0.954
EM 0.665 0.028 0.030 0.961 0.956 0.665 0.030 0.031 0.950 0.953 0.666 0.034 0.032 0.943 0.942
CC 0.666 0.032 0.033 0.954 0.941 0.669 0.045 0.044 0.947 0.949
Δ1 0.404 True 0.405 0.032 0.032 0.953 0.954 0.402 0.029 0.030 0.960 0.951 0.400 0.024 0.023 0.944 0.952
EM 0.406 0.033 0.031 0.926 0.930 0.407 0.031 0.030 0.941 0.938 0.411 0.031 0.029 0.934 0.946
CC 0.408 0.037 0.037 0.952 0.950 0.418 0.052 0.051 0.947 0.947
Δ2 0.677 True 0.663 0.034 0.033 0.944 0.946 0.663 0.037 0.038 0.958 0.949 0.662 0.025 0.026 0.959 0.951
EM 0.664 0.034 0.032 0.934 0.941 0.665 0.037 0.036 0.958 0.954 0.665 0.045 0.044 0.942 0.945
CC 0.664 0.043 0.044 0.957 0.952 0.671 0.056 0.056 0.950 0.955
Δ 0.583 True 0.578 0.023 0.022 0.951 0.957 0.576 0.020 0.019 0.941 0.949 0.575 0.018 0.019 0.955 0.954
EM 0.578 0.031 0.030 0.962 0.957 0.579 0.033 0.031 0.963 0.953 0.580 0.037 0.035 0.940 0.951
CC 0.579 0.037 0.036 0.947 0.948 0.586 0.040 0.039 0.954 0.943
STRONG COVARIATE EFFECTS; (β11, β12)′ = (log 3, log 4)′, (β21, β22)′ = (2β11, 2β12)′
Δ0 0.828 True 0.831 0.021 0.021 0.952 0.949 0.832 0.018 0.019 0.961 0.953 0.832 0.016 0.016 0.944 0.956
EM 0.832 0.021 0.022 0.955 0.952 0.832 0.022 0.023 0.958 0.956 0.833 0.025 0.026 0.948 0.959
CC 0.833 0.024 0.023 0.953 0.942 0.834 0.032 0.031 0.959 0.945
Δ1 0.553 True 0.566 0.034 0.033 0.943 0.951 0.565 0.031 0.032 0.948 0.956 0.564 0.028 0.028 0.956 0.947
EM 0.567 0.033 0.033 0.946 0.942 0.569 0.034 0.032 0.942 0.951 0.569 0.036 0.035 0.948 0.954
CC 0.567 0.039 0.038 0.948 0.941 0.574 0.053 0.053 0.942 0.956
Δ2 0.821 True 0.825 0.026 0.027 0.962 0.957 0.825 0.023 0.023 0.944 0.958 0.823 0.020 0.019 0.943 0.948
EM 0.825 0.025 0.026 0.964 0.951 0.826 0.028 0.029 0.960 0.954 0.826 0.033 0.033 0.946 0.950
CC 0.825 0.032 0.033 0.943 0.952 0.828 0.040 0.040 0.959 0.952
Δ 0.734 True 0.741 0.021 0.020 0.947 0.953 0.741 0.019 0.019 0.945 0.952 0.740 0.016 0.017 0.948 0.957
EM 0.742 0.026 0.027 0.954 0.954 0.743 0.028 0.029 0.946 0.950 0.743 0.031 0.030 0.946 0.951
CC 0.742 0.030 0.030 0.951 0.957 0.745 0.035 0.034 0.943 0.945

Prediction is based on true parameter values (True), as well as maximum likelihood estimates based on an expectation-maximization algorithm (EM) and complete-case analysis (CC); the “EM estimate” is the usual MLE obtained by fitting a standard multinomial regression model when there is no coarsening.

The results of the simulation study involving 1000 replicates are displayed in Table 1 where the mean estimate is reported under EST and we provide the empirical standard error (ESE), the average bootstrap standard error from 500 bootstrap samples (ASE), the empirical coverage probability of confidence intervals constructed directly on the scale of the PDI (ECP), and the corresponding ECP for confidence intervals constructed based on the logit transformation of the polytomous discrimination indices (ECP). The proposed weighted estimating function yields estimators with low empirical bias for all settings with this good performance maintained for the higher degrees of coarsening. We also see that when the training sample involves coarsened data and an EM algorithm is used for estimation25, there remains small empirical bias; the empirical standard error of the estimators increases with the increased degree of coarsening. The empirical bias of the estimator based on a complete case analysis is very small, as one would expect with data coarsened at random, but the associated empirical standard error is greater. Also as expected, there is also a larger PDI with stronger covariate effects. Additional simulation studies with 500 and 2000 individuals in the training sample lead to similar conclusions – these are reported on in Section S1.1 of the Supplemental Material.

3. Prediction with Multistate Processes Under an Intermittent Observation Scheme

3.1. Notation and model formulation

We now consider a multistate disease process with K + 1 states labeled 0, 1, … , K where K is an absorbing state. Let Z(t) denote the state occupied at time t and {Z(s), 0 < s} denote the associated stochastic process. We consider a q2 × 1 covariate vector X and let H(t)={Z(s),0<s<t;X} denote the history of the process at time t. The stochastic nature of the multistate process can be fully characterized via the transition intensities15 for all pairs of states where,

limΔt0P(Z(t+Δt)=lZ(t)=k,H(t))Δt=λkl(tH(t)), (14)

for k, l = 0, 1, … , K, kl, where t denotes an infinitesimal amount of time before t. For simplicity we assume that the same vector of covariates are used to model all transition intensities and restrict attention to Markov processes for which covariates act multiplicatively on baseline transition intensities via

λkl(tH(t))=λkl(t)exp(Xβkl),

with βkl = (βkl1, … , βklq2)′ a q2 × 1 vector of regression coefficients for kl transitions17. If αkl is a q1 × 1 parameter vector indexing λkl(t) and q = q1 + q2, then the kl transition intensity is indexed by the q × 1 parameter vector θkl=(αkl,βkl) and θ is the full vector containing all θkl for kl = 0, 1, … , K. We adopt a common dimension for the parameters indexing the different baseline intensities, but this is for notational convenience and can be easily relaxed.

In what follows we consider a four state process, illustrated in Figure 1 (a), with 0 → 1, 0 → 2, 1 → 3, and 2 → 3 transitions possible. The 4 × 4 transition probability matrix P(s,tx) can be computed by product integration as discussed in Section 2.2 of Cook and Lawless15. We let Q(tx) denote the 4 × 4 matrix of cumulative transition intensities

Q(tx)=(Λ01(tx)Λ02(tx)Λ01(tx)Λ02(tx)00Λ13(tx)0Λ13(tx)00Λ23(tx)Λ23(tx)0000),

where Λkl(tx)=0tλkl(sx)ds. Then let dQ(tx) be a 4 × 4 matrix with (k, l) entry dΛkl(tx) = λkl(tx)dt if kl and −dΛ(tx) if k = l with “·” representing summation over the corresponding index. Then if I is a 4 × 4 identity matrix the transition probability matrix P(s,tx) with (k, l) entry

pkl(s,tx)=P(Z(t)=lZ(s)=k,x),

is obtained by

P(s,tx)=(s,t]{I+dQ(tx)}.

We now consider a sample of n individuals under intermittent observation labeled i = 1, … , n and let 0 = ai0 < ai1 < ⋯ < aimi denote the mi visit times at which data are available for individual i. To formalize this observation process, we let τi denote a loss to follow-up time and Yi(t) = I(tτi) indicate that individual i is still on study. We also let d Ai(s) = 1 if individual i has a visit at time 5 with d Ai(s) = 0 otherwise. Let Ai(t)=0tYi(s)dAi(s) record the cumulative number of visits over (0, t], and let {Ai(s), 0 < s} denote the counting process for visits which is terminated upon censoring. We let Zi(ai0) = 0 with probability 1 for ai0 = 0, and let the observed process history be denoted by H¯i(t)={Yi(s),dAi(s),0<s<t,(Zi(aim),air),r=0,1,,Ai(t)}. We assume a conditionally independent and non-informative censoring time and that the visit process is conditionally independent in the sense of Cook and Lawless15,26 – this is akin to the sequential missing at random assumption characterized by Hogan27 for longitudinal data with drop-outs. Then under a Markov model the observed data partial likelihood is

L(θ)=i=1nr=1miP(Zi(air)Zi(ai,r1),Xi), (15)

and maximum partial likelihood estimates can be obtained by a Fisher-scoring algorithm28 or direct maximization using the msm function29. This is a brief discussion of the likelihood construction for Markov processes under intermittent observation – predictive models can be constructed based on this likelihood by simple model fitting if covariates are specified, or through use of penalization. Our primary interest however, is to discuss assessment of predictive accuracy based on any particular predictive model based on a validation sample. We describe how to do this in the next section.

3.2. Estimating predictive accuracy with coarsened multistate data

We now consider the problem in which interest lies in predicting state occupancies for an individual at a time horizon denoted by to > 0. If we consider a 4-tuple i = (i0, … , i3) where individual ij is in state j at to, let πijk(to) = p0k(0, toXij) be the conditional probability that an individual with their covariate vector is in state k at to. We then define

Ak(i;t)=I(πikk(t)>πijk(t),jk,j=0,,3) (16)

and define the polytomous discrimination index for category k at to as Δk(to) = E[Ak(i; to)} where again the expectation is take over the distributions of the covariate vectors for members of the 4-tuple.

Since the continuous-time multistate disease process is under intermittent observation, the state occupied at to will be unknown for individuals who were censored in a transient state prior to to and those whose recorded states at visits immediately before and after to differ. The observed data for individual i is denoted by H¯i() with the key elements being Di={(Zi(air),air),r=Ai(t),Ai(t)+1,Xi} under the assumptions of Section 3.1. Note that if Ai(t)=mi then we let ai,mi+1 = ∞ and Zi(ai,mi+1) = 3. As in Section 2.2, we use a subscript to label individuals according to the state they are in at to with the superscript p used to indicate that these may be pseudo-individuals, conceptualized to represent the represent all possible states occupied by an individual when their true state is unknown.

The weight wij(t)=P(Zi(t)=jDi) under the multistate process is then given by,

wij(t)=P(Zi(t)=jZi(aiAi(t)),Xi)P(Zi(ai,Ai(t)+1)Zi(t)=j,Xi)P(Zi(ai,Ai(t)+1)Zi(aiAi(t)),Xi). (17)

We use Skp(t) to denote the set of pseudo-individuals who may occupy state k at to, k = 0, 1, 2, 3. Moreover, we let ip=(i0p,i1p,i2p,i3p) represent a 4-tuple of individuals or pseudo-individuals where ijpSjp(t) and let Tp(t)={ip:ikpSkp(t),k=0,1,2,3} denote the set of all possible 4-tuples at to. We let πikpk(t)=p0k(0,tXikp) and define

Ak(ip;t)=I(πikpk(t)>πijpk(t),jk,j=0,,3) (18)

as the indicator that the (pseudo) individual from class k among the 4-tuple has highest predictive probability of being in class k. We then define the weighted estimating function for Δk(to) as,

U¯k(Δk(t);t)=ipTpD(ip;t){wi(ip;t)(Ak(ip;t)Δk(t))}, (19)

where D(ip;t)=I(ipTp(t),ijpikp,jk,j,k=0,1,2,3) is the indicator that this 4-tuple is comprised of distinct individuals and wi(ip;t)=k=03wikpk(t). The estimate Δ^k(t) is the solution to D(ip;t)=0 and an overall PDI denoted by Δ^(t) can be estimated by averaging the K + 1 category-specific PDI values obtained at to.

3.3. Simulation Studies Involving Multistate Processes

To mimic the data from the motivating study, we specify the parameter setting as follows. The transition intensities are set under the constraint λ01 = λ02 so that the baseline intensity for the onset of unilateral damage is the same for the left and right SI joints, and λ13 = λ23 = 2λ01 so that the intensity for the onset of axial disease is twice as high as it was for the onset of unilateral damage. We consider covariates X = (X1, X2)′, with X ~ BV N(μ, Σ), where μ = (μ1, μ2)′ and Σ in a similar fashion as (13) with μk = 0, σk2=1, k = 1, 2, and ρ = 0.5. We set P(Z(to) = 3∣Z(0) = 0; X = 0) = 0.5 at to = 1, so the prevalence of axial disease at time to is 0.5 among individuals with X1 = X2 = 0. Further, we set β01 = β02 so that the covariates have the same effect for the onset of unilateral damage, and β13 = β23 so that the covariates have the same effect for the onset of axial disease among those individuals with unilateral damage. Specifically, we let β01 = β02 = (log 1.5, log 2.0)′ and β13 = β23 = 01, where R = 0.25, 0.5, 1.0, and 2.0. Given the covariates and transitional intensities, we can then generate the multistate data Z(s), 0 < s < 2∣x.

For the visit process we consider the follow-up period of 2 units duration, and set the time horizon for prediction to to = 1. We adopt a time homogeneous Poisson process for the visit times with rate ρ = 5 or 10 giving A(2) ~ Poisson(mean 10 or 20) with ai1 < ⋯ < aimi the realized visit times. Given the intermittent observation process, the observed data is thus composed of {(aim, Zi(aim)), m = 0, 1, … mi, Xi}, for i = 1, … , n. In line with our motivating application, we are interested in estimating the prediction accuracy for state occupancy for unilateral damage (states 1 or 2), regardless of sides, as well as the state occupancy for axial disease (state 3). This results in three estimates of the PDI at time to. Note that although we are collapsing the unilateral damage states together when estimating the PDI, the associated parameters with multistate process are estimated under the general 4-state model to allow flexibility.

As in the multinomial setting we considered two approaches for estimating Δk(to). The first approach treats the full parameter vector θ as fixed at the true value and directly evaluates the PDI in a validation sample of 500 individuals. The second approach estimates θ from an independent training sample of size 1000 by maximizing the log likelihood in (15), and then evaluates the PDI in a validation sample of size 500. To visualize the setting under R = 0.25, 0.5, 1 and 2, we have constructed the empirical contour plots of the joint density of the covariates conditional on class membership, displayed in Figure 2. This contour plot is created by simulating a random sample of 10,000 individuals where it gives a sense of the separation of the covariate distribution between the three different classes. A complete table of simulation results repeated over 1000 simulation runs can be found in Table 2. Similar to the multinomial setting, the column named ‘Value’ corresponds to the true PDI value at time to estimated by Monte Carlo. As in Section 2, we report the mean estimate under the column headed EST, and provide the empirical standard error (ESE) and average bootstrap standard error based on 500 bootstrap samples (ASE); the empirical coverage probability of nominal 95% confidence intervals computed on the scale of Δk is reported under ECP while ECP reports the corresponding empirical coverage probability for confidence intervals are constructed based on the logit transformation. We can see that the proposed weighted estimating function gives estimates with small empirical bias. Notably, the empirical standard error of the estimators of Δk(to) and Δ(to) are only modestly affected by the estimation of θ. Thus, the increase in frequency of the visit process also had modest effect on the empirical bias and standard error. As R increased from 0.25 to 2, giving a stronger covariate effect of transitioning from the unilateral damage state to axial state, we see a big increase in Δ2(to). This finding is in accordance with the empirical contour plots in Figure 2. Additional simulation studies with n = 500 and 2000 in the training sample retain similar conclusions as what we presented here. These results are in Section S1.2 of the Supplemental Material.

FIGURE 2.

FIGURE 2

Contour plots of the empirical covariate distributions by class membership for the four simulation scenarios under the multistate processes with random samples of 10,000 individuals.

TABLE 2.

Finite sample properties of estimates of Δk(to), k = 0, 1, 2 and Δ(to) with to = 1, for different values of R with an average of 10 or 20 visits over the period (0, 2]. The ASE is approximated with 500 bootstrap samples with replacement within each simulated data. Note that ECP is the empirical coverage probability of nominal 95% confidence intervals constructed based on the normal approximation of the estimator using ASE as the standard error while ECP is the corresponding empirical coverage probability when the confidence intervals are constructed on using the logit transformation; training samples are of 1000 observations; validation samples involve 500 individuals, nsim = 1000.

E(M) = 10
E(M) = 20
Parameter Value Method EST ESE ASE ECP ECP EST ESE ASE ECP ECP
β01 = β02 = (log 1.5, log 2.0)′; β13 = β23 = 01, R = 0.25
Δ0(to) 0.708 True 0.709 0.029 0.030 0.962 0.957 0.709 0.029 0.030 0.958 0.959
MLE 0.708 0.029 0.031 0.952 0.950 0.706 0.029 0.029 0.951 0.959
Δ1(to) 0.416 True 0.428 0.029 0.028 0.959 0.957 0.429 0.031 0.030 0.940 0.943
MLE 0.419 0.031 0.030 0.956 0.954 0.418 0.030 0.031 0.935 0.935
Δ2(to) 0.651 True 0.637 0.026 0.026 0.941 0.943 0.639 0.027 0.028 0.962 0.958
MLE 0.638 0.027 0.028 0.942 0.940 0.637 0.027 0.027 0.946 0.951
Δ(to) 0.592 True 0.591 0.021 0.020 0.942 0.949 0.592 0.021 0.020 0.942 0.943
MLE 0.588 0.020 0.018 0.932 0.941 0.587 0.020 0.020 0.940 0.945
β01 = β02 = (log 1.5, log 2.0)′; β13 = β23 = 01, R = 0.5
Δ0(to) 0.697 True 0.690 0.030 0.031 0.952 0.946 0.689 0.030 0.031 0.953 0.959
MLE 0.690 0.030 0.029 0.938 0.944 0.691 0.030 0.031 0.940 0.958
Δ1(to) 0.435 True 0.431 0.032 0.031 0.940 0.954 0.432 0.032 0.032 0.957 0.955
MLE 0.427 0.031 0.030 0.942 0.939 0.428 0.031 0.032 0.962 0.945
Δ2(to) 0.702 True 0.703 0.025 0.026 0.956 0.957 0.702 0.026 0.026 0.941 0.954
MLE 0.702 0.024 0.025 0.964 0.954 0.701 0.023 0.026 0.951 0.957
Δ(to) 0.613 True 0.608 0.021 0.019 0.954 0.943 0.608 0.021 0.020 0.950 0.947
MLE 0.606 0.020 0.019 0.958 0.950 0.607 0.020 0.021 0.960 0.958
β01 = β02 = (log 1.5, log 2.0)′; β13 = β23 = 01, R = 1
Δ0(to) 0.674 True 0.659 0.032 0.031 0.942 0.943 0.659 0.032 0.033 0.952 0.951
MLE 0.659 0.032 0.032 0.961 0.950 0.660 0.031 0.032 0.963 0.951
Δ1(to) 0.461 True 0.456 0.033 0.032 0.956 0.958 0.457 0.031 0.032 0.954 0.957
MLE 0.451 0.034 0.033 0.935 0.948 0.453 0.032 0.033 0.930 0.938
Δ2(to) 0.794 True 0.789 0.021 0.022 0.949 0.941 0.789 0.022 0.022 0.958 0.944
MLE 0.784 0.022 0.023 0.952 0.947 0.787 0.021 0.022 0.937 0.941
Δ(to) 0.643 True 0.635 0.021 0.019 0.939 0.945 0.635 0.019 0.020 0.941 0.945
MLE 0.631 0.021 0.022 0.946 0.955 0.633 0.019 0.019 0.932 0.947
β01 = β02 = (log 1.5, log 2.0)′; β13 = β23 = 01, R = 2
Δ0(to) 0.640 True 0.624 0.034 0.033 0.942 0.944 0.622 0.032 0.033 0.949 0.953
MLE 0.622 0.033 0.032 0.934 0.941 0.622 0.033 0.032 0.926 0.941
Δ1(to) 0.513 True 0.513 0.034 0.033 0.947 0.958 0.514 0.031 0.032 0.939 0.947
MLE 0.503 0.034 0.033 0.956 0.960 0.503 0.033 0.032 0.953 0.950
Δ2(to) 0.871 True 0.869 0.017 0.017 0.953 0.953 0.870 0.017 0.017 0.949 0.942
MLE 0.868 0.017 0.018 0.941 0.944 0.868 0.016 0.017 0.949 0.951
Δ(to) 0.670 True 0.669 0.019 0.020 0.940 0.945 0.669 0.018 0.019 0.950 0.954
MLE 0.665 0.018 0.020 0.956 0.952 0.664 0.017 0.018 0.951 0.945

Prediction is based on true parameter values (True) and maximum likelihood estimates (MLE).

4. Prediction of Sacroiliac Joint Damage in Psoriatic Arthritis

Here we revisit the problem of predicting sacroiliac damage in patients with psoriatic arthritis (PsA) using data from the University of Toronto Psoriatic Arthritis Clinic, where interest lies in predicting whether an individual is in a certain state at a pre-specified time horizon; see Section 1.2. This multistate disease process is depicted in Figure 1(a) and we aim to assess the accuracy of predictions for being in states defining unilateral sacroiliac damage (states 1 or 2) and axial disease (state 3). Table 3 provides the distribution of the coarsening sets at time horizons to = 5, 10, 15 and 20 years. From this table we can see that the state occupied for individuals is usually uncertain with increasing degrees of coarsening due to intermittent observation at later time horizons. We restrict attention to individuals recruited to the clinic that did not have any sacroiliac damage upon clinic entry (at state 0), giving a sample of 953 individuals. The baseline covariates used in this analysis included age of PsA diagnosis, gender, and several human leukocyte antigen markers including, HLA-A2, HLA-A11, HLA-B38, HLA-C12, HLA-DR8, HLA-DR14, HLA-DQ2, HLA-DQ3 and HLA-DQ5, which have been reported as important risk factors in previous work in this area. We considered four multistate models including a model with time homogeneous transition intensities and four distinct sets of regression coefficients (Model 1), a model with time homogeneous transition intensities with the regression coefficients constrained to be equal for the 0 → 1 and 0 → 2 transitions, as well as 1 → 3 and 2 → 3 transitions (Model 2). Models 3 and 4 were analogous to Models 1 and 2, respectively, but with piecewise-constant (four pieces) baseline intensities having cut-points at t = 8, 16, 24 years. The estimated relative risks (RR) for each of these covariates and their associated 95% confidence intervals are presented in Table 4 for all four models.

TABLE 3.

Distribution of coarsened state occupancy data at time horizons to = 5, 10, 15 and 20 years from clinic entry.

Ci to = 5 to = 10 to = 15 to = 20
Prediction Time Horizon
0 611 (64.11%) 407 (42.71%) 251 (26.33%) 161 (16.89%)
1 7 (0.73%) 7 (0.73%) 6 (0.63%) 2 (0.21%)
2 12 (1.26%) 10 (1.05%) 12 (1.26%) 10 (1.05 %)
3 117 (12.28%) 188 (19.72%) 253 (26.55%) 286 (30.01%)
(0, 1) 6 (0.63%) 5 (0.53%) 1 (0.11%) 0 (0%)
(0, 2) 12 (1.26%) 6 (0.63%) 6 (0.63%) 3 (0.31%)
(1, 3) 11 (1.15%) 12 (1.26%) 18 (1.89%) 23 (2.41%)
(2, 3) 17 (1.78%) 32 (3.36%) 35 (3.67%) 42 (4.41%)
(0, 1, 2, 3) 160 (16.79%) 286 (30.01%) 371 (38.93%) 426 (44.70%)

TABLE 4.

Estimate of regression coefficients from fitting Models 1 to 4 to data from the Univeristy of Toronto Psoriatic Arthritis Cohort; cut-points are at 8, 16 and 24 years from disease onset.

Models 1 and 3 Models 2 and 4
0 → 1 0 → 2 1 → 3 2 → 3 0 → 1/2 1/2 → 3
Covariate RR CI RR CI RR CI RR CI RR CI RR CI
Time homogeneous intensity
gender (male vs. female) (years) 1.96 (0.84, 4.56) 1.33 (0.75, 2.35) 1.30 (0.33, 5.14) 1.62 (0.74, 3.58) 1.52 (1.05, 2.22) 1.34 (0.85, 2.11)
age 1.00 (0.97, 1.03) 0.96 (0.94, 0.99) 1.08 (1.02, 1.14) 0.98 (0.95, 1.00) 0.97 (0.96, 0.99) 0.99 (0.97, 1.01)
a2 3.15 (1.24, 8.05) 1.58 (0.87, 2.85) 4.77 (0.89, 28.32) 0.79 (0.43, 1.44) 1.99 (1.33, 2.97) 1.34 (0.84, 2.14)
a11 1.73 (0.67, 4.41) 0.60 (0.19, 1.92) 11.51 (2.06, 64.34) 1.33 (0.47, 3.74) 0.99 (0.55, 1.80) 1.94 (1.00, 3.77)
b38 1.04 (0.30, 3.60) 1.11 (0.42, 2.95) 2.25 (0.23, 22.18) 2.52 (0.84, 7.59) 1.05 (0.53, 2.07) 2.52 (1.09, 5.83)
c12 1.05 (0.38, 2.96) 1.55 (0.72, 3.32) 2.46 (0.32, 19.07) 1.52 (0.66, 3.48) 1.35 (0.76, 2.34) 1.56 (0.85, 2.86)
dr8 0.45 (0.09, 2.12) 1.66 (0.63, 4.37) 1.30 (0.09, 18.64) 0.75 (0.24, 2.37) 1.05 (0.48, 2.30) 0.55 (0.22, 1.38)
dr14 1.05 (0.30, 3.68) 1.25 (0.43, 3.64) 0.19 (0.02, 2.15) 0.58 (0.17, 1.99) 1.17 (0.56, 2.42) 0.37 (0.15, 0.89)
dq2 0.15 (0.05, 0.46) 1.27 (0.70, 2.32) 0.17 (0.04, 0.73) 1.18 (0.57, 2.46) 0.65 (0.42, 1.00) 0.71 (0.45, 1.14)
dq3 0.58 (0.26, 1.29) 1.31 (0.72, 2.38) 0.13 (0.03, 0.57) 1.83 (0.92, 3.63) 0.94 (0.62, 1.42) 1.24 (0.76, 2.02)
dq5 0.54 (0.22, 1.30) 0.94 (0.49, 1.80) 2.31 (0.71, 7.55) 1.30 (0.58, 2.90) 0.74 (0.48, 1.15) 1.29 (0.78, 2.14)
Piecewise constant intensity
gender (male vs. female) (years) 1.56 (0.78, 3.10) 1.26 (0.75, 2.13) 0.77 (0.24, 2.46) 1.88 (0.97, 3.63) 1.35 (0.93, 1.96) 1.36 (0.86, 2.16)
age 1.00 (0.97, 1.02) 0.95 (0.93, 0.97) 1.06 (1.00, 1.11) 0.97 (0.95, 1.00) 0.97 (0.95, 0.98) 0.99 (0.97, 1.01)
a2 2.70 (1.17, 6.22) 1.50 (0.85, 2.66) 3.39 (0.66, 17.46) 0.80 (0.43, 1.49) 1.87 (1.25, 2.79) 1.35 (0.84, 2.15)
a11 1.85 (0.78, 4.34) 0.44 (0.13, 1.55) 13.62 (2.52, 73.63) 1.19 (0.42, 3.40) 0.95 (0.52, 1.73) 1.82 (0.93, 3.55)
b38 0.88 (0.24, 3.16) 1.34 (0.53, 3.40) 1.49 (0.14, 15.51) 2.77 (0.90, 8.52) 1.08 (0.55, 2.13) 2.32 (1.00, 5.41)
c12 1.04 (0.35, 3.13) 1.49 (0.69, 3.24) 2.92 (0.35, 24.27) 1.48 (0.63, 3.49) 1.33 (0.75, 2.33) 1.56 (0.84, 2.87)
dr8 0.43 (0.09, 1.95) 1.52 (0.57, 4.09) 1.12 (0.09, 14.24) 0.85 (0.27, 2.69) 0.92 (0.41, 2.05) 0.60 (0.24, 1.49)
dr14 1.03 (0.29, 3.69) 1.14 (0.31, 3.18) 0.26 (0.02, 3.61) 0.57 (0.17, 1.98) 1.18 (0.58, 2.40) 0.45 (0.18, 1.12)
dq2 0.14 (0.05, 0.41) 1.48 (0.81, 2.68) 0.14 (0.04, 0.52) 1.38 (0.70, 2.74) 0.73 (0.47, 1.13) 0.75 (0.46, 1.22)
dq3 0.65 (0.32, 1.34) 1.12 (0.63, 2.01) 0.15 (0.04, 0.57) 1.86 (0.95, 3.68) 0.92 (0.61, 1.39) 1.23 (0.75, 2.03)
dq5 0.47 (0.19, 1.13) 1.13 (0.59, 2.17) 1.73 (0.47, 6.33) 1.53 (0.71, 3.30) 0.80 (0.51, 1.25) 1.29 (0.76, 2.19)

The mean value of the PDI for each class, as well as the overall value, were computed at times to = 5, 10, 15, 20, with the point estimates joined over the different time horizons in Figure S1 of the Supplemental Material. All four models demonstrated much superior prediction than the null model whose value of the polytomous discrimination index is 1/3 (see lower dashed horizontal line). The time-homogeneous Model 1 with unconstrained regression coefficients tended to give the best predictive performance in terms of class-specific and overall discrimination. To assess the uncertainty in Δ^k(t) and the overall estimate Δ^(t) for the four models, pointwise 95% confidence intervals were computed at each time horizon based on the nonparametric bootstrap. Model 1 had the best performance so we show the plots for the time-homogeneous models (Models 1 and 2) in Figure 3 and include the plots for piecewise-constant models (Models 3 and 4) in Figure S2 of the Supplemental Material.

FIGURE 3.

FIGURE 3

Plots of the predictive discrimination index estimates as a function of to and the empirical 95% confidence interval (dashed lines) for a time homogeneous model with no constraints on covariate effects (Model 1), a time homogeneous model with constraints (Model 2),; constraints ensure 0 → 1 and 0 → 2 regression coefficients for the onset of unilateral damage, and 1 → 3 and 2 → 3 regression coefficients for the development of axial disease, are respectively the same. Δ0(to) corresponds to prediction of no SI-joint involvement, Δ1(to) corresponds to prediction of unilateral SI-joint damage, Δ2(to) corresponds to prediction of axial disease, and Δ(to) is the overall measure.

5. Discussion

Our primary goal is to describe how to estimate the accuracy of a prediction model for state occupancy of a multistate process at a specified time horizon, in the setting where the disease process is under intermittent observation. As an initial investigation, we consider prediction with a multinomial response where outcomes may be coarsened for some individuals so that it is only known that the outcome is one of a set of possible categories. A weighted estimator of the polytomous discrimination index is proposed by considering a pseudo-sample of individuals accommodating the different response categories that individuals with grouped outcomes may belong to; this approach is shown to perform well with moderate to heavy completely random coarsening rates. Building on the discussion under the multinomial setup in Section 2, we then described estimation of polytomous discrimination index in multistate disease processes. The interest lies in predicting whether an individual is in a certain state at a pre-specified time horizon to. However, the state occupied by some individuals at to may be unknown due to the intermittent observation scheme. A weighted estimator of the polytomous discrimination index considering pseudo-sample of individuals is thus proposed and has empirically shown to perform well in simulation studies. We did not consider ties in the predictive probabilities in estimating the PDI, but they can be handled easily as discussed in Van Calster et al.12. The proposed method is then applied to a motivating study involving data from the University of Toronto Psoriatic Arthritis Clinic. Here we fitted four models and assessed their predictive accuracy at a few different time horizons via 5-fold internal cross-validation. In general, all four models seem to have reasonable PDI values compared to the null model. The most parsimonious model with time-homogeneous transition intensities seemed often to exhibit superior performance.

For the multinomial setting coarsening completely at random23 implies that the presence and nature of coarsening is completely independent of the response category. If this is not satisfied, then joint modeling, inverse probability weighting30, or augmentation and inverse probability weighting31 can be employed. In the multistate setting, response-dependent visit times can lead to bias both in terms of model fitting when building a predictive model, and in assessing predictive accuracy. Joint modeling of the multistate process and the visit process can help mitigate bias in model fitting. Use of joint disease and visit process models for prediction seem less natural and would not tend to be transportable to other clinic settings where visit schedules may differ – we are currently exploring the use of inverse-intensity weighting for this setting.

Complex disease processes often feature heterogeneity beyond that explained through covariates. Jiang and Cook32 describe finite mixture models of multistate processes under intermittent observation and develop score tests for effects of biomarkers on class membership. Here the use of score tests was motivated by the need to screen a large number of genetic markers for their association with the disease course combined with the difficulty in fitting such mixture models. Once a list of candidate genetic markers are identified by this approach, it is natural to incorporate them into a predictive model for the disease course. In this case, one could model covariate effects on class membership as done in Jiang and Cook32, as well as on the intensity functions of the multistate process in the different classes. Such a rich predictive model could then be used to predict state occupancy at to – our proposed method for estimating the PDI can be readily adapted to deal with this setting.

Multistate models with hidden states may also be of interest in some disease settings. States are often based on distinct conditions such as the definition for each state is clear. There is no ambiguity in our motivating setting – whether an individual has sacroiliac joint damage on the left or right-side (or both) is typically clear from radiographic examination. In other settings it may be difficult to determine which state an individual occupies upon examination – if it can be determined that they are in a strict subset of the possible states this remains informative and the likelihood can be modified to deal with this at training stage. This would represent a hybrid coarsening process involving aspects of the settings of Sections 2 and 3 which can be dealt with using the msm function as described in Section 3.4 of Jackson et al.29, but we do not consider this here. In other longitudinal observation schemes the states occupied may be subject to misclassification. In such cases hidden Markov models could be considered, but the general approach to estimating the PDI indices we discuss here remain applicable. Finally, as one reviewer pointed out, life history processes are often observed subject to left truncation. It is important to address this when it arises in datasets during model building and the assessment of predictive accuracy should address such complications and this, along with the development of robust standard errors for the estimators we develop, is worthy of future research.

Supplementary Material

supinfo

Acknowledgements

This work has been supported by the National Cancer Institute (U01 CA195547 for SJ) and grants from the Natural Sciences and Engineering Research Council of Canada (RGPIN-2017-04207 for RJC) and the Canadian Institutes of Health Research (FRN 159834 for RJC). Richard Cook is a Mathematics Faculty Research Chair and University Professor at the University of Waterloo. The authors thank Drs. Dafna Gladman and Vinod Chandran for helpful discussions regarding the research at the Centre for Prognosis Studies in Rheumatic Disease at the University of Toronto.

Footnotes

Conflict of interest

The authors declare no potential conflict of interests.

Supplemental Materials

Web Appendices and Tables referenced in Sections 2, 3 and 4 are available with this paper in the online supplemental material. Software in the form of R code, together with a sample input data set and complete documentation is available on github repository (https://github.com/jj113/PDImultistate).

Data availability statement

The data from the University of Toronto Psoriatic Arthritis Clinic are confidential and held by the Centre for Prognosis Studies in the Rheumatic Diseases. The R code that support the findings of this study is available upon request from the first author.

References

  • 1.Steyerberg EW. Clinical Prediction Models. New York: Springer. 2019. [Google Scholar]
  • 2.Schemper M. Predictive accuracy and explained variation. Statistics in Medicine 2003; 22(14): 2299–2308. [DOI] [PubMed] [Google Scholar]
  • 3.Kutner MH, Nachtsheim CJ, Neter J, Li W. Applied Linear Statistical Models, 2005. McGraw Hill Irwin, New York. NY: 2005: 409. [Google Scholar]
  • 4.Harrell FE, Califf RM, Pryor DB, Lee KL, Rosati RA. Evaluating the yield of medical tests. Journal of the American Medical Association 1982; 247(18): 2543–2546. [PubMed] [Google Scholar]
  • 5.Hanley JA, others . Receiver operating characteristic (ROC) methodology: the state of the art. Crit Rev Diagn Imaging 1989; 29(3): 307–335. [PubMed] [Google Scholar]
  • 6.Pepe MS. The Statistical Evaluation of Medical Tests for Classification and Prediction. New York: Oxford University Press. 2003. [Google Scholar]
  • 7.Brier GW. The statistical theory of turbulence and the problem of diffusion in the atmosphere. Journal of Meteorology 1950; 7(4): 283–290. [Google Scholar]
  • 8.Uno H, Cai T, Tian L, Wei LJ. Evaluating prediction rules for t-year survivors with censored regression models. Journal of the American Statistical Association 2007; 102(478): 527–537. [Google Scholar]
  • 9.Uno H, Cai T, Pencina MJ, D’Agostino RB, Wei LJ. On the C-statistics for evaluating overall adequacy of risk prediction procedures with censored survival data. Statistics in Medicine 2011; 30(10): 1105–1117. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Li J, Jiang B, Fine JP. Multicategory reclassification statistics for assessing improvements in diagnostic accuracy. Biostatistics 2013; 14(2): 382–394. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Li J, Feng Q, Fine JP, Pencina MJ, Van Calster B. Nonparametric estimation and inference for polytomous discrimination index. Statistical Methods in Medical Research 2018; 27(10): 3092–3103. [DOI] [PubMed] [Google Scholar]
  • 12.Van Calster B, Van Belle V, Vergouwe Y, Timmerman D, Van Huffel S, Steyerberg EW. Extending the c-statistic to nominal polytomous outcomes: the Polytomous Discrimination Index. Statistics in Medicine 2012; 31(23): 2610–2626. [DOI] [PubMed] [Google Scholar]
  • 13.Mossman D. Three-way ROCs. Medical Decision Making 1999; 19(1): 78–89. [DOI] [PubMed] [Google Scholar]
  • 14.Dreiseitl S, Ohno-Machado L, Binder M. Comparing three-class diagnostic tests by three-way ROC analysis. Medical Decision Making 2000; 20(3): 323–331. [DOI] [PubMed] [Google Scholar]
  • 15.Cook RJ, Lawless JF. Multistate Models for the Analysis of Life History Data. New York: CRC Press; . 2018. [Google Scholar]
  • 16.Aalen O, Borgan O, Gjessing H. Survival and Event History Analysis: A Process Point of View. New York: Springer Science & Business Media; . 2008. [Google Scholar]
  • 17.Andersen PK, Borgan O, Gill RD, Keiding N. Statistical Models Based on Counting Processes. New York: Springer Science & Business Media; . 2012. [Google Scholar]
  • 18.Escolano S, Golmard JL, Korinek AM, Mallet A. A multi-state model for evolution of intensive care unit patients: prediction of nosocomial infections and deaths. Statistics in Medicine 2000; 19(24): 3465–3482. [DOI] [PubMed] [Google Scholar]
  • 19.Keiding N, Klein JP, Horowitz MM. Multi-state models and outcome prediction in bone marrow transplantation. Statistics in Medicine 2001; 20(12): 1871–1885. [DOI] [PubMed] [Google Scholar]
  • 20.Putter H, van der Hage J, de Bock GH, Elgalta R, van de Velde CJ. Estimation and prediction in a multi-state model for breast cancer. Biometrical Journal: Journal of Mathematical Methods in Biosciences 2006; 48(3): 366–380. [DOI] [PubMed] [Google Scholar]
  • 21.Spitoni C, Lammens V, Putter H. Prediction errors for state occupation and transition probabilities in multi-state models. Biometrical Journal 2018; 60(1): 34–48. [DOI] [PubMed] [Google Scholar]
  • 22.Bergholtz H, Lien TG, Swanson DM, et al. Contrasting DCIS and invasive breast cancer by subtype suggests basal-like DCIS as distinct lesions. NPJ Breast Cancer 2020; 6(1): 1–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Heitjan DF, Rubin DB. Ignorability and coarse data. The Annals of Statistics 1991: 2244–2253. [Google Scholar]
  • 24.Geijer M, Gadeholt Göthlin G, Göthlin J. The validity of the New York radiological grading criteria in diagnosing sacroiliitis by computed tomography. Acta Radiologica 2009; 50(6): 664–673. [DOI] [PubMed] [Google Scholar]
  • 25.Dempster AP, Laird NM, Rubin DB. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B (Methodological) 1977; 39(1): 1–22. [Google Scholar]
  • 26.Cook RJ, Lawless JF. Independence conditions and the analysis of life history studies with intermittent observation. Biostatistics 2021; 22(3): 455–481. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Hogan JW, Roy J, Korkontzelou C. Handling drop-out in longitudinal studies. Statistics in Medicine 2004; 23(9): 1455–1497. [DOI] [PubMed] [Google Scholar]
  • 28.Kalbfleisch J, Lawless JF. The analysis of panel data under a Markov assumption. Journal of the American Statistical Association 1985; 80(392): 863–871. [Google Scholar]
  • 29.Jackson CH. Multi-state models for panel data: the msm package for R. Journal of Statistical Software 2011; 38(8): 1–29. [Google Scholar]
  • 30.Robins JM, Rotnitzky A, Zhao LP. Estimation of regression coefficients when some regressors are not always observed. Journal of the American Statistical Association 1994; 89(427): 846–866. [Google Scholar]
  • 31.Bang H, Robins JM. Doubly robust estimation in missing data and causal inference models. Biometrics 2005; 61(4): 962–973. [DOI] [PubMed] [Google Scholar]
  • 32.Jiang S, Cook RJ. Score tests based on a finite mixture model of Markov processes under intermittent observation. Statistics in Medicine 2019; 38(16): 3013–3025. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

supinfo

Data Availability Statement

The data from the University of Toronto Psoriatic Arthritis Clinic are confidential and held by the Centre for Prognosis Studies in the Rheumatic Diseases. The R code that support the findings of this study is available upon request from the first author.

RESOURCES