Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2020 Jun 30.
Published in final edited form as: Stat Med. 2020 Mar 23;39(14):1952–1964. doi: 10.1002/sim.8523

Nonparametric Estimation of Broad Sense Agreement Between Ordinal and Censored Continuous Outcomes

Tian Dai 1, Ying Guo 1, Limin Peng 1, Amita Manatunga 1
PMCID: PMC7269873  NIHMSID: NIHMS1069266  PMID: 32207170

Abstract

The concept of broad sense agreement (BSA) has recently been proposed for studying the relationship between a continuous measurement and an ordinal measurement (Peng et al. [1]). They developed a non-parametric procedure for estimating the BSA index, which is only applicable to completely observed data. In this work, we consider the problem of evaluating BSA index when the continuous measurement is subject to censoring. We propose a nonparametric estimation method built upon a derivation of a new functional representation of the BSA index, which allows for accommodating censoring by plugging in the nonparametric survival function estimators. We establish the consistency and asymptotic normality for the proposed BSA estimator. We also investigate an alternative approach based on the strategy of multiple imputation, which is shown to have better empirical performance with small sample sizes than the plug-in method. Extensive simulation studies are conducted to evaluate our proposals. We illustrate our methods via an application to a Surgical Intensive Care Unit study.

Keywords: Nonparametric Estimation, Agreement, Censored Observations, Multiple Imputation

1 |. INTRODUCTION

In biomedical research, agreement studies are often carried out to evaluate the similarity of measurements obtained by different raters, or to assess whether a new instrument can adequately reproduce the result of a “gold standard” instrument. Various methods have been developed for studying the agreement between categorical measurements ([2],[3],[4], [5],[6],[7],[8] among others) and the agreement between continuous outcomes ([9],[10],[11],[12],[13],[14],[15] among others). These methods are confined to the applications of comparing measurements on the same scale.

Peng et al. [1] proposed the concept of broad sense agreement (BSA), which lays the foundation for a new framework for assessing the correspondence between a continuous scale and an ordinal scale. Peng et al. [1] proposed a sensible BSA index/measure and developed a nonparametric estimation procedure for it. The BSA index is a chance-corrected agreement measure that lies between −1 and 1. A higher value of BSA indicates stronger alignment between the ordinal scale and continuous scale. A motivating example from Peng et al. [1] is the Melanoma and Depression studyMusselman et al. [16], in which the depression was measured by the clinician-administered Hamilton Depression Scale (HAM-D) and self-reported dimensional scale (Carroll-D). The clinician-administered HAM-D provided a well-defined depression grade: no depression, mild depression, and severe depression, while the self-reported Carroll-D provided a continuous score. The BSA was applied to assess whether the less time-consuming Carroll-D could provide consistent results as compared with HAM-D to determine the grade of depression. The estimated BSA from the study was 0.941, which was close to the upbound of 1. This high value of BSA indicated a high capability of finding interpretable cut-off points of Carroll-D that would lead to highly consistent ordinal categorical depression grades based on the HAM-D. A more detailed review of BSA is presented in Section 2.1.

In this work, we aim to extending the BSA framework by developing BSA estimators which can accommodate censored continuous measures. For example, in intensive care unit studies, disease severity scores are often used by clinicians for classifying patients to different risk groups. A number of studies have been conducted to evaluate the relationship between the disease severity scoring and the risk of mortality ([17],[18]). However, it remains unknown to which extent the risk grouping method based on disease severity scores (ordinal) is concordant with disease-related survival times. One major challenge in this example is that disease-related survival times are often subject to censoring due to ICU discharge.

Simulation results in section 3 show that directly applying Peng et al. [1]’s estimation procedure with censored observations will result in considerably biased estimates. To the best of our knowledge, this work is the first to address censored data in the BSA framework.

The presence of censoring to the continuous measurement can greatly complicate the estimation and inference of the BSA index. To address this challenge, we first derive a new functional representation of the BSA index, which delineates the dependency of the BSA index on the the distributions of ordinal and continuous measurements. Such a new representation facilitates the development of a new plug-in type estimator of BSA index which handles censoring via plugging in existing nonparametric survival distribution estimators for censored data. We show the proposed estimator is consistent and asymptotically normal and works well with moderate to large sample sizes. The proposed estimator also provides computational advantages over Peng et al. [1]’s estimation procedure, and can be used as an alternative approach for cases without censoring.

One limitation of the new plug-in estimator is that its empirical performance may be less satisfactory when the sample size is small. To achieve improved estimation accuracy with small datasets, we investigate an alternative nonparametric estimator based on multiple imputation techniques. The imputation-based estimator demonstrate good small sample performance at the cost of additional computational intensity.

The remainder of this paper is organized as follows. In Section 2, we first provide a short review of existing estimation method for the BSA measure and then present the proposed plug-in and imputation-based estimation methods. We establish the asymptotic properties and develop inference procedures for the proposed methods. In Section 3, we conduct simulation studies to evaluate the performance of our proposed estimators in comparison with the naive methods that either exclude censored observations or treat censored observations as observed events. The results show that the proposed estimators successfully correct the bias of the naive methods in the presence of censoring. We then illustrate our proposed methods via an application to a surgical ICU data in Section 4. Finally, we conclude with some remarks in Section 5.

1.1 |. Methods

In this section, we provide a brief review of the broad sense agreement (BSA) concept, the BSA index and its estimation procedure in Peng et al. [1]. Let X and Y denote a continuous and an ordinal measure of a common outcome from the same subject. Let DX and DY be the domain of X and Y, respectively. Peng et al. [1] provided the definition of perfect broad sense agreement (disagreement) between X and Y if and only if there exists an increasing (decreasing) step function ψ from DX to DY such that Y = ψ(X) with probability 1. The definition of perfect broad sense agreement (disagreement) implies that one is able to identify a set of cut-points for the continuous X such that the discretized X is in perfect concordance (discordance) with the ordinal Y. Therefore, the BSA framework defined upon the above concept provides useful information on the degree of concordance between a continuous measure and an ordinal measure.

the perfect broad sense agreement entails a scenario where, for randomlyselected X with Y = l, denoted by X(*l), it must be satisfied that X(*1) ⋯ < X(*L)

Based on the above definition, Peng et al. [1] further introduced the definition of a BSA index, denoted as ρbsa, to characterize the extent to which the relationship between X and Y departures from the perfect BSA scenario. Denote Y as an ordinal variable that takes values 1 < ⋯ < L, where X is a continuous variable. The perfect broad sense agreement entails a scenario where, for randomly selected X with Y = l, denoted by X (*l), it must be satisfied that X(*l) <,⋯< X(*L), i.e. the ordering of the continuous measurements should perfectly match the ordering of their corresponding ordinal measurements. The BSA measure is defined based on the mean square distance between the observed ranks of {X(*l) <,⋯< X(*L)} and their expected ranks under the scenario of perfect BSA. Specifically, we denote the observed ranks of {X(*l) <,⋯< X(*L)} by (R1, ⋯, RL). Based on the previous description, the expected rank of {X(*l) <,⋯< X(*L)} under perfect BSA is (1, ⋯, L). Therefore, the BSA measure takes the form,

ρbsa=1E{l=1L(lRl)2}E{l=1L(lRl)2|XY}. (1)

The ρbsa is a scaled global measure that takes values from −1 to 1, with larger value indicates better BSA and a value of 1(−1) indicates perfect broad sense agreement (disagreement). To demonstrate these extreme cases, recall that under the perfect broad sense agreement, it must be satisfied that X(*l) <,⋯< X(*L). In the contrary case with perfect broad sense disagreement, the ranks of X(*l) <,⋯< X(*L) are reversed; that is, X(*l) >,⋯> X(*L). In other words, when X and Y are in perfect broad sense agreement (or disagreement), (R1, …, RL) = (1, ⋯, L) (or (L, ⋯, 1)) with probability 1 across random samples of X. Therefore, when there is perfect broad sense agreement, the BSA index ρbsa = 1 given that E{l=1L(lRl)2} equals 0. For the perfect broad sense disagreement scenario, it can be shown that ρbsa = −1 through some derivation. The details are presented in Appendix A in Peng et al. [1] paper.

Peng et al. [1] developed a nonparametric estimation procedure for the proposed BSA index, i.e. ρbsa. Suppose the observed data consist of n complete random samples of (X,Y), denoted by {xi,Yi}i=1n, and can be arranged based on the ordering of Y as follows,

(X1,Y1=1),,(Xn1,Yn1=1),(Xn1+1,Yn1+1=2),,(Xn1+n2,Yn1+n2=2),(Xl=1L1nl+1,Yl=1L1nl+1=L),,(Xl=1Lnl,Yl=1Lnl=L),

where nl=i=1nI(Yi=l) is the number of subjects in the lth level of Y and l=1Lnl=n. Peng et al. [1] showed that under the independence assumption between X andY, the expected mean square distance can be determined based on the number of Y levels, i.e., E{l=1L(lRl)2|XY}=CL=(L3L)/6.

Adopting the idea of stratified sampling without replacement, Peng et al. [1] proposed to estimate E{l=1L(lRl)2} by the sample mean of the mean square distance based on stratified samples without replacement, i.e.

Wn=(l=1Lnl)1j1=1n1jL=1nLl=1L{lr=1LI(Xˇl,jlXˇr,jr)}2.

Note that (Xˇ1,j1,,XˇL,jL) is a random realization of X(*l),⋯>X(*L) and r=1LI(Xˇl,jlXˇr,jr) is the rank of Xˇl,jl among (Xˇ1,j1,,XˇL,jL) with Xˇl,jl=Xr=1l1nr+jl where 1 ≤ jlnl for 1 ≤ lL.

Therefore, Peng et al. [1]’s estimator of ρbsa is given by

ρ^bsa=WnCL=1(l=1Lnl)1j1=1n1jL=1nLl=1L{lr=1LI(Xˇl,jlXˇr,jr)}2(L3L)/6. (2)

The asymptotic variance of ρ^bsa is estimated using the jackknife method ([1]).

1.2 |. A new functional representation of BSA

Peng et al. [1]’s estimation procedure for the BSA index required completely observed data. In biomedical data, the continuous measure may be censored due to many reasons, such as loss of follow-up. Simulation results in section 3 show that Peng et al. [1]’s estimator will result in considerably biased estimates if directly applied to censored observations. We note that the estimation of the denominator term of BSA index is fairly simple and does not require any information from the continuous X. The key of estimating BSA lies in the estimation of the numerator term E{l=1L(lRl)2}.

We first derive a new functional representation of the BSA index. Specifically, denote the conditional survival function of [X |Y = l] as Sl (x) = P r (X > x |Y = l), l = 1, ⋯, L. We can show that the BSA measure in Equation (1) can be written as a functional of the L conditional survival functions {S1(·), …, SL(·)}. That is,

ρbsa=2r=1L1l=r+1L(lr)0Sl(x)dSr(x)(L3L)/61. (3)

Detailed derivation of Equation (3) can be found in [19]. This result holds for continuous measurements X with a finite lower bound. In the paper, without loss of generality, we assume the lower bound is zero. Let S0 be the collection of a finite dimensional vector survival functions S on L with finite support [0, τ1] × · · · × [0, τL]. Let S˜1S1supx|S˜1(x)S1(x)|S˜1S1S where S={1(,x]:x}. Then the functional T:S0

T(S)=2r=1L1l=r+1L(lr)0Sld(Sr)(L3L)/61.

We note that the new representation of ρbsa only consists of several conditional survival functions, which can be easily estimated with censored data based on existing software. Equation (3) naturally motivates us to construct an estimator of the BSA measure by plugging in the estimators of {S1(·), …, SL(·)}.

Suppose the observed data consist of n random samples {X˜i,Yi,δi}i=1n, where X˜i is the observed continuous measurement for subject i, which is defined as the minimum of the continuous measurement Xi and the censoring variable Ci, and δi is the censoring indicator, which equals zero if the observation is censored, and one if not censored. C and X are independent conditionally on Y. Let survival function Gl (x) = P r (C > x |Y = l). For any l ∈ {1, ⋯, L }, Sl (x) can be consistently estimated by the Kaplan Meier estimator S^I(x) stratified forY for any x[0,τl*], where τl*=sup{x:S1(x)G1(x)>0} (Breslow and Crowley, 1974; Wang, 1987; Cai, 1997). Given (1), we propose a nonparametric plug-in estimator for ρbsa:

ρ^bsa=2r=1L1l=r+1L(lr)0τ^lτ^rS^l(x)dS^r(x)(L3L)/61, (4)

where τ^I=max(X˜l,1,, X˜l,nl), a ^ b = min(a, b). Note that the Kaplan-Meier estimators can be readily obtained by standard software. Given that S^I(x) and S^r(x) are piecewise-constant functions, the integrals in (4) can be easily computed through a finite summation. Thus, ρ^bsa is a computationally simple nonparametric estimator. [19] shows the proposed nonparametric plug-in estimator for ρbsa in (4) is equivalent to the estimator presented in Peng et al. [1] when there is no censoring. Through simulation studies, [19] shows the nonparametric plug-in estimator is computationally much faster than Peng et al. [1]’s estimator.

In the following, we establish asymptotic properties for the proposed estimator ρ^bsa via the Hadamard differentiability of the functional T and the statistical properties of stratified Kaplan Meier estimators of the conditional survival functions. For simplicity, we denote S^=(S^1,,S^L) and S=(S1,,SL). First, we show the functional T is Hadamard differentiable in Lemma 1.

Lemma 1 Let S0 be the collection of a finite dimensional vector of survival functions S on L with finite support [0, τ1] × ⋯ × [0,τL]. Let S˜1S1supx|S˜1(x)S1(x)|S˜1S1S where S={1(,x]:x}. Then the functional T:S0

T(S)=2r=1L1l=r+1L(lr)0Sld(Sr)(L3L)/61

is Hadamard differentiable at S with respect to Kolmogorov distance dK = ∥·∥.

The proof of Lemma 1 is provided in Appendix 6.1.

Next, we show the statistical properties of S^. Breslow and Crowley (1974) shows that if both Sl and Gl are continuous distributions, on the support [0,τ1*], where τl*=sup{x:Sl(x)Gl(x)>0}, the Kaplan Meier estimator S^l uniformly consistent and n(S^l(x)Sl(x)),x[0,τl*] converges weakly to a Gaussian process W with mean 0 and covariance function Cov(Wl(s),Wl(t))=Sl(s)Sl(t)al(s),st where al(s)=0sdSlsl2Gl. We further have n{S^(x)S(x)} weakly converges to a tight, zero mean Gaussian process W(x)=(W1(x),,WL(x)). Based on Lemma 1 and the statistical properties of S^, we establish the asymptotic properties of the estimator ρ^bsa in the following theorem.

Theorem 1 Assume τ^IpτI, τI*=τI, S^I(τ^I)pSI(τI)=0, for any l, r ∈ {1, · · ·, L }, the proposed estimator ρ^bsa has the following asymptotic properties:

(i) The estimator ρ^bsa is strongly consistent. That is, |ρ^bsaρbsa|0 with probability 1.

(ii) The estimator ρ^bsa has the following weak convergence result,

n(ρ^bsaρbsa)dTS(w),

where w is the zero-mean Gaussian process and IS(W) follows a zero-mean normal distribution.

The proof of Theorem 1 is provided in Appendix 6.2. We note that the assumption τl*=τl implies the upper bounds of the survival time is less than or equal the upper bounds of the censoring time. This condition is required because the asymptotic properties of the Kaplan Meier estimator of the survival function are only valid within [0,τl*]. Our BSA plug-in estimator, which is derived based on the survival function estimators, inherits this limitation. When this assumption is not valid, that is the upper bound of the censoring time is less than that of the survival time, the proposed BSA estimator estimates the broad sense agreement between the survival outcome and the ordinal outcome within the upper bound of the censoring time instead of the entire range of the survival time distribution.

Since BSA estimator is a Hadamard differentiable function of survival function estimator S^, it is theoretically possible to attempt an analytical form for the asymptotic variance of the BSA estimator from the asymptotic covariance of the S^ following functional delta method. However, the analytical expression for the variance of the BSA estimator is technically challenging since the covariance of S^ is already complicated and the variance of BSA estimator involves the covariance of S^ through a complicated function. Therefore, we propose to estimate the variance of BSA estimator using resampling method. The resampling method demonstrated good performance through the simulation studies in section 2.

1.3 |. An alternative estimator based on multiple imputation

Multiple imputation (MI) is a popular technique for handling missing data. Censoring in survival data is a special form of missing data problem. Taylor et al. [20] proposed a multiple imputation method to handle missing event times for censored observations. The idea was to impute missing event time from the estimated distribution of event times amongst those at risk after the censoring time. Their study demonstrated that nonparametric multiple imputation successfully recovered the missing time information in the estimation of survival distributions. Hsu et al. [21] extended Taylor et al. [20]’s work by incorporating auxiliary variables to define the risk sets and performing multiple imputation only within the risk sets. They showed that with either time-independent or time-dependent auxiliary variables, the multiple imputation approach demonstrated similar results in terms of reduced bias due to dependent censoring and improved efficiency as an inverse probability of censoring weighted method.

In this section, we adopt the ideas from Taylor et al. [20] and Hsu et al. [21] by considering the censoring of the continuous X as a missing data problem and propose an alternative estimator for ρbsa based on imputing the censored observations of X. In the following, we present the algorithm to impute a censored observation with X˜=xj, Y = l, δ = 0.

Step 1: Identify the risk set for censored observation (xj, l, 0). Denote the risk set as R(j+|l)={i : X˜i>xj, Yi=l, i= 1,  , n}, which includes all the observations whose Y takes the same value l and whose survival time X˜ is longer than the censored time xj. Note that observations with different Y values other than l are excluded from the risk set.

Step 2: Estimate the distribution of event times among those at risk at the censored time xj using the Kaplan-Meier (KM) estimator based on the risk set R(j+|l). The KM estimator of the survival function of X˜ given X˜=xj and Y = l is denoted as S^j+|l(x). It is easy to see that S^j+|l(x) jumps only at observed X values in R(j+ |l).

Step 3: Impute an event time for xj by drawing random samples from the empirical distribution of event times estimated in step 2. To impute xj, generate a random value from the uniform distribution U (0, 1). Find two neighboring observations xs, xt from the sorted risk set R(j+ |l), so that (1S^j+|l(xs),1s^j+|l(xt)) includes α. The imputed value for xj is defined as xs.

Repeat the above algorithm for all censored observations until they have been imputed. This procedure imputes the censored observations with the observed values unless the largest observation is censored in which case some imputed values may include this largest value. With M sets of imputations, there are M enhanced data sets and hence M BSA estimates with associated jackknife variances, say ρ^bsam and U^bsam, respectively. The imputation-based estimator of BSA is defined as the average of the M imputation-based BSA estimates:

ρ¯bsam=m=1Mρ^bsam/M.

The variance estimator of ρ^bsaM is W=U¯+(1+M1)B ([22]), which consists of two components: the average within-imputation variance, which is the average of variance estimates from the imputed data sets, i.e., U¯=m=1MU^bsam/N, and the between-imputation variance, which is the sample variance of the imputed-data BSA estimates, i.e., B=m=1M{ρ^bsamρ¯bsam}2/(M1). The confidence interval can be constructed using fisher’s z transformation.

To incorporate the full uncertainty in the imputation, a bootstrapping stage can be added when estimating the survival distributions. For Y = l, consider the bootstrap sample {(X1(B),l,δ1(B)),,(Xnl(B),l,δnl(B))} selected with replacement from the original data set. The imputed risk set for the censored time (xj, l, 0) can be redefined as R(B)(j+|l)={i:xi(B)>xj,Yi=I,i=1,,n} to include those observations that are at risk at time xj in the bootstrap sample.

2 |. SIMULATION RESULTS

We conducted Monte Carlo simulations to evaluate the finite sample performance of the proposed methods. Specifically, we compared empirical bias, standard deviation and coverage rates of 95% confidence intervals when sample size, censoring pattern and censoring rate varied in the simulation. We considered two sample sizes: N=60, 120 representing small and moderate sample sizes, respectively. We compared the performance of the proposed methods with two naive methods: Peng et al. [1]’s estimator based on a partial dataset that excludes censored subjects, i.e. {(Xi,Yi,δi) : δi = 1, i = 1, ⋯, n } and Peng et al. [1]’s estimator using the whole dataset while ignoring the censoring to X.

Simulated data was generated as follows. We assumed L = 3 and generated Y from {1, 2, 3} with equal probability. Continuous variable X was generated from a non-normal distribution, Y + Weibull(2, ξ), where the scale parameter ξ was chosen to simulate different levels of BSA between X and Y. Censoring time was generated independently of continuous variable X conditional on Y from a non-normal distribution, πCY +Weibull (2, ξC), where ξC was selected to achieve different censoring percentages and πC controlled the balance of censoring proportions among Y levels. If πC takes the value of 1, censoring rates were the same for different Y. If πC = 0.5, censoring rate was higher for larger Y values. Results in the following tables were based on 500 simulated datasets.

Tables 12 show that when homogeneous censoring proportions of X were presented across y levels, naive methods that either exclude censored observations or treat censored observations as complete observations (referred to as complete obs. and all obs. respectively) produce similar bias for all censoring proportions and their bias increases dramatically as censoring proportion of X increases. For example, with moderate BSA, i.e., ρbsa = 0.550, the bias of naive estimators increases from 10% to 42% of the true value of BSA as censoring proportion increases from 27% to 70%. This leads to a considerably different interpretation of the magnitude of BSA, where ρbsa = 0.550 is considered moderate BSA and ρbsa = 0.8 indicates fairly strong BSA between X and Y. The results also demonstrate that both plug-in method and MI-based methods significantly reduce the bias compared to naive methods. The plug-in method performs best when there is strong BSA and the MI-based methods outperform the plug-in method in moderate BSA scenarios.

TABLE 1.

Simulation results for estimating moderate broad sense agreement (ρbsa = 0.550) when C ~ Weibull+ Y: empirical biases (EmpBias), empirical standard deviation (EmpSD), estimated standard deviation (EstSD), and coverage probabilities of 95% confidence intervals (Cov95).

Equal censoring proportions across Y: C ~ Weibull+ Y
N=60 N=120
Censoring Methods EmpBias EmpSD EstSD Cov95 EmpBias EmpSD EstSD Cov95
Low Comp obs.a 0.054 0.115 0.115 89.8 0.061 0.077 0.880 88.0
(27%,27%,27%) All obs. b 0.060 0.097 0.095 87.8 0.059 0.066 0.067 86.2
Plug-in c 0.050 0.115 0.114 90.8 0.029 0.078 0.081 93.4
MI d −0.020 0.131 0.114 91.0 −0.005 0.083 0.080 94.0
BMI e 0.022 0.132 0.120 93.0 −0.006 0.083 0.083 96.0
Moderate Comp obs. 0.149 0.115 0.117 75.0 0.149 0.074 0.080 65.6
(50%,50%,50%) All obs. 0.143 0.080 0.081 67.4 0.146 0.052 0.056 37.8
Plug-in 0.074 0.115 0.144 94.0 0.047 0.092 0.105 93.2
MI −0.020 0.144 0.119 90.0 −0.006 0.110 0.083 87.2
BMI 0.028 0.144 0.141 93.8 −0.017 0.109 0.100 91.9
Heavy Comp obs. 0.232 0.125 0.129 72.3 0.246 0.075 0.078 35.8
(70%,70%,70%) All obs. 0.239 0.060 0.061 21.0 0.243 0.041 0.042 1.0
Plug-in 0.101 0.143 0.161 91.6 0.075 0.117 0.139 92.6
MI −0.010 0.207 0.116 75.8 −0.032 0.157 0.089 72.5
BMI −0.014 0.193 0.159 88.8 −0.049 0.155 0.137 90.6
Unequal censoring proportions across Y: C ~ Weibull+0.5Y
N=60 N=120
Censoring Methods EmpBias EmpSD EstSD Cov95 EmpBias EmpSD EstSD Cov95
Low Comp obs. 0.035 0.125 0.123 93.2 0.034 0.080 0.083 94.4
(25%, 32%, 38%) All obs. −0.049 0.114 0.113 91.8 −0.047 0.076 0.078 92.2
Plug-in 0.060 0.106 0.114 92.6 0.031 0.076 0.082 94.2
MI −0.008 0.120 0.116 94.2 −0.001 0.086 0.080 91.9
BMI −0.007 0.118 0.120 95.6 −0.003 0.086 0.083 94.0
Moderate Comp obs. 0.065 0.116 0.125 91.4 0.053 0.083 0.087 91.6
(33%, 41%, 49%) All obs. −0.052 0.108 0.113 93.2 −0.056 0.077 0.079 91.4
Plug-in 0.075 0.107 0.120 90.8 0.039 0.085 0.086 90.6
MI 0.002 0.124 0.115 92.8 −0.009 0.084 0.084 94.6
BMI −0.003 0.124 0.126 96.0 −0.011 0.087 0.091 96.6
Heavy Comp obs. 0.138 0.155 0.176 87.9 0.134 0.100 0.107 78.8
(60%, 70%, 78%) All obs. −0.060 0.109 0.114 93.4 −0.061 0.073 0.078 89.4
Plug-in 0.121 0.134 0.158 92.2 0.077 0.107 0.122 92.6
MI −0.006 0.201 0.123 79.4 −0.022 0.158 0.090 77.2
BMI −0.017 0.200 0.166 89.8 −0.030 0.151 0.130 87.2
a

Naive estimator using datasets which exclude censored subjects.

b

Naive estimator using censored observations as observed observations.

c

The proposed plug-in estimator.

d

Kaplan-Meier-based imputation without bootstrap procedure.

e

Kaplan-Meier-based imputation with bootstrap procedure.

TABLE 2.

Simulation results for estimating strong broad sense agreement (ρbsa = 0.827) when C ~ Weibull+ Y: empirical biases (EmpBias), empirical standard deviation (EmpSD), estimated standard deviation (EstSD), and coverage probabilities of 95% confidence intervals (Cov95).

Equal censoring proportions across Y: C ~ Weibull+ Y
N=60 N=120
Censoring Methods EmpBias EmpSD EstSD Cov95 EmpBias EmpSD EstSD Cov95
Low Comp obs.a 0.047 0.048 0.049 87.2 0.045 0.036 0.035 77.8
(27%,27%,27%) All obs. b 0.047 0.040 0.041 86.4 0.046 0.030 0.029 69.4
Plug-in c 0.018 0.067 0.076 94.4 0.011 0.050 0.052 92.4
MI d −0.024 0.077 0.066 92.2 −0.018 0.055 0.047 90.4
BMI e −0.027 0.074 0.077 95.4 −0.015 0.052 0.054 92.6
Moderate Comp obs. 0.099 0.046 0.043 76.2 0.098 0.031 0.030 37.4
(50%,50%,50%) All obs. 0.099 0.031 0.029 33.4 0.098 0.021 0.021 8.2
Plug-in −0.020 0.120 0.130 87.6 −0.009 0.094 0.100 89.2
MI −0.067 0.136 0.077 73.4 −0.056 0.100 0.055 71.3
BMI −0.081 0.131 0.110 84.0 −0.067 0.098 0.087 81.9
Heavy Comp obs. 0.137 0.042 0.034 90.1 0.139 0.026 0.024 31.0
(69%,69%,69%) All obs. 0.137 0.019 0.019 4.4 0.139 0.013 0.013 0.0
Plug-in 0.080 0.141 0.158 80.6 −0.074 0.136 0.142 74.6
MI −0.099 0.154 0.078 59.1 −0.111 0.129 0.061 44.1
BMI −0.114 0.143 0.120 74.0 −0.120 0.117 0.102 62.4
Unequal censoring proportions across Y: C ~ Weibull+0.5Y
N=60 N=120
Censoring Methods EmpBias EmpSD EstSD Cov95 EmpBias EmpSD EstSD Cov95
Low Comp obs. 0.021 0.056 0.058 94.2 0.021 0.041 0.039 92.0
(20%, 30%, 39%) All obs. −0.086 0.079 0.076 77.0 −0.085 0.051 0.052 54.2
Plug-in 0.031 0.053 0.060 93.8 0.022 0.040 0.041 90.4
MI −0.010 0.066 0.061 93.8 −0.004 0.050 0.042 90.3
BMI −0.012 0.066 0.067 95.6 −0.005 0.050 0.046 88.2
Moderate Comp obs. 0.034 0.059 0.063 96.0 0.035 0.040 0.042 87.8
(33%, 44%, 56%) All obs. −0.118 0.078 0.081 57.8 −0.122 0.056 0.057 27.0
Plug-in 0.023 0.073 0.081 94.8 0.020 0.047 0.052 96.0
MI −0.026 0.092 0.068 86.4 −0.019 0.068 0.049 87.2
BMI −0.032 0.092 0.084 91.0 −0.027 0.072 0.063 91.5
Heavy Comp obs. 0.069 0.077 0.083 97.6 0.072 0.050 0.048 80.2
(54%, 70%, 83%) All obs. −0.155 0.085 0.084 37.6 −0.163 0.056 0.059 6.4
Plug-in 0.019 0.104 0.124 94.7 0.015 0.079 0.088 92.2
MI −0.047 0.144 0.074 75.3 −0.038 0.094 0.054 68.1
BMI −0.069 0.144 0.124 87.8 −0.046 0.090 0.090 91.5
a

Naive estimator using datasets which exclude censored subjects.

b

Naive estimator using censored observations as observed observations.

c

The proposed plug-in estimator.

d

Kaplan-Meier-based imputation without bootstrap procedure.

e

Kaplan-Meier-based imputation with bootstrap procedure.

When the censoring rates are heterogeneous across Y levels, the plug-in estimator doesn’t perform very well in the small sample scenario with moderate BSA (Table 1)and has similar or slightly larger bias as compared to the naive methods. But its performance does improve significantly in the moderate sample size case where it generally shows lower bias than the naive methods. The MI methods always outperforms the naive methods, demonstrating much lower bias.

For both homogeneous and heterogeneous censoring rate scenarios, as sample size increases, the bias of plug-in method decreases while the biases of naive methods remain the same or even increase. The proposed variance estimator of plug-in method seems to overestimate the variance when there is heavy censoring. The coverage probabilities of the estimated confidence intervals are close to the nominal level (95%) at most cases. The only exception is when there is heavy homogeneous censoring in the high BSA data, in which cases all methods fail. The variance estimator of MI method without the bootstrap procedure always underestimates the variance. After adding the bootstrap procedure, the variance estimation improves significantly and the constructed confidence intervals have reasonably good coverage probabilities especially for small to moderate censoring rates.

In Table 4, we compare the computation times of different estimation methods for estimating BSA index and jackknife standard error in one simulation. The table shows that the computing costs of Peng et al. [1]’s estimator and MI-based estimators increase dramatically with the increase of sample size. For a sample size of 240, the computing time for Peng et al. [1]’s estimator is 4077 seconds while the computing time for the proposed plug-in method is only 5 seconds. The computing time of both MI-based methods are very similar and is nine times (which is the number of imputations in each simulation) longer than that of Peng et al. [1]’s estimator. Given the results in Table 4, the plug-in method is more appropriate for large sample size datasets and the MI-based method may be used for small sample size data.

TABLE 4.

Surgical ICU patient study: estimates of ρbsa by different methods and associated standard errors (SE) and 95% confidence intervals (95% CI).

Method ρ^bsa SE 95%CI
Comp obs.a −0.023 0.103 (−0.221, 0.177)
All obs.b 0.053 0.153 (−0.243, 0.340)
Plug-in c 0.298 0.122 (0.045, 0.515)
MI d 0.276 0.109 (0.052, 0.474)
BMI e 0.269 0.120 (0.022, 0.485)
a

Naive estimator using datasets which exclude censored subjects.

b

Naive estimator using censored observations as observed observations.

c

The proposed plug-in estimator.

d

Kaplan-Meier-based imputation without bootstrap procedure.

e

Kaplan-Meier-based imputation with bootstrap procedure.

3 |. AN APPLICATION TO A SURGICAL ICU PATIENTS STUDY

Acute Physiology and Chronic Health Evaluation II, often known as APACHE II, is a severity-of-disease classification system which has been extensively used in intensive care unit (ICU) to assess the morbidity of patients and stratify risk of death ([23]). Many literatures have shown a significant correlation between APACHE-II score and the probability of hospital-related mortality as well as hospital-acquired infections ([24],[25]). However, we sometimes observe that severely sick patients die or acquire infections shortly after ICU or hospital discharge. In this case, survival endpoints, such as progression free survival (PFS), can be adopted as an alternative way to assess the risk of hospital-related mortality and infections.

In this study, 150 patients requiring postoperative surgical ICU (SICU) care were enrolled. PFS was measured as the time from first day in SICU to death or first severe infection. PFS was censored by the hospital discharge date if no events happened during the hospital stay. Six patients who had preconditioned bloodstream infections (BSI) and lower respiratory tract infections(LRI) were excluded from the analysis. For the rest 144 patients, 24 hospital-related deaths were observed, and 66 hospital-acquired infection incidences were observed, which included 30 BSI and 36 LRI. Eighty three patients were observed to be event-free during their hospital stays since SICU. Two risk groups are determined based on clinical guideline ([26], [17]) using APACHE-II score calculated upon admission to the SICU: APACHE II score 0–24 correspond to low risk group; and APACHE II 25 correspond to high risk group. Our main interest was to study the relationship between APACHE-II risk groups and PFS, in which PFS was defined as a composite endpoint of time to first infection/death.

Table 4 presents the estimated ρbsa and the associated standard errors and confidence intervals. We assume that we assume that hospital discharge is a random censoring event to PFS. Fisher’s z-transformation is used to compute confidence intervals. Naive estimates for ρbsa directly using the whole dataset without adjusting for censoring or using only observed subjects, are close to 0, which suggests no beyond-chance agreement between risk group and PFS. However, estimates obtained using plug-in method or both multiple imputation methods are about 0.27–0.30, which indicates fair agreement. The associated confidence intervals exclude 0 for all three proposed methods. We conclude that there is significant broad sense agreement between risk group and PFS but the magnitude of the BSA is in the low range. Since the censoring is related to hospital discharge, there is a possibility that the upper bound of the censoring time may be less than the upper bound of the PFS time depending on the hospital discharge policy. If that is the case, the estimated BSA measures the broad sense agreement between PFS and the APACHE-II within the upper bound of the hospital discharge time. We also note that the independent censoring assumption may be questionable for the data example since censoring is related to hospital discharge. This may affect the accuracy of the proposed estimator and inference procedure. Deeper investigation is needed in future studies to fully study the impact when the independent censoring assumption is violated. Another limitation of the application is that APACHE was developed to mainly address in-hospital morbidity and mortality while the PFS measured in the current study captures both in hospital morbidity and mortality (before discharge) as well as potentially out of hospital events. For future studies with APACHE score, it will be helpful to define and measure a more meaningful endpoint that would restrict to events before discharge.

4 |. CONCLUSION

In this paper, we propose novel estimation methods for broad sense agreement index when the continuous measure is subject to censoring. Specifically, we propose a plug-in method based on the conditional survival distributions, which is computationally efficient and has desirable theoretical properties. In addition, we propose another estimation method for smaller data sets using multiple imputation techniques. We develop inference procedures for the proposed estimators and demonstrate the small-sample performance of the proposed methods via simulation studies.

The application of the proposed plug-in estimator is not limited to the case when the continuous measurement is subject to censoring. In fact, the new plug-in estimator can be a useful alternative to Peng et al. [1]’s estimator when the sample size is large. In particular, the plug-in method can provide a much more computationally efficient estimation of the BSA by dramatically reducing the computation time required by Peng et al. [1]’s estimator.

5 |. DATA AVAILABILITY STATEMENT

Data sharing is not applicable to this article as no new data were created or analyzed in this study.

TABLE 3.

Comparing computation times (in seconds) for the estimation of BSA index and jackknife standard error in one simulation using different estimation methods.

N Peng et al. [1]’s method Plug-in method MI method BMI method
60 12 0.9 117 118
120 247 2 2480 2486
240 4077 5 40593 40611

Funding information

NIH, Grant/Award Number:R01

MH118771,R01MH079448,R01MH105561,UL1TR002378

6 |. APPENDIX

6.1 |. Proof of Lemma 1

Proof First, we give the definition of Hadamard differentiability (Gill, 1989; Wellner, 1989). A function T : SR is Hadamard differentiable at S with respect to the Kolmogorov distance if there exists T˙(S;) continuous and linear satisfying |T(SX)T(S)T˙(S;SXS)||x|=o(1) for all Sx satisfying ∥x−1(SxS) − Δ∥→ 0 for some function Δ.

For any SlxSl, where l = 1, ⋯, L, define αlx(SlxSl)/x. For Hadamard differentiability we have αlxαl with respect to ∥·∥ for some (bounded) function αl. Denote α=(α1,,αL). Define

T˙(S;α)2r=1L1l=r+1L(lr)(αrdSlαldSr)(L3L)/6

We have

T(S1x,,SLx)T(S1,,SL)xT˙(α1x,,αLx)=2r=1L1l=r+1L(lr)xαlxdαrx(L3L)/6=2r=1L1l=r+1L(lr)[αld(SrxSr)+(αlxαl)d(SrxSr)](L3L)/6

Since T˙ is continuous, it suffices to show that the right side converges to 0. For any l, r ∈ {1, ⋯, L } and l > r,

(αlxαl)d(SrxSr)αlxαl{dSrx+dSr}2αlxαl0

Fix ϵ > 0, since the limit function αl is right continuous with left limits, there is a step function with a finite number m of jumps, say α˜, which satisfies αα˜<ϵ. Thus,

|αld(SrxSr)||(αlα˜l)d(SrxSr)|+|α˜ld(SrxSr)|2αlα˜l+j=1m|α˜l(xj1)(SrxSr)[xj1,xj)|2ϵ+2mα˜lSrxSr2ϵ.

Since ϵ is arbitrary, this completes the proof that T is Hadamard differentiable.

6.2 |. Proof of theorem 1

Proof Before proving Theorem 1, we first show

n|ρ^bsaT(S^1,,S^L)|p0 (5)

Consider the difference between nρ^bsa and nT(S^1,,S^L),

n|ρ^bsaT(S^1,,S^L)|=n|2r=1L1l=r+1L(lr)0τ^lτ^rS^ldS^r(L3L)/612r=1L1l=r+1L(lr)0S^ldS^r(L3L)/6+1|=2n(L3L)/6|r=1L1l=r+1L(1r)0τ^l^τ^rS^ldS^rr=1L1l=r+1L(lr)0S^ldS^r|2n(L3L)/6r=1L1l=r+1L(1r)|τ^lτ^rS^ldS^r| (6)

Given the assumptions that Sl(τ^l)pSl(τl)=0 and Sr(τ^r)pSr(τr)=0 and the strong uniform consistency of the KM estimators S^l and S^r, the right-hand side of inequality (6) converges to zero in probability.

Next, we prove (i), (ii) of the theorem. The result of T(S^1,,S^L)pρbsa=T(S1,,SL) follows the continuity of the Hadamard differentiable function T and the uniform strong consistency of Kaplan Meier estimators S^. Then ρ^bsapρbsa via Equation (5). Thus, the statement (i) is true.

According to Equation (5), to prove (ii) is equivalent to show that

n(T(S^1,,S^L)T(S1,,SL))dT{S1,,SL}(W),asn (7)

It’s shown that n{S^S} weakly converges to a tight, zero mean Gaussian process W. The functional T is proven to be Hadamard-differentiable. Then, statement (7) is true according to the functional delta method (van der Vaart and Wellner, 1996). Because W is a tight Gaussian process, the derivative T{S1,,SL}(W) is normally distributed. Thus, the statement (ii) of the theorem is proven true.

references

  • [1].Peng L, Li R, Guo Y, Manatunga A. A Framework for Assessing Broad Sense Agreement Between Ordinal and Continuous Measurements. Journal of the American Statistical Association 2011;106(496):1592–1601. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [2].Cohen J. A Coefficient of Agreement for Nominal Scales. Educational and Psychological Measurement 1960;20(1):37–46. [Google Scholar]
  • [3].Fleiss JL. Measuring Nominal Scale Agreement Among Many Raters. Psychological Bulletin 1971;76(5):378—–382. [Google Scholar]
  • [4].Kraemer HC. Extension of the Kappa Coefficient. Biometrics 1980;36(2):207—–216. [PubMed] [Google Scholar]
  • [5].Williamson JM, Lipsitz S, Manatunga AK. Modeling Kappa for Measuring Dependent Categorical Agreement Data. Biostatistics 2000;1(2):191–202. [DOI] [PubMed] [Google Scholar]
  • [6].Barnhart HX, Williamson JM. Weighted Least-Squares Approach for Comparing Correlated Kappa. Biometrics 2002;58(4):1012—–1019. [DOI] [PubMed] [Google Scholar]
  • [7].Guo Y, Manatunga AK. Modeling the Agreement of Discrete Bivariate Survival Times using Kappa Coefficient. Lifetime Data Analysis 2005;11(3):309–332. [DOI] [PubMed] [Google Scholar]
  • [8].Guo Y, Manatunga AK. A note on assessing agreement for frailty models. Statistics and Probability Letters 2010;80(7–8):527–533. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [9].Lin LI. A concordance correlation coefficient to evaluate reproducibility. Biometrics 1989;45(1):255–268. [PubMed] [Google Scholar]
  • [10].Lin LIK. Assay Validation Using the Concordance Correlation Coefficient. Biometrics 1992;48(2):599–604. [Google Scholar]
  • [11].Lin L, Hedayat A, Sinha B, Yang M. Statistical Methods in Assessing Agreement Models, Issues and Tools. Journal of the American Statistical Association 2002;97(457):257–270. [Google Scholar]
  • [12].Lin L, Hedayat A, Wenting W. A Unified Approach for Assessing Agreement for Continuous and Categorical Data. Journal of Biophar-maceutical Statistics 2007;17(4):629–652. [DOI] [PubMed] [Google Scholar]
  • [13].Quiroz J. Assessment of Equivalence Using a Concordance Correlation Coefficient in a Repeated Measurement Design. Journal of Biopharmaceutical Statistics 2005;15(6):913–928. [DOI] [PubMed] [Google Scholar]
  • [14].King TS, Chinchilli VM, Carrasco JL, Wang K. A Class of Repeated Measures Concordance Correlation Coefficients. Journal of Biopharmaceutical Statistics 2007;17(4):653–672. [DOI] [PubMed] [Google Scholar]
  • [15].Janson H, Olsson U. A Measure of Agreement for Interval or Nominal Multivariate Observations. Educational and Psychological Measurement 2001;61(2):277–289. [Google Scholar]
  • [16].Musselman DL, Lawson DH, Gumnick JF, Manatunga AK, Penna S, Goodkin RS, et al. Paroxetine for the Prevention of Depression Induced by High-Dose Interferon Alfa. The New England Journal of Medicine 2001;344(13):961–966. [DOI] [PubMed] [Google Scholar]
  • [17].Bouch DC, Thompson JP. Severity scoring systems in the critically ill. Continuing Education in Anaesthesia Critical Care 2008;8(5):181–185. [Google Scholar]
  • [18].Saleh A, Ahmed M, Sultan I, Abdel-Lateif A. Comparison of the mortality prediction of different ICU scoring systems (APACHE II and III, SAPS II, and SOFA) in a single-center ICU subpopulation with acute respiratory distress syndrome. Egyptian journal of chest diseases and tuberculosis 2015;64(4):843–848. [Google Scholar]
  • [19].Wei B, Dai T, Peng L, Guo Y, Manatunga AK. An Alternative Representation of Broad Sense Agreement for Complete Data. Technical Report 2018;18–01. [Google Scholar]
  • [20].Taylor JM, Hsu CH, Murray S. Survival estimation and testing via multiple imputation. Statistics & Probability Letters 2002;58(3):221–232. [Google Scholar]
  • [21].Hsu CH, Taylor JM, Murray S, Commenges D. Survival analysis using auxiliary variables via non-parametric multiple imputation. Statistics In Medicine 2006;25(20):3503–3517. [DOI] [PubMed] [Google Scholar]
  • [22].Rubin DB, Schenker N. Multiple imputations in health-care database: an overview and some applications. Statistics in Medicine 1991;10(4):585–598. [DOI] [PubMed] [Google Scholar]
  • [23].Knaus WA, Draper EA, Wagner DP, Zimmerman JE. APACHE II: a severity of disease classification system. Critical Care Medicine 1985;13(10):818–829. [PubMed] [Google Scholar]
  • [24].Wisplinghoff H, Seifert H, Wenzel RP, Edmond MB. Inflammatory response and clinical course of adult patients with nosocomial bloodstream infections caused by Candida spp. Clinical Microbiology and Infection 2006;12(2):170–177. [DOI] [PubMed] [Google Scholar]
  • [25].Naved SA, Siddiqui S, Khan FH. APACHE-II Score Correlation With Mortality And Length Of Stay In An Intensive Care Unit. Journal of the College of Physicians and Surgeons–Pakistan: JCPSP 2011;21(1):4–8. [PubMed] [Google Scholar]
  • [26].Greenwood B, Szumita PM, Levy H, Lilly CM. Error rates among clinical pharmacists in calculating the APACHE II score. Pharmacotherapy 2007;27(2):285–289. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

Data sharing is not applicable to this article as no new data were created or analyzed in this study.

RESOURCES