Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2020 Nov 20.
Published in final edited form as: Stat Theory Relat Fields. 2019 Mar 19;2019:10.1080/24754269.2019.1587692. doi: 10.1080/24754269.2019.1587692

Interval Estimation for Minimal Clinically Important Difference and its Classification Error via a Bootstrap Scheme

Zehua Zhou 1, Jiwei Zhao 1,*, Melissa Kluczynski 2
PMCID: PMC7678023  NIHMSID: NIHMS1536634  PMID: 33225201

Abstract

With the improved knowledge on clinical relevance and more convenient access to the patient-reported outcome data, clinical researchers prefer to adopt minimal clinically important difference (MCID) rather than statistical significance as a testing standard to examine the effectiveness of certain intervention or treatment in clinical trials. A practical method to determining the MCID is based on the diagnostic measurement. By using this approach, the MCID can be formulated as the solution of a large margin classification problem. However, this method only produces the point estimation, hence lacks of ways to evaluate its performance. In this paper we introduce an m-out-of-n bootstrap approach which provides the interval estimations for MCID and its classification error, an associated accuracy measure for performance assessment. A variety of extensive simulation studies are implemented to show the advantages of our proposed method. Analysis of the chondral lesions and meniscus procedures (ChAMP) trial is our motivating example and is used to illustrate our method.

Keywords: Minimal clinically important difference, Classification error, Confidence interval, Non-convex optimization, Bootstrap, m-out-of-n bootstrap

1. Introduction

Statistical significance is widely reported in clinical studies to infer treatment effect. For instance, in a randomized controlled trial to compare debridement to observation of chondral lesions encountered during partial meniscectomy (Bisson et al., 2017), the difference of the patient outcomes before surgery and one year after is used to assess the existence of any statistically significant effect.

Although this framework based on a threshold of the p-value objectifies the research outcome, solely relying on it can have two potentially serious consequences. First, statistical significance only signifies the existence of treatment effect, no matter how large is the effect size. The statistical significance could result from a huge sample size, hence may clinically irrelevant to the patients at all. Second, a clinically importance effect could be classified as statistically non-significant due to various reasons, say, the small sample size in the study, hence be unfairly ignored. In brief statistical significance does not necessarily imply clinical importance, and vise versa.

Over the years clinical investigators are realizing that the determination of a treatment’s clinical importance is much more valuable and reliable than merely seeking its statistical significance. Also, the development of various patient-rated instruments contributes huge amounts of patient-reported outcome (PRO) data, which provides the researchers with chances to study the clinical relevance. In order to study clinical importance, Jaeschke et al. (1989) proposed the concept of minimal clinically importance difference (MCID). It is defined as the smallest change in an outcome that an individual patient would identify as important, therefore offers a threshold above which outcome is experienced as relevant by the patients. This avoids the problem of mere statistical significance (Wright et al., 2012). The MCID provides objective reference for clinicians and health policy makers regarding the effectiveness of the treatment, hence has quickly gained its popularity (McGlothlin and Lewis, 2014; Erdogan et al., 2016).

A variety of methods have been proposed to calculate the MCID. The anchor based method compares the changes in scores with an anchor as the reference. A popular anchor is the anchor question in the questionnaire. For instance, the short form (SF36) health survey (Ware Jr and Sherbourne, 1992) serves this role in the ChAMP trial study (Bisson et al., 2017; Kluczynski et al., 2017; Bisson et al., 2018, 2019). Hedayat et al. (2015) adopted this anchor based method and formulated the MCID as the threshold value in post-treatment change such that the probability of disagreement between the estimated satisfaction based on the MCID and the PRO is minimized.

Although the proposal in Hedayat et al. (2015) possesses the statistical rigor and paves the way for potential extension, it has some limitations. First, Hedayat et al. (2015) relies on a testing data set with a very large sample size to have their method implemented and assessed; however, the sample size of a clinical study is usually much smaller, therefore such a testing data set is not available for real application. Second, Hedayat et al. (2015) only provides a point estimation for the MCID, which is not informative enough in most clinical studies (Cook, 2008; Erdogan et al., 2016). Without an interval estimation, it is unknown how accurate this point estimation is. Furthermore without an interval estimation, we have no idea how to compare multiple MCIDs derived for different population subgroups, hence the population heterogeneity could not be learned.

In this paper, we aim on solving the problems mentioned above and filling in this gap in the literature. We first introduce the concept of classification error to gauge the effectiveness of the MCID. More importantly, this concept also allows us to compare MCIDs derived for different population subgroups or computed using different methods. Second using the m-out-of-n bootstrap technique, we obtain an interval estimation of the MCID, and also that of the classification error. The interval estimation makes it possible to conduct statistical inference on the MCID. It also allows us to fully learn the population heterogeneity based on the MCID.

Our proposal has two distinct features. First, different from Hedayat et al. (2015), our framework does not rely on a testing data set with a large sample size, hence can be conveniently used in various clinical studies. Second, although the bootstrap has already been a well known and established statistical technique since Efron (1979), we stress that its conventional version can not be directly applied in our context due to the restrictive conditions for it to be valid. Instead, the one we adopt is the m-out-of-n bootstrap with its theoretical properties justified in Shao (1994), Shao (1996), Bickel et al. (1997) and among others.

In the remainder of this paper, we first introduce our motivating example, the ChAMP trial study in Section 2. Then in Section 3 we review the concept of MCID and introduce that of the classification error. Our methodology, including both simple linear and nonparametric kernel MCIDs and the bootstrap scheme to compute the confidence interval, is presented in Section 4. We show the finite sample performance of our proposed method through simulation studies in Section 5 and apply our method to the ChAMP trial study in Section 6. The mathematical details are contained in the Appendix.

2. Motivating Example: ChAMP Trial

Our motivating example is the chondral lesions and meniscus procedures (ChAMP) trial that examines whether the presence of the chondral lesions surrounding the knee cartilage affects patients’ recovery from the arthroscopic partial meniscectomy (APM) (Bisson et al., 2017). In the field of orthopaedics, APM is one of the most common treatment options to repair the knee damage especially for the patients with meniscus tear. During the operations, however, the surgeons can often find the additional knee damages in the form of chondral lesions. The effect of these chondral lesions on patients’ post-operative outcomes are unclear, and whether these lesions need to be treated by debridement remains an open question. Thus the ChAMP trial is designed to help the clinical physicians better understand this relation and provide them reasonable suggestions on preoperative evaluation and treatment option.

This study enrolled eligible patients who were ≥ 30 years old, diagnosed with a symptomatically consistent meniscus tear by magnetic resonance imaging, and underwent APM. Of the subjects who enrolled, 190 patients with surgically significant chondral lesions were randomized to receive debridement (CL-Deb group; n=98) or observation (CL-noDeb group; n=92). Outcome measures include the Western Ontario and McMaster Universities Osteoarthritis Index (WOMAC) and the SF-36 health survey. Each outcome was evaluated at baseline and 1 year postoperatively. The demographic data such as age and sex at baseline and surgical data including the location and type of meniscal tears were also collected.

The major goal of the study is to assess whether and how the debridement group is different from the observation group in terms of the change of the WOMAC pain score from enrollment to one year after surgery, and how it relates to other covariate variables and clinical biomarkers. In our investigation, we focus on the study of the interval estimation of the MCID and its classification error. We use an anchor based method to compute the MCID and the anchor question we use is in the SF-36 health survey.

3. The MCID and the Classification Error

In the ChAMP trial, we denote each patient’s reported outcome in the SF-36 health survey as a binary variable, where Y = 1 if the patient reports a better health condition after the surgery and Y = −1 otherwise. The difference of each patient’s WOMAC pain score from baseline to one year after surgery, denoted as X, is treated as the patient’s diagnostic measurement. Let a p-dimensional covariate Z be the patient’s clinical profile with ZRp.

It is reasonable and of interest to consider the MCID c* as a function of the patient’s clinical profile, c*(z). The population heterogeneity could also be learned from the knowledge of c*(z). According to Hedayat et al. (2015), the c*(z) is defined as the minimizer of

P[Ysign{Xc(Z)}]=12E[1Ysign{Xc(Z)}], (1)

where E is the expectation taken with respect to (X, Y, Z) and sign(∙) is the standard sign function. Given independent and identically distributed observations {(xi, yi, zi), i = 1, ... , n}, the empirical version of the objective function in (1) becomes

12ni=1n[1yisign{xic(zi)}], (2)

which involves the 0-1 loss function L01(u)=12(1sign(u)). The direct minimization of (2) is infeasible. In this paper, we follow Hedayat et al. (2015) to approximate the L01 function with the non-smooth ramp loss function. Note that the non-smooth ramp loss is defined as

Lδ(u)={1u0,1uδ0<uδ,0u>δ,}

where δ > 0 is a scalar factor. As δ → 0, Lδ(∙) → L01(∙). Then our objective function becomes

1ni=1nLδ{yi(xic(zi))}. (3)

Due to the non-convexity of the non-smooth ramp loss, the optimization problem in (3) requires nonconvex minimization. Note that we can write Lδ(u) = L1(u) − L2(u), where both L1(u)=1δ(δu)+ and L2(u)=1δ(u)+ are convex functions. Hence we apply the difference of convex (DC) algorithm (Thi Hoai An and Dinh Tao, 1997) to minimize (3), which has the form

1ni=1nL1{yi(xic(zi))}1ni=1nL2{yi(xic(zi))}. (4)

Figure 1 below illustrates the relation among the L01, Lδ, L1 and L2 loss functions.

Figure 1:

Figure 1:

Figure 1:

The illustration of the 0-1 loss L01, its surrogate non-smooth ramp loss Lδ, and the DC decomposition Lδ = L1L2.

Once the minimizer c^ is obtained, we can validate whether the debridement or the observation to the chondral lesions for each patient is indeed prescribed correctly. This is essential since it provides knowledgeable advice in future surgical practice for new patients. To re-evaluate whether the treatment is offered appropriately, we need a statistical measure to quantify the discrepancy between the patient’s PRO Y0 and its dichotomous diagnostic measure from learning his/her MCID sign{X0c(Z0)}. Here, to avoid confusion, we use a generic notation {(Y0, X0, Z0)} to represent any patient that would like to be validated. This results in

E0[1{Y0sign{X0c^(Z0)}}], (5)

where E0 is the expectation taken with respect to {(Y0, X0, Z0)}. While there are other alternative measures to study, this error (5), usually called the classification error, or the test error, is popularly used in the statistical machine learning literature.

The estimation for the classification error is not a trivial task (Laber and Murphy, 2011). In this paper, we concentrate on creating confidence intervals for both the MCID and the classification error.

4. Methodology

In this section, we first present detailed algorithms to calculating MCID. We consider both simple linear MCID and its nonparametric kernel counterpart. We then introduce our bootstrap scheme to construct confidence intervals for MCID and the classification error.

4.1. Algorithms for MCID

It is of particular interest to clinicians if the MCID has a comprehensible structure. For instance, c(Z) = α + βTZ, where Z could include the treatment variable, some of the demographic variables and clinical biomarkers. This is the simple linear MCID we consider below. On the other hand, although easily interpretable, a linear structure suffers from the model misspecification issue, thereby yields a solution which may not achieve optimal performance. Therefore we also consider the nonparametric kernel MCID adopting the reproducing kernel Hilbert space framework.

4.1.1. A Simple Linear MCID

We assume c(Z) = α + βTZ. We add a penalty term λ2βTβ in (4) to avoid model overfitting. Let ω = (α, βT)T, then the objective function in (4) is s(ω) = s1(ω) − s2(ω), where

s1(ω)=1ni=1n[1δ{δyi(xiαβTzi)}+]+λ2βTβ,s2(ω)=1ni=1n[1δ{yi(xiαβTzi)}+].

It is an iterative algorithm to minimize (4). Let ω^(k) be the estimator of ω at the kth iteration. We first approximate s2(ω) with its affine minorization function s2(ω^(k))+ωω^(k),s2(ω^(k)), where s2(ω^(k)) is the subgradient of s2(ω) at ω^(k),

s2(ω^(k))=(1nδi=1nyi1{yi(xiα(k)β(k)Tzi)<0}1nδi=1nyizi1{yi(xiα(k)β(k)Tzi)<0}).

Hence

ω^(k+1)=argminωs1(ω)ωTs2(ω^k),=argminωs1(ω)1nδi=1nyi(α+βTzi)1{yi(xiα(k)β(k)Tzi)<0}. (6)

To solve (6), we derive its dual problem by using the slack variable technique. The details are retained in the Appendix A. It shows that, we arrive at the dual problem

minττTQτ{b+2Qt1(α(k),β(k))}Tτ, (7)

subject to 0 ≤ τi ≤ 1 and i=1n1δyi{τit1,i(α(k),β(k))}=0. This optimization problem only has simple box constraints hence can be solved by any quadratic programming method.

We conclude the algorithm by presenting how α^ and β^ can be computed. The Karush-Kuhn-Tucker (KKT) conditions associated with the optimization problem (7) are respectively

1nλδi=1nyi{t1,i(α(k),β(k))τi}zi=β, (8)
1τi0, (9)
τi{ξi1+1δyi(xiαβTzi)}=0, (10)
ξi0, (11)
ξi1+1δyi(xiαβTzi)0, (12)
(1τi)ξi=0. (13)

Therefore we have three scenarios to discuss depending on the magnitude of τi: if τi = 0, by (12) and (13) we get ξi = 0 and 1δyi(xiαβTzi); if 1 > τi > 0, by (10) and (13) we have ξi = 0 and 1δyi(xiαβTzi)1=0; if τi = 1, by (10), (12) and (13) we have 1δyi(xiαβTzi)1=ξi0. Therefore we can summarize the KKT conditions more concisely as

1δyi(xiαβTzi)1=ξi0ifτi<1,1δyi(xiαβTzi)1=ξi0ifτi>0,

which implies

1δyi(xiαβTzi)1=0if0<τi<1. (14)

Hence through (8) we can estimate β(k+1) as

β^(k+1)=1nλδi=1nyi{t1,i(α(k),β(k))τ^i}zi,

and through (14) we can estimate α(k+1) as

α^(k+1)=1{i:0<τi<1}i:0<τi<1(xiδyiβ^(k+1)Tzi).

Finally we achieve the estimators α^ and β^ after the iterative process (6) converges. For any new patient with data znew, his/her predicted linear MCID is

c^(znew)=α^+β^Tznew.

4.1.2. A Nonparametric Kernel MCID

Define a feature vector φi = φ(zi) for the profiles of the ith patient in the enlarged feature space. We can specify a continuous, symmetric and positive-semidefinite kernel function K corresponding to the inner product in the mapping φ, that is, K(zi, zj) = ⟨φi, φj⟩. Then we have c(z) = w + h(z) with wR and h(z)HK, where HK is the reproducing kernel Hilbert space (RKHS) with a kernel function K(∙, ∙). The norm in HK, denoted by ∥ ∙ ∥K, is induced by the following inner product:

f,gK=i=1nj=1mviujK(zi,zj),

where f()=i=1nviK(,zi) and g()=j=1mujK(,zj).

Following the representation theorem (Kimeldorf and Wahba, 1971), the nonparametric kernel MCID can be expressed as c(z)=w+j=1nvjK(z,zj). Let η = (w, vT)T = (w, v1, . . . , vn)T, then the objective function in (4) is h(η) = h1(η) − h2(η), where

h1(η)=1ni=1n[1δ{δyi(xiwj=1nvjK(zi,zj))}+]+λ2i,j=1nvivjK(zi,zj),h2(η)=1ni=1n[1δ{yi(xiwj=1nvjK(zi,zj))}+].

Similar to the linear case, it is an iterative algorithm to minimize (4). Let η^(k) be the estimator of η at the kth iteration. We first approximate h2(η) with its affine minorization function h2(η^(k))+ηη^(k),h2(η^(k)), where h2(η^(k)) is the subgradient of h2(η) at η^(k),

h2(η^(k))=(1nδi=1nyi1{yi(xiw(k)j=1nvj(k)K(zi,zj))<0}1nδi=1nyiK(zi,z1)1{yi(xiw(k)j=1nvj(k)K(zi,zj))<0}1nδi=1nyiK(zi,zn)1{yi(xiw(k)j=1nvj(k)K(zi,zj))<0}).

Consequently

η^(k+1)=argminηh1(η)(ηTh2(η^(k)),=argminηh1(η)1nδi=1nyi(w+j=1nvjK(zi,zj))1{yi(xiw(k)j=1nvj(k)K(zi,zj))<1}. (15)

Similar to the linear case, we use the slack variable technique and we reach the dual problem

minτ(τTQτ{d+2Qt2(w(k),v(k))}Tτ, (16)

subject to 0τi1 and i=1n1δyi{τit2,i(w(k),v(k))}=0 . This optimization problem only have simple box constraints hence can be solved by any quadratic programming method.

We conclude the algorithm by presenting how w^ and v^ can be computed. The Karush-Kuhn-Tucker (KKT) conditions associated with the optimization problem (16) are respectively

1nλδyi{t2,i(w(k),v(k))τi}ϕi=vi, (17)
1τi0, (18)
τi{ξi1+1δyi(xiwj=1nvjK(zi,zj))}=0, (19)
ξi0, (20)
ξi1+1δyi(xiwj=1nvjK(zi,zj)0, (21)
(1τi)ξi=0. (22)

Three scenarios can be consequently discussed based on the magnitude of τi: if τi=0, by (21) and (22) we have ξi=0 and 1δyi(xiwj=1nvjK(zi,zj))10; if 1>τi>0, by (19) and (22) we obtain ξi=0 and 1δyi(xiwj=1nvjK(zi,zj))1=0; if τi=1, by (19), (21) and (22) we obtain 1δyi(xiwj=1nvjK(zi,zj))1=ξi0. Now we can summarize the KKT conditions more concisely as

1δyi(xiwj=1nvjK(zi,zj))1=ξi0ifτi<1,1δyi(xiwj=1nvjK(zi,zj))1=ξi0ifτi>0,

which implies

1δyi(xiwj=1nvjK(zi,zj))1=0if0<τi<1. (23)

Thus for i=1,,n,vi(k+1) can be estimated via (17) as

v^i(k+1)=1nλδyi{t2,i(w(k),v(k))τ^i}ϕi

and w(k+1) can be estimated via (23) as

w^(k+1)=1{i:0<τi<1}i:0<τi<1(xiδyij=1nv^jK(zi,zj)).

We can get the estimators w^ and v^ after the iterative process (15) converges. As a result, for any new patient with data znew, his/her predicted nonparametric kernel MCID is c^(znew)=w^+j=1nv^jK(znew,zj). Note that the most common use of the nonparametric kernel function is the Gaussian radial basis function K(z1, z2) = exp(−∥z1z22/2σ2), where σ is a positive scale parameter. If K(z1,z2)=z1Tz2, the nonparametric kernel case will reduce to the simple linear case.

4.2. Bootstrap Procedure for MCID and the Classification Error

A point estimation alone does not allow one to quantify its uncertainty hence limits the usefulness of the MCID and its classification error in real applications. Hedayat et al. (2015) postulates a testing data set with a very large sample size to quantify the effectiveness of their MCID in numerical studies, but such a testing data set is usually infeasible in clinical studies. Resampling method with the boostrap as a representative, on the other hand, could serve as a tool to construct confidence interval for an estimand under these situations.

To appropriately use bootstrap, we have to be cautious on the regularity conditions under which the theoretical properties can be justified (Bickel et al., 1997). If these conditions are not satisfied, the conventional bootstrap has to be properly modified. The objective function to be minimized in (4) and the one in (5) are generally non-smooth functions. If there exists a non-negligible probability concentrated at the discontinuous points of the objective function, that is, P(X0c^(Z0)=0)>0, it is called the irregular case under which Shao (1994) showed that the conventional bootstrap is inconsistent. Instead, we propose to adopt the m-out-of-n bootstrap, a general method for remedying bootstrap inconsistency due to non-smoothness, and theoretically justified in Shao (1994, 1996); Bickel et al. (1997) and references therein.

The m-out-of-n bootstrap is the conventional nonparametric bootstrap except that the resample size, historically denoted as m, is of a smaller order compared to the original sample size n. That is, m = mn → ∞ and m/n → 0 (or m log log n/n → 0) as n → ∞ (Shao, 1994). The intuition of the m-out-of-n bootstrap is to let the empirical distribution tend to the true generative distribution at a faster rate, and essentially this allows the empirical distribution to reach its limit faster hence the bootstrap samples are drawn as if they were from the true generative distribution. Intuitively the requirement of m/n → 0 (or m log log n/n → 0) is consistent with Hedayat et al. (2015) who required a very large sample size for their testing data. To some extent, in the m-out-of-n bootstrap, those nm subjects serve the role of the testing sample (Shao, 1996). Practically m is usually chosen as m = nκ for some κ < 1. In our numerical studies, we choose κ = 0.9. Our algorithm is detailed below.

For b = 1, . . . , B, we generate bootstrap samples with size m as {(xj(b),yj(b),zj(b)),j=1,,m}. The MCID can be derived as c^(b) based on the method in Section 4. To be more specific, the simple linear MCID is c^l(b)=α^(b)+β^(b)Tz0 and the nonparametric kernel MCID is c^n(b)=w^(b)+i=1nv^i(b)K(z0,zi(b)), where z0 denotes the profile for a new patient. Accordingly, the classification error based on MCID c^(b) is computed as

err^(b)=1mj=1m1{yj(b)sign(xj(b)c^(zj(b)))}.

Similarly err^(b) can also be distinguished as err^l(b) and err^n(b) for simple linear and nonparametric kernel cases.

We repeat the above procedure in total B times. If we approach the simple linear MCID, we obtain {(α^(b),β^(b),c^l(b),err^l(b)},b=1,,B. Let l^α and u^α be the α/2-th and (1 − α/2)-th quantiles of {α^(b),b=1,,B}, l^β and u^β be the α/2-th and (1 − α/2)-th quantiles of {β^(b),b=1,,B}, l^cl and u^cl be the α/2-th and (1−α/2)-th quantiles of {c^l(b),b=1,,B}, and l^errl and u^errl be the α/2-th and (1 − α/2)-th quantiles of {err^(b),b=1,,B}. Thus, the (1 − α)-th confidence interval of α is given by [l^α,u^α], the (1 − α)-th confidence interval of β is given by [l^β,u^β], (1 − α)-th confidence interval of cl is given by [l^cl,u^cl], and (1 − α)-th confidence interval of err is given by [l^errl,u^errl]. If we approach the nonparametric kernel MCID, we have {(c^n(b),err^n(b))},b=1,,B}. Similarly the (1 − α)-th confidence interval of cn is given by [l^cn,u^cn], and (1 − α)-th confidence interval of err is given by [l^errn,u^errn].

5. Simulation Studies

In this section, we apply the proposed method to provide confidence intervals for the MCID and the classification error via extensive numerical studies based on simulated data.

We consider two scenarios. In the first scenario, we generate a random sample consisting of independent and identically distributed observations {(Xi, Yi, Zi), i = 1, . . . , n}, where we first generate patient’s clinical profile Zi from a bivariate normal distribution N2(μ, I2), where μ = (0, 0)T and I2 = diag(1, 1). Then we generate Xi from N(α + βTzi, 1), where α = 0 and β = (1, 2)T. Finally we generate the binary patient reported outcome Yi ϵ {−1, 1} from Bern(F(xi)), where F(xi) = P(Xixi). Note that under this scenario, the linear MCID is the underlying truth. We also generate a new observation (Ynew, Xnew, Znew) with Ynew = 1, Xnew = −0.3376 and Znew = (−1.3577, −1.3643) from the same distribution as all {(Yi, Xi, Zi)’s}. The true value of the MCID for this patient is −4.0862.

The data under the second scenario are generated similarly to the first, except that the Xi is generated from N(α+βTziβTzi2,1). Hence the linear structure for the MCID is misspecified in this setting. Similar to the first scenario, we also generate a new observation (Ynew, Xnew, Znew) with Ynew = 1, Xnew = −2.1809 and Znew = (−1.3577, −1.3643). The true value of the MCID for this new patient is −9.6519.

In each of the two scenarios, we apply both simple linear and nonparametric kernel MCID methods. The nonparametric kernel we apply is the Gaussian kernel defined as K(z1, z2) = exp(−∥z1z22/2σ2). The scale parameter can be found by setting it as the median of pairwise Euclidean distances within the observations (z1, z2) used to estimate the prediction rule (Hedayat et al., 2015). The m-out-of-n bootstrap samples are generated 1,000 times for each case. For simplicity we set δ = 0.01 in our numerical studies and we use the multifold cross validation method to determine the tuning parameter λ. We report two different sample sizes: n = 500 and n = 1, 000 for each situation.

Based on 500 simulation replications, our results are summarized in Tables 1 and 2. It can be seen that, in scenario 1, the length of the confidence interval is much shorter and the coverage is much more accurate when the correct linear structure and a larger sample size are used. The length is much larger and the coverage is much broader when the nonparametric kernel function is used. In scenario 2, the coverage using the nonparametric kernel is much broader. More importantly we find that the estimation of the MCID using the incorrect simple linear structure is biased so that it gives very poor coverage. This issue cannot be uncovered if confidence interval is not available as in Hedayat et al. (2015). It also reinforces the necessity and importance of developing interval estimation for the MCID.

Table 1:

Confidence interval for MCID in simulation studies. CP=Coverage Probability.

MCID
Sample Size Kernel Lower Upper Length CP
Scenario 1
n = 500 Linear(Correct) −4.3040 −3.3198 0.9842 0.934
Kernel −4.6326 −2.3341 2.2985 0.928
n = 1000 Linear(Correct) −4.2669 −3.6130 0.6539 0.952
Kernel −4.5890 −1.5640 3.0250 0.982
Scenario 2
n = 500 Linear(Incorrect) −7.5186 −4.1525 3.3661 0.004
Kernel −10.7414 −7.4439 3.2974 0.990
n = 1000 Linear(Incorrect) −7.5371 −4.8906 2.6465 0.000
Kernel −10.5402 −4.7963 5.7439 0.998

Table 2:

Confidence interval for the classification error in simulation studies. CP=Coverage Probability.

Classification Error
Sample Size Kernel Lower Upper Length CP
Scenario 1
n = 500 Linear(Correct) 0.1951 0.3068 0.1117 1.000
Kernel 0.1985 0.3198 0.1213 0.996
n = 1000 Linear(Correct) 0.2080 0.2871 0.0791 0.992
Kernel 0.2063 0.3428 0.1365 1.000
Scenario 2
n = 500 Linear(Incorrect) 0.3595 0.4861 0.1266 0.000
Kernel 0.2041 0.3335 0.1294 0.996
n = 1000 Linear(Incorrect) 0.3762 0.4687 0.0924 0.000
Kernel 0.2086 0.3883 0.1797 1.000

We notice that the coverage for the classification error is very broad and Xu et al. (2015) also noted the similar phenomenon. Due to the computational burden, we only explore our method up to the sample size of 1,000. Under the situation with a much larger sample size, the coverage would become more accurate.

6. ChAMP Trial Analysis

In ChAMP trial study, 190 patients with chondral lesions undergoing APM are randomized to either treatment (debridement) group (n = 98) or control (no debridement, observation) group (n = 92). It is of interest to investigate whether the debridement treatment on the chondral lesions would encourage the recovery from the surgery of repairing knee damage.

As we mentioned in the previous section, the binary variable Y is derived from the anchor question in the SF-36 health survey. The patient’s diagnostic measurement X is the difference in WOMAC pain score between baseline and one year after surgery. The score is scaled from 0 (extreme problem) to 100 (no problem). Patient’s clinical profile Z includes their age (continuous), treatment assignment (binary), sex (binary) and knee damage (four level categorical). The knee damage variable is the total number of types of meniscus tears that the patient suffers from. It reflects the severity of patient’s knee damage.

After excluding the missing data, our analysis contains 157 patients. Among them, 80 patients are assigned in the debridement group and 77 others observation group. In our analysis, we first compute the MCID for the whole population, for the subpopulation within the debridement group (treatment=1), for the subpopulation within the observation group (treatment=0), and their difference, respectively, where the MCID c(z) only includes an intercept term. We call this “No model” in Table 3. Then based on the structure

c(z)=α+β1treatment,

which we call “Model 1”, the structure

c(z)=α+β1treatment+β2age+β3sex+β4damage,

which we call “Model 2”, and the nonparametric c(z) which we call “Model 3” in Table 3, separately, we implement our proposed estimation procedure. The results are summarized in Table 3. Note that although we concentrate on interval estimations, we also list the “point estimation” column mainly for the purpose of comparing with the results in Hedayat et al. (2015). Here we generated the m-out-of-n bootstrap samples B = 1, 000 times with m = n0.9 ≈ 95. Also, a sensitivity analysis on the value of B in Table 4 demonstrates that the results are not sensitive when it varies from around 500 to 1,500.

Table 3:

Interval Estimation for the ChAMP Trial Analysis.

MCID Classification Error
Point Interval Estimation Interval Estimation
Estimation Lower Upper Length Lower Upper Length
No model: c(z) = c
MCID_all 1.4199 −0.9868 2.3224 3.3092 0.3050 0.4421 0.1371
MCID_trt=1 2.0216 −0.9944 3.8266 4.8210
MCID_trt=0 −0.0843 −0.9868 1.7207 2.7076
MCID_diff 2.1059 −1.5042 3.9109 5.4151
Model 1: c(z) = α + β1 treatment
α −0.0009 −2.3885 1.6182 4.0067 0.2842 0.5263 0.2421
β1 1.9008 −1.4109 3.4050 4.8159
MCID_all 0.9676 −1.0497 1.8041 2.8538
MCID_trt=1 1.8999 −0.8843 2.5207 3.4050
MCID_trt=0 −0.0009 −2.3885 1.6182 4.0067
MCID_diff 1.9008 −1.4109 3.4050 4.8159
Model 2: c(z) = α + β1 treatment + β2 age + β3 sex + β4 damage
α −6.7261 −21.4810 11.3416 32.8226 0.2737 0.5158 0.2421
β1 2.4068 −1.9171 3.9081 5.8252
β2 0.1680 −0.1918 0.3768 0.5686
β3 −0.9713 −3.4509 2.2291 5.6800
β4 −1.1430 −1.8663 1.3246 3.1909
MCID_all 0.2467 −1.6964 2.0215 3.7178
MCID_trt=1 1.6050 −1.6592 3.3212 4.9804
MCID_trt=0 −1.1645 −3.1225 1.7468 4.8693
MCID_diff 2.7694 −1.6298 4.0097 5.6395
Model 3: c(z) is nonparametric
MCID_all −0.3465 −1.5073 1.5785 3.0858 0.0526 0.4424 0.3897
MCID_trt=1 −0.2987 −1.4469 1.8478 3.2947
MCID_trt=0 −0.3962 −1.7511 1.5378 3.2890
MCID_diff 0.0975 −0.5132 1.2298 1.7430

Table 4:

Sensitivity Analysis for the ChAMP Trial

Bootstrap Resampling Size (B)
600 900 1,200 1,500
No model: c(z) = c
MCID_all Lower −0.9868 −0.9868 −0.9868 −0.9868
Upper 2.3600 2.3224 2.3224 2.3224
MCID_trt=1 Lower −0.9868 −0.9868 −1.2877 −1.2877
Upper 3.8266 3.8266 3.8266 3.8266
MCID_trt=0 Lower −0.9868 −0.9868 −0.9868 −0.9868
Upper 1.7207 1.7207 1.7207 1.7207
MCID_diff Lower −1.5042 −1.5042 −1.5042 −1.5042
Upper 3.9109 3.9109 3.9109 3.9109
Classification Lower 0.3053 0.2997 0.3053 0.2947
Error Upper 0.4526 0.4476 0.4526 0.4526
Model 1: c(z) = α + β1 treatment
α Lower −2.2125 −2.3885 −2.3885 −2.3885
Upper 1.6182 1.6182 1.6182 1.6182
β1 Lower −1.2034 −1.3565 −1.4084 −1.2992
Upper 3.4050 3.4050 3.4050 3.3092
MCID_all Lower −0.8760 −1.0214 −1.0473 −1.0458
Upper 1.7648 1.7624 1.8041 1.9027
MCID_trt=1 Lower −0.7909 −0.8843 −0.8843 −0.8843
Upper 2.5223 2.5207 2.5208 2.5208
MCID_trt=0 Lower −2.2125 −2.3885 −2.3885 −2.3885
Upper 1.6182 1.6182 1.6182 1.6182
MCID_diff Lower −1.2034 −1.3565 −1.4084 −1.2992
Upper 3.4050 3.4050 3.4050 3.3092
Classification Lower 0.2842 0.2842 0.2842 0.2842
Error Upper 0.5161 0.5263 0.5263 0.5263
Model 2: c(z) = α + β1 treatment + β2 age + β3 sex + β4 damage
α Lower −22.5595 −21.4412 −21.3968 −21.1874
Upper 9.7443 10.3888 10.7200 11.2746
β1 Lower −1.9235 −1.8822 −1.9171 −1.9289
Upper 4.0760 3.8862 3.9378 3.9518
β2 Lower −0.1830 −0.1842 −0.1830 −0.2019
Upper 0.4176 0.3760 0.3751 0.3715
β3 Lower −3.5843 −3.4407 −3.5837 −3.4926
Upper 2.3425 2.2252 2.2467 2.3068
β4 Lower −1.7888 −1.8370 −1.8925 −1.8103
Upper 1.3664 1.3070 1.3238 1.3833
MCID_all Lower −1.7405 −1.7091 −1.6945 −1.7689
Upper 2.0215 2.0263 1.9981 2.0001
MCID_trt=1 Lower −1.7345 −1.6240 −1.5910 −1.5944
Upper 3.3077 3.3142 3.3077 3.2833
MCID_trt=0 Lower −3.2409 −3.1269 −3.0683 −3.1346
Upper 1.8563 1.7076 1.6739 1.6121
MCID_diff Lower −1.6303 −1.5644 −1.6227 −1.6262
Upper 4.2119 3.9789 4.0669 4.0933
Classification Lower 0.2734 0.2737 0.2737 0.2737
Error Upper 0.5055 0.5158 0.5053 0.5158
Model 3: c(z) is nonparametric
MCID_all Lower −1.4763 −1.4921 −1.5270 −1.5201
Upper 1.5442 1.5951 1.5785 1.6139
MCID_trt=1 Lower −1.4830 −1.4605 −1.5064 −1.4834
Upper 1.7903 1.8483 1.8195 1.8338
MCID_trt=0 Lower −1.7379 −1.7465 −1.7623 −1.7518
Upper 1.5213 1.5583 1.5378 1.6119
MCID_diff Lower −0.4841 −0.5003 −0.5468 −0.5488
Upper 1.2007 1.2294 1.2000 1.1966
Classification Lower 0.0526 0.0526 0.0526 0.0526
Error Upper 0.4526 0.4476 0.4421 0.4421

From Table 3, the point estimation of MCID for the treatment=1 subgroup 2.0216 looks quite different from that for the treatment=0 subgroup −0.0843. Without the interval estimation, one would believe that they are different however with no further evidence on the degree of how they are different. Now with interval estimation, we see the confidence interval for each of them covers zero, and the confidence interval for the difference also covers zero. The similar phenomenon could also be observed under either “Model 1” or “Model 2” or “Model 3”.

Across the first three models, each of the MCID quantities has slightly different estimates, but each of their confidence intervals covers zero. “Model 3” has difference MCID estimates than its linear counterpart, indicating the potential model interpretability insufficiency contributed by the linear component of the MCID.

Also for the treatment effect β1 in “Model 1” and “Model 2”, although their estimates are different, each of their confidence intervals also covers zero. As we can see, although “Model 3” looks more flexible than its linear counterpart, it lacks clear interpretability of the treatment effect. The different results from “No model” and “Model 3” also indicates that there may also exist some other unobserved covariate variables that may play a role to quantifying the population heterogeneity in terms of MCID.

The classification errors for the first three models are roughly similar with that of “Model 1” slightly greater than “Model 2”. This is reasonable since “Model 2” controls more covariates hence should have a greater ability for model interpretability. “Model 3” has smaller lower bound and upper bound of its confidence interval than “Model 2” because it is generally acknowledged that a nonparametric kernel model entails fewer model assumptions than its linear counterpart model hence is regarded as more flexible.

In all, besides the point estimation of the MCID in each population of interest, our proposed method also provides its interval estimation which gives us a better understanding of the scale of MCID hence could facilitate us to compare with some certain historical values or values from other populations or other diseases so that a better, more convincing health policy decision could be made. The ChAMP study shows that the MCID between the debridement group and the observation group has some difference, but that difference is not significant. One of the major findings of the ChAMP trial study is that to debride the chondral lesions does not have a statistically significant effect hence recommends that not to debride the chondral lesions in future surgical practice (Bisson et al., 2017). Using the proposed method for the MCID in this paper, we reach the same conclusion. This could have a big impact on the orthopaedics surgical practice since the additional debridement of chondral lesions would bring a significant medical cost to the patients.

Acknowledgment

This work was supported by the National Center for Advancing Translational Sciences of the National Institutes of Health under award number UL1TR001412. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH.

Appendix A.

To solve (6), we need to derive its dual problem by replacing the loss function in s1 with slack variables ξi, i = 1, . . . , n, and adding two sets of constraints. This leads to

minω,ξ1ni=1n{ξi1δt1,i(α(k),β(k))yi(α+βTzi)}+λ2βTβ, (24)

subject to ξi ≥ 0 and ξi11δyi(xiαβTzi),i=1,,n, where t1,i(α(k),β(k))=1{yi(xiα(k)β(k)Tzi)<0}. Then its primary Lagrangian is

Lp=1ni=1n{ξi1δt1,i(α(k),β(k))yi(α+βTzi)}+λ2βTβ1ni=1nτi{ξi1+1δyi(xiαβTzi)}1ni=1nγiξi,

where τ = (τ1, . . . , τn)T and γ = (γ1, . . . , γn)T are vectors of non-negative Lagrange multipliers, corresponding to the two sets of constraints in (24). Setting derivatives of the Lagrangian with respect to the primary space variables ω and ξ to 0, we get

{0=1δi=1nyi{τit1,i(α(k),β(k))},β=1nλδi=1nyi{t1,i(α(k),β(k))τi}zi,1=τi+γi,i=1,,n.}

Plugging them back to Lp, we have

Lp=1nδi=1nτiyixi+1ni=1nτiλ2βTβ,2i=1nδyiτixi+2i=1nδ2τi1nλ{i=1nyiτiziTi=1nyiτizi2i=1nyit1,i(α(k),β(k))ziTi=1nyiτizi},=1nλi=1nyiτiziTi=1nyiτizi+2i=1n(δ2δyixi)τi+2nλi=1nyit1,i(α(k),β(k))ziTi=1nyiτizi,=τTQτ+bTτ+2t1(α(k),β(k))TQτ,

where Q is a square matrix with [i, j]th element as yizi,yjzj(nλ),t1(α(k),β(k))={t1,i(α(k),β(k))}i=1n,b=2{δ2δyixi}i=1n.

Appendix B.

To solve (15), we need to derive its dual problem by replacing the loss function in h1 with slack variables ξi,i=1,,n, and adding two sets of constraints. This results in

minη,ξ1ni=1n{ξi1δt2,i(w(k),v(k))yi(w+j=1nvjK(zi,zj))}+λ2i,j=1nvivjK(zi,zj), (25)

subject to ξi0 and ξi11δyi(xiwj=1nvjK(zi,zj)), where t2,i(w(k),v(k))=1{yi(xiw(k)j=1nvj(k)K(zi,zj))<0}. Then its primary Lagrangian is

Lp=1ni=1n{ξi1δt2,i(w(k),v(k))yi(w+j=1nvjK(zi,zi))}+λ2i,j=1nvivjK(zi,zi)1ni=1nτi{ξi1+1δyi(xiwj=1nvjK(zi,zj)}1ni=1nγiξi,

where τ=(τ1,,τn)T and γ=(γ1,,γn)T are vectors of non-negative Lagrange multipliers, corresponding to the two sets of constraints in (25). Setting derivatives of the Lagrangian with respect to the primary space variables η and ξ′ to 0, we get

{0=1δi=1nyi{τit2,i(w(k),v(k))},v=1nλδi=1nyi{t2,i(w(k),v(k))τi}ϕi,1=τi+γi,i=1,,n.}

Plugging them back to Lp, we have

Lp=1nδi=1nτiyixi+1ni=1nτiλ2i=1nj=1nvivjK(zi,zj),2i=1nδyiτixi+2i=1nδ2τi1nλ{i=1nyiτiϕiTi=1nyiτiϕi2i=1nyit2,i(w(k),v(k))ϕiTi=1nyiτiϕi},=1nλi=1nyiτiϕiTi=1nyiτiϕi+2i=1n(δ2δyixi)τi+2nλi=1nyit2,i(w(k),v(k))ϕiTi=1nyiτiϕi,=τTQτ+dTτ+2t2(w(k),v(k))TQτ,

where Q′ is a square matrix with [i, j]th element as yiϕi,yjϕj(nλ),t2(w(k),v(k))={t2,i(w(k),v(k))}i=1n,d=2{δ2δyixi}i=1n.

References

  1. Bickel P, Götze F, and van Zwet W (1997), “Resampling fewer than n observations: gains, losses, and remedies for losses,” Statistica Sinica, 7, 1–31. [Google Scholar]
  2. Bisson LJ, Kluczynski MA, Wind WM, Fineberg MS, Bernas GA, Rauh MA, Marzo JM, Zhou Z, and Zhao J (2017), “Patient outcomes after observation versus debridement of unstable chondral lesions during partial meniscectomy: the chondral lesions and meniscus procedures (ChAMP) randomized controlled trial,” The Journal of Bone & Joint Surgery, 99, 1078–1085. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Bisson LJ (2018), “How Does the Presence of Unstable Chondral Lesions Affect Patient Outcomes After Partial Meniscectomy? The ChAMP Randomized Controlled Trial,” The American Journal of Sports Medicine, 46, 590–597. [DOI] [PubMed] [Google Scholar]
  4. Bisson LJ, Phillips P, Matthews J, Zhou Z, Zhao J, Wind WM, Fineberg MS, Bernas GA, Rauh MA, Marzo JM, and Kluczynski MA (2019), “The association between bone marrow lesions and unstable chondral lesions and pain in patients without radiographic evidence of degenerative joint disease after arthroscopic partial meniscectomy,” Orthopaedic Journal of Sports Medicine, in press. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Cook CE (2008), “Clinimetrics corner: the minimal clinically important change score (MCID): a necessary pretense,” Journal of Manual & Manipulative Therapy, 16, 82E–83E. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Efron B (1979), “Bootstrap Methods: Another Look at the Jackknife,” The Annals of Statistics, 1–26. [Google Scholar]
  7. Erdogan BD, Leung YY, Pohl C, Tennant A, and Conaghan PG (2016), “Minimal clinically important difference as applied in rheumatology: an OMERACT Rasch Working Group systematic review and critique,” The Journal of Rheumatology, 43, 194–202. [DOI] [PubMed] [Google Scholar]
  8. Hedayat A, Wang J, and Xu T (2015), “Minimum clinically important difference in medical studies,” Biometrics, 71, 33–41. [DOI] [PubMed] [Google Scholar]
  9. Jaeschke R, Singer J, and Guyatt GH (1989), “Measurement of health status: ascertaining the minimal clinically important difference,” Controlled Clinical Trials, 10, 407–415. [DOI] [PubMed] [Google Scholar]
  10. Kimeldorf G and Wahba G (1971), “Some results on Tchebycheffian spline functions,” Journal of Mathematical Analysis and Applications, 33, 82–95. [Google Scholar]
  11. Kluczynski MA, Marzo JM, Wind WM, Fineberg MS, Bernas GA, Rauh MA, Zhou Z, Zhao J, and Bisson LJ (2017), “The effect of body mass index on clinical outcomes in patients without radiographic evidence of degenerative joint disease after arthroscopic partial meniscectomy,” Arthroscopy: The Journal of Arthroscopic & Related Surgery, 33, 2054–2063. [DOI] [PubMed] [Google Scholar]
  12. Laber EB and Murphy SA (2011), “Adaptive confidence intervals for the test error in classification,” Journal of the American Statistical Association, 106, 904–913. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. McGlothlin AE and Lewis RJ (2014), “Minimal clinically important difference: defining what really matters to patients,” JAMA, 312, 1342–1343. [DOI] [PubMed] [Google Scholar]
  14. Shao J (1994), “Bootstrap sample size in nonregular cases,” Proceedings of the American Mathematical Society, 122, 1251–1262. [Google Scholar]
  15. Shao J (1996), “Bootstrap model selection,” Journal of the American Statistical Association, 91, 655–665. [Google Scholar]
  16. Thi Hoai An L and Dinh Tao P (1997), “Solving a class of linearly constrained indefinite quadratic problems by DC algorithms,” Journal of Global Optimization, 11, 253–285. [Google Scholar]
  17. Ware JE Jr and Sherbourne CD (1992), “The MOS 36-item short-form health survey (SF-36): I. Conceptual framework and item selection,” Medical Care, 473–483. [PubMed] [Google Scholar]
  18. Wright A, Hannon J, Hegedus EJ, and Kavchak AE (2012), “Clinimetrics corner: a closer look at the minimal clinically important difference (MCID),” Journal of Manual & Manipulative Therapy, 20, 160–166. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Xu Y, Yu M, Zhao Y-Q, Li Q, Wang S, and Shao J (2015), “Regularized outcome weighted subgroup identification for differential treatment effects,” Biometrics, 71, 645–653. [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES