Skip to main content
Journal of Applied Statistics logoLink to Journal of Applied Statistics
. 2024 Sep 5;52(4):894–913. doi: 10.1080/02664763.2024.2399586

Framework for constructing an optimal weighted score based on agreement

Zhiping Qiu a, Manatunga Amita b,CONTACT, Limin Peng b, Ying Guo b, Tanja Jovanovic c
PMCID: PMC11873963  PMID: 40040675

Abstract

In many medical studies, questionnaires or instruments with item ratings are often used to measure the health outcomes reflecting the disease status. Therefore, combining these item ratings to derive a score that reflects the disease status is an important issue. This paper proposes a new weighted method with the weights determined by maximizing the broad sense agreement Peng et al. [L. Peng, R. Li, Y. Guo, and A. Manatunga, A framework for assessing broad sense agreement between ordinal and continuous measurements, J. Am. Stat. Assoc. 106 (2011), pp. 1592–1601.] between the new score and the disease status measured by an ordinal scale. Theoretical and simulation results demonstrate the validity of our proposed optimal weights. We illustrate our method via an application to a post-traumatic stress disorder (PTSD) study.

KEYWORDS: Broad sense agreement, optimal weight, PTSD, smooth approximation, weighted score

2020 MATHEMATICS SUBJECT CLASSIFICATION: 49-00

1. Introduction

In many mental health studies, such as attention-deficit hyperactivity disorder, major depression, and posttraumatic stress disorder (PTSD), etc., instruments or questionnaires with multiple-item ratings are commonly utilized to screen and diagnose diseases tentatively [2,3,21]. These instruments or questionnaires are typically based on a structured questionnaire and an assigned rating system. Therefore, combining these item ratings to derive a global score that reflects the actual disease status is often necessary and an important issue. For example, posttraumatic stress disorder (PTSD) is a complex, heterogeneous disorder and is one of the most prevalent mental health disorders among combat veterans [6]. Establishing a diagnosis such as PTSD is challenging due to the variability of the disease phenotype and heterogeneity in patient population [4]. The best available diagnosis of PTSD (yes vs. no) is evaluated by the Clinician-Administered PTSD Scale (CAPS), which is considered to be the standard criterion measure in PTSD assessment [4]. However, the administration of the CAPS requires a structured interview by trained clinicians with working knowledge of PTSD and is very time-consuming. Therefore, wide applications of this instrument to large populations are costly and challenging. In contrast, the PTSD Symptom Scale (PSS) is a 17-item self-reported questionnaire to assess symptoms of PTSD and requires less assessment time than the CAPS [3]. However, the self-reported PSS consists of four subscales, including PSS intrusive, PSS avoidance, PSS negative, and PSS hyperarousal, and thus, how to combine these four subscales reasonably to construct a score that reflects the degree of disease severity is very important.

In the literature, a naive approach commonly used is to sum over all items to construct a total score. Foa et al. [3], Agresti [1], and McDowell [10]. This approach accounts for each subscale in the questionnaire with equal importance regarding the severity of the disease, and the corresponding total score may not best reflect the disease status. Furthermore, observing disease severity as an ordinal variable is common in practice. For example, in the melanoma and depression study [14], the degree of disease severity is often characterized by no depression, mild depression, and severe depression. Another example we consider in this manuscript is PTSD measured from CAPS (0= PTSD negative, 1= PTSD positive), which is considered the gold standard for PTSD diagnosis. Now, how to develop a general and objective analytic framework for addressing the problem of determining the contributions of each subscale to construct a score when the ordinal disease severity is available. To this end, a fundamental question is, what are the most desirable weights reflecting contributions from subscales to the disease severity? A common viewpoint in practice is that meaningful weights should produce high agreement or association between the newly weighted score and the established disease categories.

This paper aims to develop an analytical strategy for constructing a new score for an instrument by assigning and estimating weights for the ratings of items within a given instrument. The newly developed score is a weighted sum of ratings of items and provides the highest agreement with the actual disease status, as measured by gold or the best available standard known as the broad sense of agreement (BSA) [14]. In addition, we develop an inferential procedure for determining weights along with a statistical test determining whether a simple equal weighting of items is adequate.

The remainder of the article is organized as follows. In Section 2, we present our framework for constructing new weighted scores. Section 3 offers the estimator of the optimal weights and investigates its asymptotic properties. In Section 4, we conduct simulation studies to evaluate the proposed method. An application of our way to the Grady Trauma Study is presented in Section 5. Concluding comments are given in Section 6. Finally, proofs are relegated to the Appendix.

2. The proposed framework for constructing weight scores

We consider a continuous instrument with items grouped according to the item interpretations. Let Z1,,ZM be the continuous scores from item groups 1,,M. In addition, here we suppose the disease status of interest is captured by a categorical measurement Y, which takes values of 1,,L and is considered the gold standard. For instance, in our cross-sectional Grady trauma study of the PTSD dataset given in Section 5, Z1,,Z4 respectively denote the self-reported subtotal score of PSS intrusive, PSS avoidance, PSS negative, and PSS hyperarousal. Moreover, Y is the corresponding PTSD diagnosis measured from CAPS, Y=0 for PTSD negative, Y = 1 otherwise.

In the literature, a common approach for constructing a total score is by the arithmetic average over Z1,,ZM. This approach not only does not consider the different importance of Z1,,ZM to the severity of the disease, but also can not best reflect the disease status. Here, we propose a weighted sum of Z1,,ZM based upon the broad sense of agreement (BSA) proposed by Peng et al. [14]. We first give a brief introduction to the BSA.

2.1. Review of broad sense of agreement

Let X and Y be some continuous and ordinal variables that measure a common trait from the same subject, and their domains are DX and DY={1,2,,L}. For example, in the cross-sectional Grady trauma study of PTSD, X may represent the average value of the subscores of PSS intrusive, PSS avoidance, PSS negative, and PSS hyperarousal, and Y is the corresponding PTSD diagnos. Let X(l) denotes a randomly selected X given Y = l. By Peng et al. [14], X and Y are in perfect BSA if X(1)<<X(L) with probability 1, and in a perfect broad sense of disagreement if X(1)>>X(L) with probability 1. Equivalently, the perfect broad sense of agreement (BSA) entails that there exists an increasing (or decreasing) step function Ψ from DX to DY such that Y=Ψ(X) with probability 1. Let v denote the Euclidean norm of the vector v, and R(x1,,xL) denote a mapping from (x1,,xL)T to its ranks (r1,,rL)T with rl=j=1LI(xlxj),l=1,2,,L. Peng et al. [14] defines the broad sense agreement (BSA) measure between X and Y as follows:

ρbsa=1E{SR(X1,,XL)2}E{SR(X1,,XL)2|XY}, (1)

where S=(1,2,,L)T, E{} denotes the expectation and E{|XY} denotes the expectation under the independence between X and Y. The BSA definition (1) quantifies the extent of departure from perfect BSA by using the distance between the observed ranks in X and the ranks expected under perfect BSA within a group of randomly selected observations, one from each category of Y. Thus, this new measure has a clear meaning and is easy to interpret. In [14], they further showed that 1ρbsa1, and ρbsa=1 (or 1) corresponding to the perfect BSA agreement (or disagreement) scenario, and E{SR(X1,,XL)2|XY}=(L3L)/6.

2.2. The proposed method to construct a score

Let Z=(Z1,,ZM)T, and V(ZT,Y)T denote an observation from the distribution P on the set TRMR. We propose a new method by assigning weights to different continuous group scores, denoted as X(w)(w1,,wM1,1k=1M1wk)Z, where w(w1,,wM1), and w1,,wM1 and 1k=1M1wk respectively denote the weights for item groups 1,,M. Define A={(w1,,wM1)T:0wk1,k=1,,M1,k=1M1wk1}. We can express the BSA between the categorical measurement Y and the continuous weighted score X(w) as a function of weights w, given by:

ρbsa(w)=1E{SR(X(1)(w),,X(L)(w))2}CL,wA, (2)

where X(l)(w)=(w1,,wM1,1k=1M1wk)Z(l), Z(l) denotes a randomly selected Z given Y = l, and CL=(L3L)/6. If the categories do not have an intrinsic natural order, we may use the methods proposed by [8] to sort out the multiple categories. Following our view that a tangible score that measures the degree of severity of the disease should produce a high agreement with the ordinal scale Y, we propose to define the optimal weights as the vector wA that maximizes ρbsa(w), i.e.

w0=argmaxwAρbsa(w). (3)

3. Estimation procedure and inference

This section presents a nonparametric approach for estimating the optimal weights w0 in a clinical application. Let Zik be the observed continuous score of item group k for subject i. Let Yi denote the observed ordinal disease status for subject i. Define Zi=(Zi1,Zi2,,ZiM)T, and let Xi(w)(w1,,wM1,1k=1M1wk)Zi be the severity score of the disease. Then the observed data can be summarized as Vi=(ZiT,Yi)T,i=1,,n, which is a sample of independent observations of (ZT,Y)T. Following [14], without loss of generality, we first arrange them as follows:

(Z1T,Y1=1),,(Zn1T,Yn1=1),(Zn1+1T,Yn1+1=2),,(Zn1+n2T,Yn1+n2=2),(Zl=1L1nl+1T,Yl=1L1nl+1=L),,(Zl=1LnlT,Yl=1Lnl=L),

where nl=i=1nI(Yi=l) and l=1Lnl=n.

Define Z~s,ts=Zl=0s1nl+ts,ts=1,2,,ns,s=1,2,,L, and

Wn(w)=(l=1Lnl)1j1=1n1jL=1nLSR(X~1,j1(w),,X~L,jL(w))2,

where X~l,jl(w)=(w1,,wM1,1k=1M1wk)Z~l,tl. Note that Z~l,jl is the realization of Z(l),l=1,2,,L, it can be shown that Wn(w) is an unbiased estimator of E{SR(X(1)(w),,X(L)(w))2} in (2). Thus, it naturally leads to the estimator of the optimal weights w0 given by:

w^=argmaxwAρ^bsa(w), (4)

where ρ^bsa(w)=1Wn(w)CL. Note that the direct calculation of Wn(w) involves the exhausted stratified sampling, which can cause prohibitive computational burden. To circumvent this issue, we write

SR(X~1,j1(w),,X~L,jL(w))2=2l=1Ll22l=1Lr=1LlI(X~r,jr(w)X~l,jl(w)).

This leads to an equivalent expression of Wn(w),

Wn(w)=L(L+1)(2L+1)32l=1Lr=1Llnrnljr=1nrjl=1nlI{X~r,jr(w)X~l,jl(w)}. (5)

However, ρ^bsa(w) is not a continuous function with respect to w since Wn(w) involves indicator functions. To reduce possible numerical issues caused by the discontinuity, we further propose to use the sigmoid function s(x)=[1+exp(x)]1 as an approximation to the indicator functions in Wn(w). One problem using s(x) in place of indicator functions is that this approximation is poor when x is close to 0. As a remedy, we use the family of functions sn(x)=[1+exp(x/σn)]1 as an approximation to the indicator function I(x0) in Wn(w), where σn is a sequence of strictly positive and decreasing numbers satisfying limnσn=0 [9,15,16,20]. Thus we obtain that a smooth version of Wn(w), which is equal to

Wn,s(w)=L(L+1)(2L+1)32l=1Lr=1Llnrnljr=1nrjl=1nlsn(X~l,jl(w)X~r,jr(w)).

Replacing Wn(w) by Wn,s(w) in ρ^bsa(w), we obtain a smooth estimator of the BSA between X(w) and Y, ρbsa(w), which is given by

ρ^bsa,s(w)=1Wn,s(w)CL. (6)

The corresponding estimator of optimal weights based on the smooth BSA is given by:

w^s=argmaxwAρ^bsa,s(w). (7)

The maximization of (7) involves the tuning parameter σn. In the following simulation studies and real data analysis, we suggest a rule of thumb for choosing σn [20]. Specifically, we first Initialize σn0=σn, where σn is user specified and data independent, satisfying σn0 as n. We use σn=1/n in our data analysis. Next, calculate w^ by maximizing (6) with σn0. Denote σn1 as the largest constant such that 95% of the |(Xi(w^)Xj(w^))/σn| is greater than 5. Lastly, set σn=min(σn0,σn1). In addition, note that the objective function ρ^bsa,s(w) is differentiable concerning w. Thus, we can efficiently solve the maximization problem in (7) by many existing algorithms, such as the Newton-Raphson algorithm, which can be finished using the R function optim() directly.

We investigate the asymptotic properties of w^s, and summarize its consistency and asymptotic normality in Theorems 3.1 and 3.2. We first introduce some notations and regularity conditions. Let Ωn,L={(j1,,jL):1jljL,j1,,jLare distinct}, pl=Pr(Y=l) for l=1,,L, and γL=2/(CLL!). For (m1,,mL)Ωn,L, define

Ψ(Vm1,,VmL,w)=I((Ym1,,YmL)ΘL)×k=1L{Ymk2Ymkr=1LI(Xmk(w)Xmr(w))},h(Vm1,,VmL,w)=1Ψ(Vm1,,VmL,w)γL(l=1Lpl)1,

where ΘL denotes all L! permutations of {1,2,,L}. For each vT and for each wA, define

τ(v,w)=Eh(v,V2,,VL,w)+Eh(V1,v,,VL,w)++Eh(V1,V2,,v,w).

Let m denotes the mth partial derivative operator with respect to w. The following regularity conditions are needed in deriving the asymptotic properties of w^s:

  • (C1)

    pl>0 for l=1,,L.

  • (C2)

    w0 is an interior point of the set A and is the unique maximizer of ρbsa(w).

  • (C3)

    The support of Z is not contained in any proper linear subspace of RM.

  • (C4)

    The tuning parameter σn satisfies: nσn20, as n.

  • (C5)

    Let N denote a neighborhood of w0.

  • (i)

    For each vT, all mixed second partial derivatives with respect to w of τ(v,w) exist on N. The partial derivatives of the conditional density of Z given X(w0) exist and are bounded.

  • (ii)
    For all vT and wA, there exists an integrable function M(v) such that
    2τ(v,w)2τ(v,w0)∥≤M(v)|ww0|.
  • (iii)

    E1τ(V,w0)2<, where the symbol denote the matrix norm: (cij)=(i,jcij2)1/2.

  • (iv)

    E|2|τ(V,w0)<, where |2|τ(v,w)=i1,i2|2wi1wi2τ(v,w)|.

  • (v)

    The matrix E[2τ(V,w0)] is negative definite.

The conditions above are commonly adopted in the literature [5,7,18–20]. These conditions ensure the consistency and asymptotic normality of w^s. Under these regularity conditions, we establish the consistency and asymptotic normality in the following two Theorems.

Theorem 3.1

Assume that the regularity conditions (C1)–(C4) hold. Then

w^spw0.

Theorem 3.2

Assume that the regularity conditions (C1)–(C5) hold. Then

n(w^sw0)dN(0,[Σ(w0)]1Σ(w0)[Σ(w0)]1),

where ‘ d’ denote the convergence in distribution, Σ(w0)=1LE[2τ(V,w0)] and Σ(w0)=E1τ(V,w0)[1τ(V,w0)]T.

Suggested by a referee, next we also give the asymptotic distribution of the estimated agreement parameters, ρ^bsa,s(w^s).

Theorem 3.3

Assume that the regularity conditions (C1)–(C5) hold. Then

n{ρ^bsa,s(w^s)ρbsa(w0)}dN(0,[1ρbsa(w0)]T[Σ(w0)]1Σ(w0)[Σ(w0)]1[1ρbsa(w0)]).

The proofs of Theorems 3.1–3.3 are provided in Appendix. Next we propose using the jacknife method to estimate the asymptotic variance of w^s due to the complication of the sandwich matrix [Σ(w0)]1Σ(w0)[Σ(w0)]1. The jacknife variance estimator of w^s is given as below:

n1ni=1n(w^sin1k=1nw^sk)(w^sin1k=1nw^sk)T, (8)

where w^si denotes the estimator of w0 based on the original data excluding Vi.

4. Simulation studies

We conduct simulation studies to examine the performance of our proposed method. We consider the following three scenarios.

For the first scenario, Y is an ordinal random variable taking values 1,2,,L with equal probabilities. Let 1M denote a M×1 vector with all elements are equal to 1, Σ=(σij)M×M denote a M×M matrix with the element σij=ρ|ij| if ij, σij=1 otherwise. The conditional distribution of Zi=(Zi1,Zi2,,ZiM)T given Y = l is specified as the following two cases: multivariate normal distribution NM(l1M,Σ), or multivariate t distribution t5(l1M,Σ). We set L = 3, M = 2 or 4, and ρ=0, 0.3, or 0.7. Note that ρ=0 denotes that the variables in Zi=(Zi1,Zi2,,ZiM)T are independent, and when ρ becomes larger, the correlation between the variables in Zi=(Zi1,Zi2,,ZiM)T becomes stronger. The simulation results based on 1000 replications are presented in Table 1, where w0 denotes the true weight, and BIAS, SE, ASE, and CP denote the empirical bias, the empirical standard deviations, the average estimated standard deviation and the empirical coverage probability of 95% confidence interval respectively. The results show that the proposed approach yields very small empirical biases in all cases. The SE's and the corresponding ASE's are close for all sample sizes. The CP's are close to the nominal value 95%. Moreover, the different values of ρ and L have no obvious impacts on the performances.

Table 1.

Simulation results for the first scenario.

      Normal t
ρ M n Est w0 BIAS SE ASE CP w0 BIAS SE ASE CP
0 2 200 w^1 0.500 −0.001 0.052 0.055 96.2% 0.500 0.000 0.067 0.075 95.1%
    w^2 0.500 0.001 0.052 0.055 96.1% 0.500 0.000 0.067 0.075 95.4%
  400 w^1 0.500 −0.000 0.035 0.037 96.7% 0.500 0.002 0.048 0.050 93.2%
    w^2 0.500 0.000 0.035 0.037 96.4% 0.500 −0.002 0.048 0.050 95.3%
  4 200 w^1 0.250 −0.002 0.040 0.043 97.4% 0.250 −0.002 0.052 0.058 96.0%
    w^2 0.250 0.000 0.041 0.043 97.6% 0.250 −0.002 0.052 0.058 97.2%
    w^3 0.250 −0.000 0.039 0.043 97.4% 0.250 0.004 0.051 0.058 96.7%
    w^4 0.250 0.002 0.041 0.043 97.1% 0.250 0.000 0.054 0.057 97.1%
  400 w^1 0.250 0.000 0.028 0.029 96.7% 0.250 −0.002 0.038 0.041 97.2%
    w^2 0.250 −0.000 0.028 0.030 97.8% 0.250 0.006 0.038 0.041 95.1%
    w^3 0.250 0.000 0.029 0.029 97.2% 0.250 −0.001 0.037 0.040 98.2%
    w^4 0.250 −0.000 0.029 0.029 97.5% 0.250 −0.003 0.036 0.040 96.1%
0.3 2 200 w^1 0.500 −0.003 0.077 0.079 95.9% 0.500 0.000 0.098 0.107 94.2%
    w^2 0.500 0.003 0.077 0.079 95.0% 0.500 0.000 0.098 0.107 94.5%
  400 w^1 0.500 0.001 0.051 0.054 96.1% 0.500 −0.007 0.064 0.075 96.0%
    w^2 0.500 −0.001 0.051 0.054 96.4% 0.500 0.007 0.064 0.075 95.0%
  4 200 w^1 0.294 0.001 0.058 0.061 96.2% 0.294 −0.001 0.073 0.088 96.3%
    w^2 0.206 −0.003 0.064 0.069 97.4% 0.206 0.001 0.08 0.097 96.8%
    w^3 0.206 −0.000 0.066 0.068 97.0% 0.206 0.004 0.082 0.096 96.2%
    w^4 0.294 0.002 0.059 0.061 95.9% 0.294 −0.004 0.073 0.088 96.4%
  400 w^1 0.294 −0.000 0.039 0.042 97.6% 0.294 0.000 0.050 0.058 97.8%
    w^2 0.206 0.001 0.045 0.046 96.8% 0.206 0.000 0.060 0.064 96.6%
    w^3 0.206 −0.001 0.044 0.046 97.1% 0.206 0.002 0.059 0.064 95.1%
    w^4 0.294 0.001 0.039 0.042 97.1% 0.294 −0.002 0.051 0.059 96.6%
0.7 2 200 w^1 0.500 −0.009 0.141 0.152 95.8% 0.500 −0.005 0.179 0.198 93.3%
    w^2 0.500 0.009 0.141 0.152 93.7% 0.500 0.005 0.179 0.198 93.6%
  400 w^1 0.500 0.002 0.098 0.104 95.6% 0.500 −0.005 0.120 0.132 94.7%
    w^2 0.500 −0.002 0.098 0.104 95.9% 0.500 0.005 0.120 0.132 93.4%
  4 200 w^1 0.385 0.002 0.119 0.134 97.2% 0.385 0.003 0.152 0.185 95.5%
    w^2 0.115 0.001 0.153 0.172 96.7% 0.115 −0.002 0.191 0.235 96.3%
    w^3 0.115 −0.003 0.157 0.175 95.8% 0.115 −0.008 0.193 0.235 97.2%
    w^4 0.385 −0.000 0.117 0.135 97.2% 0.385 0.007 0.146 0.189 95.8%
  400 w^1 0.385 −0.002 0.082 0.088 97.0% 0.385 0.005 0.105 0.132 96.2%
    w^2 0.115 0.006 0.108 0.114 95.4% 0.115 −0.005 0.133 0.172 95.7%
    w^3 0.115 −0.003 0.103 0.115 97.6% 0.115 0.001 0.137 0.176 94.9%
    w^4 0.385 −0.001 0.082 0.088 97.5% 0.385 −0.001 0.109 0.136 95.4%

For the second scenario, Y is an ordinal random variable taking values 1,2,,L with nonequal probabilities 1/l=1Ll,2/l=1Ll,,L/l=1Ll respectively. The other configurations are the same to the first scenario. The simulation results are reported in Table 2. Again, we find that BIAS's are very small, the SE's and the corresponding ASE's are close, and the CP's are close to the nominal value 95%.

Table 2.

Simulation results for the second scenario.

      Normal t
ρ M n Est w0 BIAS SE ASE CP w0 BIAS SE ASE CP
0 2 200 w^1 0.500 −0.001 0.061 0.061 96.1% 0.500 −0.003 0.073 0.086 96.4%
    w^2 0.500 0.001 0.061 0.061 95.7% 0.500 0.003 0.073 0.086 96.2%
  400 w^1 0.500 −0.000 0.042 0.042 96.0% 0.500 0.000 0.052 0.056 94.6%
    w^2 0.500 0.000 0.042 0.042 96.3% 0.500 0.000 0.052 0.056 95.6%
  4 200 w^1 0.250 0.000 0.044 0.048 97.5% 0.250 0.000 0.060 0.068 96.1%
    w^2 0.250 −0.000 0.044 0.047 97.2% 0.250 −0.001 0.059 0.067 96.9%
    w^3 0.250 −0.000 0.046 0.047 96.7% 0.250 0.001 0.058 0.068 97.6%
    w^4 0.250 0.000 0.045 0.048 97.5% 0.250 0.000 0.057 0.068 96.6%
  400 w^1 0.250 −0.000 0.031 0.032 98.0% 0.250 0.000 0.042 0.046 96.7%
    w^2 0.250 −0.000 0.031 0.032 96.9% 0.250 −0.003 0.042 0.046 98.2%
    w^3 0.250 0.001 0.031 0.032 97.3% 0.250 0.003 0.040 0.045 96.7%
    w^4 0.250 0.000 0.032 0.032 96.6% 0.250 0.000 0.042 0.045 96.9%
0.3 2 200 w^1 0.500 −0.003 0.087 0.090 96.1% 0.500 −0.007 0.108 0.122 94.5%
    w^2 0.500 0.003 0.087 0.090 95.2% 0.500 0.007 0.108 0.122 94.7%
  400 w^1 0.500 0.000 0.058 0.061 96.6% 0.500 −0.003 0.076 0.088 94.4%
    w^2 0.500 0.000 0.058 0.061 96.3% 0.500 0.003 0.076 0.088 94.6%
  4 200 w^1 0.294 −0.003 0.063 0.068 97.5% 0.294 0.000 0.084 0.097 96.8%
    w^2 0.206 0.002 0.073 0.077 96.3% 0.206 0.000 0.091 0.108 96.2%
    w^3 0.206 0.002 0.074 0.077 96.8% 0.206 0.001 0.092 0.107 96.5%
    w^4 0.294 −0.000 0.062 0.069 97.6% 0.294 −0.001 0.084 0.098 96.4%
  400 w^1 0.294 −0.003 0.044 0.047 97.3% 0.294 0.007 0.056 0.068 97.0%
    w^2 0.206 0.001 0.049 0.052 97.6% 0.206 −0.003 0.062 0.073 97.3%
    w^3 0.206 0.001 0.050 0.052 96.5% 0.206 −0.001 0.063 0.074 97.0%
    w^4 0.294 0.002 0.044 0.046 95.4% 0.294 −0.003 0.056 0.066 97.7%
0.7 2 200 w^1 0.500 −0.005 0.155 0.178 95.3% 0.500 0.010 0.193 0.229 94.9%
    w^2 0.500 0.005 0.155 0.178 95.0% 0.500 −0.010 0.193 0.229 95.4%
  400 w^1 0.500 0.002 0.110 0.115 95.1% 0.500 0.003 0.138 0.159 95.4%
    w^2 0.500 −0.002 0.110 0.115 95.7% 0.500 −0.003 0.138 0.159 94.7%
  4 200 w^1 0.385 0.008 0.128 0.160 97.4% 0.385 0.005 0.164 0.218 95.9%
    w^2 0.115 −0.014 0.180 0.206 97.9% 0.115 −0.008 0.210 0.270 97.7%
    w^3 0.115 0.003 0.183 0.206 95.7% 0.115 0.007 0.207 0.267 96.2%
    w^4 0.385 0.004 0.132 0.161 97.1% 0.385 −0.004 0.169 0.214 96.5%
  400 w^1 0.385 0.001 0.093 0.103 97.2% 0.385 0.016 0.112 0.150 95.5%
    w^2 0.115 −0.007 0.115 0.134 96.5% 0.115 −0.014 0.141 0.189 97.0%
    w^3 0.115 0.006 0.121 0.134 96.0% 0.115 0.006 0.150 0.193 97.3%
    w^4 0.385 0.001 0.095 0.102 96.8% 0.385 −0.009 0.113 0.152 97.9%

In the last scenario, we consider a heteroscedasticity case. The settings are same to the first and second scenario except that Σ=(σij)M×M is a M×M matrix, where the diagonal elements σii=i, but the off-diagonal elements σij=ρ|ij| for ij. That is, when kl, the variables Zik and Zil have different variances. The simulation results are presented in Tables 3 and 4. Table 3 is for the case when Y is an ordinal random variable taking values 1,2,,L with equal probabilities, and Table 4 is for the unequal probabilities case. Like in the first scenario and the second scenario, we consistently observe good performance of the proposed method in the heteroscedasticity case. In summary, our results indicate the good and broad utility of the proposed approach.

Table 3.

Simulation results for the third scenario with Y taking values 1,2,,L with equal probabilities.

    Normal t
ρ M n Est w0 BIAS SE ASE CP w0 BIAS SE ASE CP
0 2 200 w^1 0.667 −0.001 0.054 0.055 96.7% 0.667 0.000 0.066 0.076 94.8%
    w^2 0.333 0.001 0.054 0.055 94.6% 0.333 0.000 0.066 0.076 95.1%
  400 w^1 0.667 −0.000 0.035 0.037 95.9% 0.667 −0.001 0.046 0.051 94.5%
    w^2 0.333 0.000 0.035 0.037 96.2% 0.333 0.001 0.046 0.051 93.5%
  4 200 w^1 0.480 0.002 0.050 0.056 97.0% 0.480 −0.002 0.063 0.082 97.0%
    w^2 0.240 0.002 0.044 0.047 97.0% 0.240 0.002 0.056 0.071 96.0%
    w^3 0.160 −0.003 0.037 0.041 98.1% 0.160 0.000 0.048 0.061 97.1%
    w^4 0.120 −0.001 0.034 0.036 96.9% 0.120 0.000 0.043 0.055 95.8%
  400 w^1 0.480 −0.000 0.035 0.037 96.7% 0.480 −0.004 0.046 0.053 97.2%
    w^2 0.240 −0.000 0.030 0.032 97.0% 0.240 0.002 0.036 0.046 96.4%
    w^3 0.160 0.000 0.027 0.028 95.9% 0.160 0.001 0.034 0.040 94.7%
    w^4 0.120 −0.000 0.024 0.024 96.0% 0.120 0.001 0.03 0.035 95.5%
0.3 2 200 w^1 0.708 −0.000 0.067 0.072 95.2% 0.708 0.000 0.082 0.084 97.0%
    w^2 0.292 0.000 0.067 0.072 95.2% 0.292 0.000 0.082 0.084 96.8%
  400 w^1 0.708 0.001 0.047 0.048 96.4% 0.708 0.001 0.056 0.058 96.9%
    w^2 0.292 −0.001 0.047 0.048 95.7% 0.292 −0.001 0.056 0.058 97.9%
  4 200 w^1 0.525 0.004 0.064 0.072 96.1% 0.525 −0.002 0.081 0.086 97.2%
    w^2 0.193 −0.003 0.058 0.064 96.3% 0.193 0.001 0.069 0.076 97.2%
    w^3 0.152 −0.002 0.046 0.052 97.5% 0.152 0.001 0.061 0.063 96.0%
    w^4 0.131 0.001 0.040 0.044 96.3% 0.131 −0.002 0.049 0.052 96.8%
  400 w^1 0.525 0.000 0.044 0.048 97.1% 0.525 −0.003 0.055 0.059 97.0%
    w^2 0.193 −0.000 0.038 0.043 97.7% 0.193 0.003 0.049 0.053 96.1%
    w^3 0.152 −0.000 0.030 0.034 96.5% 0.152 −0.001 0.039 0.043 97.2%
    w^4 0.131 0.000 0.028 0.029 96.7% 0.131 0.000 0.034 0.036 96.7%
0.7 2 200 w^1 0.812 −0.004 0.090 0.102 95.7% 0.812 0.004 0.120 0.125 95.6%
    w^2 0.188 0.004 0.090 0.102 96.0% 0.188 −0.004 0.120 0.125 97.0%
  400 w^1 0.812 −0.001 0.065 0.066 95.9% 0.812 0.003 0.080 0.083 96.0%
    w^2 0.188 0.001 0.065 0.066 94.2% 0.188 −0.003 0.080 0.083 96.4%
  4 200 w^1 0.650 −0.006 0.095 0.116 96.9% 0.650 0.006 0.126 0.134 96.6%
    w^2 0.118 0.004 0.084 0.101 96.1% 0.118 -0.006 0.115 0.121 97.8%
    w^3 0.115 −0.002 0.065 0.076 95.6% 0.115 −0.004 0.085 0.089 97.2%
    w^4 0.117 0.003 0.053 0.062 95.5% 0.117 0.004 0.066 0.073 96.2%
  400 w^1 0.650 −0.002 0.069 0.077 96.1% 0.650 −0.004 0.082 0.084 96.7%
    w^2 0.118 0.001 0.059 0.069 95.9% 0.118 0.005 0.078 0.079 97.2%
    w^3 0.115 0.001 0.045 0.050 95.7% 0.115 0.003 0.057 0.057 97.7%
    w^4 0.117 −0.000 0.038 0.042 96.5% 0.117 −0.004 0.048 0.049 96.6%

Table 4.

Simulation results for the third scenario with Y taking values 1,2,,L with different probabilities.

    Normal t
ρ M n Est w0 BIAS SE ASE CP w0 BIAS SE ASE CP
0 2 200 w^1 0.667 −0.000 0.060 0.065 95.4% 0.667 0.001 0.074 0.078 97.2%
    w^2 0.333 0.000 0.060 0.065 94.8% 0.333 −0.001 0.074 0.078 96.8%
  400 w^1 0.667 0.000 0.041 0.043 96.0% 0.667 −0.002 0.052 0.052 95.2%
    w^2 0.333 −0.000 0.041 0.043 96.2% 0.333 0.002 0.052 0.052 96.8%
  4 200 w^1 0.480 −0.000 0.058 0.065 96.7% 0.480 −0.002 0.076 0.078 97.4%
    w^2 0.240 0.000 0.051 0.055 97.1% 0.240 −0.004 0.061 0.067 97.0%
    w^3 0.160 0.000 0.044 0.047 96.8% 0.160 0.002 0.055 0.057 97.0%
    w^4 0.120 −0.000 0.039 0.041 96.9% 0.120 0.004 0.052 0.050 96.0%
  400 w^1 0.480 −0.001 0.041 0.043 97.3% 0.480 0.004 0.052 0.053 97.0%
    w^2 0.240 0.002 0.034 0.037 96.7% 0.240 −0.003 0.042 0.046 97.6%
    w^3 0.160 −0.001 0.030 0.033 97.0% 0.160 −0.001 0.036 0.039 97.5%
    w^4 0.120 −0.000 0.027 0.028 97.1% 0.120 0.001 0.033 0.034 96.7%
0.3 2 200 w^1 0.708 0.001 0.073 0.087 96.6% 0.708 0.001 0.096 0.099 96.8%
    w^2 0.292 −0.000 0.070 0.082 96.5% 0.292 −0.001 0.096 0.099 96.4%
  400 w^1 0.708 −0.000 0.054 0.063 96.9% 0.708 −0.004 0.064 0.067 97.6%
    w^2 0.292 0.000 0.053 0.056 95.6% 0.292 0.004 0.064 0.067 96.2%
  4 200 w^1 0.525 0.001 0.071 0.083 96.7% 0.252 0.000 0.092 0.099 96.8%
    w^2 0.193 −0.001 0.067 0.075 96.9% 0.193 −0.001 0.080 0.089 97.6%
    w^3 0.152 0.001 0.052 0.059 96.4% 0.152 0.001 0.065 0.071 97.2%
    w^4 0.131 −0.002 0.047 0.050 96.5% 0.131 −0.001 0.055 0.059 97.8%
  400 w^1 0.525 −0.001 0.049 0.055 96.9% 0.252 0.001 0.068 0.065 96.8%
    w^2 0.193 0.001 0.044 0.049 97.5% 0.193 −0.001 0.059 0.058 96.0%
    w^3 0.152 −0.001 0.036 0.039 96.5% 0.152 0.001 0.044 0.048 97.2%
    w^4 0.131 0.000 0.030 0.033 97.4% 0.131 −0.002 0.038 0.040 97.4%
0.7 2 200 w^1 0.812 −0.002 0.108 0.120 95.0% 0.812 0.000 0.135 0.138 96.8%
    w^2 0.188 0.002 0.108 0.120 95.8% 0.188 0.000 0.135 0.138 96.6%
  400 w^1 0.812 −0.001 0.072 0.080 95.8% 0.812 −0.007 0.090 0.092 96.8%
    w^2 0.188 0.001 0.072 0.080 95.4% 0.188 0.007 0.090 0.092 95.2%
  4 200 w^1 0.650 0.000 0.116 0.139 96.9% 0.650 0.000 0.141 0.154 96.4%
    w^2 0.118 −0.006 0.101 0.122 96.9% 0.118 0.000 0.123 0.135 96.8%
    w^3 0.115 0.004 0.078 0.092 95.8% 0.115 0.001 0.090 0.099 96.8%
    w^4 0.117 0.002 0.062 0.075 96.4% 0.117 −0.001 0.074 0.080 97.0%
  400 w^1 0.650 0.003 0.082 0.092 96.7% 0.650 0.001 0.105 0.103 96.8%
    w^2 0.118 −0.004 0.073 0.078 96.3% 0.118 0.000 0.089 0.09 96.0%
    w^3 0.115 0.001 0.050 0.058 96.0% 0.115 0.001 0.062 0.069 97.2%
    w^4 0.117 −0.001 0.043 0.049 95.8% 0.117 −0.002 0.051 0.055 97.2%

5. Data example

We apply our method to a cross-sectional Grady trauma study of PTSD. This research was approved by the Trauma-Related Health Sequelae at Grady Memorial Hospital, Emory University (IRB number: 00002114, 00078593). The current PTSD diagnosis (Y ) is obtained from CAPS (0= PTSD negative, 1= PTSD positive), considered the gold standard for PTSD diagnosis. Data collected from a 17-item self-reported version of PSS are used in our analysis. The self-reported PSS consists of four subscales, including PSS intrusive (Z1), PSS avoidance (Z2), PSS negative (Z3), and PSS hyerarousal (Z4). Specifically, PSS intrusive and PSS negative each include four items with a subtotal score ranging from 0 to 12, PSS avoidance consists of 3 items with a subtotal score ranging from 0 to 9, and PSS avoidnumb includes six items with a subtotal score ranging from 0 to 18. A total of 942 participants with records on all variables (CAPS and the four PSS subscales) are included for further analysis. Among these participants, 433 (46.0 %) have a positive PTSD diagnosis.

Our goal for this analysis is to develop a new score based on four PSS subscales with the highest agreement with the current diagnosis of PTSD. To make the contributions of the four PSS scores comparable, we standardize them using the Z-score approach. The newly scaled PSS scores are calculated by subtracting the mean from the original score and then dividing it by the original score's standard deviation. As a preliminary result, we first present the descriptive statistics of four PSS subscales (intrusive, avoidance, negative, and hyerarousal) between the positive PTSD group vs. the negative PTSD group. As seen in Figure 1, the PSS subscales are different among those with PTSD vs. those without PTSD.

Figure 1.

Figure 1.

The box plot of PSS intrusive, PSS avoidance, PSS negative, and PSS hyerarousal between positive PTSD group (‘1’) vs. the negative PTSD group (‘0’).

Next, we apply our method by considering four PSS subscales and estimate the optimal weights by (7). As shown in Table 5, the optimal weights, computed by (7) respectively, for PSS intrusive, PSS avoidance, PSS negative, and PSS hyerarousal are w^s=(0.2281,0.1199,0.0991,0.5529)T, and the corresponding broad sense agreement ρ^bsa,s(w^s)=0.4895. The weight estimates suggest a more dominating role of PSS hyerarousal in contributing to the PTSD diagnosis.

Table 5.

Optimal weights estimation of four PSS subscales for a cross-sectional Grady trauma study of PTSD.

PSS subscale w^s SE CI
PSS intrusive 0.2281 0.1162 (0.0004, 0.4557)
PSS avoidance 0.1199 0.0936 (0.0000, 0.3034)
PSS negative 0.0991 0.0946 (0.0000, 0.2845)
PSS hyerarousal 0.5529 0.1040 (0.3668, 0.7567)

Note: w^s: the estimated weights, SE: the standard error of w^s, CI: 95% confidence interval.

We also calculate the BSA between the average value of the four subscales of the PSS and CAPS-based PTSD diagnosis and the resulting broad sense agreement 0.4772, which is smaller than ρ^bsa,s(w^s)=0.4895. We also test the hypothesis H0:w=(1/4,1/4,1/4,1/4)T v.s. H1:w(1/4,1/4,1/4,1/4)T. The test statistic yields a P-value of 0.0013, which suggests equal weighting across the four subscales may not be appropriate.

To gain insight into the problem, we conduct a logistic regression for predicting the current PTSD diagnosis based on PSS intrusive, PSS avoidance, PSS negative, and PSS hyerarousal. Results show that PSS intrusive (estimate=0.2117, se=0.1046, p-value=0.043) and PSS hyerarousal (estimate=0.6710, se=0.1062, p-value<0.001) are significantly associated with PTSD diagnosis, while the association of PSS avoidance (estimate=0.1150, se=0.0967, p-value=0.234) and PSS negative (estimate=0.1545, se=0.1018, p-value=0.129) with PTSD diagnosis are not significant. This is consistent with the weight estimates, which indicate relatively minor contributions of PSS avoidance and PSS negative to PTSD diagnosis.

6. Concluding remarks

In this article, we propose a new method to derive a weighted score that can provide the highest agreement between the new score and the disease status. The proposed method offers an objective scientific way of constructing a new score that best aligns with the disease status. The weights are determined by the proposed methods and indicate the strength of each item subscale's contribution to the overall score. For a new patient, rather than totaling the scores of each of the four items, one can use the weighted score to represent the severity of the disease status better.

A valuable by-product of our method is a test for testing the hypothesis that the contribution of an item to the score is zero or that the contributions of different items are equal. In practice, computational difficulties become intensive when too many items exist. In this case, we recommend grouping items based on clinical guidance to reduce the number of item groups M. Another important question is the required number of subjects to estimate the weighted score when the ordinal disease severity is available. We note that The estimated weights follow a multivariate normal distribution asymptotically with the mean of true weights and variance-covariance matrix. The problem of calculating sample size for weights is complicated because estimated weights are correlated, and the estimated variance cannot be explicitly written. Simulation studies offer better insights for determining the required width of confidence interval weights for specified sample sizes, which may interest future studies.

Appendix.

Proof of Theorem 3.1

We divide the proof of this Theorem into two steps.

Step 1. First, we prove that w^pw0. According to the Theorem 2.1 in [11] and the condition (C3), what remains for consistency is to show

supwA|ρ^bsa(w)ρbsa(w)|p0. (A1)

By the similar proof of [14,22], we have

Wn(w)=(l=1Lnl)1j1=1n1jL=1nLSR(X~1,j1(w),,X~L,jL(w))2=(l=1Lnl)11L!(m1,,mL)Ωn,LI((Ym1,,YmL)ΘL)×k=1L{Ymkr=1LI(Xmk(w)Xmr(w))}2=2(l=1Lnl)11L!(m1,,mL)Ωn,LI((Ym1,,YmL)ΘL)×k=1L{Ymk2Ymkr=1LI(Xmk(w)Xmr(w))}=2(l=1Lnl)11L!(m1,,mL)Ωn,LΨ(Vm1,,VmL,w).

Therefore,

ρ^bsa(w)=1P(n,L)(m1,,mL)Ωn,L{1Ψ(Vm1,,VmL,w)Mn,L},

where Mn,L=2(nL)/{CLl=1Lnl}.

Define

ρ^bsa[1](w)=1P(n,L)(m1,,mL)Ωn,Lh(Vm1,,VmL,w),ρ^bsa[2](w)=1P(n,L)(m1,,mL)Ωn,LΨ(Vm1,,VmL,w){Mn,LγL(l=1Lpl)1}.

Then ρ^bsa(w)=ρ^bsa[1](w)ρ^bsa[2](w). Thus it follows that

supwA|ρ^bsa(w)ρbsa(w)|supwA|ρ^bsa[1](w)ρbsa(w)|+supwA|ρ^bsa[2](w)|. (A2)

For the second item on the right-hand side of (A2), note that the uniform boundedness of Ψ(Vm1,,VmL,w) and the convergence of Mn,LpγL(l=1Lpl)1 shown in [14], we have supwA|ρ^bsa[2](w)|p0. For the first item on the right-hand side of (A2). Let

F={h(Vm1,,VmL,w):wA}.

According to the analogous subgraph set arguments as [18] and Lemma 2.14 in [13], we can show that it is Euclidean for a constant envelope. Thus, by applying Corollary 7 in [19], we get

supwA|ρ^bsa[1](w)ρbsa(w)|p0.

Combining these arguments, we conclude the proof of (A1) and also the proof of w^pw0.

Step 2. Next, we prove that w^spw0. By Theorem 2.1 in [11], it is sufficient to show that

supwA|ρ^bsa,s(w)ρ^bsa(w)|p0. (A3)

Similarly, since ρ^bsa,s(w)=ρ^bsa,s[1](w)ρ^bsa,s[2](w), where

ρ^bsa,s[1](w)=1P(n,L)(m1,,mL)Ωn,Lhs(Vm1,,VmL,w),ρ^bsa,s[2](w)=1P(n,L)(m1,,mL)Ωn,LΨs(Vm1,,VmL,w){Mn,LγL(l=1Lpl)1},

and

hs(Vm1,,VmL,w)=1Ψs(Vm1,,VmL,w)γL(l=1Lpl)1,Ψs(Vm1,,VmL,w)=I((Ym1,,YmL)ΘL)×k=1L{Ymk2Ymkr=1Lsn(Xmk(w)Xmr(w))}.

Hence

supwA|ρ^bsa(w)ρ^bsa,s(w)|psupwA|ρ^bsa[1](w)ρ^bsa,s[1](w)|+supwA|ρ^bsa[2](w)ρ^bsa,s[2](w)|. (A4)

Note that the boundedness of Ψs(Vm1,,VmL,w) and Ψ(Vm1,,VmL,w) again and by the similar derivations as above, we have supwA|ρ^bsa[2](w)ρ^bsa,s[2](w)|p0. But for the first item on the right hand side of (A4), since

ρ^bsa[1](w)ρ^bsa,s[1](w)=1P(n,L)(m1,,mL)Ωn,LI((Ym1,,YmL)ΘL)γL(l=1Lpl)1×k=1L[{Ymk2Ymkr=1LI(Xmk(w)Xmr(w))}{Ymk2Ymkr=1Lsn(Xmk(w)Xmr(w))}]=1P(n,L)(m1,,mL)Ωn,LI((Ym1,,YmL)ΘL)γL(l=1Lpl)1×k=1Lr=1L[I(Xmk(w)Xmr(w))sn(Xmk(w)Xmr(w))].

Thus there exists a large constant M such that for any η>0,

|ρ^bsa[1](w)ρ^bsa,s[1](w)|M1P(n,L)(m1,,mL)Ωn,LI((Ym1,,YmL)ΘL)×k=1Lr=1L|I(Xmk(w)Xmr(w))sn(Xmk(w)Xmr(w))|=Tn1(w)+Tn2(w),

where

Tn1(w)=M1P(n,L)(m1,,mL)Ωn,LI((Ym1,,YmL)ΘL)×k=1Lr=1L|I(Xmk(w)Xmr(w))sn(Xmk(w)Xmr(w))|×I(|Xmk(w)Xmr(w)|η),Tn2(w)=M1P(n,L)(m1,,mL)Ωn,LI((Ym1,,YmL)ΘL),×k=1Lr=1L|I(Xmk(w)Xmr(w))sn(Xmk(w)Xmr(w))|×I(|Xmk(w)Xmr(w)|<η).

On the set {|x|η}, we have |sn(x)I(x0)|exp(|x|/σn)exp(η/σn). Thus when σn0, sn(x)I(x0) uniformly on the set {|x|η}. Therefore, Tn1(w) converges to 0 uniformly over A. The second term

Tn2(w)M1P(n,L)(m1,,mL)Ωn,Lk=1Lr=1LI(|Xmk(w)Xmr(w)|<η).

Since the class of indicator functions is manageable, we have

1P(n,L)(m1,,mL)Ωn,Lk=1Lr=1LI(|Xmk(w)Xmr(w)|<η)

converges uniformly almost surely to k=1Lr=1LP(|Xmk(w)Xmr(w)|<η) by uniform convergence of U-processes [12]. Letting η0, k=1Lr=1LP(|Xmk(w)Xmr(w)|<η)0 uniformly over A as n. This completes the proof of Theorem 3.1.

Proof of Theorem 3.2

Without loss of generality, we just prove the case when L = 3. Define the functions Γsn(w)=ρ^bsa,s(w)ρ^bsa,s(w0), and Γ0(w)=ρbsa(w)ρbsa(w0). According to Theorem 2 of [18], to prove Theorem 2, we only need to show

Γsn(w)=12(ww0)TΣ(w0)(ww0)+1n(ww0)TWn(w0)+op(|ww0|2)+op(1/n), (A5)

uniformly in op(1) neighborhood of w0, where Wn(w0)dN(0,Σ(w0)).

First, let

τs(v,w)=Ehs(v,V2,V3,w)+Ehs(V1,v,V3,w)+Ehs(V1,V2,v,w),

where hs(vm1,vm2,vm3,w)=1Ψs(vm1,vm2,vm3,w)Mn,3. Under the condition (C4) and by similar derivations as (A.1) in [20], rewriting τs(v,w) as an integral, changing variables, the Taylor expansion technique, and the convergence of Mn,3pγ3(l=13pl)1 shown in [14], we can get

1τs(v,w)=1τ(v,w)+O(σn),and2τs(v,w)=2τ(v,w)+O(σn). (A6)

Define fs(vm1,vm2,vm3,w)=hs(vm1,vm2,vm3,w)hs(vm1,vm2,vm3,w0). Using same notations as in [19], denote Pn as the empirical measure that places mass n1 at each Vi, and denote Unk as a random probability measure putting mass 1/n(n1)(nk+1) at each ordered k-tuple, k = 2, 3. Then by the similar technique as the Hoeffding decomposition [17], it follows that

Γsn(w)=Γs0(w)+Pnfs1(,w)+Un2fs2(,,w)+Un3fs3(,,,w), (A7)

where

Γs0(w)=Efs(V1,V2,V3,w),fs1(v,w)=Efs(v,V2,V3,w)+Efs(V1,v,V3,w)+Efs(V1,V2,v,w)3Γs0(w),fs2(v1,v2,w)=Efs(v1,v2,V3,w)+Efs(v2,V3,v1,w)+Efs(V3,v1,v2,w)Efs(v1,V2,V3,w)Efs(v2,V3,V1,w)Efs(V3,v1,V2,w)Efs(V1,v2,V3,w)Efs(V2,V3,v1,w)Efs(V3,V1,v2,w)+3Γs0(w),fs3(v1,v2,v3,w)=Efs(v1,v2,v3,w)Efs(v1,v2,V3,w)Efs(v1,V2,v3,w)Efs(V1,v2,v3,w)+Efs(v1,V2,V3,w)+Efs(V1,v2,V3,w)+Efs(V1,V2,v3,w)Γs0(w).

In the following, we show

Γs0(w)=12(ww0)TΣ(w0)(ww0)+O(σn|ww0|)+o(|ww0|2),uniformly in anop(1)neighborhood ofw0, (A8)
Pnfs1(,w)=1n(ww0)TWn(w0)+O(σn|ww0|)+o(|ww0|2),uniformly in anop(1)neighborhood ofw0, (A9)

and

Un2fs2(,,w)+Un3fs3(,,,w)=op(1/n), (A10)

uniformly in an op(1) neighborhood of w0.

For the Equation (A8), recall the definition of τs(v,w). By expanding τs(v,w) about w0 and the Equation (A6), it follows that

τs(v,w)=τs(v,w0)+(ww0)T1τs(v,w0)+12(ww0)T2τs(v,w)(ww0)=τs(v,w0)+(ww0)T1τ(v,w0)+12(ww0)T2τ(v,w0)(ww0)+12(ww0)T[2τ(v,w)2τ(v,w0)](ww0)+O(σn|ww0|)+o(|ww0|2), (A11)

where w is between w and w0. Replacing v with V, and taking expectation in (A11) on both sizes, and apply the Lipschitz condition (C4), we have

3Γs0(w)=(ww0)TE1τ(V,w0)+12(ww0)TE2τ(V,w0)(ww0)+O(σn|ww0|)+o(|ww0|2).

Note that w0 is the maximizer of Γ0(w). Thus E1τ(V,w0)=0. This finishes the proof of the Equation (A8).

For the Equation (A9), note that according to the definition of τs(v,w), we have

fs1(v,w)=τs(v,w)τs(v,w0)3Γs0(w). (A12)

Thus according to (A8) and (A11), we have

Pnfs1(,w)=1n(ww0)TWn(w0)+12(ww0)TDn(w0)(ww0)+Rn(w)+o(|ww0|2)+O(σn|ww0|), (A13)

where

Wn(w0)=nPn1τ(,w0),Dn(w0)=Pn2τ(,w0)E2τ(V,w0),Rn(w)=12(ww0)TPn(2τ(,w)2τ(,w0))(ww0).

Deduce from (iii) in the regularity condition (C5) and note the fact that E1τ(V,w0)=0, it follows that Wn(w0) converges to N(0,Σ(w0)). According to (iv) in the regularity condition (C5) and a weak law of large numbers, Dn(w0) converges to zero in probability as n tends to infinity. Finally, deduce from (ii) in the regularity condition (C5), we get |Rn(w)||ww0|3PnM(). Thus, by the integrability of M(v) and a weak law of large numbers, we have |Rn(w)|=op(|ww0|2) uniformly over op(1) neighborhoods of w0. This finishes the proof of (A9).

Lastly we consider the Equation (A10). Define Ψs(vm1,vm2,vm3,w)=Ψs(vm1,vm2,vm3,w)Ψs(vm1,vm2,vm3,w0), and

Ψs2(v1,v2,w)=EΨs(v1,v2,V3,w)+EΨs(v2,V3,v1,w)+EΨs(V3,v1,v2,w)EΨs(v1,V2,V3,w)EΨs(v2,V3,V1,w)EΨs(V3,v1,V2,w)EΨs(V1,v2,V3,w)EΨs(V2,V3,v1,w)EΨs(V3,V1,v2,w)+3EΨs(V1,V2,V3,w),Ψs3(v1,v2,v3,w)=EΨs(v1,v2,v3,w)EΨs(v1,v2,V3,w)EΨs(v1,V2,v3,w)EΨs(V1,v2,v3,w)+EΨs(v1,V2,V3,w)+EΨs(V1,v2,V3,w)+EΨs(V1,V2,v3,w)EΨs(V1,V2,V3,w).

Let the class of functions F1={Ψs2(v1,v2,w):wA}, and F2={Ψs3(v1,v2,v3,w):wA}. Clearly, for wA, EΨs2(V1,v2,w)=EΨs2(v1,V2,w)=0 and EΨs3(V1,v2,v3,w)=EΨs3(v1,V2,v3,w)=EΨs3(v1,v2,V3,w)=0. That is, Un2Ψs2(,,w) and Un3Ψs3(,,,w) are, respectively, the degenerate U-statistic of order 2 and 3 on SS and SSS.

Note that Ψs2(v1,v2,w0)=0 and Ψs3(v1,v2,v3,w0)=0. By the continuities of Ψs2 and Ψs3, it follows from the dominated convergence theorem that EΨs22(v1,v2,w)0 and EΨs32(v1,v2,v3,w)0 as ww0. Moreover, by the Lemma 2.14 in [13], we can show that F1 and F2 are Euclidean for constant envelope. Therefore, by the Theorem 3 in [18], it follows that Un2Ψs2(,,w)=op(1/n) and Un3Ψs3(,,,w)=op(1/n), uniformly in an op(1) neighborhood of w0. Lastly, by the convergence of Mn,3pγ3(l=13pl)1 again, and note that Un2fs2(,,w)=Mn,3Un2Ψs2(,,w) and Un3fs3(,,,w)=Mn,3Un3Ψs3(,,,w), we conclude the proof of (A10).

Proof of Theorem 3.3

According to the Taylor expansion, we have

ρ^bsa,s(w^s)=ρ^bsa,s(w0)+[1ρ^bsa,s(w0)]T(w^sw0)+op(w^sw0).

Thus, it follows that

n{ρ^bsa,s(w^s)ρbsa(w0)}=n{ρ^bsa,s(w^s)ρ^bsa,s(w0)}+n{ρ^bsa,s(w0)ρbsa(w0)}=[1ρ^bsa,s(w0)]Tn{w^sw0}+n{ρ^bsa,s(w0)ρbsa(w0)}+op(n{w^sw0}).

By some calculations, we get

1ρ^bsa,s(w0)=1ρbsa(w0)+Op(σn),andρ^bsa,s(w0)=ρbsa(w0)+Op(σn).

Therefore, by Slutsky's theorem, we finish the proof of Theorem 3.3.

Funding Statement

This research project was supported by grants from National Institute of Health (R01MH079448, R01HL113548, and R01MH105561). Qiu's work was supported by the National Natural Science Foundation of China (NSFC) (12071164) and the Center for Applied Mathematics of Fujian Province (FJNU).

Disclosure statement

No potential conflict of interest was reported by the author(s).

References

  • 1.Agresti A., Categorical Data Analysis, John Wiley & Sons, New York, 2012. [Google Scholar]
  • 2.Edwards M.C., Gardner E.S., Chelonis J.J., Schulz E.G., Flake R.A., and Diaz P.F., Estimates of the validity and utility of the conners' continuous performance test in the assessment of inattentive and/or hyperactive-impulsive behaviors in children, J. Abnorm. Child. Psychol. 35 (2007), pp. 393–404. [DOI] [PubMed] [Google Scholar]
  • 3.Foa E.B., Riggs D.S., Dancu C.V., and Rothbaum B.O., Reliability and validity of a brief instrument for assessing post-traumatic stress disorder, J. Trauma. Stress. 6 (1993), pp. 459–473. [Google Scholar]
  • 4.Gillespie C.F., Bradley B., Mercer K., Smith A.K., Conneely K., Gapen M., Weiss T., Schwartz A.C., Cubells J.F., and Ressler K.J., Trauma exposure and stress-related disorders in inner city primary care patients, Gen. Hosp. Psychiatry. 31 (2009), pp. 505–514. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Han A.K., Non-parametric analysis of a generalized regression model: the maximum rank correlation estimator, J. Econom. 35 (1987), pp. 303–316. [Google Scholar]
  • 6.Kessler R.C., Berglund P., Demler O., Jin R., Merikangas K.R., and Walters E.E., Lifetime prevalence and age-of-onset distributions of DSM-IV disorders in the national comorbidity survey replication, Arch. Gen. Psychiatry. 62 (2005), pp. 593–602. [DOI] [PubMed] [Google Scholar]
  • 7.Khan S. and Tamer E., Partial rank estimation of duration models with general forms of censoring, J. Econom. 136 (2007), pp. 251–280. [Google Scholar]
  • 8.Li J., Chow Y., Wong W.K., and Wong T.Y., Sorting multiple classes in multi-dimensional roc analysis: parametric and nonparametric approaches, Biomarkers 19 (2014), pp. 1–8. [DOI] [PubMed] [Google Scholar]
  • 9.Ma S. and Huang J., Regularized ROC method for disease classification and biomarker selection with microarray data, Bioinformatics 21 (2005), pp. 4356–4362. [DOI] [PubMed] [Google Scholar]
  • 10.McDowell I., Measuring Health: A Guide to Rating Scales and Questionnaires, Oxford University Press, New York, 2006. [Google Scholar]
  • 11.Newey W.K. and McFadden D., Large sample estimation and hypothesis testing, Handb. Econom. 4 (1994), pp. 2111–2245. [Google Scholar]
  • 12.Nolan D. and Pollard D., U-processes: rates of convergence, Ann. Stat. 15 (1987), pp. 780–799. [Google Scholar]
  • 13.Pakes A. and Pollard D., Simulation and the asymptotics of optimization estimators, Econometrica 57 (1989), pp. 1027–1057. [Google Scholar]
  • 14.Peng L., Li R., Guo Y., and Manatunga A., A framework for assessing broad sense agreement between ordinal and continuous measurements, J. Am. Stat. Assoc. 106 (2011), pp. 1592–1601. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Qiu Z., Qin J., and Zhou Y., Composite estimating equation method for the accelerated failure time model with length-biased sampling data, Scand. J. Statist. 43 (2016), pp. 396–415. [Google Scholar]
  • 16.Qiu Z., Wan A.T., Zhou Y., and Gilbert P.B., Smoothed rank regression for the accelerated failure time competing risks model with missing cause of failure, Stat. Sin. 29 (2019), pp. 23–46. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Serfling R.J., Approximation Theorems of Mathematical Statistics, John Wiley & Sons, New York, 1980. [Google Scholar]
  • 18.Sherman R.P., The limiting distribution of the maximum rank correlation estimator, Econometrica 61 (1993), pp. 123–137. [Google Scholar]
  • 19.Sherman R.P., Maximal inequalities for degenerate U-processes with applications to optimization estimators, Ann. Stat. 22 (1994), pp. 439–459. [Google Scholar]
  • 20.Song X., Ma S., Huang J., and Zhou X.-H., A semiparametric approach for the nonparametric transformation survival model with multiple covariates, Biostatistics 8 (2007), pp. 197–211. [DOI] [PubMed] [Google Scholar]
  • 21.Spitzer R.L., Kroenke K., and Williams J.B., Patient Health Questionnaire Primary Care Study Group and Patient Health Questionnaire Primary Care Study Group and Others , Validation and utility of a self-report version of prime-md: the phq primary care study, JAMA 282 (1999), pp. 1737–1744. [DOI] [PubMed] [Google Scholar]
  • 22.Wei B., Dai T., Peng L., Guo Y., and Manatunga A, A new functional representation of broad sense agreement, Statist. Probab. Lett. 158 (2020), p. 108619. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Journal of Applied Statistics are provided here courtesy of Taylor & Francis

RESOURCES