Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2020 Sep 12.
Published in final edited form as: Biometrics. 2019 Dec 25;76(3):853–862. doi: 10.1111/biom.13199

Learning-based Biomarker-assisted Rules for Optimized Clinical Benefit under a Risk-constraint

Yanqing Wang 1, Ying-Qi Zhao 2, Yingye Zheng 2,*
PMCID: PMC7292743  NIHMSID: NIHMS1572026  PMID: 31833561

Summary:

Novel biomarkers, in combination with currently available clinical information, have been sought to improve clinical decision making in many branches of medicine, including screening, surveillance, and prognosis. Statistical methods are needed to integrate such diverse information to develop targeted interventions that balance benefit and harm. In the specific setting of disease detection, we propose novel approaches to construct a multiple-marker-based decision rule by directly optimizing a benefit function, while controlling harm at a maximally tolerable level. These new approaches include plug-in and direct-optimization-based algorithms, and they allow for the construction of both nonparametric and parametric rules. A study of asymptotic properties of the proposed estimators is provided. Simulation results demonstrate good clinical utilities for the resulting decision rules under various scenarios. The methods are applied to a biomarker study in prostate cancer surveillance.

Keywords: Biomarker, Clinical decision rules, False positive fraction, Machine learning, True positive fraction

1. Introduction

Novel biomarkers, in combination with clinical information, can be used for developing an individualized clinical decision rule (ICDR) with great potential to enhance clinical decision making. It is well recognized that while there are survival benefits associated with detecting and treating aggressive and early-stage diseases, widespread screening often leads to over-detection of low-risk individuals for whom treatment is unnecessary. It is therefore important that a disease management strategy aims to optimize outcomes of the target population while controlling for financial and human costs. Statistical methods for deriving biomarker-assisted decision rules that incorporate clinical consequences are urgently needed.

The Canary Prostate Active Surveillance Study (PASS) is a prospective cohort enrolling over 1,550 men with clinically localized prostate cancer who have chosen to manage their cancers using active surveillance (Newcomb et al., 2010). An array of blood- and urine-based biomarkers have emerged (Parekh et al., 2015) with the potential to improve the clinical management of patients under active surveillance. In particular, a clinical decision rule is sought to maximize the identification of aggressive cancers (Gleason score 7 and above) while controlling for over-treatment of patients for whom surgery will not render benefit (‘rule in’ rule). Meanwhile, a second decision rule is sought to maximize the number of low-risk individuals spared intrusive biopsies while controlling the risk of missing aggressive cancers (‘rule out’ rule). The development of clinical decision rules that maximize benefit while controlling for harm for patients under active surveillance remains challenging.

When multiple biomarkers/predictors are available, the standard ‘regression-based’ approach for deriving a decision rule is often conducted in two steps. A working regression model that relates outcome with multiple biomarkers would first be fitted to obtain a model score, a function of the linear combination of covariates in the model. A cutoff of the score is then calculated according to the performance criterion one intended to control for. Such an approach is simple to derive, and under the correct specification of the working model, it will naturally lead to optimized results (McIntosh and Pepe, 2002). A limitation of the regression-based approach is that when the working model is seriously misspecified, the intended performance may be suboptimal. More recent consideration includes searching for biomarker combinatorial algorithms based on functionals of the receiver operating characteristic (ROC) curves, in particular, the area under the curve (AUC) (Pepe et al., 2006; Ma and Huang, 2007). However these approaches do not lead directly to a decision rule (with the cutoff for positivity), nor do they incorporate the constraint from harm on the benefit function, which is the focus of our research here.

In this manuscript, we introduce a novel criterion, called risk-constrained clinical benefit (RCCB) function, to aid the development and evaluation of decision rules used in cancer screening and surveillance. The goal is to optimize a screening benefit function while controlling harm at a tolerable bound. Hence, we need to optimize a constrained benefit function, which is related to two different measures of clinical utility as a result of the decision rule. In keeping with the traditional measure of medical diagnostic literature, we use the true positive fraction (TPF) and false positive fraction (FPF) to quantify the two aspects of clinical consequences. In the example of PASS, ‘rule in’ represents the case of maximizing the true positive fraction (TPF) while controlling false positive fraction (FPF), and ‘rule out’ corresponds to the case of minimizing the FPF with a constraint on TPF. The decision rule derived is practically very appealing as it readily results in a clinically desired outcome. Providing solutions to such complex functions in the disease classification setting can be computationally challenging as they involve a non-differentiable and discontinuous function, especially with an added constraint of similar form.

We propose novel algorithms for the derivation of ICDRs by directly optimizing the RCCB function. First, we consider a plug-in approach, which allows combining biomarkers both parametrically, e.g., based on a logistic regression for a binary disease outcome; or nonparametrically, e.g., based on learning algorithms such as the random forest (Breiman, 2001). To improve the robustness of a linear ICDR, we further propose a direct-optimization approach to construct a linear decision rule. In contrast with the regression-model-based approaches, the advantage of such a linear decision rule is that it does not rely on any underlying model assumption. The proposed direct-optimization approach leads to asymptotically tractable and computationally efficient estimates. We additionally describe a kernel-based nonparametric rule under the direct-optimization framework.

Our proposed work, with the objective of maximizing diagnostic accuracy in a classification problem, draws an analogy in the framework of deriving personalized treatment regimens (PTRs) for optimizing the mean outcome. This opens up the path for the adaptation of an array of powerful learning algorithms developed for PTRs in the setting of medical diagnosis. There is much literature in the area of treatment selection (Zhao et al., 2012; Chen et al., 2017; Qiu et al., 2018; Cai et al., 2019). The further consideration of competing outcomes instead of a single outcome in this setting has been limited and is relatively new (Laber et al., 2014; Luedtke and van der Laan, 2016; Wang et al., 2018; Linn et al., 2015). For example, a recent work of Wang et al. (2018) proposed machine learning approaches to identify optimal treatment strategies that maximize clinical reward function under a risk constraint. Compared to existing work, our manuscript provides a few further developments. First, we extended the framework of the learning-based ICDR in the specific settings of disease screening and diagnosis. Such settings have different clinical goals, with improving the diagnostic accuracy (e.g., sensitivity and specificity) as the key objective. We provide estimating algorithms that are specifically designed for our objective functions. For example, it is more di cult to develop algorithms that guarantee convergence for the binary endpoints, especially for data with small sample size. Hence, instead of using an existing quadratic programming package, as employed in Wang et al. (2018), we implemented a subgradient approach. In addition, we studied theoretical properties, including the consistency and weak convergence of the proposed estimators.

The manuscript is organized as follows: we introduce the RCCB function in Section 2. We propose new methods for estimating the optimal decision rule that maximizes the RCCB function in Section 3. In Section 4, we discuss the asymptotic properties of the proposed estimators. Simulation studies evaluating the proposed procedures are presented in Section 5. In Section 6, we illustrate our methods with a biomarker study for the active surveillance of prostate cancer from the PASS.

2. The Risk-Constrained Clinical Benefit (RCCB) Functions

2.1. Definition

We consider a study consisting of n subjects observed for a binary outcome D, with D = 1 indicating the presence of a specific disease outcome. Let Z denote biomarkers and other clinical predictors used for deriving prediction rules. Denote R(Z) as an ICDR, with R(Z)=1 indicating whether an intervention should be prescribed (e.g., biopsies or surgery). In this manuscript, we consider two types of rules: a nonparametric rule denoted as R(Z):I{f(Z)>0}, or a parametric rule denoted as R(Z):I{f(Z,β)>0} indexed by parameter β.

We emphasize here that R(Z) is best sought based on the clinical goals of the study. When using risk markers for disease screening, this often relates to ‘rule in’ or ‘rule out’ patients for more aggressive interventions. The consequences of such a prediction rule can be summarized by defining a pair of key elements, the TPF(R) and FPF(R), as TPF(R)=pr{R(Z)=1|D=1} and FPF(R)=pr{R(Z)=1|D=0}. TPF(R) is related to the gain of the rule, the extent to which individuals who are in need of intervention are being identified; and FPF(R) is related to the cost of the rule in terms of over-treating individuals who should be spared the procedure. There are many practical situations where decision rules are sought to maximize the benefit within a range of acceptable risk. The RCCB function, denoted by ΦRCCB(R), can take the form of

maxRTPF(R)  subject to  FPF(R)α,

when the primary goal is to ‘rule in’ patients for further intervention while controlling for the error of over-treatment; or

maxR{1FPF(R)}  subject to  TPF(R)1γ,

when the focus is on ‘rule out’ patients for an invasive follow-up procedure while controlling for the error of missing individuals who are in need of the intervention. Such value functions take into account the trade-off between benefit and risk of clinical action and provide a common scale of performance criterion upon which decisions can be compared. Next, we formulate estimation procedures for deriving decision rules under these constrained objective functions. Throughout, we will use the ‘rule in’ criterion as an example. The solution for the ‘rule out’ criterion follows similarly.

2.2. Optimal Decision Rule

Denote the density function of Z as p(Z), the conditional probability of D = 1 given Z as pr(D = 1|Z), and pk = pr(D = k) for k = {0, 1}. We assume that both pr(D = 1|Z) and p(Z) are continuous functions of Z, and ϵ1 < pr(D = k|Z) < 1 − ϵ1 and ϵ2 < pk < 1 − ϵ2 almost surely for some positive constants ϵ1 and ϵ2. For notational simplicity, we further define η1(Z) = pr(D = 1|Z)/p1 and η0(Z) = pr(D = 0|Z)/p0. Write TPF(R)=pr{R(Z)=1|D=1}=EZ[I{R(Z)=1}η1(Z)], and FPF(R)=pr{R(Z)=1|D=0}=EZ[I{R(Z)=1}η0(Z)].

For a ‘rule in’ case, we are interested in maximizing TPF(R) while controlling FPF(R) at the level of α. Then, the optimization problem becomes

maxREZ[I{R(Z)=1}η1(Z)],  subject to  EZ[I{R(Z)=1}η0(Z)]α.

Applying the Karush–Kuhn–Tucker (KKT) conditions (Boyd and Vandenberghe, 2004), we can instead maximize

EZ[I{R(Z)=1}{η1(Z)λη0(Z)}], (1)

such that EZ[I{R(Z)=1}η0(Z)]α0,λ0, and λ(EZ[I{R(Z)=1}η0(Z)]α)=0. Hence it is equivalent to maximize Equation (1) with respect to R(Z) under two situations: λ= 0 with constraint FPF(R)α0, or λ > 0 with FPF(R)α=0. When λ = 0, the optimal decision rule is R*(Z)=I{η1(Z)>0}. Since η1(Z) > 0 almost surely for all Z, we have R*(Z)1, and FPF(R)1. With α ∈ (0, 1), we only discuss the case of λ > 0.

From Equation (1), it is clear that the maxima can be achieved with Rλ(Z)=I{η1(Z)λη0(Z)>0}, where λ can be solved via

FPF(Rλ)EZ[I{η1(Z)λη0(Z)>0}η0(Z)]=α. (2)

Assume that pr(D = 1|Z) and p(Z) are continuous functions of Z, the left side of Equation (2) is a strictly decreasing function of λ, and as λ changes from 0 to +∞, its value varies from 1 to 0. Hence, Equation (2) has a unique solution at λ*, and the optimal ICDR is

Rλ*(Z)=I{η1(Z)λ*η0(Z)>0}. (3)

Remark 1: We assume the continuity of Z for convenience. For the case that pr(D = 1|Z) or/and p(Z) are not continuous, there is a positive probability that there does not exist a value of λ satisfying Equation (2), or there exist many values of λ satisfying Equation (2). Therefore the solution may not be unique. Under such settings, we suggest that λ is solved via λ*=inf{λ:FPF(Rλ)=α˜}, where α˜=sup{FPF(Rλ):FPF(Rλ)α}.

3. Estimation

3.1. The Plug-In Approach

Based on the analytical form of Rλ*(Z) shown in Equation (3), a straightforward approach is to estimate η0(Z), η1(Z) and λ*, and then plug in these estimates to obtain the estimated ICDR. To remove the dependence in estimating λ*, η0(Z), and η1(Z), we randomly split the data into two sets with sample sizes n1 and n2 respectively, where n1/n is bounded away from 0 and 1. The first set is used to estimate the conditional models pr(D = k|Z) for k = {0, 1}, either using a parametric approach such as the logistic regression (plugin-lg), or a nonparametric approach such as the random forest (plug-in-rf). We then use the data in the second set to obtain pr^(Di=k|Zi), for i′ = 1,⋯,n2. Let R^λ(Z)=I{η^1(Z)λη^0(Z)>0}, where η^0(Zi)=pr^(Di=0|Zi)/p^0, p^0=n21i=1n2(1Di) and η^1(Zi)=pr^(Di=1|Zi)/p^1,p^1=1p^0 , an estimator for FPF(Rλ) is given by FPF^(R^λ^)=n21(p^0)1i=1n2R^λ^(Zi)(1Di). since λ^ is the solution to FPF^(R^λ)α=0, FPF^(R^λ^)=α. An estimator of TPF(Rλ) is TPF^(R^λ^)=n21(p^1)1i=1n2R^λ^(Zi)Di. Note that one may also employ K-fold cross-validation here to improve the robustness of the estimation. For a new subject with biomarker Z˜i, the estimated plug-in decision rule is: R^(Z˜i)=I{η^1(Z˜i)λ^η^0(Z˜i)>0}. Since the performance of plug-in-lg and plug-in-rf estimators may vary by studies due to their different assumptions on the data structure, one may consider a cross-validation-based data-driven approach (plug-in-cv) to automatically select between the two approaches to estimating pr(D = 1|Z).

Although the nonparametric plug-in rule is expected to be robust, in clinical settings a simple parametric rule such as a linear rule might be appealing due to ease of implementation and interpretability. However, the performance of a linear plug-in rule might not be satisfactory if the working regression model is misspecified. In Section 3.2, we propose a direct-optimization-based algorithm to derive a more robust linear ICDR.

3.2. The Direct-Optimization-Based Approach (DOBA)

We now consider a parametric ICDR R(Z):I{f(Z,β)>0} taking a linear form. Let Z=(1,Z), a vector of covariates including 1 as its first element. Then f(Z, β) is of the form ZTβ=β0+ZTβ1, with β=(β0,β1T)T. We propose to estimate β through a DOBA. Our goal is to solve for the parameters by maximizing the following constrained objective function TPF(β)EZ{I(ZTβ>0)η1(Z)}, subject to FPF(β)EZ{I(ZTβ>0)η0(Z)}  α. Again applying the KKT conditions, the objective is equivalent to maximizing EZ[I(ZTβ>0){η1(Z)λη0(Z)}] with respect to β. Using the same argument as in Section 2.2, we only need to consider the case of λ > 0. With a fixed value of λ, let βλ*=argminβL(β;λ), where L(β;λ)=EZ{I(ZTβ<0)η1(Z)}+λEZ{I(ZTβ<0)η0(Z)}. Then λ* is the solution to FPF(βλ*)=α, and subsequently β*=βλ*|λ=λ*.

Direct optimization of L(β; λ) is computationally difficult since the indicator function I(ZTβ>0) is not differentiable with respect to β. We propose to approximate the 0–1 loss with a convex loss function. Denote the convex approximation function as ϕ(t). The resulting objective function is Lϕ(β;λ)=EZ{ϕ(ZTβ)η1(Z)}+λEZ{ϕ(ZTβ)η0(Z)}. Denote βλϕ=argminβLϕ(β;λ) for a fixed λ, λϕ is the solution to FPF(βλϕ)α=0, and βϕ=βλϕ|λ=λϕ. We can set ϕ(t) as the logistic loss, ϕ(t) = log(1 + et), or the hinge loss, ϕ(t) = max(1 − t, 0) (Cortes and Vapnik, 1995). Both have demonstrated good computational and theoretical properties in machine learning literature (Bartlett et al., 2006; Nguyen and Sanner, 2013).

Estimations can be based on the following empirical function

Ln,ϕ(β;λ)=n1i=1nϕ(ZiTβ)p^11Di+λϕ(ZiTβ)p^01(1Di), (4)

with the constraint that FPF^(β)α=0, where FPF^(β)=n1i=1nI(ZiTβ>0)p^01(1Di), p^0=n1i=1n(1Di). We keep the 0–1 loss function in the constraint to preserve the accuracy for estimating FPF. For identifiability, we set β0 ∈ {−1, 0, 1}. Since Ln,ϕ(β; λ) is a convex function for β1 when β0 and λ are fixed, the unique estimate of β1 can be obtained by solving ∂Ln,ϕ(β; λ)/ ∂β1 = 0. For a fixed λ, β^λϕ=argminβLn,ϕ(β;λ). Subsequently, λ^ϕ is solved by identifying the corresponding β^λϕ that satisfies FPF^(β^λϕ)=α. The final estimator β^ϕ=β^λϕ|λ=λ^ϕ. Finally, an estimator for TPF(β) under the constraint is given by TPF^(β^ϕ)=n1p^11i=1nI(ZiTβ^ϕ>0)Di, p^1=1p^0.

In addition, we note that nonparametric forms of f(Z) can also be considered under the framework of the DOBA. In Section 1 of the Supporting Information, we provide details on estimating the optimal ICDR of the form f(Z) = h(Z) + b, where bR, h(Z)HK, and HK is a reproducing kernel Hilbert space (RKHS) associated with a kernel function K (Zhao et al., 2012; Zhou et al., 2017).

4. Asymptotic Properties

4.1. Asymptotic Properties of the Plug-In Rule

Theorem 1: Assume that n1/n is bounded away from 0 and 1, with at least probability of 1 – δ, δ ∈ (0, 1), we have supz|η^1(z)η1(z)|<c1(δ)n1ζ and supz|η^0(z)η0(z)|<c2(δ)n1ζ for some bounded values c1(δ) and c2(δ) and some positive constant ζ. Then as n → ∞, it holds that λ^λ*.

The proof is provided in Section 2 in the Supporting Information. Parameter ζ relies on the estimation method for pr(D = 1|Z). For example, the estimator by random forest in Breiman (2004) would have the estimated variance with order O(n10.75slog(2)+0.75) under some mild assumptions on the distribution of Z (Biau, 2012), where S is the number of strong features of Z used in the estimating process. Then by Chebyshev’s inequality, we have ζ=0.752{slog(2)+0.75}. If a parametric regression model such as a logistic model is correctly specified, we have ζ = 1/2.

The adaptive procedure was similar to the discrete super learner (van der Laan et al., 2007). The plug-in-cv estimator will converge as fast as plug-in-lg if the parametric logistic model is the true generative model, otherwise, plug-in-rf will be the oracle selector and the resulting plug-in-cv estimator performs asymptotically as well as the plug-in-rf estimator. This justifies the asymptotic variance of the plug-in-cv estimator.

4.2. Asymptotic Properties of the Linear Decision Rule

In Section 4.2.1 we show the linear decision rule, derived using the approximations of the logistic loss or the hinge loss, has the property of Fisher consistency when the underlying optimal rule is linear. We then establish the consistency of (β^ϕ,λ^ϕ) and TPF^(β^ϕ) in Section 4.2.2 under the constraint FPF^(β^ϕ)=FPF(β*)=α. Moreover, we derive the asymptotic distributions of, β^1ϕ,λ^ϕ and TPF^(β^ϕ) in Section 4.2.3, where β^1ϕ is the estimated slope in the linear decision rule.

4.2.1. Fisher Consistency of Linear Decision Rule.

We first establish the relationship between β* and βϕ.

Theorem 2: Assume that the optimal decision rule for the constrained objective is a linear function of Z. Let βλ*=argminβL(β;λ) for a fixed value of λ, and let λ* be the solution to FPF(βλ*)=α and β*=βλ*|λ=λ*. It holds that sign(ZTβϕ)=sign(ZTβ*) and β* = βϕ given that β0*,β0ϕ{0,1}.

The proof is given in Section 3 of the Supporting Information. The theorem justifies the validity of using the logistic loss and the hinge loss as surrogates for 0–1 loss in our constrained optimization problem.

4.2.2. Consistency of Linear Decision Rule.

In this section, we show that (β^ϕ,λ^ϕ) is consistent for (β*, λ*). We then prove the consistency of TPF^(β^ϕ) with respect to TPF(β*).

Theorem 3: Suppose Z is bounded, (β; λ) belongs to a bounded set and ‖β0‖ = {0, 1}, as n → ∞, it holds that

  • (β^ϕ,λ^ϕ)p(β*,λ*);

  • TPF^(β^ϕ)pTPF(β*).

The proof is deferred to Section 4 of the Supporting Information. Theorem 3 first shows that the estimated linear rule using the proposed procedure converges in probability to the optimal linear rule when the specific surrogate function is used to approximate the 0–1 loss. Then it shows the consistency of TPF^(β^ϕ).

4.2.3. Asymptotic Distributions of Linear Decision Rule.

In this section, we provide the weak convergence of (β^1ϕ,λ^ϕ), which can be used to form confidence intervals of β1ϕ. The weak convergence of the estimated TPF function, TPF^(β^ϕ), is also presented.

Theorem 4: As n → ∞, under the same conditions as in Theorem 2 and with known value of β0, as β^0ϕ=β0*, the asymptotic distributions of β^1ϕ and TPF^(β^ϕ) are given as

n1/2(β^1ϕβ1*λ^ϕλ*)~Normal(0,Vβ1,λ)
n1/2(TPF^(β^ϕ)TPF(β*))~Normal(0,VTPF).

Details of the expressions for Vβ1,λ and VTPF, and the proof for Theorem 4 are provided in Section 5 of the Supporting Information.

5. Simulation

We conducted numerical studies to assess the finite sample performance of our proposed estimators. Two independent biomarkers (Z1, Z2) were generated under two scenarios. In the first scenario, they followed a bivariate normal distribution with mean μ1 = 2 and variance σ1 = 1 for D = 1 and μ0 = 1 and σ0 = 1 for D = 0. Such a configuration led to a logistic regression model as logit{P(D = 1|Z1, Z2)} = θ0 + θ11Z1 + θ12Z2, where θ0 = −3 and θ11 = θ12 = 1. In addition, a noise variable Z3 not contributing to D was independently generated from the standard normal distribution, and we included it in the analysis to verify the performance of our proposed methods. A linear combinatorial rule in this setting would be optimal among all classes of combinatorial rules (Baker, 2000; McIntosh and Pepe, 2002). For the case of ‘rule in’, since TPF is maximized at FPF = α, we derived the optimal decision rule as of the form −1 + β11Z1 + β12Z2 + β13Z3 ⩾ 0 by solving for β11 and β12 under such a constraint, and obtained β11=β12={2+2Φ1(1α)}1 and β13 = 0, where Φ(·) is the cumulative distribution function of the standard normal distribution. Based on derived β parameters, we had that TPF was maximized at 1Φ{Φ1(1α)2}, with the constraint that FPF = α. For the case of ‘rule out’, we similarly derived the optimal decision rule of the form −1 + β11Z1 + β12Z2 + β13Z3 ⩾ 0, β11=β12={4+2Φ1γ}1, β13 = 0, and FPF was minimized at 1Φ{2+Φ1(γ)}, with the constraint that TPF = 1 – γ. In Scenario II, (1 – p*)100% of the subjects had their biomarkers generated following the first scenario; however the remaining p*100% of the subjects had their markers generated from a normal distribution with μ1 = −m or μ0 = m with m > 0. This may correspond to the practical setting where outliers are present, or subjects coming from a population with potential mixed disease subtypes. Under such a configuration, a logistic regression could not be induced, and the linear combinatorial rule is no longer optimal among all classes of rules.

We first conducted numerical studies to evaluate the performance of proposed estimators and contrasted them with existing standard approaches for identifying biomarker combinatorial rules and cutoff points. We applied the standard method by the following steps: (a) fit the data by a logistic model pr(D = 1|Z) = g{θ0 + (Z1, Z2, Z3)Tθ1}; (b) identify the cutoff from ZTθ1 that maximized TPF under the constraint on FPF; and (c) for comparison, transform θ into β under the constraint that ‖β0‖ ∈ {0, 1}. We considered three different methods, plug-in-rf, plug-in-lg, and plug-in-cv, to estimate pr(D = 1|Z) in the plug-in framework. For plug-in-cv, the dataset was randomly split into three equal folds, with two folds used to derive the rule by either plug-in-lg or plug-in-rf, and the remaining fold to calculate TPF and FPF. The procedure was repeated for all three possible configurations. For example, in the ‘rule in’ case, the plug-in method that yielded the highest averaged TPF value, while having the average estimated FPF within the margin of error of the constraint α (±α/5), was chosen as the final procedure.

For the DOBA, we calculated linear rules where the indicator function was approximated with either a logistic loss function (L-logistic) or a hinge loss function (L-hinge). We fixed β^0 at either −1, 0, or 1, and then chose the value with maximum TPF under the constraint of FPF for ‘rule in’ cases. In our simulation study, the correct from of β^0 was always selected when the linear decision rule is the optimal decision rule. The theoretical optimal linear rules were used as benchmarks to gauge the performance of all methods. We generated a training dataset with n = 200 and a testing set with sample size fixed at 1000. In both sets we let n0 = n1. We derived the decision rules using different methods and then applied the rules on the testing set. The procedure was repeated 1,000 times.

Table 1 shows the simulation result of Scenario I for ‘rule in’ with α = 0.3 and ‘rule out’ with γ = 0.1. Given the optimality of a logistic regression model under such a scenario, as expected, the standard approach performed well. It identified the linear combinatorial rule that was consistent and achieved optimal values for TPF in the testing set, while constraining FPF close to α for the ‘rule in’ case. Our two linear methods from DOBA performed similarly to the standard approach. They achieved slightly higher TPF in the ‘rule in’ setting but with the price of slightly higher FPF. The plug-in-lg had similar performance as the standard approach as the data can be fitted by a logistic regression very well. The plug-in-rf reached slightly lower TPF in ‘rule in’, perhaps because the nonparametric method cannot fit the data as well as the logistic model. The performance of the plug-in-cv method was encouraging and close to that of the plug-in-lg, which verified its ability to automatically choose the better fitting candidate model. These results support our theoretical results that under the setting when a linear rule is optimal, the proposed parametric-based estimators are consistent for the optimal rule. We also considered the finite sample performance of the proposed variance estimators for the parametric decision rules under the correctly specified linear model (Table 2). All estimators had a negligible bias. The standard errors tracked the empirical standard deviations well, and the coverage probabilities of the confidence intervals were close to the nominal 95% level. The performance was similar in the ‘rule out’ case.

Table 1.

Result of Scenario I for n = 200. OptimalL — best theoretical result by the linear combinatorial rule; Standard — results of the standard logistic fitting; L-logistic — results from the linear determination rule with logistic loss approximation; L-hinge — results from the linear determination rule with hinge loss approximation; plug-in-rf — results from plug-in personalized rule with random forest fitting; plug-in-lg — results from plug-in personalized rule with logistic fitting; plug-in-cv — selecting between plug-in-rf and plug-in-lg with cross-validation.

Method β0 seβ0 β1 seβ1 β2 seβ2 β3 seβ3 TPF seTPF FPF seFPF
α = 0.3 for ‘rule in’
OptimalL −1.000 0.365 0.365 0.000 0.813 0.300
Standard −1.000 0.000 0.361 0.054 0.361 0.054 −0.002 0.063 0.799 0.045 0.293 0.053
plug-in-rf 0.743 0.062 0.297 0.058
plug-in-lg 0.806 0.043 0.302 0.052
plug-in-cv 0.795 0.050 0.304 0.053
L-logistic −1.000 0.000 0.366 0.085 0.371 0.085 −0.002 0.111 0.803 0.045 0.310 0.052
L-hinge −1.000 0.000 0.370 0.093 0.365 0.096 −0.002 0.120 0.800 0.047 0.311 0.053
γ = 0.1 for ‘rule out’
OptimalL −1.000 5.330 5.330 0.000 0.900 0.395
Standard −1.000 0.000 0.485 0.094 0.485 0.093 −0.002 0.088 0.903 0.035 0.474 0.079
plug-in-rf 0.896 0.039 0.548 0.101
plug-in-lg 0.898 0.036 0.461 0.077
plug-in-cv 0.895 0.037 0.488 0.095
L-logistic −1.000 0.000 0.471 0.110 0.467 0.109 −0.005 0.128 0.891 0.035 0.457 0.072
L-hinge −1.000 0.000 0.468 0.109 0.469 0.113 −0.003 0.131 0.891 0.035 0.456 0.070

Table 2.

Result of Scenario I for n = 200. L-logistic — results from the linear determination rule with logistic loss approximation; L-hinge — results from the linear determination rule with hinge loss approximation.

optimal mean sd se coverage
α = 0.3 for ‘rule in’
L-logistic
β1 0.365 0.366 0.102 0.085 0.973
β2 0.365 0.371 0.139 0.085 0.974
β3 0.000 −0.002 0.165 0.111 0.977
TPF 0.813 0.803 0.065 0.045 0.973
L-hinge
β1 0.365 0.370 0.138 0.093 0.959
β2 0.365 0.365 0.187 0.096 0.964
β3 0.000 −0.002 0.194 0.120 0.968
TPF 0.813 0.800 0.077 0.047 0.968
γ = 0.1 for ‘rule out’
L-logistic
β1 0.457 0.471 0.116 0.110 0.954
β2 0.457 0.467 0.145 0.109 0.959
β3 0.000 −0.005 0.275 0.128 0.972
FPF 0.447 0.457 0.910 0.072 0.928
L-hinge
β1 0.457 0.468 0.125 0.109 0.945
β2 0.457 0.469 0.160 0.113 0.956
β3 0.000 −0.003 0.359 0.131 0.959
FPF 0.447 0.456 0.908 0.070 0.934

Table 3 presents the result of Scenario II with a small proportion of outliers (p* = 0.05, and m = 10). Under this setting, the standard approach did not perform well. In contrast, the proposed linear estimators from the DOBA were more robust to the ‘outliers’ or disease heterogeneity and yielded rules that improved over the standard method in TPF for the ‘rule in’ case. Although these two approaches reached slightly higher FPF, they show great improvement in TPF when compared to the standard approach. This suggests the potential advantage of our proposed parametric approach over the existing regression-based approach. The plug-in-lg did not perform well under this situation because a logistic regression failed to fit the data. The nonparametric plug-in-rf procedure in this case provided the best performance. More encouraging, the plug-in-cv had the same performance as the plug-in-rf, which means that this approach could automatically choose the better fitting method under various settings. Again, similar results were observed for the ‘rule out’ cases.

Table 3.

Result of Scenario II for n = 200, p*= 0.05 and m = 10. L-logistic — results from the linear determination rule with logistic loss approximation; L-hinge — results from the linear determination rule with hinge loss approximation; plug-in-rf— results from plug-in personalized rule with random forest fitting; plug-in-lg — results from plug-in personalized rule with logistic fitting; plug-in-cv — selecting between plug-in-rf and plug-in-lg with cross-validation.

Method β0 seβ0 β1 seβ1 β2 seβ2 β3 seβ3 TPF seTPF FPF seFPF
α = 0.3 for ‘rule in’
Standard −0.996 0.090 −0.064 1.463 −0.160 1.742 −0.078 2.260 0.221 0.107 0.220 0.109
plug-in-rf 0.763 0.055 0.293 0.054
plug-in-lg 0.213 0.114 0.214 0.118
plug-in-cv 0.763 0.055 0.293 0.054
L-logistic −0.996 0.090 0.336 0.131 0.347 0.121 −0.002 0.167 0.715 0.088 0.312 0.063
L-hinge −1.000 0.000 0.344 0.235 0.333 0.283 −0.013 0.292 0.704 0.107 0.308 0.063
γ = 0.1 for ‘rule out’
Standard 1.000 0.000 −0.016 0.275 −0.004 0.277 0.006 0.379 0.957 0.045 0.956 0.044
plug-in-rf 0.900 0.037 0.524 0.099
plug-in-lg 0.960 0.045 0.959 0.045
plug-in-cv 0.900 0.037 0.524 0.099
L-logistic −0.456 0.890 0.325 0.333 0.325 0.340 −0.011 0.232 0.857 0.049 0.614 0.198
L-hinge −0.970 0.243 0.554 0.179 0.560 0.169 −0.003 0.178 0.881 0.037 0.580 0.077

We additionally considered a scenario with a moderate proportion of outliers (p* = 0.2, and m = 10). Results are summarized in Section 6 of the Supporting Information. Under such a setting, the standard approach did not work as well as the L-logistics and the L-hinge approaches because the linear decision rule now is far away from the global optimal rule. Similarly, the performance of plug-in-lg was also not favorable. The plug-in-rf and plug-in-cv outperformed these four methods and demonstrated robustness in dealing with heterogeneous data.

In addition, simulation studies to verify the performance of kernel-based DOBA are provided in Section 7 of the Supporting Information. When there was only a small portion of outliers, the kernel-based DOBA performed similarly to the linear approach. When there was a moderate number of outliers, it outperformed the linear approach. However, its ability to constrain FPF/TFP at the boundary was generally weaker, which might be due to the computational di culties associated with the kernel-based method.

6. Real Data Analysis

We illustrate the approaches with a dataset from the PASS. The data consist of 478 patients who had a Gleason score of 6 at study entry. Among them, 94 had their disease reclassified as high-grade cancer after receiving a Gleason score of 7 or above at subsequent biopsies during active surveillance. For each patient, four predictors were used to construct the optimal decision rule: body mass index (BMI), prostate volume, the ratio of new biopsy cores containing cancer to total cores from the previous biopsy, and a new blood-based biomarker score (Lin et al., 2016). We aim to derive a ‘rule out’ decision rule for whether a follow-up biopsy should be performed.

The TPF was controlled at TPF > 1 – γ with γ = 0.10. Data were randomly split into three folds. Two folds were used to develop the decision rule, and the remaining fold was used to calculate the performance of the decision rule in terms of TPF and FPF. The process was repeated 200 times, and the average results are reported in Figure 1a. The three linear approaches performed similarly to the standard approach and the nonparametric plug-in procedure. The linear rule generated by L-hinge will spare 18% (95%CI: 5% – 31%) of the patients from an invasive biopsy procedure while missing only 7% (95%CI: 0%–14%) of reclassified patients. Such results are similar to Scenario I considered in our simulation studies, suggesting that the data can be approximated very well by a logistic regression model. We further perturbed the PASS data with additional outliers similar to the setup in Scenario II of the simulation. Specifically, we computed a risk score, ZTθ^, for each patient, where θ^ were the estimated parameters from logistic regression. We added to the data 30 (approximately 5% of the total population) reclassified patients whose risk scores ZTθ^ were approximately at or below the lower 5th percentile, and another 30 patients without reclassification whose scores were at or above the upper 5th percentiles. We repeated the procedure 200 times. As expected, the proposed nonparametric plug-in methods were the most robust to the additional outliers. In particular, the random forest estimates yielded the highest 1−FPFs given similar constraint on TPFs, and the plug-in-cv method generated results very close to that of the plug-in-rf. The two proposed linear estimators using DOBA also yielded outcomes that were quite comparable to the nonparametric approaches, suggesting again that these approaches are generally very robust. The standard approach, however, yielded decision rules with specificities far below those from the other approaches (Figure 1b).

Figure 1.

Figure 1.

Result from PASS with 200 3-folds cross-validation for γ = 0.1. Standard — results of the Standard logistic fitting; plug-in-rf — results from plug-in personalized rule with random forest fitting; plug-in-lg — results from plug-in personalized rule with logistic fitting; plug-in-cv — selecting between plug-in-rf and plug-in-lg with cross-validation; L-logistic — results from the linear determination rule with logistic loss approximation; L-hinge — results from the linear determination rule with hinge loss approximation. (a) Original PASS data; (b) PASS data perturbed with simulated outliers.

7. Discussion

The development of biomarker-based rules for clinical decision making is not trivial, due to the complex nature of associated clinical consequences. When searching for such rules, it is important that maximizing clinical benefit be considered in the context of controlling for potential risk. In this manuscript, we consider a general framework where the derivation of clinical decision rules is cast in the general framework of maximizing a benefit function specific to the medical diagnostic settings while putting a limit on the extent of harm the rule may allow. To date, rigorous and efficient statistical procedures for combining multiple biomarkers have not been formally investigated in the statistical literature. This constitutes a significant change of paradigm, connecting statistical learning more directly with the improvement of clinical outcomes.

We considered various approaches for deriving both parametric and nonparametric decision rules. Our simulation studies elucidate how different approaches perform in various settings. The standard two-step approach, based on a regression model to derive a linear decision rule and then identify the cutoff points to achieve the desired outcome, relies on the assumption that the model fits the data well. The linear rules from the DOBA can perform comparably to the standard approach when the regression model assumption is correct, but can much improve the outcome with more complex data when it is not. Such gain in robustness may be due to the fact that optimization is done simultaneously for cutoff and combinatorial rules and directly on the clinically relevant objective functions without modeling assumptions on the data structure. This is appealing because in practice a correctly specified model cannot always be validated, especially with a heterogeneous disease population. As expected, the plug-in algorithm tends to be more flexible when compared with the parametric linear rules, and it performed particularly well in settings where linear decision rules did not fit data well. As expected, the plug-in algorithm with nonparametric fitting and plug-in-cv tend to be more flexible when compared with the parametric linear rules, and they performed particularly well in settings where linear decision rules did not fit data well. Both linear and nonparametric rules are useful in practice. When all rules render similar performance, it might be preferable to consider the linear rule for ease of implementation and interpretation in clinics. When dealing with a more heterogeneous population, it is worth checking if other learning algorithms can capture the data more precisely and with improved clinical outcomes. We also showed in our real data analysis that the simple existing approach can offer sensible results compared with more sophisticated methods. Our development here equips investigators with a suite of tools to aid clinical decision making.

There are a few directions for future work. The object function we considered here takes into account benefit and cost based on functionals representing benefit and cost, for example, the TPF and FPF. The benefit functions can be further augmented to incorporate financial costs (e.g., cost of biomarker measurement and cost of the procedures) when implementing the decision rules. Incorporating variable selections while developing decision rules to enhance clinical outcomes is warranted. Additionally, extending the method to studies with censored failure outcomes would be of great clinical interest.

Supplementary Material

supp info
Supp dataS1

Acknowledgements

The work is supported by grants U01CA86368, R01CA236558, R01DK108073, R21HD086754, P30 CA015704, and S10OD020069 awarded by the National Institutes of Health.

Footnotes

Supporting Information

Web Appendices and Tables referenced in Sections 4.1, 4.2, and 5, and R codes for numerical work are available with this paper at the Biometrics website in the Wiley Online Library.

References

  1. Baker SG (2000). Identifying combinations of cancer markers for further study as triggers of early intervention. Biometrics 56, 1082–1087. [DOI] [PubMed] [Google Scholar]
  2. Bartlett PL, Jordan MI, and McAuliffe JD (2006). Convexity, classification, and risk bounds. Journal of the American Statistical Association 101, 138–156. [Google Scholar]
  3. Biau G (2012). Analysis of a random forests model. Journal of Machine Learning Research 13, 1063–1095. [Google Scholar]
  4. Boyd S and Vandenberghe L (2004). Convex optimization. Cambridge university press. [Google Scholar]
  5. Breiman L (2001). Random forests. Machine Learning 45, 5–32. [Google Scholar]
  6. Breiman L (2004). Consistency for a simple model of random forests. Citeseer. [Google Scholar]
  7. Cai T, Cai T, and Guo Z (2019). Individualized treatment selection: An optimal hypothesis testing approach in high-dimensional models. arXiv:1904.12891. [Google Scholar]
  8. Chen S, Tian L, Cai T, and Yu M (2017). A general statistical framework for subgroup identification and comparative treatment scoring. Biometrics 73, 1199–1209. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Cortes C and Vapnik V (1995). Support-vector networks. Machine Learning 20, 273–297. [Google Scholar]
  10. Laber EB, Lizotte DJ, and Ferguson B (2014). Set-valued dynamic treatment regimes for competing outcomes. Biometrics 70, 53–61. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Lin D, Brown M, Newcomb L, Sjoberg D, Brooks J, Carroll P, Dash A, Fabrizio M, Gleave M, Morgan T, et al. (2016). Evaluating the four kallikrein panel of the 4kscore for prediction of high-grade prostate cancer in men in the canary prostate active surveillance study (pass). The Journal of Urology 195, e229. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Linn KA, Laber EB, and Stefanski LA (2015). Chapter 15: Estimation of dynamic treatment regimes for complex outcomes: Balancing benefits and risks In Kosorok MR and Moodie EE, editors, Adaptive Treatment Strategies in Practice: Planning Trials and Analyzing Data for Personalized Medicine, volume 21, chapter 15, pages 249–262. SIAM, Philadelphia, PA: https://epubs.siam.org/doi/10.1137/1.9781611974188.ch15. [Google Scholar]
  13. Luedtke AR and van der Laan MJ (2016). Optimal individualized treatments in resource-limited settings. The International Journal of Biostatistics 12, 283–303. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Ma S and Huang J (2007). Combining multiple markers for classification using roc. Biometrics 63, 751–757. [DOI] [PubMed] [Google Scholar]
  15. McIntosh MW and Pepe MS (2002). Combining several screening tests: optimality of the risk score. Biometrics 58, 657–664. [DOI] [PubMed] [Google Scholar]
  16. Newcomb LF, Brooks JD, Carroll PR, Feng Z, Gleave ME, Nelson PS, Thompson IM, and Lin DW (2010). Canary prostate active surveillance study: design of a multi-institutional active surveillance cohort and biorepository. Urology 75, 407–413. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Nguyen T and Sanner S (2013). Algorithms for direct 0–1 loss optimization in binary classification. In International Conference on Machine Learning, pages 1085–1093. [Google Scholar]
  18. Parekh DJ, Punnen S, Sjoberg DD, Asroff SW, Bailen JL, Cochran JS, Concepcion R, David RD, Deck KB, Dumbadze I, et al. (2015). A multi-institutional prospective trial in the usa confirms that the 4kscore accurately identifies men with high-grade prostate cancer. European urology 68, 464–470. [DOI] [PubMed] [Google Scholar]
  19. Pepe MS, Cai T, and Longton G (2006). Combining predictors for classification using the area under the receiver operating characteristic curve. Biometrics 62, 221–229. [DOI] [PubMed] [Google Scholar]
  20. Qiu X, Zeng D, and Wang Y (2018). Estimation and evaluation of linear individualized treatment rules to guarantee performance. Biometrics 74, 517–528. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. van der Laan MJ, Polley EC, and Hubbard AE (2007). Super learner. Statistical applications in genetics and molecular biology 6, Article 25. [DOI] [PubMed] [Google Scholar]
  22. Wang Y, Fu H, and Zeng D (2018). Learning optimal personalized treatment rules in consideration of benefit and risk: with an application to treating type 2 diabetes patients with insulin therapies. Journal of the American Statistical Association 113, 1–13. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Zhao Y, Zeng D, Rush AJ, and Kosorok MR (2012). Estimating individualized treatment rules using outcome weighted learning. Journal of the American Statistical Association 107, 1106–1118. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Zhou X, Mayer-Hamblett N, Khan U, and Kosorok MR (2017). Residual weighted learning for estimating individualized treatment rules. Journal of the American Statistical Association 112, 169–187. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

supp info
Supp dataS1

RESOURCES