Abstract
In this work we propose a novel method for individualized treatment selection when there are correlated multiple treatment responses. For the K treatment (K ≥ 2) scenario, we compare quantities that are suitable indexes based on outcome variables for each treatment conditional on patient-specific scores constructed from collected covariate measurements. Our method covers any number of treatments and outcome variables, and it can be applied for a broad set of models. The proposed method uses a rank aggregation technique that takes into account possible correlations among ranked lists to estimate an ordering of treatments based on treatment performance measures such as the smooth conditional mean. The method has the flexibility to incorporate patient and clinician preferences into the optimal treatment decision on an individual case basis. A simulation study demonstrates the performance of the proposed method in finite samples. We also present data analyses using HIV clinical trial data to show the applicability of the proposed procedure for real data.
Keywords: Design variables, Personalized Treatments, Single Index Models, Rank Aggregation
1. Introduction
The concept of personalized medicine is fairly old, but the idea advanced dramatically after the introduction of randomized controlled clinical trials that also collected additional patient information. The primary aim of clinical trials is to make population-level decisions which do not necessarily are optimal at an individual patient level. Such population-level decisions do not always account for patient heterogeneity. But the increasing availability of vast amounts of additional patient data from such studies has increased the awareness of heterogeneity in both patient characteristics and outcomes and lead to new evidence-based medicine concepts. Over the last two decades, the statistical methodology in personalized medicine has led to new methodologies and insights, mostly owing to advancement in computational power, bioinformatics discoveries, and access to electronic health data.[1–5] The goal of personalized medicine is to use data to improve decision making in health care to provide the “best” outcome for a patient based on his/her individualized features.
In real-life situations, the success of treatment may not be fully reflected through a single outcome as a variety of factors may compel both patients and clinicians to consider cure/recovery in a rather broad view. For example, in the treatment for Type-2 Diabetes, control in HbA1c, systolic blood pressure, low-density lipoproteins, cholesterol levels, and prevention from hypoglycemia and weight gain have been suggested as therapeutic goals to address the net clinical utility of a treatment.[6] In cancer studies, although the overall survival is considered the most important, a variety of other factors such as a reduction in tumor size or eradication of cancerous cells, are also considered to be meaningful outcomes.[7] Similarly, in situations where the disease is a life-threatening condition, while the time-to-event outcomes (e.g., overall survival) are commonly considered the best outcome, there could be other factors with secondary importance, such as those relating to the quality of life and economic impact. Also, it is common to use a collection of surrogate outcomes during the early development of treatments.[8] Hence, selecting the best treatment considering multiple outcome measures becomes a relevant issue for most patient populations.
Few articles have attempted to address the treatment selection in the face of multiple responses. The creation of a composite outcome via a linear combination of outcomes was advocated by Lizotte et al.[9] In a contrasting approach, Laber et al.[10] and Ertefaie et al.[11] provide continuous/dynamic learning methods for selecting set-valued treatment regimes when there are multiple responses. Rather than a single optimal treatment, the general recommendation in set-valued selections is a set of possible treatments that are “no worse” than any other in that set. When there are multiple treatments, methods proposed in both articles above conduct the selection at each of several stages comparing two treatments at a time. Another article that addresses multi-response treatment selection is by Lizotte and Laber [12] where these authors focus on a multi-objective sequential optimization method that again gives a non-dominated treatment set that is commonly known as a Pareto optimal collection among possible treatments. On the other hand, Butler et al.[13] provide a selection method for only two treatments where a single treatment is recommended using patient survey data in addition to clinical data.
To address the multiple response issue, Siriwardhana et al.[14] used a rank aggregation method to obtain an ordered list of treatments based on multiple ranked lists of treatment performance measures. However, a limitation of that technique is that the method does not fully address relationships among multiple outcomes. In treatment selection problems dealing with multiple responses, outcome measures often tend to be correlated, especially when a collection of clinical or behavioral outcomes are used. For example, scores relating to cognitive improvement in multiple cognitive domains are typically positively associated.[15,16] Similarly, when multiple surrogate markers are used on a specific clinical outcome, naturally, such markers are assumed to be correlated. As such, there are many cases where the natural dependency among responses can not be ignored, demanding that treatment selection methods be capable of addressing potential dependencies among responses, in the multiple response setting.
In this paper, we consider the selection of the optimal treatment among K possible treatments for a patient using his or her baseline characteristics when multivariate outcomes (responses) are to be considered. First, to handle statistical issues arising due to high dimensional covariates, each patient is assigned a score based on his/her covariate values. Next, the set of conditional means of multivariate responses are estimated at a given patient score, using a smooth mean estimation step for K treatment options. Finally, a rank aggregation concept that is also capable of handling potential dependencies among ranked lists is applied to find an overall ranking for the K options. The “optimal” treatment for the given score is defined as the treatment estimated to be the best-ranked option in the overall ranked list. The proposed method allows one to use apriori opinions on the importance of each response in determining the best treatment procedure. Empirical studies show that the proposed method has very desirable properties in terms of the selection frequency of the best treatment and an aggregated average gain. The article also demonstrates an application of the proposed technique to a real dataset resulted from an HIV clinical trial.
The remainder of the article is organized as follows. In Section 2, we discuss the proposed methodology. Section 3 includes simulation results followed by a real data illustration in Section 4. The main body of the paper ends with a discussion in Section 5.
2. Treatment Selection
In this section, we describe the proposed procedure on selecting the optimal personalized treatment based on multiple outcome measures. Suppose we observe J continuous response variables for each patient undergoing a treatment selected from K possible treatments and, without loss of generality, suppose larger values of each component of the J dimensional response vector are indicative of better outcomes. Let indicate the vector of responses for the kth treatment with an associated r dimensional covariate vector X. We assume, there exists a natural correlation structure Rk among responses. In this work we assume that we have data from a randomized clinical trial (RCT) study that provides responses and covariate information of a set of patients randomized into K arms. It should be noted that in practice, using the data resulted from a RCT experiment one cannot observe the full J × K matrix of counterfactuals for a single patient; hence, one cannot obtain a sample from the joint distribution of . Rather, one observes K independent pairs of observations (Yk, Xk) from marginal distributions of for k = 1, …, K, where Yk = (Y1k, …, YJk)′ (see Siriwardhana et al.[14,17] for a detailed discussion). Further, assume that we can use a patient’s covariate value X is to obtain a lower dimensional composite patient score U(X), as described in Siriwardhana et al.[14] say to summarize each patient’s characteristics.
Here, we consider pairs of independent observations (Yk, Xk) from the marginal distribution of , k = 1, …, K to select the optimal treatment for K treatments using vectors of smoothed conditional means for each treatment. We define
(1) |
and vectors μk(u) = (μ1k(u1), …, μJk(uJ))′ for U = (U1, …, Uj)′ and u = (u1, …, uJ)′ where components of these vectors correspond to each 1, …, J response. Although we suppress the dependence of Ujs on the covariate vector for brevity, in all developments below, quantities related to Ujs are functions of X.
In our proposed approach, we rank the K values for each component of μk(u) vectors (k = 1, …, K) to get size K vectors vj(u) = (vj1(u), …, vjK(u))′, where vjk(u) is the rank of μjk k = 1, …, K among μj(u) for each j (here j = 1, .., J) with the largest μjk(uj) value given the rank 1. We next produce an overall rank by following an aggregation technique as a basis to find the optimal treatment.
In their previous approach Siriwardhana et al.[14] used a aggregation method by Pihur et al.18,19 to combine these rank vectors vj(u); j = 1, …, J to get an overall ranking of treatments for a given patient score u. They defined the optimal treatment as
(2) |
However, the rank aggregation method by Pihur et al.[18, 19] does not account for dependencies among the J rank lists, which could potentially lead to unwanted estimation errors when such dependencies exist. In many real life problems dealing with multiple responses, natural associations among responses can not be ignored. In finite sample cases, unignorable dependencies among estimated response means due to correlated responses, subsequently translate to correlated rank lists when one uses rankings of those estimated means. In the current work we propose to use an aggregation technique that can be utilized under correlated rank lists for the proposed treatment selection concept. We detail the proposed aggregation step in the sequel.
We link the jth component Yjk of the response vector Yk for the kth treatment and covariates Xk via a Single Index Model (SIM). The SIM formulation provides great flexibility and reasonable efficiency in modeling many types of data. This model is expressed as,
(3) |
for j = 1, …, J and k = 1, …, K, where each βjk is a r-vector of parameters, gjk is an unknown smooth link function and ϵjks are error terms with E[ϵjk|X] = 0. We assume independence of ϵjks across k = 1, …, K for a fixed j where these terms are correlated across js for any given k.
RCT data used in our approach are of the form (Yki, Xki) where Yki = (Y1ki, …, YJki)′ and Yjki indicates the jth component of the response for the ith individual under treatment k with associated covariate values Xki, i = 1, …, nk. The relationship (3) between response and covariates for such a sample can be written as
(4) |
Following Siriwardhana et al.14,17 we define a score vector U(X) for a patient with covariate X as follows. First define,
Next, define the jth component of the combined overall score vector as
(5) |
The overall score is given as U(X) = (U1(X), …, UJ(X))′ where Uj(X) = (Sj(X), δj(X))′ for j = 1, …, J.
Since model functions defined in (3) contain unknown parameters, components of these score vectors should be estimated using a standard function estimation method. In the literature, there are many different estimation techniques available for estimating the link function and the index vector of a SIM, allowing us to use one out of several available reasonable estimation methodd to estimate the gs and the βs.[20–22] We adopt the Hristache et al.[21] procedure in our simulations and data analysis in the sequel.
In particular, for any given vector X = x, let
(6) |
As suggested in Siriwardhana et al.[14], a suitable estimator for μjk(uj), k = 1, …, K at a given uj = (sj, dj)′ can be obtained using the smooth mean estimator given by,
(7) |
where w is a kernel function with ω ≥ 0 and ∫ ω(t)dt = 1, and hjk, k = 1, …, K are a set of smoothing parameters. Here I(A) is the indicator of A. The bandwidth selection for estimating μkis is a challenging issue. However, as Siriwardhana et al.[14] suggested, methods given in Wand and Jones [23] for kernel smoothing provides a reasonable solution for this estimation problem.
For a realization x0 of the covariate X, if one could find the corresponding realizations of the scores, uj0 = (Sj(x0), δj(x0))′, this allows estimating μjk(uj0) by . However, due to aforementioned reasons, one may only find an estimate of uj0 using (6) above. Thus, in practice one may use as the estimate of μjk(uj0) for j = 1, …, J; k = 1, …, K, with û0 = (û10, …, ûJ0)′.
Now we rank the K components of vectors, j = 1, …, J, to get size K vectors vj(û0) = (vj1(ûj0), …, vjK(ûj0))′, where vjk(ûj0) is the rank of among , k = 1, …, K for each j (here j = 1, .., J) with the largest value given the rank 1. Note that these vectors of ranks vj(û0), j = 1, …, J are correlated. Then, we use the following aggregation method to combine these rank vectors to get an overall ranking of treatments.
First, find the distance γj(v, û0) = η(v, vj(û0)), for a given v ∈ VK, a rank list of length K, where VK is all permutations of integers {1, .., K}. Here we propose to use Spearman’s rank distance (Pihur et al.[18]) for η(.), which is defined as
where M1, …, MK is a list of real values, r1, …, rK are the ranks of Mk, k = 1, …, K and v1, …, vK is a permutation of integers 1, …, K and ρ is a positive number.
Let Γ(v, û0) = (γ1(v, û0), …, γJ(v, û0))′ for a fixed rank vector v. Now, we define an overall rank distance L(v, û0) by
(8) |
where D(v, û0) is a suitable dispersion matrix for Γ(v, û0). Here Λ is a diagonal matrix with diagonal elements (τ1, …, τJ) which signify the practical importance of 1, .., J responses including the views of patients and clinicians. As such, L(v, û0) can be considered as a Mahalanobis type weighted distance from Γ(v, û0) to the origin . We propose to minimize L(v, û0), with respect to v ∈ VK, for an estimated score û0 corresponding to a new patient. Suppose this minimum occurs at a vector ; i.e.,
(9) |
We then define the optimal treatment as
(10) |
We approximate D(v, û0) by a suitable weighted dispersion matrix , calculated from Γ(v, ûki)s using , i = 1, …, nk; k = 1, …, K, corresponding to training samples from each treatment group. The weights Wkis, say, for i = 1, …, nk; k = 1, …, K, for this calculation are developed using a localization approach to be described below. For a given v ∈ VK we find Γ(v, ûki) = (γ1(v, ûki), …, γJ(v, ûki))′ via calculation of distances γj(v, ûki) = η(v, vj(ûjki)), to obtain,
(11) |
where m(.) is the weighted average given by
(12) |
In developing , we use a localizing weighting scheme to localize Γ(v, ûki) vectors at û0 = (û10, …, ûJ0)′ using K training samples of size nk each in order to achieve a reasonable approximation for D(v, û0). Our approach is the following. For the ith training observation in the kth group with a score value , a weight is defined as using the bandwidth and kernel ω(.) for j = 1, …, J and k = 1, …, K. Next, define an overall weight as the product of J individual weights . For this purpose, we propose to use the same bandwidth hjk and kernel ω(.), that were used in the calculation of , corresponding to the ith training patient in the kth set; i = 1, …, nk, k = 1, …, K. Even though such a selection of weights may not be the optimal, as observed from empirical results, these weights seem to be reasonable and flexible to implement the proposed procedure.
In summary, the proposed approach can be implemented as follows:
Using the historical RCT data (training set), estimate the set of J × K single index models that are relating covariates and J response variables corresponding to each treatment k, k = 1, …, K (i.e., model (4)).
Estimate scores for all patients in the training data set and the new patient. Use the criteria given in formula (6) to estimate the jth sub-component of the score with respect to jth response ûj. Find the estimated overall score û = (û1, …, ûJ) in each case.
For the new patient with an estimated score û0 = (û01, …, û0J), estimate the mean outcome of the jth response for the kth treatment via the smooth mean estimator given in equation (7). Any symmetric kernel can be used for this estimation step. Next, specify the estimated mean outcomes for responses for the kth treatment as , k = 1, …, K.
Rank the K components of vectors j = 1, …, J, to get size K vectors vj(û0) = (vj1(ûj0), …, vjK(ûj0))′.
Find the distance γj(v, û0) = η(v, vj(û0)) for j = 1, …, J, for a given v ∈ VK, and construct Γ(v, û0) = (γ1(v, û0), …, γJ(v, û0))′.
Specify response weights for the diagonal matrix Λ. For example, use a J × J identity matrix in the equal weight case. Next, calculate L(v, û0) for the given Λ. Here D(v, û0) is estimated via the estimator given in equation (11).
Finally, use the criterion given in (9) for estimating the optimal rank vector and then use the criterion given in (10) for the optimal treatment selection.
3. Empirical Studies
In this section we present a simulation study that investigates the properties of the proposed procedure in finite samples.
We performed a series of simulations with the proposed procedure under various settings to evaluate its performance. Primarily, we focused on the accuracy of treatment assignment of a new (test) observation using estimated values of μjk functions from a set of training data. This simulation study was performed for K = 2, 3 treatments with response dimension J = 2, 3, and 4. To illustrate the personalized medicine treatment concept, in our simulations we selected our model sets such that each model in a set dominates other competing models for some combination of covariate values. In particular, none of the considered models fully dominate other models within the whole covariate space. Hence, subjects with distinct covariates vectors could experience the highest response from different treatments.
In our study, we first simulated K independent multivariate (dimension J) samples of size n (n = 50 or n = 100) per group. The components of the r dimensional covariate vectors X were generated independently from the U(−1, 1) distribution, where r was fixed at 10. Using various link functions and index vectors, we obtained the treatment responses from model (3) for each k. We examined the performance of the proposed methodology under a set of highly nonlinear regression models given by sine/cosine function that followed the SIM structure. Such trigonometric functions pose most difficulties in smooth estimation and therefore we believe these models somewhat present worst case scenarios with respect to estimation of the optimal treatment. A set of model functions used for the study is provided in Table 1. For a given k, k = 1, …, K, the errors were generated from either a J dimensional multivariate normal distribution or a J dimensional multivariate double exponential distribution with zero mean and a compound symmetric correlation matrix where the off-diagonal values were chosen from the set {0.1, 0.5, 0.9}. The R package mvtnorm (Genz et al.[24]) was used for the generation of these random vectors in the multivariate normal case, where the dispersion parameter σN was chosen from the set {0.1, 0.3, 0.5}. We used the R package LaplacesDemon (Statisticat.[25]) for generating Double Exponential random variables with dispersion parameter σD chosen from {0.1, 0.3, 0.5}.
Table 1:
Sets of smooth mean functions used for generating treatment responses. We choose the common vector C to be a unit vector; , for all combinations of j and k.
Treatment Group (k) | ||||
---|---|---|---|---|
k = 1 | k = 2 | k = 3 | ||
Response (j) | j = 1 | |||
j = 2 | ||||
j = 3 | ||||
j = 4 |
Once the K samples were generated, we estimated the corresponding SIMs followed by an estimation of scores at each covariate value. SIMs were estimated by the procedure given in Hristache et al.[21] using Epanechnikov kernels. Then, a new covariate value X0 was generated in the same manner as above, and for its corresponding estimated score û0, we calculated for k = 1, …, K; j = 1, …, J. Similarly, we calculated for the ith patient in the kth training set; i = 1, …, nk; k = 1, …, K, for dimension j, j = 1, …, J. Next we obtained rank distances γj(v, ûki) = η(v, vj(ûjki)) for a given a rank list v ∈ VK, where VK is the set of all permutations of the integers {1, …, K} and η(.) is the Spearman’s footrule distance function with ρ = 1. This produced a vector of rank distances Γ(v) = (γ1(v, ûki), …, γJ(v, ûki))′. Next, following the procedure in (11), a localized dispersion matrix , at the neighborhood of û0 = (û10,…, ûJ0) was obtained as an approximation for D(v, û0), based on the K training sets of size nk each, k = 1, …, K. This was followed by the calculation of L(v, û0) given in (8) for a chosen Λ matrices and we then use the proposed procedure given in (10) to estimate corresponding . The kernel function in this estimation was taken to be a Normal (0, 1) probability density function. We chose all bandwidths by the algorithm given by Wand and Jones [23] for each hjk, k = 1, …, K; j = 1, …, J.
Next, to define the “correct selection” for the above covariate value X0, we follow the following approach. First, we generated K new response vectors, Ỹk0 = (Ỹ1k0, …, ỸJk0)′, k = 1, …, K, each with mean vector for k = 1, …, K, corresponding to this X0 using model (3) where the errors were generated independently from the same error distribution that was used to generate the K original samples. Our approach will be to define quantities similar to those given in (8), (9) and (10) using these new observation vectors.
To do that, we ranked rows of the J × K matrix Ỹ0 created from response vectors Ỹk0; k = 1, …, K to get size K vectors vj(Ỹ0) = (vj1(Ỹ0), …, vjK(Ỹ0))′, where vjk(Ỹ0) is the rank of Ỹjk0 among Ỹj10, …, ỸjK0 for each j, j = 1, .., J. Then we proceeded to the calculation of rank distance and the corresponding distance vector for a fixed v. Next we obtained a suitable dispersion matrix, say for as follows.
We generated additional ñ = 5, 000 observations Ỹk0l, l = 1, …, ñ with mean vectors for each k = 1, …, K, again with the same error distribution. Then we calculated the corresponding . Then we formulate a scaling matrix by
where is the sample mean of vectors . Now, we let below be the overall rank distance for the responses Ỹk0; k = 1, …, K for fixed v and the response weights matrix Λ,
We then defined
and called the true optimal treatment corresponding to response vectors Ỹk0; k = 1, …, K as the one that ranked 1 in the optimal rank list . The treatment assignment for the new patient was considered to be correct if .
We repeated this procedure 1, 000 times for each model and error distribution combination. Frequencies of correct treatment assignments for a representative set of cases are given in the Tables 2 to 3, and Supplementary Tables 1 to 6.
Table 2:
Accuracies of treatment selection in 1000 test cases using the proposed technique for the case of three treatments and four responses (equally weighted).
Error dist. | Sample size per group | Error dist. parameter | Error correlation | ||
---|---|---|---|---|---|
ρ = 0.1 | ρ = 0.5 | ρ = 0.9 | |||
Normal | N = 50 | σN = 0.1 | 815 | 822 | 843 |
σN = 0.2 | 730 | 767 | 780 | ||
σN = 0.3 | 669 | 684 | 668 | ||
N = 100 | σN = 0.1 | 867 | 872 | 916 | |
σN = 0.2 | 816 | 801 | 832 | ||
σN = 0.3 | 718 | 703 | 736 | ||
DE | N = 50 | σD = 0.1 | 796 | 827 | 861 |
σD = 0.2 | 743 | 753 | 748 | ||
σD = 0.3 | 692 | 665 | 657 | ||
N = 100 | σD = 0.1 | 871 | 873 | 907 | |
σD = 0.2 | 785 | 797 | 814 | ||
σD = 0.3 | 727 | 702 | 717 |
Table 3:
Accuracies of treatment selection in 1000 test cases using the proposed technique for the case of three treatments and four responses, using weights τ1 = 0.4, τ2 = 0.3, τ3 = 0.2, and τ4 = 0.1, for responses 1, 2, 3, and 4, respectively.
Error dist. | Sample size per group | Error dist. parameter | Error correlation | ||
---|---|---|---|---|---|
ρ = 0.1 | ρ = 0.5 | ρ = 0.9 | |||
Normal | N = 50 | σN = 0.1 | 826 | 834 | 839 |
σN = 0.2 | 754 | 782 | 800 | ||
σN = 0.3 | 676 | 679 | 672 | ||
N = 100 | σN = 0.1 | 856 | 877 | 904 | |
σN = 0.2 | 763 | 816 | 848 | ||
σN = 0.3 | 693 | 706 | 738 | ||
DE | N = 50 | σD = 0.1 | 800 | 830 | 874 |
σD = 0.2 | 743 | 759 | 769 | ||
σD = 0.3 | 687 | 667 | 657 | ||
N = 100 | σD = 0.1 | 844 | 875 | 912 | |
σD = 0.2 | 753 | 810 | 839 | ||
σD = 0.3 | 699 | 689 | 728 |
Simulations results demonstrate reasonable selection accuracies in the each scenario considered. More importantly, the selection frequency remained consistent at large values of the response correlation, indicating the potential of the proposed technique for such cases. As to be expected, results reveal that the selection accuracy drops when the error distribution has a high dispersion, as well with smaller sample sizes. Note that the presented simulation results are based on sine functions which are bounded to be in (−1, 1). Hence, an increment in the error dispersion parameter such as 0.1 adds a relatively large noise to a model. We observed comparable performance under both Normal and Double exponential errors. In all simulations, model functions were chosen so that they would dominate all other model functions at some covariate values. We have not investigated cases where two or more models were identical and dominating all others because in that case, those models which dominate all others will have an equal chance of being selected. We also conducted a simulation study to compare the accuracy of the proposed procedure with the method by Siriwardhana et al.[14] that does not account for associations (Supplementary Table-7). Results of this study indicate improved performance by the proposed technique especially for high response correlation cases.
In addition to studying the performance of the proposed method via the accuracy of correct selection, we also investigated the impact of using our method in terms of a composite average gain of responses. This investigation sheds light on the impact of possible wrong assignments for cases where the treatment chosen by our method is only marginally superior in terms of its overall rank compared with its nearest competitor (the treatment with the rank 2) as well as its worst competitor (the treatment with the rank K). Suppose , and . For a fixed j, j = 1, …, J, define, and as
and
where , , and . Define λ12(û0) and λ1K(û0) as weighted sums of and , respectively, as
and
where τj is the jth, j = 1, .., J, diagonal element of Λ, that contains priority weights of the responses. Now, letting C be the theoretical maximum for any λ1K(u) within the whole covariate domain, we average λ12(û0)/C and λ1K(û0)/C for the 1000 new test cases and denote them by Ω12 and Ω1K, respectively.
Note that, measures given by Ωs quantify the potential gains/losses in terms of the weighted aggregation of mean outcomes for borderline cases (i.e. mis-classifications close to the decision boundary) as well as cases that are furthest from the optimal case according to the proposed procedure. We report a few results in Tables 4 and 5. Positive values of Ω12 and Ω1K indicate aggregated relative average gains in expected treatment outcomes by our treatment selection technique. As noted in tables, all Ω values are positive and they are higher for Ω1K cases than those for corresponding Ω12s indicating that our proposed procedure results gains with respect to average responses.
Table 4:
Ω12 and Ω13 values calculated using 1000 test cases by the proposed method. Three treatments with four responses, using equal weights.
Error dist. | Sample size sper group | Error dist. parameter | Error correlation (ρ) | |||||
---|---|---|---|---|---|---|---|---|
ρ = 0 | ρ = 0.5 | ρ = 0.9 | ||||||
Ω12 | Ω13 | Ω12 | Ω13 | Ω12 | Ω13 | |||
Normal | N = 50 | σN = 0.1 | 0.211 | 0.591 | 0.202 | 0.582 | 0.191 | 0.575 |
σN = 0.2 | 0.188 | 0.556 | 0.210 | 0.570 | 0.194 | 0.555 | ||
σN = 0.3 | 0.181 | 0.527 | 0.184 | 0.525 | 0.171 | 0.527 | ||
N = 100 | σN = 0.1 | 0.217 | 0.586 | 0.217 | 0.591 | 0.208 | 0.587 | |
σN = 0.2 | 0.215 | 0.584 | 0.211 | 0.582 | 0.209 | 0.580 | ||
σN = 0.3 | 0.205 | 0.580 | 0.218 | 0.599 | 0.210 | 0.566 | ||
DE | N = 50 | σD = 0.1 | 0.185 | 0.556 | 0.208 | 0.580 | 0.210 | 0.597 |
σD = 0.2 | 0.197 | 0.565 | 0.205 | 0.565 | 0.187 | 0.552 | ||
σD = 0.3 | 0.184 | 0.541 | 0.172 | 0.510 | 0.170 | 0.517 | ||
N = 100 | σD = 0.1 | 0.223 | 0.598 | 0.225 | 0.597 | 0.220 | 0.600 | |
σD = 0.2 | 0.207 | 0.586 | 0.219 | 0.606 | 0.204 | 0.576 | ||
σD = 0.3 | 0.197 | 0.574 | 0.208 | 0.586 | 0.221 | 0.588 |
Table 5:
Ω12 and Ω13 values calculated using 1000 test cases by the proposed method. Three treatments with four responses, using weights τ1 = 0.4, τ2 = 0.3, τ3 = 0.2, and τ4 = 0.1, for responses 1, 2, 3, and 4, respectively.
Error dist. | Sample size sper group | Error dist. parameter | Error correlation (ρ) | |||||
---|---|---|---|---|---|---|---|---|
ρ = 0 | ρ = 0.5 | ρ = 0.9 | ||||||
Ω12 | Ω13 | Ω12 | Ω13 | Ω12 | Ω13 | |||
Normal | N = 50 | σN = 0.1 | 0.181 | 0.592 | 0.175 | 0.579 | 0.162 | 0.572 |
σN = 0.2 | 0.162 | 0.559 | 0.178 | 0.569 | 0.171 | 0.558 | ||
σN = 0.3 | 0.149 | 0.521 | 0.161 | 0.536 | 0.142 | 0.520 | ||
N = 100 | σN = 0.1 | 0.177 | 0.569 | 0.175 | 0.573 | 0.171 | 0.571 | |
σN = 0.2 | 0.176 | 0.567 | 0.170 | 0.563 | 0.172 | 0.565 | ||
σN = 0.3 | 0.162 | 0.553 | 0.173 | 0.575 | 0.171 | 0.554 | ||
DE | N = 50 | σD = 0.1 | 0.157 | 0.550 | 0.171 | 0.570 | 0.173 | 0.586 |
σD = 0.2 | 0.166 | 0.562 | 0.173 | 0.565 | 0.158 | 0.545 | ||
σD = 0.3 | 0.155 | 0.533 | 0.158 | 0.518 | 0.153 | 0.515 | ||
N = 100 | σD = 0.1 | 0.182 | 0.583 | 0.187 | 0.581 | 0.177 | 0.578 | |
σD = 0.2 | 0.168 | 0.567 | 0.174 | 0.578 | 0.164 | 0.557 | ||
σD = 0.3 | 0.156 | 0.545 | 0.169 | 0.564 | 0.183 | 0.572 |
4. ACTG-175 HIV Clinical Trial
In this section, we provide another illustration of our proposed method using, the data resulted from the ACTG 175 Clinical Trial (Hammer et al.[26]). This clinical trial was a randomized, double-blinded, placebo-controlled clinical trial that was conducted to compare single nucleoside or two nucleosides antiviral medications in adults infected with human immunodeficiency (HIV-1) whose T-cell CD4 counts were in the range of 200 to 500 per cubic millimeter. The study randomized HIV-1–infected patients to one of four daily regimens: 600 mg of zidovudine (arm-0), 600 mg of zidovudine plus 400 mg of didanosine (arm-1), 600 mg of zidovudine plus 2.25 mg of zalcitabine (arm-2), or 400 mg of didanosine (arm-3). The data set by Juraska et al.[27] contains information on 2,136 HIV-1–infected subjects. Arms 0, 1, 2, and 3 contain 532, 519, 524, and 561 patients, respectively.
Our intention in this data analysis is merely to demonstrate what would be the optimal treatment for a new subject if one were to use the proposed treatment selection based on individual patient characteristics when training samples with corresponding covariate values for multiple treatments are available in advance. In this illustration, rather than assuming there is a standard treatment (i.e., control) that is being compared with experimental treatment(s) as in a typical clinical trial, we take the stance that given multiple treatments can be used to treat a patient, what would be optimal for the individual based on his/her characteristics. In this study, subjects were examined periodically, capturing their T-cells counts (i.e, CD4 T helper cells and CD8 cytotoxic T cells), that are critical components in the human immune system. The scientific literature on HIV/AIDS often declares CD4 cells as the primary T-cell type that is suppressing the HIV cell replication, by signaling various other cells for immune response. For an HIV patient, the severity of the disease progression is directly measured by the decline in CD4 counts. The important role of the CD8 cell is typically referred to as the antibody reaction against cancers and various types of other viruses. However, some studies have illustrated the important role of the CD8 during the early stages of HIV progression (e.g., see, Streeck and Nixon [28]).
In previous a study, Sriwardhana et al.[14] used this dataset to demonstrate personalized treatment strategies with respect to log-transformed dual outcomes given by CD4 and CD8 counts of a patient after 20 weeks. However, their analysis did not account for correlation among the two responses, whereas the current work has the capability to handle such dependencies. We estimated the Pearson product-moment correlation between CD4 and CD8 responses as 0.251 using the observed data. In this analysis also we used log transformed CD4 and CD8 counts after 20 weeks as our responses. As covariates, we used log-CD4 and log-CD8 counts at baseline, age, weight, and the number of months a patient received the pre-antiviral therapy.
As a training set, we selected 200 patients at random from each group and considered their data as outcomes of a RCT trial in which those patients were randomly assigned to one of the four treatments. We used the training data to estimate the corresponding SIMs for two outcomes: log -CD4, and log -CD8, together with above covariates. Considering the remaining 1336 cases as “new” patients, we applied the proposed treatment selection for those cases. In this illustration, we ranged the “importance” weights from 0 to 1 for each response. This procedure was repeated for 1000 random training (i.e., size = 200, per group) and testing (i.e., size = 1336) sets to obtain average assignments for those 1000 partitions.
We summarized the resulted assignments for test patients in Table (6). For example, when we chose log-CD4:log-CD8 weights to be 0.8:0.2, using the proposed treatment selection, 3.26%, 64.50%, 15.48%, and 16.76% test patients are proposed to be assigned to Arms 0, 1, 2, and 3, respectively; whereas the corresponding assignment for weights 0.5:0.5 (equal weights) are 5.28%, 61.41%, 14.31%, and 19.00%. The overall pattern of these assignments indicates that only a few patients are proposed to be assigned to zidovudine alone, which was used as the control arm in the study (arm-0). Our analysis using individual characteristics shows that Treatment Arm 1 appears to be a choice for most patients while the control Arm is the least chosen. It is interesting that the marginal analysis of ACTG-175 data (Hammer et al.[26]) concluded, that treatment Arms 1, 2 and, 3 all slowed the progression of HIV disease at a higher level compared to Arm 0 although no Arm among 1,2 and 3 was selected over others. Proposed assignments in our approach appear to have classified many patients for Arm 1 compared with others.
Table 6:
Treatment assignment summary for ACTG-175 Clinical Trial data, by the proposed method selecting both CD4 and CD8 counts as clinical response, using weights τCD4 and τCD8 for CD4 and CD8 counts, respectively.
Weights | Group assignments (%) | ||||
---|---|---|---|---|---|
τ CD4 | τ CD8 | Arm-0 | Arm-1 | Arm-2 | Arm-3 |
1 | 0 | 1.90 | 65.30 | 17.79 | 15.01 |
0.8 | 0.2 | 3.26 | 64.50 | 15.48 | 16.76 |
0.6 | 0.4 | 4.53 | 62.64 | 14.61 | 18.22 |
0.5 | 0.5 | 5.28 | 61.41 | 14.31 | 19.00 |
0.4 | 0.6 | 6.20 | 59.19 | 14.03 | 19.83 |
0.8 | 0.2 | 8.94 | 55.52 | 13.54 | 22.00 |
0 | 1 | 19.02 | 42.50 | 12.72 | 25.76 |
Extending our analysis we conducted the following study to examine the effectiveness of proposed treatment selection for test patients compared to their original random assignment.
Suppose CD4 and CD8 responses indicated by j = 1 and j = 2, respectively. For a given testing set, consider the tth test subject with a covariate value Xt who was randomly assigned to a particular arm in the original study. Suppose the individual’s estimated score is ût = (û1t, û2t), where û1t = (s1, d1) and û2t = (s2, d2) are marginal estimated scores based on CD4 and CD8 responses respectively. In our approach, we attempted estimating the gain in conditional means of log-CD4, and log-CD8 values for an individual, comparing the proposed assignment versus his/her original assignment. Let v, v = 0, .., 3, be the treatment option suggested by the proposed technique for the test patient based on his/her estimated score ût and let v0, v0 = 0, .., 3 be the treatment group patient was assigned in the original trial by the random assignment. We define the average conditional gain Δj for a given j, j = 1, 2, as
where Yjv0 is the observed value by the original assignment. Following, Siriwardhana et al. [14], we estimated E(Yjv|ût) using a smooth mean estimator given by,
(13) |
where nv is the number of patients originally assigned to treatment arm v, v = 0, .., 3, w is a kernel function with ω ≥ 0 and ∫ ω(t)dt = 1, and hjv’s, j = 1, 2 are corresponding smoothing parameters. We proceeded the estimation using Gaussian kernels and the corresponding smoothing parameters were determined by the method suggested in Wand and Jones [23] for kernel smoothing. Next, we estimated the overall gains , j = 1, 2 values by averaging rescaled values,
(14) |
where N = 1336 is the number of test patients and is the standard deviation of . It is reasonable to argue that positive values for and are indicative of the overall effective treatment selection compared to the original random assignment. Finally by averaging over the 1000 random test sets, we estimated average values of and as and , respectively.
Table 7 provides these and values. These observed positive values for and seemingly suggesting that if one were able to use prior data to estimate the personal score for a new patient, then the proposed assignment is more beneficial for him/her than an assignment based on a clinical trial which may only find the best treatment for the general population.
Table 7:
and by the proposed method compared to the original assignment for test patients, using weights τCD4 and τCD8 for CD4 and CD8 counts, respectively. The 10-fold cross-validation technique was used for estimating and quantities.
Weights | |||
---|---|---|---|
τ CD4 | τ CD8 | ||
1 | 0 | 0.191 | 0.060 |
0.8 | 0.2 | 0.177 | 0.074 |
0.6 | 0.4 | 0.165 | 0.080 |
0.5 | 0.5 | 0.159 | 0.083 |
0.6 | 0.4 | 0.151 | 0.085 |
0.8 | 0.2 | 0.130 | 0.091 |
1 | 0 | 0.068 | 0.101 |
5. Discussion
In this article, we proposed a novel personalized treatment plan to select the optimal treatment from a set of multiple treatments when the outcome measures are multivariate. This is an extension of the method by Siriwardhana et al.[14], which uses the rank aggregation. Our method assumes that there are historical RCT data that contains patient characteristics and response variables with respect to each treatment, and we use such information to select the best treatment option for a new patient. The proposed method incorporates potential dependencies among responses as opposed to other existing methods in the literature. Our empirical studies show that the new method performs very satisfactorily in selecting the optimal treatment in a multiple treatment setting. Our analysis of a real clinical trials dataset which has multiple treatment options reveals possible changes if one were to use multiple outcome measures as opposed to a single measure.
As the working model, we use the semiparametric Single Index Models to relate responses and subject covariates. This model offers great flexibility with modeling smooth complex relationships, allowing us to handle many real-life situations. The proposed method can also be applied using quantile regression single index models to handle problems with many different error types, deviating from the Gaussian error structure.
In our approach, we defined Wkis similar to the geometric means of wjkis. In a few empirical studies performed under arithmetic means, we observed comparable performances similar to current selection for Wkis. In our empirical study and real data analysis, we used τ = 1 and Spearman’s distance function. However, the choice of τs and the choice of the distance function η(.) can be influential in the optimal treatment selection. Although the specification of response weight is an important step, we consider this as more of a subjective/qualitative issue than a quantitative issue, where one can guide the choice of optimal weights based on patient qualities and preferences and advice of physicians. When dealing with responses that are related to factors that govern the cure from the disease or degradation of the patient due to various aspects of the disease, then a team of clinicians may be better suited to determine what weights should be applied. On the other hand, for outcome measures that correspond to the quality of life, behavior, financial impact, etc., a physician with consultation from relevant support/advisory groups may be more appropriate. For situations where sufficient data is available from various studies, weight selection using more of a data-based method might be appropriate. However, we have not investigated those aspects in the current work.
There can be several extensions of this work. The proposed method requires a collection of complete data records from RCT studies, but when dealing with life-threatening or terminal conditions, priority is typically given to survival type measures that could be subject to censoring (e.g., overall survival, disease-free survival, etc.). We have not explored that avenue in this work. Also, although we focused only on continuous outcomes, binary and ordinary outcomes are also common (e.g., relapse, response level, cure, etc.) and therefore selection methods for such responses would be valuable. In addition, another exploration is possible modifications needed in order to use the proposed method with data from other sources such as observational studies.
Supplementary Material
Acknowledgment
The research work by Chathura Siriwardhana was partially supported by U54MD007601 grant from the National Institutes of Health.
Footnotes
Conflict of Interest
The authors have declared no conflict of interest.
Data Availability Statement
Data sharing is not applicable to this article as no new data were created or analyzed in this study.
References
- 1.Murphy SA (2003), Optimal dynamic treatment regimes. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 65: 331–355. doi: 10.1111/1467-9868.00389 [DOI] [Google Scholar]
- 2.van’t Veer LJ & Bernards R (2008). Enabling Personalized Cancer Medicine Through Analysis of Gene-Expression Patterns. Nature, 452: 564–570. [DOI] [PubMed] [Google Scholar]
- 3.Zhao Y, Donglin Z, Rush AJ, and Kosorok MK (2012), “Estimating Individualized Treatment Rules Using Outcome Weighted Learning”, Journal of the American Statistical Association, pp. 1106–1118. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Vazquez A (2013). Optimization of Personalized Therapies for Anticancer Treatment. BMC Syst. Biol, 7: doi: 10.1186/1752-0509-7-31. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Kosorok MR and Moodie EE (2015). Adaptive Treatment Strategies in Practice: Planning Trials and Analyzing Data for Personalized Medicine. Society for Industrial and Applied Mathematics, doi: 10.1137/1.9781611974188. [DOI] [Google Scholar]
- 6.Unnikrishnan AG, Bhattacharyya A, Baruah MP, Sinha B, Dharmalingam M, Rao PV. (2013) Importance of achieving the composite endpoints in diabetes. Indian J Endocrinol Metab, 17(5):835–843. doi: 10.4103/2230-8210.117225 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Johnson P, Greiner W, Al-Dakkak, et al. (2015) Which Metrics Are Appropriate to Describe the Value of New Cancer Therapies? BioMed Research International, 10.1155/2015/865101 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Fleming TR, Powers JH. Biomarkers and surrogate endpoints in clinical trials. (2012) Stat Med, 31(25):2973–2984. doi: 10.1002/sim.5403 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Lizotte DJ, Bowling M, Murphy S (2012). “Linear fitted-Q iteration with multiple reward functions”. J Mach Learn Res, 13: 3253–3295. [PMC free article] [PubMed] [Google Scholar]
- 10.Laber EB, Lizotte DJ, and Ferguson B (2014). “Set-valued dynamic treatment regimes for competing outcomes”. Biometrics, 70: 53–61. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Ertefaie A, Wu T, Lynch KG, Nahum-Shani I (2016). “Identifying a set that contains the best dynamic treatment regimes”. Biostatistics, 17: 135148. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Lizotte DJ, and Laber EB (2016). “Multi-Objective Markov Decision Processes for Data-Driven Decision Support”. J Mach Learn Res, 17: 1–28. [PMC free article] [PubMed] [Google Scholar]
- 13.Butler EL, Laber EB, Davis SM, and Kosorok MR (2017). “Incorporating Patient Preferences into Estimation of Optimal Individualized Treatment Rules”. Biometrics. doi: 10.1111/biom.12743. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Siriwardhana C, Datta S, Kulasekera KB (2020) Selection of the optimal personalized treatment from multiple treatments with multivariate outcome measures, Journal of Biopharmaceutical Statistics, 30 :(3), 462–480. DOI: 10.1080/10543406.2019.1684304 [DOI] [PubMed] [Google Scholar]
- 15.Li SC, Lindenberger U, Sikström S. Aging cognition: from neuromodulation to representation. Trends Cogn Sci. 2001. Nov 1;5(11):479–486. doi: 10.1016/s1364-6613(00)01769-1. [DOI] [PubMed] [Google Scholar]
- 16.Cheng Y, Wu W, Feng W et al. The effects of multi-domain versus single-domain cognitive training in non-demented older people: a randomized controlled trial. BMC Med 10, 30 (2012). 10.1186/1741-7015-10-30 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Siriwardhana C, Zhao M, Datta S, & Kulasekera K (2019). A probability based method for selecting the optimal personalized treatment from multiple treatments. Statistical Methods in Medical Research, 28(3), 749–760. [DOI] [PubMed] [Google Scholar]
- 18.Pihur V, Datta S, & Datta S (2007). Weighted rank aggregation of cluster validation measures: a Monte Carlo cross-entropy approach. Bioinformatics, 23: 1607–1615. [DOI] [PubMed] [Google Scholar]
- 19.Pihur V, Datta S, & Datta S (2009). RankAggreg, an R package for weighted rank aggregation. BMC Bioinformatics, 10: 62. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Ichimura H, Hall P, Hardle W (1993), “Optimal smoothing in single index models”, The Annals of Statistics, Vol. 21, No 1, pp. 157–178. [Google Scholar]
- 21.Hristache M, Juditsky A, Polzehl J, and Spokoiny V (2001), “Structure Adaptive Approach for Dimension Reduction”, The Annals of Statistics. pp. 1537–1566. [Google Scholar]
- 22.Yu Y and David Ruppert D (2002), “Penalized Spline Estimation for Partially Linear Single-Index Models”, Journal of the American Statistical Association, pp. 1042–1054. [Google Scholar]
- 23.Wand MP & Jones MC (1995) Kernel Smoothing. Chapman & Hall London. [Google Scholar]
- 24.Genz A, Bretz F, Miwa T, Mi. X, Leisch F, et al. (2017). mvtnorm: Multivariate Normal & t Distributions, R package version 1.0–3. September 29 2017. URL: https://cran.r-project.org/web/packages/mvtnorm/.
- 25.Statisticat LLC. (2017). LaplacesDemon: Complete Environment for Bayesian Inference. Bayesian-Inference.com; R package version 16.0.1. September 29 2017. URL: https://cran.r-project.org/web/packages/LaplacesDemon/.
- 26.Hammer SM, Katzenstein DA, Hughes MD, Gundacker H, Schooley RT et al. (1996). A Trial Comparing Nucleoside Monotherapy with Combination Therapy in HIV-Infected Adults with CD4 Cell Counts from 200 to 500 per Cubic Millimeter. N. Engl. J. Med, 335: 1081–1090. [DOI] [PubMed] [Google Scholar]
- 27.Juraska M, Gilbert PB, Lu X, Zhang M, Davidianet D, et al. (2017). spe 2trial: Semi parametric efficient estimation for a two-sample treatment effect; R package version 1.0.4 2012. September 29 2017. URL: http://cran.r-project.org/package=spe2trial.
- 28.Streeck H & Nixon DF (2010). T Cell Immunity in Acute HIV-1 Infection. J Infect Dis, 202: 302–308. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Data sharing is not applicable to this article as no new data were created or analyzed in this study.