Optimal Personalized Treatment Selection with Multivariate Outcome Measures in a Multiple Treatment Case

Chathura Siriwardhana; KB Kulasekera

doi:10.1080/03610918.2021.1999473

. Author manuscript; available in PMC: 2024 Feb 16.

Published in final edited form as: Commun Stat Simul Comput. 2021 Nov 15;52(12):5773–5787. doi: 10.1080/03610918.2021.1999473

Optimal Personalized Treatment Selection with Multivariate Outcome Measures in a Multiple Treatment Case

Chathura Siriwardhana ^1,^*, KB Kulasekera ²

PMCID: PMC10871612 NIHMSID: NIHMS1840299 PMID: 38371330

Abstract

In this work we propose a novel method for individualized treatment selection when there are correlated multiple treatment responses. For the K treatment (K ≥ 2) scenario, we compare quantities that are suitable indexes based on outcome variables for each treatment conditional on patient-specific scores constructed from collected covariate measurements. Our method covers any number of treatments and outcome variables, and it can be applied for a broad set of models. The proposed method uses a rank aggregation technique that takes into account possible correlations among ranked lists to estimate an ordering of treatments based on treatment performance measures such as the smooth conditional mean. The method has the flexibility to incorporate patient and clinician preferences into the optimal treatment decision on an individual case basis. A simulation study demonstrates the performance of the proposed method in finite samples. We also present data analyses using HIV clinical trial data to show the applicability of the proposed procedure for real data.

Keywords: Design variables, Personalized Treatments, Single Index Models, Rank Aggregation

1. Introduction

The concept of personalized medicine is fairly old, but the idea advanced dramatically after the introduction of randomized controlled clinical trials that also collected additional patient information. The primary aim of clinical trials is to make population-level decisions which do not necessarily are optimal at an individual patient level. Such population-level decisions do not always account for patient heterogeneity. But the increasing availability of vast amounts of additional patient data from such studies has increased the awareness of heterogeneity in both patient characteristics and outcomes and lead to new evidence-based medicine concepts. Over the last two decades, the statistical methodology in personalized medicine has led to new methodologies and insights, mostly owing to advancement in computational power, bioinformatics discoveries, and access to electronic health data.[1–5] The goal of personalized medicine is to use data to improve decision making in health care to provide the “best” outcome for a patient based on his/her individualized features.

In real-life situations, the success of treatment may not be fully reflected through a single outcome as a variety of factors may compel both patients and clinicians to consider cure/recovery in a rather broad view. For example, in the treatment for Type-2 Diabetes, control in HbA1c, systolic blood pressure, low-density lipoproteins, cholesterol levels, and prevention from hypoglycemia and weight gain have been suggested as therapeutic goals to address the net clinical utility of a treatment.[6] In cancer studies, although the overall survival is considered the most important, a variety of other factors such as a reduction in tumor size or eradication of cancerous cells, are also considered to be meaningful outcomes.[7] Similarly, in situations where the disease is a life-threatening condition, while the time-to-event outcomes (e.g., overall survival) are commonly considered the best outcome, there could be other factors with secondary importance, such as those relating to the quality of life and economic impact. Also, it is common to use a collection of surrogate outcomes during the early development of treatments.[8] Hence, selecting the best treatment considering multiple outcome measures becomes a relevant issue for most patient populations.

Few articles have attempted to address the treatment selection in the face of multiple responses. The creation of a composite outcome via a linear combination of outcomes was advocated by Lizotte et al.[9] In a contrasting approach, Laber et al.[10] and Ertefaie et al.[11] provide continuous/dynamic learning methods for selecting set-valued treatment regimes when there are multiple responses. Rather than a single optimal treatment, the general recommendation in set-valued selections is a set of possible treatments that are “no worse” than any other in that set. When there are multiple treatments, methods proposed in both articles above conduct the selection at each of several stages comparing two treatments at a time. Another article that addresses multi-response treatment selection is by Lizotte and Laber [12] where these authors focus on a multi-objective sequential optimization method that again gives a non-dominated treatment set that is commonly known as a Pareto optimal collection among possible treatments. On the other hand, Butler et al.[13] provide a selection method for only two treatments where a single treatment is recommended using patient survey data in addition to clinical data.

To address the multiple response issue, Siriwardhana et al.[14] used a rank aggregation method to obtain an ordered list of treatments based on multiple ranked lists of treatment performance measures. However, a limitation of that technique is that the method does not fully address relationships among multiple outcomes. In treatment selection problems dealing with multiple responses, outcome measures often tend to be correlated, especially when a collection of clinical or behavioral outcomes are used. For example, scores relating to cognitive improvement in multiple cognitive domains are typically positively associated.[15,16] Similarly, when multiple surrogate markers are used on a specific clinical outcome, naturally, such markers are assumed to be correlated. As such, there are many cases where the natural dependency among responses can not be ignored, demanding that treatment selection methods be capable of addressing potential dependencies among responses, in the multiple response setting.

In this paper, we consider the selection of the optimal treatment among K possible treatments for a patient using his or her baseline characteristics when multivariate outcomes (responses) are to be considered. First, to handle statistical issues arising due to high dimensional covariates, each patient is assigned a score based on his/her covariate values. Next, the set of conditional means of multivariate responses are estimated at a given patient score, using a smooth mean estimation step for K treatment options. Finally, a rank aggregation concept that is also capable of handling potential dependencies among ranked lists is applied to find an overall ranking for the K options. The “optimal” treatment for the given score is defined as the treatment estimated to be the best-ranked option in the overall ranked list. The proposed method allows one to use apriori opinions on the importance of each response in determining the best treatment procedure. Empirical studies show that the proposed method has very desirable properties in terms of the selection frequency of the best treatment and an aggregated average gain. The article also demonstrates an application of the proposed technique to a real dataset resulted from an HIV clinical trial.

The remainder of the article is organized as follows. In Section 2, we discuss the proposed methodology. Section 3 includes simulation results followed by a real data illustration in Section 4. The main body of the paper ends with a discussion in Section 5.

2. Treatment Selection

In this section, we describe the proposed procedure on selecting the optimal personalized treatment based on multiple outcome measures. Suppose we observe J continuous response variables for each patient undergoing a treatment selected from K possible treatments and, without loss of generality, suppose larger values of each component of the J dimensional response vector are indicative of better outcomes. Let $Y_{k}^{*} = {(Y_{1 k}^{*}, \dots, Y_{J k}^{*})}^{'}$ indicate the vector of responses for the kth treatment with an associated r dimensional covariate vector X. We assume, there exists a natural correlation structure R_k among $Y_{k}^{*}$ responses. In this work we assume that we have data from a randomized clinical trial (RCT) study that provides responses and covariate information of a set of patients randomized into K arms. It should be noted that in practice, using the data resulted from a RCT experiment one cannot observe the full J × K matrix of counterfactuals $(Y_{1}^{*}, \dots, Y_{K}^{*})$ for a single patient; hence, one cannot obtain a sample from the joint distribution of $(Y_{1}^{*}, \dots, Y_{K}^{*}, X)$ . Rather, one observes K independent pairs of observations (Y_k, X_k) from marginal distributions of $(Y_{k}^{*}, X)$ for k = 1, …, K, where Y_k = (Y_1k, …, Y_Jk)′ (see Siriwardhana et al.[14,17] for a detailed discussion). Further, assume that we can use a patient’s covariate value X is to obtain a lower dimensional composite patient score U(X), as described in Siriwardhana et al.[14] say to summarize each patient’s characteristics.

Here, we consider pairs of independent observations (Y_k, X_k) from the marginal distribution of $(Y_{k}^{*}, X)$ , k = 1, …, K to select the optimal treatment for K treatments using vectors of smoothed conditional means for each treatment. We define

μ_{j k} (u_{j}) = E [Y_{j k} ∣ U_{j} (X_{k}) = u_{j}]; j = 1, \dots, J; k = 1, \dots, K

(1)

and vectors μ_k(u) = (μ_1k(u₁), …, μ_Jk(u_J))′ for U = (U₁, …, U_j)′ and u = (u₁, …, u_J)′ where components of these vectors correspond to each 1, …, J response. Although we suppress the dependence of U_js on the covariate vector for brevity, in all developments below, quantities related to U_js are functions of X.

In our proposed approach, we rank the K values for each component of μ_k(u) vectors (k = 1, …, K) to get size K vectors v_j(u) = (v_j1(u), …, v_jK(u))′, where v_jk(u) is the rank of μ_jk k = 1, …, K among μ_j(u) for each j (here j = 1, .., J) with the largest μ_jk(u_j) value given the rank 1. We next produce an overall rank by following an aggregation technique as a basis to find the optimal treatment.

In their previous approach Siriwardhana et al.[14] used a aggregation method by Pihur et al.^18,19 to combine these rank vectors v_j(u); j = 1, …, J to get an overall ranking of treatments $v^{*} (u) = (v_{1}^{*}, \dots v_{K}^{*})$ for a given patient score u. They defined the optimal treatment as

k^{*} (u) = arg min_{1 \leq k \leq K} {v_{k}^{*}} .

(2)

However, the rank aggregation method by Pihur et al.[18, 19] does not account for dependencies among the J rank lists, which could potentially lead to unwanted estimation errors when such dependencies exist. In many real life problems dealing with multiple responses, natural associations among responses can not be ignored. In finite sample cases, unignorable dependencies among estimated response means due to correlated responses, subsequently translate to correlated rank lists when one uses rankings of those estimated means. In the current work we propose to use an aggregation technique that can be utilized under correlated rank lists for the proposed treatment selection concept. We detail the proposed aggregation step in the sequel.

We link the jth component Y_jk of the response vector Y_k for the kth treatment and covariates X_k via a Single Index Model (SIM). The SIM formulation provides great flexibility and reasonable efficiency in modeling many types of data. This model is expressed as,

Y_{j k} = g_{j k} (β_{j k}^{'} X_{k}) + ϵ_{j k}

(3)

for j = 1, …, J and k = 1, …, K, where each β_jk is a r-vector of parameters, g_jk is an unknown smooth link function and ϵ_jks are error terms with E[ϵ_jk|X] = 0. We assume independence of ϵ_jks across k = 1, …, K for a fixed j where these terms are correlated across js for any given k.

RCT data used in our approach are of the form (Y_ki, X_ki) where Y_ki = (Y_1ki, …, Y_Jki)′ and Y_jki indicates the jth component of the response for the ith individual under treatment k with associated covariate values X_ki, i = 1, …, n_k. The relationship (3) between response and covariates for such a sample can be written as

Y_{j k i} = g_{j k} (β_{j k}^{'} X_{k i}) + ϵ_{j k i}, i = 1, \dots, n_{k}; j = 1, \dots, J and k = 1, \dots, K, .

(4)

Following Siriwardhana et al.^14,17 we define a score vector U(X) for a patient with covariate X as follows. First define,

S_{j k} (X) = g_{j k} (β_{j k}^{'} X) - max_{l \neq k} {g_{j l} (β_{j l}^{'} X)} .

Next, define the jth component of the combined overall score vector as

S_{j} (X) = max_{k} {S_{j k}} δ_{j} (X) = arg max_{k} {S_{j k}} .

(5)

The overall score is given as U(X) = (U₁(X), …, U_J(X))′ where U_j(X) = (S_j(X), δ_j(X))′ for j = 1, …, J.

Since model functions defined in (3) contain unknown parameters, components of these score vectors should be estimated using a standard function estimation method. In the literature, there are many different estimation techniques available for estimating the link function and the index vector of a SIM, allowing us to use one out of several available reasonable estimation methodd to estimate the gs and the βs.[20–22] We adopt the Hristache et al.[21] procedure in our simulations and data analysis in the sequel.

In particular, for any given vector X = x, let

{\hat{U}}_{j k} (x) = {\hat{g}}_{j k} ({\hat{β}}_{j k}^{'} x) - max_{m \neq k} {{\hat{g}}_{j m} ({\hat{β}}_{j m}^{'} x)} {\hat{S}}_{j} (x) = max_{k} {{\hat{S}}_{j k} (x)} {\hat{δ}}_{j} (x) = arg max_{k} {{\hat{S}}_{j k} (x)} and {\hat{U}}_{j} (x) = {({\hat{S}}_{j} (x), {\hat{δ}}_{j} (x))}^{'}; j = 1, \dots, J .

(6)

As suggested in Siriwardhana et al.[14], a suitable estimator for μ_jk(u_j), k = 1, …, K at a given u_j = (s_j, d_j)′ can be obtained using the smooth mean estimator given by,

{\hat{μ}}_{j k} (u_{j}) = \frac{\sum_{i = 1}^{n_{k}} Y_{j k i} ω ((s_{j} - {\hat{S}}_{j} (X_{k i})) / h_{j k}) I ({\hat{δ}}_{j} (X_{k i}) = d_{j})}{\sum_{i = 1}^{n_{k}} ω ((s_{j} - {\hat{S}}_{i} (X_{k i})) / h_{j k}) I ({\hat{δ}}_{j} (X_{k i}) = d_{j})},

(7)

where w is a kernel function with ω ≥ 0 and ∫ ω(t)dt = 1, and h_jk, k = 1, …, K are a set of smoothing parameters. Here I(A) is the indicator of A. The bandwidth selection for estimating μ_kis is a challenging issue. However, as Siriwardhana et al.[14] suggested, methods given in Wand and Jones [23] for kernel smoothing provides a reasonable solution for this estimation problem.

For a realization x₀ of the covariate X, if one could find the corresponding realizations of the scores, u_j0 = (S_j(x₀), δ_j(x₀))′, this allows estimating μ_jk(u_j0) by ${\hat{μ}}_{j k} (u_{j 0})$ . However, due to aforementioned reasons, one may only find an estimate ${\hat{u}}_{j 0} = {({\hat{s}}_{i 0}, {\hat{d}}_{j 0})}^{'}$ of u_j0 using (6) above. Thus, in practice one may use ${\hat{μ}}_{j k} ({\hat{u}}_{j 0})$ as the estimate of μ_jk(u_j0) for j = 1, …, J; k = 1, …, K, with û₀ = (û₁₀, …, û_J0)′.

Now we rank the K components of ${\hat{μ}}_{j} ({\hat{u}}_{0}) = {({\hat{μ}}_{j 1} ({\hat{u}}_{j 0}), \dots {\hat{μ}}_{j K} ({\hat{u}}_{j 0}))}^{'}$ vectors, j = 1, …, J, to get size K vectors v_j(û₀) = (v_j1(û_j0), …, v_jK(û_j0))′, where v_jk(û_j0) is the rank of ${\hat{μ}}_{j k} ({\hat{u}}_{j 0})$ among ${\hat{μ}}_{j 1} ({\hat{u}}_{j 0}), \dots, {\hat{μ}}_{j K} ({\hat{u}}_{j 0})$ , k = 1, …, K for each j (here j = 1, .., J) with the largest ${\hat{μ}}_{j k} ({\hat{u}}_{j 0})$ value given the rank 1. Note that these vectors of ranks v_j(û₀), j = 1, …, J are correlated. Then, we use the following aggregation method to combine these rank vectors to get an overall ranking of treatments.

First, find the distance γ_j(v, û₀) = η(v, v_j(û₀)), for a given v ∈ V_K, a rank list of length K, where V_K is all permutations of integers {1, .., K}. Here we propose to use Spearman’s rank distance (Pihur et al.[18]) for η(.), which is defined as

\sum_{k = 1}^{K} | v_{k} - v_{r_{k}} | {| M_{v_{k}} - M_{v_{r_{k}}} |}^{ρ},

where M₁, …, M_K is a list of real values, r₁, …, r_K are the ranks of M_k, k = 1, …, K and v₁, …, v_K is a permutation of integers 1, …, K and ρ is a positive number.

Let Γ(v, û₀) = (γ₁(v, û₀), …, γ_J(v, û₀))′ for a fixed rank vector v. Now, we define an overall rank distance L(v, û₀) by

L (v, {\hat{u}}_{0}) = {[Λ^{1 / 2} Γ (v, {\hat{u}}_{0})]}^{'} {[D (v, {\hat{u}}_{0})]}^{- 1} [Λ^{1 / 2} Γ (v, {\hat{u}}_{0})],

(8)

where D(v, û₀) is a suitable dispersion matrix for Γ(v, û₀). Here Λ is a diagonal matrix with diagonal elements (τ₁, …, τ_J) which signify the practical importance of 1, .., J responses including the views of patients and clinicians. As such, L(v, û₀) can be considered as a Mahalanobis type weighted distance from Γ(v, û₀) to the origin $ℝ_{+}^{J}$ . We propose to minimize L(v, û₀), with respect to v ∈ V_K, for an estimated score û₀ corresponding to a new patient. Suppose this minimum occurs at a vector $v^{*} ({\hat{u}}_{0}) = {(v_{1}^{*} ({\hat{u}}_{0}), \dots, v_{K}^{*} ({\hat{u}}_{0}))}^{'}$ ; i.e.,

v^{*} ({\hat{u}}_{0}) = arg min_{v \in V_{K}} {L (v, {\hat{u}}_{0})} .

(9)

We then define the optimal treatment as

k^{*} ({\hat{u}}_{0}) = arg min_{1 \leq k \leq K} {v_{k}^{*} ({\hat{u}}_{0})} .

(10)

We approximate D(v, û₀) by a suitable weighted dispersion matrix $\tilde{D} (v, {\hat{u}}_{0})$ , calculated from Γ(v, û_ki)s using ${\hat{μ}}_{j} ({\hat{u}}_{j k i}) s$ , i = 1, …, n_k; k = 1, …, K, corresponding to training samples from each treatment group. The weights W_kis, say, for i = 1, …, n_k; k = 1, …, K, for this calculation are developed using a localization approach to be described below. For a given v ∈ V_K we find Γ(v, û_ki) = (γ₁(v, û_ki), …, γ_J(v, û_ki))′ via calculation of distances γ_j(v, û_ki) = η(v, v_j(û_jki)), to obtain,

\tilde{D} (v, {\hat{u}}_{0}) = \frac{\sum_{k = 1}^{K} \sum_{i = 1}^{n_{k}} W_{k i} [Γ (v, {\hat{u}}_{k i}) - m (Γ (v, \hat{u}))] {[Γ (v, {\hat{u}}_{k i}) - m (Γ (v, \hat{u}))]}^{'}}{\sum_{k = 1}^{K} \sum_{i = 1}^{n_{k}} W_{k i}},

(11)

where m(.) is the weighted average given by

m (Γ (v, \hat{u})) = \frac{\sum_{k = 1}^{K} \sum_{i = 1}^{n_{k}} W_{k i} Γ (v, {\hat{u}}_{k i})}{\sum_{k = 1}^{K} \sum_{i = 1}^{n_{k}} W_{k i}} .

(12)

In developing $\tilde{D} (v, {\hat{u}}_{0})$ , we use a localizing weighting scheme to localize Γ(v, û_ki) vectors at û₀ = (û₁₀, …, û_J0)′ using K training samples of size n_k each in order to achieve a reasonable approximation for D(v, û₀). Our approach is the following. For the ith training observation in the kth group with a score value ${\hat{u}}_{j k i} = ({\hat{s}}_{j k i}, {\hat{d}}_{j k i})$ , a weight is defined as $w_{j k i} = ω (\frac{{\hat{s}}_{j k i} - {\hat{s}}_{j 0}}{{\tilde{h}}_{j k}})$ using the bandwidth ${\tilde{h}}_{j k}$ and kernel ω(.) for j = 1, …, J and k = 1, …, K. Next, define an overall weight as the product of J individual weights $W_{k i} = \prod_{j = 1}^{J} w_{j k i} I ({\hat{d}}_{j 0} = {\hat{d}}_{j i k})$ . For this purpose, we propose to use the same bandwidth h_jk and kernel ω(.), that were used in the calculation of ${\hat{μ}}_{j k} ({\hat{u}}_{j k i}) s$ , corresponding to the ith training patient in the kth set; i = 1, …, n_k, k = 1, …, K. Even though such a selection of weights may not be the optimal, as observed from empirical results, these weights seem to be reasonable and flexible to implement the proposed procedure.

In summary, the proposed approach can be implemented as follows:

Using the historical RCT data (training set), estimate the set of J × K single index models that are relating covariates and J response variables corresponding to each treatment k, k = 1, …, K (i.e., model (4)).
Estimate scores for all patients in the training data set and the new patient. Use the criteria given in formula (6) to estimate the jth sub-component of the score with respect to jth response û_j. Find the estimated overall score û = (û₁, …, û_J) in each case.
For the new patient with an estimated score û₀ = (û₀₁, …, û_0J), estimate the mean outcome of the jth response for the kth treatment ${\hat{μ}}_{j k} ({\hat{u}}_{0 j})$ via the smooth mean estimator given in equation (7). Any symmetric kernel can be used for this estimation step. Next, specify the estimated mean outcomes for responses for the kth treatment as ${\hat{μ}}_{1 k} ({\hat{u}}_{01}), \dots, {\hat{μ}}_{J k} ({\hat{u}}_{0 J})$ , k = 1, …, K.
Rank the K components of ${\hat{μ}}_{j} ({\hat{u}}_{0}) = {({\hat{μ}}_{j 1} ({\hat{u}}_{j 0}), \dots {\hat{μ}}_{j K} ({\hat{u}}_{j 0}))}^{'}$ vectors j = 1, …, J, to get size K vectors v_j(û₀) = (v_j1(û_j0), …, v_jK(û_j0))′.
Find the distance γ_j(v, û₀) = η(v, v_j(û₀)) for j = 1, …, J, for a given v ∈ V_K, and construct Γ(v, û₀) = (γ₁(v, û₀), …, γ_J(v, û₀))′.
Specify response weights for the diagonal matrix Λ. For example, use a J × J identity matrix in the equal weight case. Next, calculate L(v, û₀) for the given Λ. Here D(v, û₀) is estimated via the estimator given in equation (11).
Finally, use the criterion given in (9) for estimating the optimal rank vector and then use the criterion given in (10) for the optimal treatment selection.

3. Empirical Studies

In this section we present a simulation study that investigates the properties of the proposed procedure in finite samples.

We performed a series of simulations with the proposed procedure under various settings to evaluate its performance. Primarily, we focused on the accuracy of treatment assignment of a new (test) observation using estimated values of μ_jk functions from a set of training data. This simulation study was performed for K = 2, 3 treatments with response dimension J = 2, 3, and 4. To illustrate the personalized medicine treatment concept, in our simulations we selected our model sets such that each model in a set dominates other competing models for some combination of covariate values. In particular, none of the considered models fully dominate other models within the whole covariate space. Hence, subjects with distinct covariates vectors could experience the highest response from different treatments.

In our study, we first simulated K independent multivariate (dimension J) samples of size n (n = 50 or n = 100) per group. The components of the r dimensional covariate vectors X were generated independently from the U(−1, 1) distribution, where r was fixed at 10. Using various link functions and index vectors, we obtained the treatment responses from model (3) for each k. We examined the performance of the proposed methodology under a set of highly nonlinear regression models given by sine/cosine function that followed the SIM structure. Such trigonometric functions pose most difficulties in smooth estimation and therefore we believe these models somewhat present worst case scenarios with respect to estimation of the optimal treatment. A set of model functions used for the study is provided in Table 1. For a given k, k = 1, …, K, the errors were generated from either a J dimensional multivariate normal distribution or a J dimensional multivariate double exponential distribution with zero mean and a compound symmetric correlation matrix where the off-diagonal values were chosen from the set {0.1, 0.5, 0.9}. The R package mvtnorm (Genz et al.[24]) was used for the generation of these random vectors in the multivariate normal case, where the dispersion parameter σ_N was chosen from the set {0.1, 0.3, 0.5}. We used the R package LaplacesDemon (Statisticat.[25]) for generating Double Exponential random variables with dispersion parameter σ_D chosen from {0.1, 0.3, 0.5}.

Table 1:

Sets of smooth mean functions used for generating treatment responses. We choose the common vector C to be a unit vector; $C = {(\frac{1}{\sqrt{10}}, \dots \frac{1}{\sqrt{10}})}^{'}$ , for all combinations of j and k.

		Treatment Group (k)
		k = 1	k = 2	k = 3
Response (j)	j = 1	$sin {π (C^{'} X)}$	$sin {\frac{π}{3} + π (C^{'} X)}$	$sin {\frac{- π}{3} + π (C^{'} X)}$
	j = 2	$cos {π (C^{'} X)}$	$cos {\frac{π}{3} + π (C^{'} X)}$	$cos {\frac{- π}{3} + π (C^{'} X)}$
	j = 3	$sin {\frac{π}{2} (C^{'} X)}$	$sin {\frac{π}{3} + \frac{π}{2} (C^{'} X)}$	$sin {\frac{- π}{3} + \frac{π}{2} (C^{'} X)}$
	j = 4	$cos {\frac{π}{2} (C^{'} X)}$	$cos {\frac{π}{3} + \frac{π}{2} (C^{'} X)}$	$cos {\frac{- π}{3} + \frac{π}{2} (C^{'} X)}$

Open in a new tab

Once the K samples were generated, we estimated the corresponding SIMs followed by an estimation of scores at each covariate value. SIMs were estimated by the procedure given in Hristache et al.[21] using Epanechnikov kernels. Then, a new covariate value X₀ was generated in the same manner as above, and for its corresponding estimated score û₀, we calculated ${\hat{μ}}_{j k} ({\hat{u}}_{j 0})$ for k = 1, …, K; j = 1, …, J. Similarly, we calculated ${\hat{μ}}_{j k} ({\hat{u}}_{j k i}) s$ for the ith patient in the kth training set; i = 1, …, n_k; k = 1, …, K, for dimension j, j = 1, …, J. Next we obtained rank distances γ_j(v, û_ki) = η(v, v_j(û_jki)) for a given a rank list v ∈ V_K, where V_K is the set of all permutations of the integers {1, …, K} and η(.) is the Spearman’s footrule distance function with ρ = 1. This produced a vector of rank distances Γ(v) = (γ₁(v, û_ki), …, γ_J(v, û_ki))′. Next, following the procedure in (11), a localized dispersion matrix $\tilde{D} (v, {\hat{u}}_{0})$ , at the neighborhood of û₀ = (û₁₀,…, û_J0) was obtained as an approximation for D(v, û₀), based on the K training sets of size n_k each, k = 1, …, K. This was followed by the calculation of L(v, û₀) given in (8) for a chosen Λ matrices and we then use the proposed procedure given in (10) to estimate corresponding ${\hat{k}}^{*} ({\hat{u}}_{0})$ . The kernel function in this estimation was taken to be a Normal (0, 1) probability density function. We chose all bandwidths by the algorithm given by Wand and Jones [23] for each h_jk, k = 1, …, K; j = 1, …, J.

Next, to define the “correct selection” for the above covariate value X₀, we follow the following approach. First, we generated K new response vectors, Ỹ_k0 = (Ỹ_1k0, …, Ỹ_Jk0)′, k = 1, …, K, each with mean vector ${(g_{1 k} (β_{1 k}^{'} X_{0}), \dots, g_{J k} (β_{J k}^{'} X_{0}))}^{'}$ for k = 1, …, K, corresponding to this X₀ using model (3) where the errors were generated independently from the same error distribution that was used to generate the K original samples. Our approach will be to define quantities similar to those given in (8), (9) and (10) using these new observation vectors.

To do that, we ranked rows of the J × K matrix Ỹ₀ created from response vectors Ỹ_k0; k = 1, …, K to get size K vectors v_j(Ỹ₀) = (v_j1(Ỹ₀), …, v_jK(Ỹ₀))′, where v_jk(Ỹ₀) is the rank of Ỹ_jk0 among Ỹ_j10, …, Ỹ_jK0 for each j, j = 1, .., J. Then we proceeded to the calculation of rank distance $γ_{j {\tilde{Y}}_{0}} (v) = η (v, v_{j} ({\tilde{Y}}_{0}))$ and the corresponding distance vector $Γ_{{\tilde{Y}}_{0}} (v) = {(γ_{1} {\tilde{Y}}_{0} (v), \dots, γ_{J {\tilde{Y}}_{0}} (v))}^{'}$ for a fixed v. Next we obtained a suitable dispersion matrix, say ${\tilde{D}}_{{\tilde{Y}}_{0}} (v))$ for $Γ_{{\tilde{Y}}_{0}} (v)$ as follows.

We generated additional ñ = 5, 000 observations Ỹ_k0l, l = 1, …, ñ with mean vectors ${(g_{1 k} (β_{1 k}^{'} X_{0}), \dots, g_{J k} (β_{J k}^{'} X_{0}))}^{'}$ for each k = 1, …, K, again with the same error distribution. Then we calculated the corresponding $Γ_{{\tilde{Y}}_{0 l}} (v), l = 1, \dots, \tilde{n}$ . Then we formulate a scaling matrix ${\tilde{D}}_{{\tilde{Y}}_{0}} (v)$ by

{\tilde{D}}_{{\tilde{Y}}_{0}} (v) = \frac{\sum_{l = 1}^{\tilde{n}} [Γ_{{\tilde{Y}}_{0 l}} (v) - \hat{E} (Γ_{{\tilde{Y}}_{0}} (v))] {[Γ_{{\tilde{Y}}_{0 l}} (v) - \hat{E} (Γ_{{\tilde{Y}}_{0}} (v))]}^{'}}{\tilde{n} - 1}

where $\hat{E} (Γ_{{\tilde{Y}}_{0}} (v))$ is the sample mean of vectors $Γ_{{\tilde{Y}}_{0 l}} (v), l = 1, \dots, \tilde{n}$ . Now, we let $L_{{\tilde{Y}}_{0}} (v)$ below be the overall rank distance for the responses Ỹ_k0; k = 1, …, K for fixed v and the response weights matrix Λ,

L_{{\tilde{Y}}_{0}} (v) = {[Λ^{1 / 2} Γ_{{\tilde{Y}}_{0}} (v)]}^{'} {[{\tilde{D}}_{{\tilde{Y}}_{0}} (v)]}^{- 1} [Λ^{1 / 2} Γ_{{\tilde{Y}}_{0}} (v)] .

We then defined

v_{{\tilde{Y}}_{0}}^{*} = arg min_{v \in V_{K}} {L_{{\tilde{Y}}_{0}} (v)} .

and called the true optimal treatment $\tilde{k}$ corresponding to response vectors Ỹ_k0; k = 1, …, K as the one that ranked 1 in the optimal rank list $v_{{\tilde{Y}}_{0}}^{*}$ . The treatment assignment for the new patient was considered to be correct if ${\hat{k}}^{*} = \tilde{k}$ .

We repeated this procedure 1, 000 times for each model and error distribution combination. Frequencies of correct treatment assignments for a representative set of cases are given in the Tables 2 to 3, and Supplementary Tables 1 to 6.

Table 2:

Accuracies of treatment selection in 1000 test cases using the proposed technique for the case of three treatments and four responses (equally weighted).

Error dist.	Sample size per group	Error dist. parameter	Error correlation
Error dist.	Sample size per group	Error dist. parameter	ρ = 0.1	ρ = 0.5	ρ = 0.9
Normal	N = 50	σ_N = 0.1	815	822	843
		σ_N = 0.2	730	767	780
		σ_N = 0.3	669	684	668
	N = 100	σ_N = 0.1	867	872	916
		σ_N = 0.2	816	801	832
		σ_N = 0.3	718	703	736
DE	N = 50	σ_D = 0.1	796	827	861
		σ_D = 0.2	743	753	748
		σ_D = 0.3	692	665	657
	N = 100	σ_D = 0.1	871	873	907
		σ_D = 0.2	785	797	814
		σ_D = 0.3	727	702	717

Open in a new tab

Table 3:

Accuracies of treatment selection in 1000 test cases using the proposed technique for the case of three treatments and four responses, using weights τ₁ = 0.4, τ₂ = 0.3, τ₃ = 0.2, and τ₄ = 0.1, for responses 1, 2, 3, and 4, respectively.

Error dist.	Sample size per group	Error dist. parameter	Error correlation
Error dist.	Sample size per group	Error dist. parameter	ρ = 0.1	ρ = 0.5	ρ = 0.9
Normal	N = 50	σ_N = 0.1	826	834	839
		σ_N = 0.2	754	782	800
		σ_N = 0.3	676	679	672
	N = 100	σ_N = 0.1	856	877	904
		σ_N = 0.2	763	816	848
		σ_N = 0.3	693	706	738
DE	N = 50	σ_D = 0.1	800	830	874
		σ_D = 0.2	743	759	769
		σ_D = 0.3	687	667	657
	N = 100	σ_D = 0.1	844	875	912
		σ_D = 0.2	753	810	839
		σ_D = 0.3	699	689	728

Open in a new tab

Simulations results demonstrate reasonable selection accuracies in the each scenario considered. More importantly, the selection frequency remained consistent at large values of the response correlation, indicating the potential of the proposed technique for such cases. As to be expected, results reveal that the selection accuracy drops when the error distribution has a high dispersion, as well with smaller sample sizes. Note that the presented simulation results are based on sine functions which are bounded to be in (−1, 1). Hence, an increment in the error dispersion parameter such as 0.1 adds a relatively large noise to a model. We observed comparable performance under both Normal and Double exponential errors. In all simulations, model functions were chosen so that they would dominate all other model functions at some covariate values. We have not investigated cases where two or more models were identical and dominating all others because in that case, those models which dominate all others will have an equal chance of being selected. We also conducted a simulation study to compare the accuracy of the proposed procedure with the method by Siriwardhana et al.[14] that does not account for associations (Supplementary Table-7). Results of this study indicate improved performance by the proposed technique especially for high response correlation cases.

In addition to studying the performance of the proposed method via the accuracy of correct selection, we also investigated the impact of using our method in terms of a composite average gain of responses. This investigation sheds light on the impact of possible wrong assignments for cases where the treatment chosen by our method is only marginally superior in terms of its overall rank compared with its nearest competitor (the treatment with the rank 2) as well as its worst competitor (the treatment with the rank K). Suppose $v_{k_{1}}^{*} ({\hat{u}}_{0}) = 1$ , $v_{k_{2}}^{*} ({\hat{u}}_{0}) = 2$ and $v_{k_{K}}^{*} ({\hat{u}}_{0}) = K$ . For a fixed j, j = 1, …, J, define, $λ_{12}^{j} ({\hat{u}}_{0})$ and $λ_{1 K}^{j} ({\hat{u}}_{0})$ as

λ_{12}^{j} ({\hat{u}}_{0}) = {\tilde{μ}}_{j k_{1}} ({\hat{u}}_{0}) - {\tilde{μ}}_{j k_{2}} ({\hat{u}}_{0}),

and

λ_{1 K}^{j} ({\hat{u}}_{0}) = {\tilde{μ}}_{j k_{1}} ({\hat{u}}_{0}) - {\tilde{μ}}_{j k_{K}} ({\hat{u}}_{0}) .

where ${\tilde{μ}}_{j k_{1}} ({\hat{u}}_{0}) = g_{j k_{1}} (β_{j k_{1}}^{'} X_{0})$ , ${\tilde{μ}}_{j k_{2}} ({\hat{u}}_{0}) = g_{j k_{2}} (β_{j k_{2}}^{'} X_{0})$ , and ${\tilde{μ}}_{j k_{K}} ({\hat{u}}_{0}) = g_{j k_{K}} (β_{j k_{K}}^{'} X_{0})$ . Define λ₁₂(û₀) and λ_1K(û₀) as weighted sums of $λ_{12}^{j} ({\hat{u}}_{0}) s$ and $λ_{1 K}^{j} ({\hat{u}}_{0}) s$ , respectively, as

λ_{12} ({\hat{u}}_{0}) = \sum_{j = 1}^{J} τ_{j} λ_{12}^{j} ({\hat{u}}_{0}),

and

λ_{1 K} ({\hat{u}}_{0}) = \sum_{j = 1}^{J} τ_{j} λ_{1 K}^{j} ({\hat{u}}_{0}),

where τ_j is the jth, j = 1, .., J, diagonal element of Λ, that contains priority weights of the responses. Now, letting C be the theoretical maximum for any λ_1K(u) within the whole covariate domain, we average λ₁₂(û₀)/C and λ_1K(û₀)/C for the 1000 new test cases and denote them by Ω₁₂ and Ω_1K, respectively.

Note that, measures given by Ωs quantify the potential gains/losses in terms of the weighted aggregation of mean outcomes for borderline cases (i.e. mis-classifications close to the decision boundary) as well as cases that are furthest from the optimal case according to the proposed procedure. We report a few results in Tables 4 and 5. Positive values of Ω₁₂ and Ω_1K indicate aggregated relative average gains in expected treatment outcomes by our treatment selection technique. As noted in tables, all Ω values are positive and they are higher for Ω_1K cases than those for corresponding Ω₁₂s indicating that our proposed procedure results gains with respect to average responses.

Table 4:

Ω₁₂ and Ω₁₃ values calculated using 1000 test cases by the proposed method. Three treatments with four responses, using equal weights.

Error dist.	Sample size sper group	Error dist. parameter	Error correlation (ρ)
			ρ = 0		ρ = 0.5		ρ = 0.9
			Ω₁₂	Ω₁₃	Ω₁₂	Ω₁₃	Ω₁₂	Ω₁₃
Normal	N = 50	σ_N = 0.1	0.211	0.591	0.202	0.582	0.191	0.575
		σ_N = 0.2	0.188	0.556	0.210	0.570	0.194	0.555
		σ_N = 0.3	0.181	0.527	0.184	0.525	0.171	0.527
	N = 100	σ_N = 0.1	0.217	0.586	0.217	0.591	0.208	0.587
		σ_N = 0.2	0.215	0.584	0.211	0.582	0.209	0.580
		σ_N = 0.3	0.205	0.580	0.218	0.599	0.210	0.566
DE	N = 50	σ_D = 0.1	0.185	0.556	0.208	0.580	0.210	0.597
		σ_D = 0.2	0.197	0.565	0.205	0.565	0.187	0.552
		σ_D = 0.3	0.184	0.541	0.172	0.510	0.170	0.517
	N = 100	σ_D = 0.1	0.223	0.598	0.225	0.597	0.220	0.600
		σ_D = 0.2	0.207	0.586	0.219	0.606	0.204	0.576
		σ_D = 0.3	0.197	0.574	0.208	0.586	0.221	0.588

Open in a new tab

Table 5:

Ω₁₂ and Ω₁₃ values calculated using 1000 test cases by the proposed method. Three treatments with four responses, using weights τ₁ = 0.4, τ₂ = 0.3, τ₃ = 0.2, and τ₄ = 0.1, for responses 1, 2, 3, and 4, respectively.

Error dist.	Sample size sper group	Error dist. parameter	Error correlation (ρ)
			ρ = 0		ρ = 0.5		ρ = 0.9
			Ω₁₂	Ω₁₃	Ω₁₂	Ω₁₃	Ω₁₂	Ω₁₃
Normal	N = 50	σ_N = 0.1	0.181	0.592	0.175	0.579	0.162	0.572
		σ_N = 0.2	0.162	0.559	0.178	0.569	0.171	0.558
		σ_N = 0.3	0.149	0.521	0.161	0.536	0.142	0.520
	N = 100	σ_N = 0.1	0.177	0.569	0.175	0.573	0.171	0.571
		σ_N = 0.2	0.176	0.567	0.170	0.563	0.172	0.565
		σ_N = 0.3	0.162	0.553	0.173	0.575	0.171	0.554
DE	N = 50	σ_D = 0.1	0.157	0.550	0.171	0.570	0.173	0.586
		σ_D = 0.2	0.166	0.562	0.173	0.565	0.158	0.545
		σ_D = 0.3	0.155	0.533	0.158	0.518	0.153	0.515
	N = 100	σ_D = 0.1	0.182	0.583	0.187	0.581	0.177	0.578
		σ_D = 0.2	0.168	0.567	0.174	0.578	0.164	0.557
		σ_D = 0.3	0.156	0.545	0.169	0.564	0.183	0.572

Open in a new tab

4. ACTG-175 HIV Clinical Trial

In this section, we provide another illustration of our proposed method using, the data resulted from the ACTG 175 Clinical Trial (Hammer et al.[26]). This clinical trial was a randomized, double-blinded, placebo-controlled clinical trial that was conducted to compare single nucleoside or two nucleosides antiviral medications in adults infected with human immunodeficiency (HIV-1) whose T-cell CD4 counts were in the range of 200 to 500 per cubic millimeter. The study randomized HIV-1–infected patients to one of four daily regimens: 600 mg of zidovudine (arm-0), 600 mg of zidovudine plus 400 mg of didanosine (arm-1), 600 mg of zidovudine plus 2.25 mg of zalcitabine (arm-2), or 400 mg of didanosine (arm-3). The data set by Juraska et al.[27] contains information on 2,136 HIV-1–infected subjects. Arms 0, 1, 2, and 3 contain 532, 519, 524, and 561 patients, respectively.

Our intention in this data analysis is merely to demonstrate what would be the optimal treatment for a new subject if one were to use the proposed treatment selection based on individual patient characteristics when training samples with corresponding covariate values for multiple treatments are available in advance. In this illustration, rather than assuming there is a standard treatment (i.e., control) that is being compared with experimental treatment(s) as in a typical clinical trial, we take the stance that given multiple treatments can be used to treat a patient, what would be optimal for the individual based on his/her characteristics. In this study, subjects were examined periodically, capturing their T-cells counts (i.e, CD4 T helper cells and CD8 cytotoxic T cells), that are critical components in the human immune system. The scientific literature on HIV/AIDS often declares CD4 cells as the primary T-cell type that is suppressing the HIV cell replication, by signaling various other cells for immune response. For an HIV patient, the severity of the disease progression is directly measured by the decline in CD4 counts. The important role of the CD8 cell is typically referred to as the antibody reaction against cancers and various types of other viruses. However, some studies have illustrated the important role of the CD8 during the early stages of HIV progression (e.g., see, Streeck and Nixon [28]).

In previous a study, Sriwardhana et al.[14] used this dataset to demonstrate personalized treatment strategies with respect to log-transformed dual outcomes given by CD4 and CD8 counts of a patient after 20 weeks. However, their analysis did not account for correlation among the two responses, whereas the current work has the capability to handle such dependencies. We estimated the Pearson product-moment correlation between CD4 and CD8 responses as 0.251 using the observed data. In this analysis also we used log transformed CD4 and CD8 counts after 20 weeks as our responses. As covariates, we used log-CD4 and log-CD8 counts at baseline, age, weight, and the number of months a patient received the pre-antiviral therapy.

As a training set, we selected 200 patients at random from each group and considered their data as outcomes of a RCT trial in which those patients were randomly assigned to one of the four treatments. We used the training data to estimate the corresponding SIMs for two outcomes: log -CD4, and log -CD8, together with above covariates. Considering the remaining 1336 cases as “new” patients, we applied the proposed treatment selection for those cases. In this illustration, we ranged the “importance” weights from 0 to 1 for each response. This procedure was repeated for 1000 random training (i.e., size = 200, per group) and testing (i.e., size = 1336) sets to obtain average assignments for those 1000 partitions.

We summarized the resulted assignments for test patients in Table (6). For example, when we chose log-CD4:log-CD8 weights to be 0.8:0.2, using the proposed treatment selection, 3.26%, 64.50%, 15.48%, and 16.76% test patients are proposed to be assigned to Arms 0, 1, 2, and 3, respectively; whereas the corresponding assignment for weights 0.5:0.5 (equal weights) are 5.28%, 61.41%, 14.31%, and 19.00%. The overall pattern of these assignments indicates that only a few patients are proposed to be assigned to zidovudine alone, which was used as the control arm in the study (arm-0). Our analysis using individual characteristics shows that Treatment Arm 1 appears to be a choice for most patients while the control Arm is the least chosen. It is interesting that the marginal analysis of ACTG-175 data (Hammer et al.[26]) concluded, that treatment Arms 1, 2 and, 3 all slowed the progression of HIV disease at a higher level compared to Arm 0 although no Arm among 1,2 and 3 was selected over others. Proposed assignments in our approach appear to have classified many patients for Arm 1 compared with others.

Table 6:

Treatment assignment summary for ACTG-175 Clinical Trial data, by the proposed method selecting both CD4 and CD8 counts as clinical response, using weights τ_CD4 and τ_CD8 for CD4 and CD8 counts, respectively.

Weights		Group assignments (%)
τ _CD4	τ _CD8	Arm-0	Arm-1	Arm-2	Arm-3
1	0	1.90	65.30	17.79	15.01
0.8	0.2	3.26	64.50	15.48	16.76
0.6	0.4	4.53	62.64	14.61	18.22
0.5	0.5	5.28	61.41	14.31	19.00
0.4	0.6	6.20	59.19	14.03	19.83
0.8	0.2	8.94	55.52	13.54	22.00
0	1	19.02	42.50	12.72	25.76

Open in a new tab

Extending our analysis we conducted the following study to examine the effectiveness of proposed treatment selection for test patients compared to their original random assignment.

Suppose CD4 and CD8 responses indicated by j = 1 and j = 2, respectively. For a given testing set, consider the tth test subject with a covariate value X_t who was randomly assigned to a particular arm in the original study. Suppose the individual’s estimated score is û_t = (û_1t, û_2t), where û_1t = (s₁, d₁) and û_2t = (s₂, d₂) are marginal estimated scores based on CD4 and CD8 responses respectively. In our approach, we attempted estimating the gain in conditional means of log-CD4, and log-CD8 values for an individual, comparing the proposed assignment versus his/her original assignment. Let v, v = 0, .., 3, be the treatment option suggested by the proposed technique for the test patient based on his/her estimated score û_t and let v₀, v₀ = 0, .., 3 be the treatment group patient was assigned in the original trial by the random assignment. We define the average conditional gain Δ_j for a given j, j = 1, 2, as

Δ_{j t} = E (Y_{j v} ∣ {\hat{u}}_{t}) - Y_{j v_{0}},

where Y_jv0 is the observed value by the original assignment. Following, Siriwardhana et al. [14], we estimated E(Y_jv|û_t) using a smooth mean estimator given by,

E (Y_{j v} ∣ {\hat{u}}_{t}) = \frac{\sum_{l = 1}^{n_{v}} Y_{j v l} \prod_{j = 1}^{2} ω ((s_{j} - {\hat{S}}_{j} (X_{v l})) / h_{j v}) I ({\hat{δ}}_{j} (X_{v l}) = d_{j})}{\sum_{l = 1}^{n_{v}} \prod_{j = 1}^{2} ω ((s_{j} - {\hat{S}}_{j} (X_{v l})) / h_{j v}) I ({\hat{δ}}_{j} (X_{v l}) = d_{j})},

(13)

where n_v is the number of patients originally assigned to treatment arm v, v = 0, .., 3, w is a kernel function with ω ≥ 0 and ∫ ω(t)dt = 1, and h_jv’s, j = 1, 2 are corresponding smoothing parameters. We proceeded the estimation using Gaussian kernels and the corresponding smoothing parameters were determined by the method suggested in Wand and Jones [23] for kernel smoothing. Next, we estimated the overall gains ${\hat{Δ}}_{j}$ , j = 1, 2 values by averaging rescaled ${\hat{Δ}}_{j t}$ values,

{\hat{Δ}}_{j} = \frac{1}{N {\tilde{S}}_{{\hat{Δ}}_{j}}} \sum_{t = 1}^{N} {\hat{Δ}}_{j t},

(14)

where N = 1336 is the number of test patients and ${\tilde{S}}_{{\hat{Δ}}_{j}}$ is the standard deviation of ${\hat{Δ}}_{j t} s$ . It is reasonable to argue that positive values for ${\hat{Δ}}_{1} ({\hat{Δ}}^{C D 4})$ and ${\hat{Δ}}_{2} ({\hat{Δ}}^{C D 8})$ are indicative of the overall effective treatment selection compared to the original random assignment. Finally by averaging over the 1000 random test sets, we estimated average values of ${\hat{Δ}}^{C D 4}$ and ${\hat{Δ}}^{C D 8}$ as ${\hat{Δ}}_{c}^{C D 4}$ and ${\hat{Δ}}_{c}^{C D 8}$ , respectively.

Table 7 provides these ${\hat{Δ}}_{c}^{C D 4}$ and ${\hat{Δ}}_{c}^{C D 8}$ values. These observed positive values for ${\hat{Δ}}_{c}^{C D 4}$ and ${\hat{Δ}}_{c}^{C D 8}$ seemingly suggesting that if one were able to use prior data to estimate the personal score for a new patient, then the proposed assignment is more beneficial for him/her than an assignment based on a clinical trial which may only find the best treatment for the general population.

Table 7:

${\hat{Δ}}_{c}^{C D 4}$ and ${\hat{Δ}}_{c}^{C D 8}$ by the proposed method compared to the original assignment for test patients, using weights τ_CD4 and τ_CD8 for CD4 and CD8 counts, respectively. The 10-fold cross-validation technique was used for estimating ${\hat{Δ}}_{c}^{C D 4}$ and ${\hat{Δ}}_{c}^{C D 8}$ quantities.

Weights		$\hat{Δ}$
τ _CD4	τ _CD8	${\hat{Δ}}_{c}^{C D 4}$	${\hat{Δ}}_{c}^{C D 8}$
1	0	0.191	0.060
0.8	0.2	0.177	0.074
0.6	0.4	0.165	0.080
0.5	0.5	0.159	0.083
0.6	0.4	0.151	0.085
0.8	0.2	0.130	0.091
1	0	0.068	0.101

Open in a new tab

5. Discussion

In this article, we proposed a novel personalized treatment plan to select the optimal treatment from a set of multiple treatments when the outcome measures are multivariate. This is an extension of the method by Siriwardhana et al.[14], which uses the rank aggregation. Our method assumes that there are historical RCT data that contains patient characteristics and response variables with respect to each treatment, and we use such information to select the best treatment option for a new patient. The proposed method incorporates potential dependencies among responses as opposed to other existing methods in the literature. Our empirical studies show that the new method performs very satisfactorily in selecting the optimal treatment in a multiple treatment setting. Our analysis of a real clinical trials dataset which has multiple treatment options reveals possible changes if one were to use multiple outcome measures as opposed to a single measure.

As the working model, we use the semiparametric Single Index Models to relate responses and subject covariates. This model offers great flexibility with modeling smooth complex relationships, allowing us to handle many real-life situations. The proposed method can also be applied using quantile regression single index models to handle problems with many different error types, deviating from the Gaussian error structure.

In our approach, we defined W_kis similar to the geometric means of w_jkis. In a few empirical studies performed under arithmetic means, we observed comparable performances similar to current selection for W_kis. In our empirical study and real data analysis, we used τ = 1 and Spearman’s distance function. However, the choice of τs and the choice of the distance function η(.) can be influential in the optimal treatment selection. Although the specification of response weight is an important step, we consider this as more of a subjective/qualitative issue than a quantitative issue, where one can guide the choice of optimal weights based on patient qualities and preferences and advice of physicians. When dealing with responses that are related to factors that govern the cure from the disease or degradation of the patient due to various aspects of the disease, then a team of clinicians may be better suited to determine what weights should be applied. On the other hand, for outcome measures that correspond to the quality of life, behavior, financial impact, etc., a physician with consultation from relevant support/advisory groups may be more appropriate. For situations where sufficient data is available from various studies, weight selection using more of a data-based method might be appropriate. However, we have not investigated those aspects in the current work.

There can be several extensions of this work. The proposed method requires a collection of complete data records from RCT studies, but when dealing with life-threatening or terminal conditions, priority is typically given to survival type measures that could be subject to censoring (e.g., overall survival, disease-free survival, etc.). We have not explored that avenue in this work. Also, although we focused only on continuous outcomes, binary and ordinary outcomes are also common (e.g., relapse, response level, cure, etc.) and therefore selection methods for such responses would be valuable. In addition, another exploration is possible modifications needed in order to use the proposed method with data from other sources such as observational studies.

Supplementary Material

Supp 1

NIHMS1840299-supplement-Supp_1.pdf^{(111KB, pdf)}

Acknowledgment

The research work by Chathura Siriwardhana was partially supported by U54MD007601 grant from the National Institutes of Health.

Footnotes

Conflict of Interest

The authors have declared no conflict of interest.

Data Availability Statement

Data sharing is not applicable to this article as no new data were created or analyzed in this study.

References

1.Murphy SA (2003), Optimal dynamic treatment regimes. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 65: 331–355. doi: 10.1111/1467-9868.00389 [DOI] [Google Scholar]
2.van’t Veer LJ & Bernards R (2008). Enabling Personalized Cancer Medicine Through Analysis of Gene-Expression Patterns. Nature, 452: 564–570. [DOI] [PubMed] [Google Scholar]
3.Zhao Y, Donglin Z, Rush AJ, and Kosorok MK (2012), “Estimating Individualized Treatment Rules Using Outcome Weighted Learning”, Journal of the American Statistical Association, pp. 1106–1118. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Vazquez A (2013). Optimization of Personalized Therapies for Anticancer Treatment. BMC Syst. Biol, 7: doi: 10.1186/1752-0509-7-31. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Kosorok MR and Moodie EE (2015). Adaptive Treatment Strategies in Practice: Planning Trials and Analyzing Data for Personalized Medicine. Society for Industrial and Applied Mathematics, doi: 10.1137/1.9781611974188. [DOI] [Google Scholar]
6.Unnikrishnan AG, Bhattacharyya A, Baruah MP, Sinha B, Dharmalingam M, Rao PV. (2013) Importance of achieving the composite endpoints in diabetes. Indian J Endocrinol Metab, 17(5):835–843. doi: 10.4103/2230-8210.117225 [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Johnson P, Greiner W, Al-Dakkak, et al. (2015) Which Metrics Are Appropriate to Describe the Value of New Cancer Therapies? BioMed Research International, 10.1155/2015/865101 [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Fleming TR, Powers JH. Biomarkers and surrogate endpoints in clinical trials. (2012) Stat Med, 31(25):2973–2984. doi: 10.1002/sim.5403 [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Lizotte DJ, Bowling M, Murphy S (2012). “Linear fitted-Q iteration with multiple reward functions”. J Mach Learn Res, 13: 3253–3295. [PMC free article] [PubMed] [Google Scholar]
10.Laber EB, Lizotte DJ, and Ferguson B (2014). “Set-valued dynamic treatment regimes for competing outcomes”. Biometrics, 70: 53–61. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Ertefaie A, Wu T, Lynch KG, Nahum-Shani I (2016). “Identifying a set that contains the best dynamic treatment regimes”. Biostatistics, 17: 135148. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Lizotte DJ, and Laber EB (2016). “Multi-Objective Markov Decision Processes for Data-Driven Decision Support”. J Mach Learn Res, 17: 1–28. [PMC free article] [PubMed] [Google Scholar]
13.Butler EL, Laber EB, Davis SM, and Kosorok MR (2017). “Incorporating Patient Preferences into Estimation of Optimal Individualized Treatment Rules”. Biometrics. doi: 10.1111/biom.12743. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Siriwardhana C, Datta S, Kulasekera KB (2020) Selection of the optimal personalized treatment from multiple treatments with multivariate outcome measures, Journal of Biopharmaceutical Statistics, 30 :(3), 462–480. DOI: 10.1080/10543406.2019.1684304 [DOI] [PubMed] [Google Scholar]
15.Li SC, Lindenberger U, Sikström S. Aging cognition: from neuromodulation to representation. Trends Cogn Sci. 2001. Nov 1;5(11):479–486. doi: 10.1016/s1364-6613(00)01769-1. [DOI] [PubMed] [Google Scholar]
16.Cheng Y, Wu W, Feng W et al. The effects of multi-domain versus single-domain cognitive training in non-demented older people: a randomized controlled trial. BMC Med 10, 30 (2012). 10.1186/1741-7015-10-30 [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Siriwardhana C, Zhao M, Datta S, & Kulasekera K (2019). A probability based method for selecting the optimal personalized treatment from multiple treatments. Statistical Methods in Medical Research, 28(3), 749–760. [DOI] [PubMed] [Google Scholar]
18.Pihur V, Datta S, & Datta S (2007). Weighted rank aggregation of cluster validation measures: a Monte Carlo cross-entropy approach. Bioinformatics, 23: 1607–1615. [DOI] [PubMed] [Google Scholar]
19.Pihur V, Datta S, & Datta S (2009). RankAggreg, an R package for weighted rank aggregation. BMC Bioinformatics, 10: 62. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Ichimura H, Hall P, Hardle W (1993), “Optimal smoothing in single index models”, The Annals of Statistics, Vol. 21, No 1, pp. 157–178. [Google Scholar]
21.Hristache M, Juditsky A, Polzehl J, and Spokoiny V (2001), “Structure Adaptive Approach for Dimension Reduction”, The Annals of Statistics. pp. 1537–1566. [Google Scholar]
22.Yu Y and David Ruppert D (2002), “Penalized Spline Estimation for Partially Linear Single-Index Models”, Journal of the American Statistical Association, pp. 1042–1054. [Google Scholar]
23.Wand MP & Jones MC (1995) Kernel Smoothing. Chapman & Hall London. [Google Scholar]
24.Genz A, Bretz F, Miwa T, Mi. X, Leisch F, et al. (2017). mvtnorm: Multivariate Normal & t Distributions, R package version 1.0–3. September 29 2017. URL: https://cran.r-project.org/web/packages/mvtnorm/.
25.Statisticat LLC. (2017). LaplacesDemon: Complete Environment for Bayesian Inference. Bayesian-Inference.com; R package version 16.0.1. September 29 2017. URL: https://cran.r-project.org/web/packages/LaplacesDemon/.
26.Hammer SM, Katzenstein DA, Hughes MD, Gundacker H, Schooley RT et al. (1996). A Trial Comparing Nucleoside Monotherapy with Combination Therapy in HIV-Infected Adults with CD4 Cell Counts from 200 to 500 per Cubic Millimeter. N. Engl. J. Med, 335: 1081–1090. [DOI] [PubMed] [Google Scholar]
27.Juraska M, Gilbert PB, Lu X, Zhang M, Davidianet D, et al. (2017). spe 2trial: Semi parametric efficient estimation for a two-sample treatment effect; R package version 1.0.4 2012. September 29 2017. URL: http://cran.r-project.org/package=spe2trial.
28.Streeck H & Nixon DF (2010). T Cell Immunity in Acute HIV-1 Infection. J Infect Dis, 202: 302–308. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supp 1

NIHMS1840299-supplement-Supp_1.pdf^{(111KB, pdf)}

Data Availability Statement

Data sharing is not applicable to this article as no new data were created or analyzed in this study.

[R1] 1.Murphy SA (2003), Optimal dynamic treatment regimes. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 65: 331–355. doi: 10.1111/1467-9868.00389 [DOI] [Google Scholar]

[R2] 2.van’t Veer LJ & Bernards R (2008). Enabling Personalized Cancer Medicine Through Analysis of Gene-Expression Patterns. Nature, 452: 564–570. [DOI] [PubMed] [Google Scholar]

[R3] 3.Zhao Y, Donglin Z, Rush AJ, and Kosorok MK (2012), “Estimating Individualized Treatment Rules Using Outcome Weighted Learning”, Journal of the American Statistical Association, pp. 1106–1118. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R4] 4.Vazquez A (2013). Optimization of Personalized Therapies for Anticancer Treatment. BMC Syst. Biol, 7: doi: 10.1186/1752-0509-7-31. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] 5.Kosorok MR and Moodie EE (2015). Adaptive Treatment Strategies in Practice: Planning Trials and Analyzing Data for Personalized Medicine. Society for Industrial and Applied Mathematics, doi: 10.1137/1.9781611974188. [DOI] [Google Scholar]

[R6] 6.Unnikrishnan AG, Bhattacharyya A, Baruah MP, Sinha B, Dharmalingam M, Rao PV. (2013) Importance of achieving the composite endpoints in diabetes. Indian J Endocrinol Metab, 17(5):835–843. doi: 10.4103/2230-8210.117225 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R7] 7.Johnson P, Greiner W, Al-Dakkak, et al. (2015) Which Metrics Are Appropriate to Describe the Value of New Cancer Therapies? BioMed Research International, 10.1155/2015/865101 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R8] 8.Fleming TR, Powers JH. Biomarkers and surrogate endpoints in clinical trials. (2012) Stat Med, 31(25):2973–2984. doi: 10.1002/sim.5403 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] 9.Lizotte DJ, Bowling M, Murphy S (2012). “Linear fitted-Q iteration with multiple reward functions”. J Mach Learn Res, 13: 3253–3295. [PMC free article] [PubMed] [Google Scholar]

[R10] 10.Laber EB, Lizotte DJ, and Ferguson B (2014). “Set-valued dynamic treatment regimes for competing outcomes”. Biometrics, 70: 53–61. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] 11.Ertefaie A, Wu T, Lynch KG, Nahum-Shani I (2016). “Identifying a set that contains the best dynamic treatment regimes”. Biostatistics, 17: 135148. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R12] 12.Lizotte DJ, and Laber EB (2016). “Multi-Objective Markov Decision Processes for Data-Driven Decision Support”. J Mach Learn Res, 17: 1–28. [PMC free article] [PubMed] [Google Scholar]

[R13] 13.Butler EL, Laber EB, Davis SM, and Kosorok MR (2017). “Incorporating Patient Preferences into Estimation of Optimal Individualized Treatment Rules”. Biometrics. doi: 10.1111/biom.12743. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R14] 14.Siriwardhana C, Datta S, Kulasekera KB (2020) Selection of the optimal personalized treatment from multiple treatments with multivariate outcome measures, Journal of Biopharmaceutical Statistics, 30 :(3), 462–480. DOI: 10.1080/10543406.2019.1684304 [DOI] [PubMed] [Google Scholar]

[R15] 15.Li SC, Lindenberger U, Sikström S. Aging cognition: from neuromodulation to representation. Trends Cogn Sci. 2001. Nov 1;5(11):479–486. doi: 10.1016/s1364-6613(00)01769-1. [DOI] [PubMed] [Google Scholar]

[R16] 16.Cheng Y, Wu W, Feng W et al. The effects of multi-domain versus single-domain cognitive training in non-demented older people: a randomized controlled trial. BMC Med 10, 30 (2012). 10.1186/1741-7015-10-30 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R17] 17.Siriwardhana C, Zhao M, Datta S, & Kulasekera K (2019). A probability based method for selecting the optimal personalized treatment from multiple treatments. Statistical Methods in Medical Research, 28(3), 749–760. [DOI] [PubMed] [Google Scholar]

[R18] 18.Pihur V, Datta S, & Datta S (2007). Weighted rank aggregation of cluster validation measures: a Monte Carlo cross-entropy approach. Bioinformatics, 23: 1607–1615. [DOI] [PubMed] [Google Scholar]

[R19] 19.Pihur V, Datta S, & Datta S (2009). RankAggreg, an R package for weighted rank aggregation. BMC Bioinformatics, 10: 62. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R20] 20.Ichimura H, Hall P, Hardle W (1993), “Optimal smoothing in single index models”, The Annals of Statistics, Vol. 21, No 1, pp. 157–178. [Google Scholar]

[R21] 21.Hristache M, Juditsky A, Polzehl J, and Spokoiny V (2001), “Structure Adaptive Approach for Dimension Reduction”, The Annals of Statistics. pp. 1537–1566. [Google Scholar]

[R22] 22.Yu Y and David Ruppert D (2002), “Penalized Spline Estimation for Partially Linear Single-Index Models”, Journal of the American Statistical Association, pp. 1042–1054. [Google Scholar]

[R23] 23.Wand MP & Jones MC (1995) Kernel Smoothing. Chapman & Hall London. [Google Scholar]

[R24] 24.Genz A, Bretz F, Miwa T, Mi. X, Leisch F, et al. (2017). mvtnorm: Multivariate Normal & t Distributions, R package version 1.0–3. September 29 2017. URL: https://cran.r-project.org/web/packages/mvtnorm/.

[R25] 25.Statisticat LLC. (2017). LaplacesDemon: Complete Environment for Bayesian Inference. Bayesian-Inference.com; R package version 16.0.1. September 29 2017. URL: https://cran.r-project.org/web/packages/LaplacesDemon/.

[R26] 26.Hammer SM, Katzenstein DA, Hughes MD, Gundacker H, Schooley RT et al. (1996). A Trial Comparing Nucleoside Monotherapy with Combination Therapy in HIV-Infected Adults with CD4 Cell Counts from 200 to 500 per Cubic Millimeter. N. Engl. J. Med, 335: 1081–1090. [DOI] [PubMed] [Google Scholar]

[R27] 27.Juraska M, Gilbert PB, Lu X, Zhang M, Davidianet D, et al. (2017). spe 2trial: Semi parametric efficient estimation for a two-sample treatment effect; R package version 1.0.4 2012. September 29 2017. URL: http://cran.r-project.org/package=spe2trial.

[R28] 28.Streeck H & Nixon DF (2010). T Cell Immunity in Acute HIV-1 Infection. J Infect Dis, 202: 302–308. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Optimal Personalized Treatment Selection with Multivariate Outcome Measures in a Multiple Treatment Case

Chathura Siriwardhana

KB Kulasekera

Abstract

1. Introduction

2. Treatment Selection

3. Empirical Studies

Table 1:

Table 2:

Table 3:

Table 4:

Table 5:

4. ACTG-175 HIV Clinical Trial

Table 6:

Table 7:

5. Discussion

Supplementary Material

Acknowledgment

Footnotes

Data Availability Statement

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Optimal Personalized Treatment Selection with Multivariate Outcome Measures in a Multiple Treatment Case

Chathura Siriwardhana

KB Kulasekera

Abstract

1. Introduction

2. Treatment Selection

3. Empirical Studies

Table 1:

Table 2:

Table 3:

Table 4:

Table 5:

4. ACTG-175 HIV Clinical Trial

Table 6:

Table 7:

5. Discussion

Supplementary Material

Acknowledgment

Footnotes

Data Availability Statement

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases