ABSTRACT
Overall survival has been used as the primary endpoint for many randomized trials that aim to examine whether a new treatment is non‐inferior to the standard treatment or placebo control. When a new treatment is indeed non‐inferior in terms of survival, it may be important to assess other outcomes including health utility. However, analyzing health utility scores in a secondary analysis may have limited power since the primary objectives of the original study design may not include health utility. To comprehensively consider both survival and health utility, we developed a composite endpoint, HUS (Health Utility‐adjusted Survival), which combines both survival and utility. HUS has been shown to be able to increase statistical power and potentially reduce the required sample size compared to the standard overall survival endpoint. Nevertheless, the asymptotic properties of the test statistics of the HUS endpoint have yet to be fully established. Besides that, the standard version of HUS cannot be applied to or has limited performance in certain scenarios, where extensions are needed. In this manuscript, we propose various methodological extensions of HUS and derive the asymptotic distributions of the test statistics. By comprehensive simulation studies and a data application using retrospective data based on a translational patient cohort in Princess Margaret Cancer Centre, we demonstrate the better efficiency and feasibility of HUS compared to different methods.
Keywords: hazard ratio, health utility, overall survival, proportional hazards, randomized controlled trials, time‐to‐event data
1. Introduction
In clinical studies, including superiority and non‐inferiority trials, overall survival (OS) is commonly used as the primary endpoint to compare a new treatment with a standard or controlled treatment. In some scenarios, non‐inferiority trials are preferred due to some specific potential benefits of the new treatment (e.g., lower costs, fewer side effects, better quality of life (QoL), or less difficulty to implement), and it would suffice to show that the new treatment is not worse than control with respect to OS. After establishing non‐inferiority, the next step is usually to demonstrate that the new treatment can benefit patients in some other clinical endpoints beyond OS, and one of such endpoints that clinicians have interest is health utility [1]. Health utility is a value ascribed to individuals' preference for a specific health state. As a measurement usually ranging from 0 to 1, health utility can quantify the health state of a patient at a certain time point, and a higher value usually corresponds to a healthier state. Usually, a health utility of 0 is akin to death and, in some instances, negative utilities can indicate a state worse than death. Health utilities are typically elicited either indirectly from patients by means of patient‐reported outcomes with instruments such as the EQ‐5D or they can be directly measured with utility elicitation techniques such as time‐trade‐off or standard gamble, whereby an individual is presented with a hypothetical health state and asked to select between the hypothetical health state and a lesser time or lower probability of a healthier state, respectively. Various methods for performing statistical analysis using health utility scores at different time points have been proposed in the literature [2, 3, 4]. However, the health utility analysis may be underpowered since the primary objective of the study design is usually based on OS, without taking health utility into consideration, especially if the compared treatments differ substantially in survival but only moderately in utility. In some trials, utility is only evaluated if OS demonstrates a significant improvement. Meanwhile, it may not be desirable to perform statistical testing on OS and health utility separately since it will result in multiplicity and require multiple testing adjustment, which will lead to potential loss of statistical power. Hence, using a composite endpoint that combines survival and health utility to perform a single test may be preferred, since it may increase the statistical power and reduce the required sample size. Furthermore, if utility is analyzed separately, the absence of utility scores following a patient's death constitutes informative missingness. One might assume that utility is zero at and after death; however, this assumption may not always hold and can lead to biased results. Therefore, incorporating both survival and utility into a composite endpoint may naturally offer a more appropriate approach for handling such a missing mechanism.
While methods like Q‐TWiST (Quality‐adjusted Time Without Symptoms of disease or Toxicity) that can combine survival and utility have been proposed and used to analyze clinical trial data, they have their own drawbacks [5, 6, 7, 8, 9, 10, 11, 12]. For instance, though researchers have derived some statistical properties as well as sample size formulas for Q‐TWiST [8], the implementation of this approach is limited to scenarios where each patient's status can be divided into three states (toxicity, time without symptoms and toxicity, and relapse), and the weights for different states are artificially preselected. In many other scenarios, especially with utility scores measured on a continuous scale, clinicians may be more interested in analyzing them in the original scale rather than forcing them into three categories, since some information is lost in the categorization process, which will make the statistical test have lower power.
A more general approach, usually referred to as QALY (quality‐adjusted life years), offers an intuitive way to combine survival and health utility [5, 13, 14, 15, 16, 17], and a similar concept called quality‐adjusted progression‐free survival has also been used in some randomized trials [18, 19, 20]. Nevertheless, the formal statistical frameworks of these methods have not been established, and the potential advantages and feasibility of these methods compared to the traditional survival endpoint have not been fully evaluated through comprehensive simulation studies.
A composite endpoint called HUS (Health Utility‐adjusted Survival) was proposed in recent literature [21], which can combine longitudinal health utility and survival performance to assess treatment effects. The authors also provided a detailed statistical testing framework and procedures for power analysis and sample size calculations. Although they demonstrated through simulations that HUS may increase the statistical power over classic tests based on OS only and thus reduce the required sample size, the asymptotic properties of HUS have yet to be thoroughly explored. Establishing detailed theoretical properties may make HUS a more solid approach that can benefit future clinical trials.
Meanwhile, on some special occasions, it may be beneficial to modify the test statistics in order to better capture treatment effects. For example, the utility scores recorded at later time‐points may be more important than those recorded at earlier time‐points, since they are better indicators of how well a patient has recovered, or ultimately how much the treatment has helped the patients improve their QoL. The early utility scores may indicate a patient's discomfort level during or right after going through a treatment such as surgery, but they may be less important if the clinicians care more about the patient's QoL after the recovering process. In some other scenarios, with multiple measurements of utility recorded at each time‐point, it may be useful to consider giving them different weights before combining them into a single utility score, and the weights may be predetermined based on the clinicians' knowledge.
With the above considerations, we propose some important extensions of HUS, including a time‐weighted version that allows assigning different weights to different time points, denoted by twHUS, and a natural way to combine different utility measurements using weights. We also provide a simple process to apply HUS to clinical observational data with covariates, which may greatly extend the range of studies that HUS can be implemented on. Most importantly, we derive the asymptotic theories for HUS and the proposed extensions.
This manuscript is structured as follows. In Section 2, we present the methodology of the HUS endpoint as well as its extensions, and then we establish the theoretical properties of HUS. In Section 3, we use comprehensive simulation studies, including different scenarios with a single utility score or multiple QoL scores, without covariates or with covariates, to demonstrate the effectiveness of the HUS endpoint. At last, we provide a thorough discussion regarding the strengths and drawbacks of HUS as well as potential future directions in Section 4.
2. Methods
2.1. Health Utility Survival (HUS)
In this section, we give a brief review of HUS. Suppose the total length of the study is , and and are the survival function (proportion of patients alive at ) and average utility score of those alive at for treatment group . Intuitively, we can define a composite endpoint using and . To allow survival and utility to be weighted differently, we propose a highly general class of tests analogous to the two‐sample test proposed in Reference [22], which is defined as
| (1) |
| (2) |
where and are preselected weights to reflect the importance of survival and utility. Larger weights correspond to higher importance, and the standard HUS uses . Since true and are unknown, we need to estimate them based on observed data. The simplest way is to obtain Kaplan–Meier (KM) estimates for the two groups separately [23]. Then we can substitute and with and .
We may also use the Cox proportional hazards model [24], treating the treatment assignment as a covariate, to obtain estimates of the survival functions. This requires the proportional hazards assumption for the two groups, which may have benefits when this assumption is not violated.
The test statistic to examine whether the two treatment groups differ in HUS is defined as
| (3) |
We can use the bootstrap or the permutation algorithm to get the empirical confidence intervals or p‐values. With our derived asymptotic properties of HUS, we may also use an alternative approach to simulate the distribution of and obtain its p‐value. More details are provided in Appendix A in Data S1.
2.2. Time‐Weighted HUS (twHUS)
In some clinical studies, the importance of health utility at different time points may vary. For example, the utility scores recorded in the later stage of the study may be more important than those recorded in the earlier stage (e.g., around surgery time), as the later scores may show how well a patient has recovered. Besides, some treatments may show less effect in the beginning, but benefit the patients' QoL significantly more after a period of time. As a result, it may make more sense to give different weights to utility scores at different time points. We propose a twHUS, with
| (4) |
| (5) |
where is a function of weight across time. For example, we may let linearly increase from 0 at baseline to 1 at the end of surgery, and then it may stay at 1 until the end of study. We use this setting by default unless otherwise specified.
2.3. Special Case With Multiple QoL Scores
Note that our previous framework only focuses on one utility score, while sometimes we may have different QoL scores measuring different aspects of the wealth and comfort of patients. Suppose we have different QoL scores, and is the QoL score for subject at time in treatment group . To combine different scores, we may define a new score , with
| (6) |
where are the weights for different scores, and . In some cases, if we are interested in the worst score at each time point, we may also consider . Once the combined score is defined and calculated, we can apply our previously described procedures.
2.4. HUS For Observational Data
The original framework of HUS was intended for data from randomized trials, where covariates are balanced in the two treatment groups. However, we may also be interested in analyzing observational studies that the study subjects were not randomized by treatments. In this case, due to the potential confounders that are likely unbalanced, directly applying HUS to the full data may be problematic. A simple but effective approach to alleviate the confounding issue is to conduct propensity matching first, and then analyze the matched pairs [25]. While there are more sophisticated techniques regarding propensity scores in survival analyses [26, 27, 28], to maintain simplicity and focus, we will only demonstrate propensity score matching in our simulations.
2.5. Theoretical Properties
2.5.1. Asymptotic Distribution of HUS
In this section, we investigate the asymptotic properties of the HUS test statistic. The observed test statistic is , with
| (7) |
| (8) |
For the simplest case without weights, we have , and
| (9) |
| (10) |
Note that in the above formulas, we have already replaced the unknown survival functions with KM estimates and . Denote the survival time, censoring time, and observed time of the subject in group by , , and respectively, and their utility score at time by , of which the expectation is . Define , , , , .
The following assumptions are needed to establish the asymptotic distribution of under the null hypothesis.
Assumption A.1
Assumption A.2
are of bounded variation on .
Assumption A.1 is very common in survival analysis [29, 30]. Assumption A.2 is to make sure the weak convergence of , , which is very common in counting process theory.
Lemma 1
Under Assumptions A.1 and A.2 , we have converges to a mean zero Gaussian process . Here, is a mean zero Gaussian process with variance being , is a mean zero Gaussian process with variance being , and is the hazard function for .
Theorem 1
Under Assumptions A.1 and A.2 and the null hypothesis, if , we have
(11) Here, means converges in distribution. More details and the proofs of Lemma 1 and Theorem 1 are provided in Appendix A of Data S1.
We can derive asymptotic properties using similar techniques for HUS with weights, where are preselected values.
Lemma 2
Under Assumptions A.1 and A.2 , we have converges to a mean zero Gaussian process:
Theorem 2
Under Assumptions A.1 and A.2 and the null hypothesis, if , we have
(12)
The proofs for Lemma 2 and Theorem 2 are similar to the proofs for Lemma 1 and Theorem 1. They are available in Appendix A of Data S1. Under the null hypothesis, the distribution of can be approximated well via a perturbation‐resampling method [31, 32, 33]. Let and be independent random samples from . For HUS with weights , following from the proof of Lemma 2, we can approximate the distribution of by
| (13) |
According to our experience, this approximation approach has similar performance compared to the bootstrap method. As the bootstrap method is more straightforward and more commonly used, we recommend using it by default. The results in this manuscript are based on bootstrap, unless otherwise specified. Some comparison of the two methods can be found in Appendix B of Data S1.
2.5.2. Asymptotic Distribution of Time‐Weighted HUS
For twHUS, the test statistics are calculated using
| (14) |
| (15) |
To compare two treatment arms, we look at . Here is defined as the test statistic, and we want to give the asymptotic distribution under the null hypothesis.
Lemma 3
Under Assumptions A.1 and A.2 , we have converges to a mean zero Gaussian process, .
Theorem 3
Under Assumptions A.1 and A.2 and the null hypothesis, if , we have
(16)
The proofs for Lemma 3 and Theorem 3 are provided in Appendix A of Data S1. For twHUS, by perturbation‐resampling [31, 32, 33], it is also possible to approximate the distribution of by
| (17) |
2.6. Handling Missing Utility Scores
In most of the clinical studies, it is difficult to collect utility scores at every time point for all subjects. As previous literature has shown [21], in scenarios with missing utility scores, a simple but efficient way is to use linear functions to fill in the utility scores for each subject. However, using this approach may result in very unstable estimates when the number of recorded scores for a subject is too small. In this manuscript, we apply an alternative method, where we use the average score of the treatment group at a time point plus a small variation to impute the missing scores at that time point. The small variation follows a normal distribution with its mean equal to zero and variance equal to the sample variance of the non‐missing utility scores at that time point. We only implement this procedure on time points where at least 80% of the subjects have utility scores recorded. Next, we fill in the rest of the missing scores using linear functions for each subject separately. Following this procedure, a single imputation is conducted. Our experience shows that the new imputation method yields higher statistical power compared to the previous approach. Some examples are provided in Appendix B of Data S1.
3. Results
3.1. Simulations With Single Utility Score
We conduct simulations under various scenarios focusing on comparing the performances of different versions of HUS, as the advantages of the standard version of HUS has already been thoroughly studied [21]. By default, we simulate a randomized clinical trial data with two treatment arms mimicking the PET‐NECK trial, a randomized phase III non‐inferiority trial that compares positron emission tomography‐computerized tomography‐guided watch‐and‐wait policy (PET‐CT) with planned neck dissection (planned ND) for head and neck cancer patients [1]. We assume the length of study to be 36 months () with each patient receiving surgery at 3 months (). Let , , and be the true survival time, observed survival time, and survival status for patient from treatment group , respectively, and be the sample sizes for group 1 and group 2. Similar to what was done in prior work [21], we simulate the survival data using
where is chosen to control the censoring rate, denoted by . Under this setting, it is easy to see that the hazard ratio of treatment 1 against treatment 2 is . Note that our focus is to compare the performances of different HUS methods with different patterns of the utility difference given that the two treatments do not differ in OS, we let unless otherwise specified.
When simulating the health utility score, we first define the base utility at time for group using functions . For example, can be defined as
which represents the average utility for group starts from at baseline, linearly changes to at 3 months, and then linearly changes to at the end of the study. This setting mimics typical clinical trials where a patient's health utility reaches the lowest at the end of surgery and gradually recovers afterwards. We use to generate , the health utility score of patient from group at time‐point . Since it rarely happens in practice that utility scores are fully collected at all time‐points, we assume that the scores are only collected at , , and unless otherwise specified. Also, we assume there is a chance that the score is missing for a subject when or . We choose by default. Figure 1 shows the base utility functions for different scenarios. Note that in Scenarios B1 and B2, the utility changes are piece‐wise smooth but not piece‐wise linear. In Scenario B2, treatment 1 reaches the lowest point earlier than treatment 2. We conducted 200 iterations to assess statistical power in scenarios where the null hypothesis is false and increase the number of iterations to 500 when estimating type I error rates in scenarios where the null hypothesis is true.
FIGURE 1.

Utility plots for different scenarios with a single utility score.
As shown in Table 1, after applying different methods to our simulated data, we find that all methods can control type I errors in Scenario A0. Since the difference between the two treatment groups lies in utility but not in survival, assigning more weight to utility results in higher power in Scenarios A1 and B1–B2. Meanwhile, twHUS has similar performance to the standard HUS except in Scenario B2, which makes sense because, in the other scenarios, the difference between the treatments is relatively consistent throughout the study, whereas in Scenario B2, treatment 1 performs worse in the earlier stage but becomes much better in the later stage. In such a scenario, using the time weights with more weights given to the later time points, it is easier to detect the advantage of treatment 1 over that of treatment 2, and thus twHUS can obtain higher power compared to the standard HUS. For both HUS and twHUS, since the survival data was generated under the proportional hazards assumption, the Cox model is able to get slightly better results, and thus yields slightly higher power compared to the KM method. These simulation results demonstrate the flexibility of HUS and that it is important to choose appropriate weights given their potential impact on the testing results. Note that in our main simulations, the missingness of the utility score is independent of its value. In Appendix B of Data S1, we demonstrate that HUS can work relatively well when there is moderate informative missingness, where lower utility scores are more likely to be missing.
TABLE 1.
Simulation results for Scenarios A0–A1 and B1–B2.
| KM | Cox | |||||||
|---|---|---|---|---|---|---|---|---|
| n 1, n 2 | λ 2 = 1 | λ 2 = 0.5 | λ 2 = 2 | λ 2 = 1 (twHUS) | λ 2 = 1 | λ 2 = 0.5 | λ 2 = 2 | λ 2 = 1 (twHUS) |
| Scenario A0 | ||||||||
| 50 | 0.048 | 0.047 | 0.053 | 0.048 | 0.047 | 0.046 | 0.052 | 0.046 |
| 100 | 0.046 | 0.045 | 0.051 | 0.046 | 0.051 | 0.048 | 0.053 | 0.052 |
| 150 | 0.054 | 0.054 | 0.052 | 0.050 | 0.052 | 0.051 | 0.054 | 0.052 |
| Scenario A1 | ||||||||
| 50 | 0.81 | 0.41 | 0.99 | 0.80 | 0.85 | 0.46 | 1 | 0.85 |
| 100 | 0.95 | 0.62 | 1 | 0.95 | 0.98 | 0.70 | 1 | 0.98 |
| 150 | 0.99 | 0.78 | 1 | 0.99 | 1 | 0.86 | 1 | 1 |
| Scenario B1 | ||||||||
| 50 | 0.74 | 0.32 | 0.97 | 0.72 | 0.80 | 0.39 | 0.98 | 0.78 |
| 100 | 0.90 | 0.57 | 1 | 0.9 | 0.92 | 0.60 | 1 | 0.92 |
| 150 | 0.99 | 0.67 | 1 | 0.98 | 0.99 | 0.76 | 1 | 0.98 |
| Scenario B2 | ||||||||
| 50 | 0.24 | 0.11 | 0.71 | 0.58 | 0.30 | 0.12 | 0.75 | 0.62 |
| 100 | 0.44 | 0.17 | 0.90 | 0.82 | 0.49 | 0.18 | 0.92 | 0.85 |
| 150 | 0.54 | 0.21 | 0.99 | 0.92 | 0.60 | 0.22 | 0.98 | 0.95 |
Note: Type I errors for Scenario A0, and power for Scenarios A1 and B1–B2.
3.2. Simulations With Multiple QoL Scores
To examine the performance of our method for combing multiple measures with weights, we simulate multiple QoL scores using similar procedures as we used in Section 3.1. Figure 2 shows the base QoL functions for different scores in four different scenarios. As shown in Table 2, in Scenario M0, where there is no difference between the two treatments in either measurement, HUS can control the type I error regardless of the choice of weights. In Scenario M1, where the difference is the same in both scores, the choice of weights does not affect the power. In Scenario M2, treatment 1 performs better than treatment 2 in terms of score 1, but it is worse in score 2. As a result, assigning more weight to score 1 (i.e., choosing a larger ) leads to higher power. In Scenario M3, with four QoL scores having the same difference, different choices of weights yield very similar results. This shows that choosing different weights may have a significant impact on the testing result when the differences in scores are not the same. We recommend choosing the same weight for all measurements if there is no prior knowledge of which measurements are more important.
FIGURE 2.

Quality of life plots for different scenarios with multiple measures of quality of life.
TABLE 2.
Simulation results for Scenarios M0–M3.
| n 1, n 2 | , | , | , |
|---|---|---|---|
| Scenario M0 | |||
| 50 | 0.050 | 0.053 | 0.052 |
| 100 | 0.050 | 0.051 | 0.049 |
| 150 | 0.052 | 0.045 | 0.051 |
| Scenario M1 | |||
| 50 | 0.35 | 0.33 | 0.32 |
| 100 | 0.59 | 0.54 | 0.54 |
| 150 | 0.69 | 0.67 | 0.69 |
| Scenario M2 | |||
| 50 | 0.06 | 0.25 | 0.02 |
| 100 | 0.12 | 0.47 | 0 |
| 150 | 0.14 | 0.60 | 0.02 |
| n 1, n 2 | , | , |
|---|---|---|
| Scenario M3 | ||
| 50 | 0.16 | 0.14 |
| 100 | 0.32 | 0.28 |
| 150 | 0.34 | 0.33 |
Note: Type I errors for Scenario M0, and power for Scenarios M1–M3.
3.3. Simulation With Covariates
In this section, we consider scenarios where we have observational data and need to be adjusted for covariates. We simulate 10 000 samples with covariates:
Sex is defined as 0 for female and 1 for male. Each subject is assigned to a treatment group (1 or 2) with a probability that is a function of covariates. The probability of receiving treatment 2 is
We choose and , meaning that older patients and male patients are more likely to receive treatment 2 (control). As a result, treatment 1 has 60% female and 40% male, while treatment 2 has 40% female and 60% male. The average age is 48.8 for treatment 1 and 56.0 for treatment 2.
For each replication, we randomly select subjects from treatment group 1 and subjects from treatment group 2. For the subject in group , denote their age and sex by and respectively. We simulate the survival data using
where is chosen to control the censoring rate, denoted by . Given the distributions used to generate age and sex in the full sample, their mathematical expectations are 57.5 and 0.5, respectively. represents the baseline hazard rate for group when age and sex are centered at 57.5 and 0.5. The hazard ratio of treatment 1 against treatment 2, given the same age and sex, is . The hazard ratio of male versus female given the same age and treatment is (e.g., this hazard ratio is 1.22 if we choose ). The hazard ratio of age 80 versus age 35 given the same sex and treatment is (e.g., this hazard ratio is 6.05 if we choose ).
For health utility, we use the same baseline functions as before. For the subject in group , we simulate its utility scores as
which means we allow the age and sex's effects on health utility to be different depending on the treatment. For example, if we let and , it will mean that older patients and male patients tend to have lower utility scores. We consider five scenarios: Scenario C0 uses the same base utility function as in Scenario A0, while Scenarios C1–C4 use the same base utility function as in Scenario A1. The effect sizes of the covariates are included in Table 3.
TABLE 3.
Effect sizes of covariates and testing results for Scenarios C0–C4.
| Scenario C0 | ||||
|---|---|---|---|---|
| Effect size | Assign | Survival | Utility (group 1) | Utility (group 2) |
| Age | 0.01 | 0.04 | −0.01 | −0.01 |
| Sex | 0.2 | 0.2 | −0.1 | −0.1 |
| Type I error | ||||||
|---|---|---|---|---|---|---|
| Naïve | Propensity matching | |||||
| n 1, n 2 | λ 2 = 1 | λ 2 = 0.5 | λ 2 = 2 | λ 2 = 1 | λ 2 = 0.5 | λ 2 = 2 |
| 50 | 0.55 | 0.42 | 0.62 | 0.04 | 0.04 | 0.03 |
| 100 | 0.84 | 0.71 | 0.89 | 0.04 | 0.04 | 0.04 |
| Scenario C1 | ||||
|---|---|---|---|---|
| Effect size | Assign | Survival | Utility (group 1) | Utility (group 2) |
| Age | 0.01 | 0.04 | −0.01 | −0.01 |
| Sex | 0.2 | 0.2 | −0.1 | −0.1 |
| Power | ||||||
|---|---|---|---|---|---|---|
| Naïve | Propensity matching | |||||
| n 1, n 2 | λ 2 = 1 | λ 2 = 0.5 | λ 2 = 2 | λ 2 = 1 | λ 2 = 0.5 | λ 2 = 2 |
| 50 | 0.97 | 0.80 | 1 | 0.32 | 0.21 | 0.52 |
| Scenario C2 | ||||
|---|---|---|---|---|
| Effect size | Assign | Survival | Utility (group 1) | Utility (group 2) |
| Age | 0.01 | −0.02 | 0.01 | 0.01 |
| Sex | 0.2 | −0.1 | 0.1 | 0.1 |
| Power | ||||||
|---|---|---|---|---|---|---|
| Naïve | Propensity matching | |||||
| n 1, n 2 | λ 2 = 1 | λ 2 = 0.5 | λ 2 = 2 | λ 2 = 1 | λ 2 = 0.5 | λ 2 = 2 |
| 50 | 0.07 | 0.06 | 0.10 | 0.38 | 0.20 | 0.52 |
| 100 | 0.07 | 0.04 | 0.14 | 0.63 | 0.38 | 0.84 |
| 150 | 0.06 | 0.03 | 0.14 | 0.76 | 0.44 | 0.92 |
| Scenario C3 | ||||
|---|---|---|---|---|
| Effect size | Assign | Survival | Utility (group 1) | Utility (group 2) |
| Age | 0 | 0.04 | −0.01 | −0.01 |
| Sex | 0 | 0.2 | −0.1 | −0.1 |
| Power | ||||||
|---|---|---|---|---|---|---|
| Naïve | Propensity matching | |||||
| n 1, n 2 | λ 2 = 1 | λ 2 = 0.5 | λ 2 = 2 | λ 2 = 1 | λ 2 = 0.5 | λ 2 = 2 |
| 50 | 0.46 | 0.24 | 0.68 | 0.41 | 0.21 | 0.62 |
| 100 | 0.65 | 0.35 | 0.85 | 0.68 | 0.36 | 0.86 |
| 150 | 0.80 | 0.47 | 0.94 | 0.84 | 0.44 | 0.97 |
| Scenario C4 | ||||
|---|---|---|---|---|
| Effect size | Assign | Survival | Utility (group 1) | Utility (group 2) |
| Age | 0 | 0 | 0 | 0 |
| Sex | 0 | 0 | 0 | 0 |
| Power | ||||||
|---|---|---|---|---|---|---|
| Naïve | Propensity matching | |||||
| n 1, n 2 | λ 2 = 1 | λ 2 = 0.5 | λ 2 = 2 | λ 2 = 1 | λ 2 = 0.5 | λ 2 = 2 |
| 50 | 0.59 | 0.28 | 0.84 | 0.56 | 0.28 | 0.76 |
| 100 | 0.79 | 0.49 | 0.97 | 0.73 | 0.44 | 0.94 |
| 150 | 0.92 | 0.56 | 0.99 | 0.90 | 0.56 | 0.98 |
As Table 3 suggests, in Scenario C0, where there is no survival or utility difference between the two treatments, while there are covariates that are affecting the treatment assignment as well as utility scores, the naïve approach that does not adjust for covariates may have highly inflated type I errors. Meanwhile, the propensity matching technique can control type I errors. In Scenario C1, the power of the naïve method may seem much higher, mostly due to its high inflation. In Scenarios C3–C4, the two methods have similar power, though propensity matching may slightly lose power since it uses fewer samples. However, in Scenario C2, where the signs of the covariate effects are modified, the naïve method has very low power, while propensity matching still has decent power. This demonstrates that the confounding issue may also result in loss of power. In conclusion, it is safer to use propensity matching when dealing with observational data, since it can reduce the bias brought by confounders.
3.4. Application to Health Utilities Index 3 (HUI3) Data
To further demonstrate the feasibility of HUS, we apply it to the HUI3 data on a translational patient cohort in Princess Margaret Cancer Centre [34]. It is a retrospective dataset that records the patients' utility scores throughout the study as well as many baseline variables. We are interested in comparing the survival and health utility performance in patients who only received surgery treatments (Sx alone) and patients who received surgery plus combination chemotherapy and radiotherapy (Sx + POCRT). The survival time for patients was recorded from baseline up to 82.4 months, while the utility data was only recorded from baseline to 24 months. The median follow‐up time is 43.8 months. We consider four covariates when conducting propensity matching: age, gender, stage (early or late), and HPV status. Figure 3 shows the sample sizes before and after matching as well as estimated curves based on survival, utility, and the product of survival and utility. We conduct different tests to the data and compare their results.
FIGURE 3.

Curves (survival, utility, product of survival and utility) using the group average for the two treatment groups from baseline to 24 months.
As shown in Table 4, in this application, the OS‐based log‐rank test has insignificant results, which agrees with the survival plots in Figure 3, where there is no significant difference between the two treatment groups. Note that the p‐values in those plots are results using 24 months' survival data only, whereas the p‐values of OS in Table 4 use all time‐points (up to 82.4 months). Meanwhile, HUS (except ) and the test based on utility only are able to obtain significant results, which also agrees with Figure 3. A larger leads to a smaller p‐value, which makes sense because the difference appears to be in health utility. Giving it a larger weight will lead to more significant results. These test results suggest that the treatment group that receives surgery alone tends to have better utility than the group that receives surgery and postoperative combination chemotherapy and radiotherapy. We also conducted the analysis with propensity matching, however, the sample size left after matching is too small to give meaningful conclusions.
TABLE 4.
p‐values of different tests.
| HUS | |||||
|---|---|---|---|---|---|
| OS | HU only | λ 2 = 1 | λ 2 = 0.5 | λ 2 = 1.5 | λ 2 = 1 (twHUS) |
| 0.513 | < 0.0001* | 0.048* | 0.186 | 0.016* | 0.052 |
Note: OS stands for the log‐rank test to test the difference in overall survival. HU only means compare the health utility without comparing survival (i.e., set , in HUS).
indicates p < 0.05; significance level was set at 0.05.
4. Discussion
We have presented the extensions of HUS to compare treatment effects with a composite endpoint combining survival and health utility with different focuses, and we have established the theoretical properties of HUS. As demonstrated by our comprehensive simulation studies and HUI3 data application, HUS can be applied to not only randomized trial data but also to observational study data, and different versions of HUS show different advantages in various scenarios. The time‐weighted version, twHUS, may yield better results when we care more about the patients' eventual recovery. Using propensity matching is important for observational data as it helps reduce the inflation caused by confounders.
Note that when dealing with multiple QoL measures, the proposed weighted‐average method in this manuscript is straightforward, but there may be other techniques that can boost power in certain scenarios, especially considering there are various choices of measures with different emphases [35, 36, 37]. Exploring other options may be a potential future direction. For instance, we may consider taking a powered sum and applying a technique similar to the SPU (Sum of Powered U‐score) test [38, 39], or building different models with different assumptions and then taking the model average [40]. To better handle sparsely measured utility scores and dependent censoring, we may also consider jointly modeling the scores using a mixed model [41]. Another point worth mentioning is that the choice of weights is important in practice. The main purpose of this manuscript is to further develop the statistical framework of HUS and demonstrate the potentials of its different versions. In practice, we recommend choosing equal weights by default. Based on the clinicians' input, more weights may be given to the measures or time points that are considered more important. On the other hand, if pilot data are available, different choices of weights may be applied based on the prior information and compared before finalizing the choice for the new study. In the analysis stage of the new study, we also recommend conducting sensitivity analyses.
Meanwhile, we have proposed the Cox version of HUS and showed that it has similar performance to using the KM estimates, which is likely due to the fact that our survival data are generated by proportional hazard models. In the future, we may also explore other situations where the proportional hazard assumption is violated and compare the performances of using the Cox estimates and the KM estimates. For example, when the proportional hazards assumption does not hold, the Cox model can be interpreted as estimating a time‐averaged hazard ratio, which may still be quite useful [42, 43]. Besides, it is also possible to perform HUS analysis with other survival models like the flexible parametric model [44], which may show certain benefits in some situations.
Regarding observational study data, there are other options we may compare in terms of handling the confounding issue besides the propensity score matching approach [25, 45, 46]. For example, we may use regression models to estimate and take out the covariate effects, and then use HUS analysis on the residuals. The major challenge about this approach is that it may require very large sample sizes to get good estimates, since when we have many covariates to consider, the number of parameters to estimate in the regression models may be very large. Also, there may be unmeasured confounders that we cannot directly adjust for. Hence, it will be worthwhile to find more efficient and robust ways to analyze observational data with HUS. Besides, though our proposed approach to deal with missing data works well in our simulation settings, where the missingness is independent of or moderately associated with the utility score, we may need to explore better options, given that the missing mechanism in real data may be much less ideal. For example, death may be more related to the utility score. It is important to incorporate more advanced techniques to make HUS more robust in more complicated scenarios [47, 48]. Since this manuscript focuses on the development of statistical properties, the currently used imputation approach is intuitive but relatively simple, which may introduce more bias than some more sophisticated methods. In the future, we may implement more advanced imputation techniques and compare their results [49, 50].
Finally, the current HUS framework is limited to comparing two treatment groups. In certain scenarios, we may have more than two groups of interest, and it may be desirable to have a single test to detect whether there is a difference in multiple groups instead of comparing two groups at a time, which will result in multiple testing and loss of power. For example, in our HUI3 application, we may have interests in comparing three treatment groups: patients who only went through radiation therapy, patients who only received surgery, and patients who received combination treatments. We may develop a multi‐group comparison test based on HUS that can be very useful in these situations.
5. Software
R code for our simulation studies is available at https://github.com/yangq001/HUS.
Conflicts of Interest
The authors declare no conflicts of interest.
Supporting information
Data S1. Supporting information.
Data S2. Supporting information.
Acknowledgments
The authors would like to acknowledge the contributions of Dr. Hisham Mehanna (Institute of Head and Neck Studies and Education, University of Birmingham) and Dr. Sue Yom (Department of Radiation Oncology, University of California) for clinical insights and discussion.
Funding: This work was supported by Princess Margaret Cancer Foundation, Fundamental Research Funds for the Central Universities (Grant No. CXTD14‐05), and National Natural Science Foundation of China (Grant No. 12171329, 12371264).
Yangqing Deng and Meiling Hao are co‐first authors.
Data Availability Statement
The data that support the findings of this study are available from the corresponding author upon reasonable request.
References
- 1. Mehanna H., McConkey C. C., Rahman J. K., et al., “PET‐NECK: A Multicentre Randomised Phase III Non‐Inferiority Trial Comparing a Positron Emission Tomography–Computerised Tomography‐Guided Watch‐And‐Wait Policy With Planned Neck Dissection in the Management of Locally Advanced (N2/N3) Nodal Metastases in Patients With Squamous Cell Head and Neck Cancer,” Health Technology Assessment 21, no. 17 (2017): 1–122, 10.3310/hta21170. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. Mathias S. D., Bates M. M., Pasta D. J., Cisternas M. G., Feeny D., and Patrick D. L., “Use of the Health Utilities Index With Stroke Patients and Their Caregivers,” Stroke 28, no. 10 (1997): 1888–1894, 10.1161/01.STR.28.10.1888. [DOI] [PubMed] [Google Scholar]
- 3. Horsman J., Furlong W., Feeny D., and Torrance G., “The Health Utilities Index (HUI): Concepts, Measurement Properties and Applications,” Health and Quality of Life Outcomes 1, no. 1 (2003): 54, 10.1186/1477-7525-1-54. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. Jewell E. L., Smrtka M., Broadwater G., et al., “Utility Scores and Treatment Preferences for Clinical Early‐Stage Cervical Cancer,” Value in Health 14, no. 4 (2011): 582–586, 10.1016/j.jval.2010.11.017. [DOI] [PubMed] [Google Scholar]
- 5. Glasziou P. P., Simes R. J., and Gelber R. D., “Quality Adjusted Survival Analysis,” Statistics in Medicine 9, no. 11 (1990): 1259–1276, 10.1002/sim.4780091106. [DOI] [PubMed] [Google Scholar]
- 6. Gelber R. D., “Quality‐of‐Life‐Adjusted Evaluation of Adjuvant Therapies for Operable Breast Cancer,” Annals of Internal Medicine 114, no. 8 (1991): 621–628, 10.7326/0003-4819-114-8-621. [DOI] [PubMed] [Google Scholar]
- 7. Gelber R. D., Goldhirsch A., Cole B. F., Wieand H. S., Schroeder G., and Krook J. E., “A Quality‐Adjusted Time Without Symptoms or Toxicity (Q‐TWiST) Analysis of Adjuvant Radiation Therapy and Chemotherapy for Resectable Rectal Cancer,” JNCI Journal of the National Cancer Institute 88, no. 15 (1996): 1039–1045, 10.1093/jnci/88.15.1039. [DOI] [PubMed] [Google Scholar]
- 8. Murray S. and Cole B., “Variance and Sample Size Calculations in Quality‐of‐Life‐Adjusted Survival Analysis (Q‐TWiST),” Biometrics 56, no. 1 (2000): 173–182, 10.1111/j.0006-341X.2000.00173.x. [DOI] [PubMed] [Google Scholar]
- 9. Konski A. A., Winter K., Cole B. F., Ang K. K., and Fu K. K., “Quality‐Adjusted Survival Analysis of Radiation Therapy Oncology Group (RTOG) 90‐03: Phase III Randomized Study Comparing Altered Fractionation to Standard Fractionation Radiotherapy for Locally Advanced Head and Neck Squamous Cell Carcinoma,” Head & Neck 31, no. 2 (2009): 207–212, 10.1002/hed.20949. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Zbrozek A. S., Hudes G., Levy D., et al., “Q‐TWiST Analysis of Patients Receiving Temsirolimus or Interferon Alpha for Treatment of Advanced Renal Cell Carcinoma,” PharmacoEconomics 28, no. 7 (2010): 577–584, 10.2165/11535290-000000000-00000. [DOI] [PubMed] [Google Scholar]
- 11. Seymour J. F., Gaitonde P., Emeribe U., Cai L., and Mato A. R., “A Quality‐Adjusted Survival (Q‐TWiST) Analysis to Assess Benefit‐Risk of Acalabrutinib Versus Idelalisib/Bendamustine Plus Rituximab or Ibrutinib Among Relapsed/Refractory (R/R) Chronic Lymphocytic Leukemia (CLL) Patients,” Blood 138, no. S1 (2021): 3722, 10.1182/blood-2021-147112. [DOI] [Google Scholar]
- 12. Jerusalem G., Delea T. E., Martin M., et al., “Quality‐Adjusted Survival With Ribociclib Plus Fulvestrant Versus Placebo Plus Fulvestrant in Postmenopausal Women With HR±HER2− Advanced Breast Cancer in the MONALEESA‐3 Trial,” Clinical Breast Cancer 22, no. 4 (2022): 326–335, 10.1016/j.clbc.2021.12.008. [DOI] [PubMed] [Google Scholar]
- 13. Glasziou P. P., Cole B. F., Gelber R. D., Hilden J., and Simes R. J., “Quality Adjusted Survival Analysis With Repeated Quality of Life Measures,” Statistics in Medicine 17, no. 11 (1998): 1215–1229, . [DOI] [PubMed] [Google Scholar]
- 14. Prieto L. and Sacristán J. A., “Problems and Solutions in Calculating Quality‐Adjusted Life Years (QALYs),” Health and Quality of Life Outcomes 1, no. 1 (2003): 80, 10.1186/1477-7525-1-80. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Whitehead S. J. and Ali S., “Health Outcomes in Economic Evaluation: The QALY and Utilities,” British Medical Bulletin 96, no. 1 (2010): 5–21, 10.1093/bmb/ldq033. [DOI] [PubMed] [Google Scholar]
- 16. Touray M. M. L., “Estimation of Quality‐Adjusted Life Years Alongside Clinical Trials: The Impact of ‘Time‐Effects’ on Trial Results,” Journal of Pharmaceutical Health Services Research 9, no. 2 (2018): 109–114, 10.1111/jphs.12218. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Chung C. H., Hu T. H., Wang J. D., and Hwang J. S., “Estimation of Quality‐Adjusted Life Expectancy of Patients With Oral Cancer: Integration of Lifetime Survival With Repeated Quality‐of‐Life Measurements,” Value in Health Regional Issues 21 (2020): 59–65, 10.1016/j.vhri.2019.07.005. [DOI] [PubMed] [Google Scholar]
- 18. Billingham L. J., Abrams K. R., and Jones D. R., “Methods for the Analysis of Quality‐of‐Life and Survival Data in Health Technology Assessment,” Health Technology Assessment (Winchester) 3, no. 10 (1999): 1–152. [PubMed] [Google Scholar]
- 19. Diaby V., Adunlin G., Ali A. A., and Tawk R., “Using Quality‐Adjusted Progression‐Free Survival as an Outcome Measure to Assess the Benefits of Cancer Drugs in Randomized‐Controlled Trials: Case of the BOLERO‐2 Trial,” Breast Cancer Research and Treatment 146, no. 3 (2014): 669–673, 10.1007/s10549-014-3047-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. Oza A. M., Lorusso D., Aghajanian C., et al., “Patient‐Centered Outcomes in ARIEL3, a Phase III, Randomized, Placebo‐Controlled Trial of Rucaparib Maintenance Treatment in Patients With Recurrent Ovarian Carcinoma,” Journal of Clinical Oncology 38, no. 30 (2020): 3494–3505, 10.1200/JCO.19.03107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. Deng Y., De Almeida J. R., and Xu W., “Health Utility Adjusted Survival: A Composite Endpoint for Clinical Trial Designs,” Statistical Methods in Medical Research (2025), 10.1177/09622802251338409. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. Fleming T. R. and Harrington D. P., “A Class of Hypothesis Tests for One and Two Sample Censored Survival Data,” Communications in Statistics – Theory and Methods 10, no. 8 (1981): 763–794, 10.1080/03610928108828073. [DOI] [Google Scholar]
- 23. Kaplan E. L. and Meier P., “Nonparametric Estimation From Incomplete Observations,” Journal of the American Statistical Association 53, no. 282 (1958): 457–481, 10.1080/01621459.1958.10501452. [DOI] [Google Scholar]
- 24. Cox D. R., “Regression Models and Life‐Tables,” Journal of the Royal Statistical Society: Series B: Methodological 34, no. 2 (1972): 187–202, 10.1111/j.2517-6161.1972.tb00899.x. [DOI] [Google Scholar]
- 25. Austin P. C., “An Introduction to Propensity Score Methods for Reducing the Effects of Confounding in Observational Studies,” Multivariate Behavioral Research 46, no. 3 (2011): 399–424, 10.1080/00273171.2011.568786. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26. Cheng C., Li F., Thomas L. E., and Li F., “Addressing Extreme Propensity Scores in Estimating Counterfactual Survival Functions via the Overlap Weights,” American Journal of Epidemiology 191, no. 6 (2022): 1140–1151, 10.1093/aje/kwac043. [DOI] [PubMed] [Google Scholar]
- 27. Austin P. C. and Stuart E. A., “The Performance of Inverse Probability of Treatment Weighting and Full Matching on the Propensity Score in the Presence of Model Misspecification When Estimating the Effect of Treatment on Survival Outcomes,” Statistical Methods in Medical Research 26, no. 4 (2017): 1654–1670, 10.1177/0962280215584401. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28. Chesnaye N. C., Stel V. S., Tripepi G., et al., “An Introduction to Inverse Probability of Treatment Weighting in Observational Research,” Clinical Kidney Journal 15, no. 1 (2022): 14–20, 10.1093/ckj/sfab158. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29. Hao M., Song X., and Sun L., “Reweighting Estimators for the Additive Hazards Model With Missing Covariates,” Canadian Journal of Statistics 42, no. 2 (2014): 285–307, 10.1002/cjs.11210. [DOI] [Google Scholar]
- 30. Fleming T. R. and Harrington D. P., Counting Processes and Survival Analysis (John Wiley & Sons, Inc., 2005), 10.1002/9781118150672. [DOI] [Google Scholar]
- 31. Lin D. Y., Wei L. J., and Ying Z., “Checking the Cox Model With Cumulative Sums of Martingale‐Based Residuals,” Biometrika 80, no. 3 (1993): 557–572, 10.1093/biomet/80.3.557. [DOI] [Google Scholar]
- 32. Parzen M. I., Wei L. J., and Ying Z., “Simultaneous Confidence Intervals for the Difference of Two Survival Functions,” Scandinavian Journal of Statistics 24, no. 3 (1997): 309–314, 10.1111/1467-9469.t01-1-00065. [DOI] [Google Scholar]
- 33. Zhao L., Tian L., Uno H., et al., “Utilizing the Integrated Difference of Two Survival Functions to Quantify the Treatment Contrast for Designing, Monitoring, and Analyzing a Comparative Clinical Study,” Clinical Trials 9, no. 5 (2012): 570–577, 10.1177/1740774512455464. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34. Ren J., Pang W., Hueniken K., et al., “Longitudinal Health Utility and Symptom‐Toxicity Trajectories in Patients With Head and Neck Cancers,” Cancer 128, no. 3 (2022): 497–508, 10.1002/cncr.33936. [DOI] [PubMed] [Google Scholar]
- 35. Fisk J. D., “A Comparison of Health Utility Measures for the Evaluation of Multiple Sclerosis Treatments,” Journal of Neurology, Neurosurgery, and Psychiatry 76, no. 1 (2005): 58–63, 10.1136/jnnp.2003.017897. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36. Hawthorne G., Richardson J., and Day N. A., “A Comparison of the Assessment of Quality of Life (AQoL) With Four Other Generic Utility Instruments,” Annals of Medicine 33, no. 5 (2001): 358–370, 10.3109/07853890109002090. [DOI] [PubMed] [Google Scholar]
- 37. Pickard A. S., Ray S., Ganguli A., and Cella D., “Comparison of FACT‐ and EQ‐5D‐Based Utility Scores in Cancer,” Value in Health 15, no. 2 (2012): 305–311, 10.1016/j.jval.2011.11.029. [DOI] [PubMed] [Google Scholar]
- 38. Kim J., Bai Y., and Pan W., “An Adaptive Association Test for Multiple Phenotypes With GWAS Summary Statistics,” Genetic Epidemiology 39, no. 8 (2015): 651–663, 10.1002/gepi.21931. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39. Pan W., Kim J., Zhang Y., Shen X., and Wei P., “A Powerful and Adaptive Association Test for Rare Variants,” Genetics 197, no. 4 (2014): 1081–1095, 10.1534/genetics.114.165035. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40. Baselmans B. M. L., Jansen R., Ip H. F., et al., “Multivariate Genome‐Wide Analyses of the Well‐Being Spectrum,” Nature Genetics 51, no. 3 (2019): 445–451, 10.1038/s41588-018-0320-8. [DOI] [PubMed] [Google Scholar]
- 41. Li N., Elashoff R. M., and Li G., “Robust Joint Modeling of Longitudinal Measurements and Competing Risks Failure Time Data,” Biometrical Journal 51, no. 1 (2009): 19–30, 10.1002/bimj.200810491. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42. Rauch G., Brannath W., Brückner M., and Kieser M., “The Average Hazard Ratio – A Good Effect Measure for Time‐To‐Event Endpoints When the Proportional Hazard Assumption is Violated?,” Methods of Information in Medicine 57, no. 3 (2018): 89–100, 10.3414/ME17-01-0058. [DOI] [PubMed] [Google Scholar]
- 43. Kalbfleisch J. D. and Prentice R. L., “Estimation of the Average Hazard Ratio,” Biometrika 68, no. 1 (1981): 105–112, 10.1093/biomet/68.1.105. [DOI] [Google Scholar]
- 44. Lambert P. C. and Royston P., “Further Development of Flexible Parametric Models for Survival Analysis,” Stata Journal 9, no. 2 (2009): 265–290, 10.1177/1536867X0900900206. [DOI] [Google Scholar]
- 45. Desai R. J. and Franklin J. M., “Alternative Approaches for Confounding Adjustment in Observational Studies Using Weighting Based on the Propensity Score: A Primer for Practitioners,” BMJ (Clinical Research Ed.) 367 (2019): l5657, 10.1136/bmj.l5657. [DOI] [PubMed] [Google Scholar]
- 46. Wunsch H., Linde‐Zwirble W. T., and Angus C. D., “Methods to Adjust for Bias and Confounding in Critical Care Health Services Research Involving Observational Data,” Journal of Critical Care 21, no. 1 (2006): 1–7, 10.1016/j.jcrc.2006.01.004. [DOI] [PubMed] [Google Scholar]
- 47. Graham J. W., Missing Data (Springer, 2012), 10.1007/978-1-4614-4018-5. [DOI] [Google Scholar]
- 48. Naeim A., Keeler E. B., and Mangione C. M., “Options for Handling Missing Data in the Health Utilities Index Mark 3,” Medical Decision Making 25, no. 2 (2005): 186–198, 10.1177/0272989X05275153. [DOI] [PubMed] [Google Scholar]
- 49. Jahangiri M., Kazemnejad A., Goldfeld K. S., et al., “A Wide Range of Missing Imputation Approaches in Longitudinal Data: A Simulation Study and Real Data Analysis,” BMC Medical Research Methodology 23, no. 1 (2023): 161, 10.1186/s12874-023-01968-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50. Cao Y., Allore H., Vander Wyk B., and Gutman R., “Review and Evaluation of Imputation Methods for Multivariate Longitudinal Data With Mixed‐Type Incomplete Variables,” Statistics in Medicine 41, no. 30 (2022): 5844–5876, 10.1002/sim.9592. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data S1. Supporting information.
Data S2. Supporting information.
Data Availability Statement
The data that support the findings of this study are available from the corresponding author upon reasonable request.
