Abstract
BACKGROUND AND AIMS:
Prevention of decompensation is a primary therapeutic target in patients with compensated cirrhosis (CC). However, a major problem is the large sample size and long follow-up required to demonstrate a significant treatment effect because of the relatively low baseline risk. For this reason, it has been recently suggested that ordinal outcomes may be used in this area to gain power and reduce sample size. The aim of this study was to assess the applicability of ordinal outcomes in cirrhosis.
APPROACH AND RESULTS:
An inception cohort of 202 patients with CC (no ascites, gastrointestinal bleeding, encephalopathy, or jaundice) without esophageal varices was included, and 5-year outcome is reported. Etiology was mostly viral and alcoholic, and there were no dropouts. Ordinal outcome was set according to six grades with a previously established prognostic ordinality: grade 1 = no disease progression; grade 2 = development of varices; grade 3 = bleeding alone; grade 4 = nonbleeding single decompensation; grade 5 = more than one decompensating event; and grade 6 = death. At the 60-month time point, patients were distributed in grades 1 through 6 as follows: 129, 43, 2, 7, 5, and 16, respectively. Emulation of a clinical trial performed by dividing patients based on baseline platelet count into two groups (cutoff, 150 × 109/L) demonstrated a statistically significant outcome difference between groups when using ordinal outcomes not detectable by binary logistic or chi-square or time-to-event analyses. Additionally, using ordinal outcomes in a hypothetical study to prevent decompensation resulted in sample-size estimates 3-to 4-fold lower than using a binary composite endpoint.
CONCLUSIONS:
Compared to traditional binary outcomes, the use of ordinal outcomes in trials of cirrhosis decompensation may provide more power and thus may require a smaller sample size.
Phase 3 trials in cirrhosis and portal hypertension (PH) have conventionally looked at a single primary endpoint or a composite endpoint so that the formal statistical analysis compares a binary outcome between study groups. Stopping rules are usually based on occurrence of the primary endpoint, such that patients reaching the endpoint are withdrawn from the experimental treatment and offered alternative therapies.
Although in the past mortality or major outcomes linked to PH were most frequently the primary endpoints in randomized clinical trials (RCTs) performed in patients with cirrhosis, it is now recognized that endpoints are different depending on disease stage of cirrhosis (compensated or decompensated). Decompensation is defined as the development of variceal hemorrhage, ascites, encephalopathy, or jaundice, and survival is significantly lower than in the compensated stage (median survival, 2 years vs. 12 years, respectively).(1) Therefore, the endpoint for RCTs should be based on baseline characteristics of the target population and the trial objectives.(2) In RCTs targeting patients with compensated cirrhosis (CC), who have an expected very long survival, it would be almost impossible to demonstrate an effect on death, and therefore preventing decompensation would be a more appropriate endpoint. Such an endpoint could be defined by occurrence of any new decompensating event. However, stopping the trial at the occurrence of any first event would hamper the assessment of treatment efficacy on specific events. For example, a treatment that reduces portal pressure would be considered a failure if a patient without esophageal varices (EV) at inclusion developed varices during the study. However, stopping the trial at this point would hamper assessing the effect of the treatment on other more significant events, such as ascites. This has already been demonstrated in a recent RCT that showed that nonselective beta-blockers (NSBBs) significantly prevent the development of ascites in CC (but not varices or variceal enlargement),(3) whereas in an older RCT,(4) treatment withdrawal at the time of development of varices hampered the assessment of treatment effect on other events, such as ascites.
Based on these considerations, a recent position paper reporting on a consensus of experts regarding clinical trial design in PH(2) proposed ordinal outcomes(5) to allow more granularity in the assessment of treatment effect on several events ordered hierarchically according to increasing probability of death in both CC and decompensated cirrhosis.
Using ordinal outcomes would allow investigators to assess the effects of different therapies (e.g., active vs. placebo) across a relevant range of outcomes with ethically consistent stopping rules.(6) Because each of the substages of cirrhosis has a different predominant pathophysiological mechanism,(7-10) determining the effect of a therapy at a given stage would also be key in elucidating its most likely mechanism of action. Additionally, analysis based on ordinal outcomes would require a smaller sample size.(5,6,11,12) In this context, traditional outcomes, such as mortality, would only constitute a part of the spectrum.(2)
Analysis of ordinal outcomes has been frequently simplified by reducing it to a dichotomous outcome by (arbitrarily) setting a cutpoint distinguishing good from bad outcomes. However, this practice is not recommended—it results in loss of information and may be misleading because the classification of an outcome as good or bad strongly depends on baseline prognostic conditions. In fact, a bad outcome for a patient with a very favorable baseline prognosis may still be good for a patient with a baseline poor prognosis.(6)
Ordinal outcomes have also been considered as nominal scales and analyzed by comparing proportions for each level of outcome by the chi-square (χ2) test. However, this approach is far from ideal because the ordinality of the data is ignored and it does not consider possible confounders.(5)
In this article, we illustrate the applicability and advantages of ordinal outcome analysis in the setting of clinical trials for the prevention of decompensation in patients with CC without varices using an example drawn from a cohort describing the clinical progression of cirrhosis.(13)
Patients and Methods
ESTABLISHING AN ORDINAL OUTCOME
An ordinal outcome consists of several grades ordered according to increasing severity. An example largely used in neurology is the Rankin Scale for Neurologic Disability.(14-16) In this scale, scores range from 0 to 6, with 0 indicating no symptoms, 1 no clinically significant disability, 2 slight disability, 3 moderate disability, 4 moderately severe disability, 5 severe disability, and 6 death.
For the scope of this study, ordinality of patient outcome was established based on a prospective inception cohort study of 494 consecutive patients with a new diagnosis of cirrhosis.(13) All the patients gave their informed consent to participate in that study at the time of enrollment. Most patients (377, or 76%) had CC defined as the absence of gastrointestinal (GI) bleeding, ascites, encephalopathy, or jaundice, whereas the remaining 117 patients were decompensated. The cohort was enrolled between 1981 and 1984 and prospectively followed up to death or 2014, without any dropouts. A total of 380 patients died during the follow-up (20-year survival estimate, 28.7%; 95% confidence interval [CI], 24.8-32.8), whereas the remaining 114 have been followed for a minimum of 22 years to a maximum of 28 years.
Results from this cohort have identified five prognostic stages of the disease according to clinical characteristics and increasing 5-year death risk (Fig. 1).
FIG. 1.
Schematic representation of 5-year transitioning rates across stages and to death for the whole series of patients. Arrows represent transitions, and the numbers close to each arrow are the relevant 5-year transition rates (%). Modified from an earlier work.(13)
We therefore defined ordinal outcome to be used in this study as follows: grade 1 = no disease progression (patients remained alive without varices and without having developed any decompensating event); grade 2 = development of EV; grade 3 = development of upper GI bleeding; grade 4 = development of a single nonbleeding decompensating event (ascites, encephalopathy, or jaundice); grade 5 = development of any second decompensating event, including bleeding; and grade 6 = death.
PATIENTS
All 202 patients with CC and without EV at the first diagnosis of cirrhosis from the original cohort of 494 patients(13) constituted the cohort used in this study (details in Supporting Information).
To show the potential advantages of using ordinal outcomes, we emulated an RCT for prevention of disease progression in patients who, at baseline, had CC without EV. We thereby divided our study cohort into two subgroups based on platelet count (PLT) at a cutoff of 150 × 109/L, with the experimental treatment group being those with PLT > 150 × 109/L and the control group being those with PLT ≤ 150 × 109/L (Supporting Table S1). We chose PLT because it is frequently reported as a prognostic indicator in cirrhosis,(1) and the 150 × 109/L threshold has been shown to be associated with clinically significant portal hypertension (CSPH) and increased risk of development of EV and decompensation.(17,18) To make this emulation realistic, 60-month outcomes are reported.
STATISTICAL ANALYSIS
Statistical methods for comparing ordinal outcomes in different groups of patients have been described elsewhere.(5,6,19) Specific software is included in most statistical packages under the heading of “ordinal logistic analysis.”
Briefly, the main statistical models used in this context are extensions of the classic logistic model for ordinal response outcome.(12) They account for the category order of the outcome by grouping categories that are contiguous on the ordinal scale. For example, the (proportional odds) cumulative logistic model applies the logit transformation () to the cumulative probabilities of the ordinal outcome. In our example, it would consider the probability of no disease progression (grade 1 vs. any progression grades 2-6), then the probability of no disease progression or development of EV (grades 1 and 2) versus major progressions (grades 3-6), and so on. The model simultaneously uses all these cumulative probabilities and results in a common odds ratio (OR) for disease progression for one group compared to the other (Supporting Table S2, left).
Other approaches not requiring proportionality of ORs have been developed.(12) Among these, a widely used approach is based on ORs of progressing toward subsequent grades in the ordinal scale on the condition that the previous grade has been reached and hence the cutpoints in the scale are considered as conditional incremental cutpoints. This “continuation ratio model” is mostly used when proportionality of ORs is not verified (Supporting Table S2, right). The continuation ratio model requires specific computational rearrangement of the original data sets as reported elsewhere.(5)
In the present study, we used the proportional odds model after having checked proportionality of odds across response categories by the likelihood ratio test of proportionality of ORs. All the analyses were performed by the statistical package STATA (version 14; StataCorp LP, College Station, TX), if not differently specified.
COMPARISON OF ORDINAL OUTCOMES WITH TRADITIONAL BINARY OUTCOMES
In our emulated trial, disease progression would be the development of EV, decompensation, or death. To show the potential advantages of using ordinal outcomes, we compared the outcome between the two study groups at different observation times using (1) the ordinal outcome analysis as described above,(2) its dichotomization at the level of variceal bleeding (VB; no progression or development of varices vs. any decompensation or death), and (3) a composite endpoint defined by time to any decompensating event or death, also corresponding to the dichotomization of the ordinal outcome at the level of VB.
The Wald test, based on the coefficient of PLT/group effect and its standard error, has been computed for each analysis to compare efficiency of the different approaches.(11)
SAMPLE-SIZE CALCULATION
Methods for sample-size calculation using ordinal outcomes are available,(12,19,20) and it has been consistently shown that the required sample size for clinical trials is lower when using an ordinal outcome scale than its dichotomization in a binary outcome.(5,6) In fact, by dichotomizing an ordinal outcome, part of the outcome events will be considered as “nonevent,”(6) thus reducing the total number of events available for the relevant comparison as well as information given by the ordinal outcome.
To illustrate how the required sample size may differ when planning an RCT in patients with CC using ordinal outcomes compared to traditional binary outcomes, we calculated the required sample size hypothesizing the same number of observed events using an ordinal outcome,(20) a dichotomized ordinal outcome,(6) and a composite (time-to-event) endpoint.(21)
Results
DISEASE PROGRESSION AND 60-MONTH ORDINAL OUTCOME
The flow of patients across disease stages through-out the 60-month follow-up period and their ordinal outcome grade at this time point are shown in Fig. 2.
FIG. 2.
Schematic representation of patient flow across diseases stages, resulting in the final (ordinal) outcome at 60 months. Light boxes represent the occurring events. Bold boxes represent the (ordinal) outcome 60 months after the diagnosis of CC without EV. Grades refer to the corresponding grade of the ordinal outcome: grade 1 = no progression; grade 2 = development of varices; grade 3 = bleeding; grade 4 = nonbleeding decompensation; grade 5 = any second decompensation; and grade 6 = death. ∥Five more patients developed varices after other events; Ⱶ4 more patients bled after other events; *2 after developing varices; ¶2 after developing varices; ǂ1 developed varices and bled. Abbreviation: decomp, decompensation.
No Progression (Grade 1)
A total of 129 patients were still alive and free of both EV and decompensating events.
Development of EV (Grade 2)
Although 49 patients developed EV in the 60-month observation period, 44 of those patients developed varices as the first progression event, and 5 patients developed varices after other nonbleeding decompensating events. Of the 44 patients with varices as the first event, 1 patient developed ascites and bleeding and eventually died within 60 months. The remaining 43 patients were still alive, with varices and no other decompensating events, at 60 months.
Upper GI Bleeding (Grade 3)
Upper GI bleeding occurred in 7 patients, of which 3 bled as the first clinical event and 4 bled after having developed another decompensating event. Of the latter, 3 patients had EV on emergency endoscopy, and 1 patient had no varices and a source could not be identified. None of the 3 patients who bled as the first clinical event had varices at endoscopy, and the source of bleeding was duodenal ulcer in 1 patient, erosions in another, and undefined in the remaining patient. Because 1 of these 3 patients died at the time of hemorrhage, only 2 patients were in this outcome grade at 60 months.
First Nonbleeding Decompensating Event (Grade 4)
A total of 21 patients developed a first nonbleeding event (11 ascites, 6 encephalopathy, and 3 jaundice), and 1 patient developed ascites after development of EV. Because 8 of the 21 patients subsequently developed a second decompensating event (stage 5) with 6 deaths and 6 dying without having developed a further event, 7 patients are considered to belong to this grade of the ordinal outcome.
Any Second Decompensating Event (Grade 5)
A total of 11 patients developed a second decompensating event, of which 8 were from stage 4 (3 after bleeding), and 3 developed two decompensating events at the same time. Because 6 of the 11 patients died, 5 patients are in this grade of the outcome.
Death (Grade 6)
Overall, there were 16 deaths in the 60-month observation period: 3 in stage 1 (non-liver-related causes), 2 at the first bleeding episode, 6 in stage 4, and 6 in stage 5 (Fig. 2). Causes of death were upper digestive bleeding in 2, liver failure in 4, sepsis in 2, hepatocellular carcinoma in 3, extrahepatic neoplasia in 1, hematological disease in 1, and human immunodeficiency virus in 3.
60-Month Ordinal Outcome
As a consequence of disease progression, 60-month ordinal outcome from grades 1 to 6 for the 202 included patients was 129, 43, 2, 7, 5, and 16, respectively (Fig. 2). Corresponding values of ordinal outcome at 12, 24, 36, 48, and 60 months are reported in Table 1 for comparison.
TABLE 1.
Number of Patients With Each Grade of the Ordinal Outcome at 12, 24, 36, 48, and 60 Months Compared to the First New Event Denoting Disease Progression in an Inception Cohort of 202 Patients With CC Without Varices at Diagnosis(13)
| Outcome Grade* | Observation Time, Months | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| 12 | 24 | 36 | 48 | 60 | ||||||
| Ord | Pro | Ord | Pro | Ord | Pro | Ord | Pro | Ord | Pro | |
| 1 = no progression | 194 | 160 | 147 | 142 | 129 | |||||
| 2 = varices | 3 | 3 | 25 | 25 | 34 | 35 | 37 | 38 | 43 | 44 |
| 3 = GI bleeding | 0 | 0 | 1 | 2 | 1 | 2 | 1 | 2 | 2 | 3 |
| 4 = first nonbleeding decompensation | 4 | 4 | 10 | 13 | 7 | 16 | 7 | 16 | 7 | 20 |
| 5 = any second decompensation | 0 | 0 | 1 | 1 | 5 | 1 | 5 | 3 | 5 | 3 |
| 6 = death | 1 | 1 | 5 | 1 | 8 | 1 | 10 | 1 | 16 | 3 |
| Total patients | 202 | 202 | 202 | 202 | 202 | |||||
| Total events | 8 | 42 | 55 | 60 | 73 | |||||
Ord = ordinal outcome at the end of the observation period; Pro = first new event denoting disease progression during the observation period.
60-Month Ordinal Outcome Versus First New Event
Figure 2 clearly shows that patients pass through different clinical stages to get their final condition at 60 months. There is therefore a substantial difference between 60-month ordinal outcome and the first event denoting disease progression (Table 1).
ORDINAL VERSUS BINARY OUTCOME
There were 101 patients with PLT ≤ 150 × 109/L (which in our emulated RCT corresponded to the control group) and 101 with PLT > 150 × 109/L (treatment group). A graphical representation of the ordinal outcome in the two study groups, at 24 and 60 months, is shown in Fig. 3.
FIG. 3.
Ordinal outcome at 24 and 60 months for patients with PLT ≤ 150 × 109/L and PLT > 150 × 109/L, respectively. The ordinal outcome was significantly worse in patients with lower PLT at either 24 months or 60 months. Grades refer to the corresponding grade of the ordinal outcome: grade 1 = no progression; grade 2 = varices; grade 3 = bleeding; grade 4 = single nonbleeding decompensation; grade 5 = any second decompensation; and grade 6 = death.
To compare ordinal outcome of the two groups, we first performed a basic analysis using the χ2 test for the 2 by 6 table (5 degrees of freedom), which compares multinomial categorical outcomes without accounting for ordinality of categories, and found no significant differences at any time point (Table 2). In contrast, the proportional odds cumulative logistic model (ordinal logistic) showed a significant difference between the two groups according to ordinal outcome from 24 months (OR, 0.48; 95% CI, 0.24-0.96) to 60 months.
TABLE 2.
Comparison of Disease Progression Assessed by an Ordinal Outcome at 12, 24, 36, 48, and 60 Months for Patients With CC Without EV and PIT ≤ 150 × 109/L or PIT > 150 × 109/L, Respectively
| Observation Time, Months | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| 12 | 24 | 36 | 48 | 60 | ||||||
| PLT × 109/L | ≤150 | >150 | 150 | >150 | ≤150 | >150 | ≤150 | >150 | ≤150 | >150 |
| Outcome grade | ||||||||||
| No. of patients | 101 | 101 | 101 | 101 | 101 | 101 | 101 | 101 | 101 | 101 |
| 1 = No prog | 96 | 98 | 74 | 86 | 66 | 81 | 64 | 78 | 56 | 73 |
| 2 = varices | 2 | 1 | 16 | 9 | 20 | 14 | 21 | 16 | 24 | 19 |
| 3 = Gl bleeding | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 1 |
| 4 = 1st decomp | 3 | 1 | 7 | 3 | 5 | 2 | 5 | 2 | 4 | 3 |
| 5 = 2nd decomp | 0 | 0 | 1 | 0 | 4 | 1 | 3 | 2 | 3 | 2 |
| 6 = death | 0 | 1 | 3 | 2 | 6 | 2 | 8 | 2 | 13 | 3 |
| Total events | 8 | 42 | 55 | 60 | 73 | |||||
| Total patients | 202 | 202 | 202 | 202 | 202 | |||||
| Statistical analysis | ||||||||||
| χ2; P(df = 5)† | 2.35; 0.5 | 6.7; 0.24 | 8.7; 0.12 | 8.14; 0.15 | 9.4; 0.09 | |||||
| OR (ordinal logistic)* | 0.59 (0.14-2.55) | 0.48 (0.24-0.96) | 0.45 (0.24-0.84) | 0.49 (0.26-0.89) | 0.45 (0.25-0.79) | |||||
| Wald test | 0.70 | 2.06 | 2.48 | 2.31 | 2.74 | |||||
| OR (binary logistic)* | 0.65 (0.11-4.03) | 0.51 (0.18-1.45) | 0.36 (0.13-0.98) | 0.39 (0.15-1.01) | 0.37 (0.16-0.86) | |||||
| Wald test | 0.45 | 1.25 | 2.01 | 1.94 | 2.31 | |||||
| Wald test ordinal/Wald test binary logistic regression | 1.56 | 1.65 | 1.23 | 1.19 | 1.19 | |||||
| Cox HR (binary composite outcome)‡ | 0.66 (0.11-3.95) | 0.54 (0.20-1.45) | 0.42 (0.16-1.09) | 0.45 (0.18-1.11) | 0.43 (0.19-0.94) | |||||
| Wald test | 0.46 | 1.23 | 1.79 | 1.74 | 2.12 | |||||
| Wald test ordinal/Wald test Cox model regression | 1.52 | 1.67 | 1.39 | 1.33 | 1.29 | |||||
ORs for progression in patients with higher versus lower PLTs are shown for an ordinal outcome in six grades (by ordinal logistic analysis) and for a dichotomization of the ordinal outcome at grade 2 (≤2 vs. >2, by binary logistic analysis); HRs by the Cox model are also reported with a composite outcome defined by death or any decompensating event, corresponding to grades ≤2 versus >2 of the ordinal outcome.
Bold delimited area includes outcome grades from 3 to 6.
95% CIs are reported in parentheses.
Overall chi-square test with 5 df (this test does not account for ordinality of outcome).
HR for the composite outcome defined by death or development of bleeding, first nonbleeding decompensation, or any second decompensation by the Cox model.
Abbreviation: df, degrees of freedom.
To show how dichotomization of an ordinal outcome may dilute the difference between two patient groups, we dichotomized the above-shown ordinal outcome at the level of VB such that no disease progression or development of varices were counted as good outcomes, and bleeding, ascites, encephalopathy, jaundice, or death were counted as bad outcomes (Table 2, bold delimited area). Although the OR for ordinal outcome was significant after 2 years, the traditional binary logistic analysis for the dichotomized outcome was only barely significant at 36 months, but not confirmed at 48 months, and was clearly significant only at 60 months (Table 2). Moreover, the Wald test was from 1.19 to 1.65 times higher for the ordinal regression than for the binary logistic regression for the dichotomized outcome, denoting higher efficiency with the ordinal approach.
To explore whether analysis of time to outcome would be a better approach in this situation, we performed a Cox analysis with the dependent variable defined by the composite outcome of death or development of bleeding, any first nonbleeding decompensation, or any second decompensation, whichever occurred first (excluding varices). The analysis was repeated with observation times of 12, 24, 36, 48, and 60 months, respectively, and the corresponding hazard ratios (HRs) are reported in Table 2. Five years of follow-up were needed to detect the significant difference between the two study groups already evident after 2 years by using the ordinal outcome (Table 2). The Wald test was from 1.29 to 1.67 times higher for the ordinal regression than for the Cox regression, indicating that the ordinal approach is more efficient.
In order to better illustrate the proportional odds assumption and the OR interpretation in the ordinal logistic regression, we focused on comparison of the ordinal outcome of disease progression of the two study groups at the 60-month time point and applied a generalized ordered logistic regression (Table 3). The odds of any progression, from stage 1 (grade 1 of the ordinal outcome) in the treatment group versus the control group, was 0.48, whereas for dying (grade 6 of the ordinal outcome or remaining alive at any clinical stage, 1-5) it was 0.20. ORs for progression from each of the ordinal outcome grades are shown in sequence in Table 3. The OR from the ordinal logistic proportional odds model estimates a common OR over all these cutpoints and can be interpreted as a summary measure of the effect of the grouping variable: treatment in a clinical trial (PLT in our example). The likelihood ratio test of proportionality of ORs across response categories was not significant (χ2 with 4 degrees of freedom = 2.65; P = 0.6173), denoting the absence of a statistically significant difference among these ORs.
TABLE 3.
ORs of Disease Progression by Treatment Group in Our Emulated Trial, Analyzed for Each Increasing Grade in the Ordinal Outcome Progression Scale by Generalized Ordered Logistic Regression (Not Assuming Proportional Odds)
| Treatment Response According to the Ordinal Outcome Grade (Y) |
Odds | Clinical Translation of Odds | OR Comparing Patients With PLT > 150 × 109/L vs. PLT ≤ 150 × 109/L (95% CI) |
|---|---|---|---|
| 1 = no progression | — | — | — |
| 2 = varices | P(Y > 1)/P(Y = 1) | Any disease progression/no progression | 0.48 (0.27, 0.86)* |
| 3 = GI bleeding | P(Y > 2)/P(Y ≤ 2) | Decompensation or death/remaining compensated | 0.37 (0.16, 0,86)* |
| 4 = first decompensation | P(Y > 3)/P(Y ≤ 3) | First nonbleeding decompensation or death/remaining compensated or with bleeding alone | 0.35 (0.15, 0.83)* |
| 5 = any second decompensation | P(Y > 4)/P(Y ≤ 4) | Any second decompensation or death/remaining compensated or with any first decompensation | 0.28 (0.10, 0.79)* |
| 6 = death | P(Y > 5)/P(Y ≤ 5) | Death/alive | 0.21 (0.06, 0.75)* |
Chi-square (df = 4) = 2.65; P value = 0.6173. The lack of statistical significance among ORs for each increasing grade of the ordinal outcome confirms that the proportional odds model may be applied in this case.
Abbreviation: P, probability.
ORDINAL VERSUS COMPOSITE ENDPOINT
In clinical trials in cirrhosis, the binary outcomes used rarely correspond to the dichotomization of a previously defined ordinal outcome. More frequently, it is the presence or absence of a specific clinical event or a composite endpoint. A composite endpoint is defined by occurrence of the first among several relevant events: whichever of these events occurs first, it leads to termination from the trial of patients experiencing the event (if required by predefined stopping rules). As an example, in a trial of a treatment for prevention of progression of PH in cirrhosis, in a cohort like the one described above a meaningful composite endpoint would be the occurrence of EV or any decompensating event or death, whichever occurs first. Cumulative incidence of the composite endpoint as defined above and its components in our two study groups are reported in Fig. 4 by a competing risks analysis; the corresponding number of events is reported in Table 1 in the “Pro” columns. Note that these incidences consider only the first occurring events; therefore, if a patient experienced bleeding or other decompensating events after the occurrence of varices, these would not be accounted for in the analysis represented in the figure, whereas they would be accounted for by an ordinal outcome analysis (Fig. 3). Moreover, the composite endpoint analysis gives the same importance to all the events considered, whereas the ordinal outcome will consider the increasing probability of death of each event.
FIG. 4.
Crude cumulative incidence (accounting for competing risks) of the events (varices, GI, bleeding, any first nonbleeding decompensating event, any second decompensating event, or death) defining the composite endpoint in a hypothetical trial for the prevention of disease progression. The two comparison groups are represented by patients with PLT ≤ 150 × 109/L or PLT > 150 × 109/L, respectively, in our cohort. Occurrence of any of the components of the composite endpoint, whichever occurs first, would lead to termination of the study for patients experiencing the event.
It may be noted that patients with PLT ≤ 150 × 109/L had a consistently progressive increase of any relevant event across all the observation times compared to patients with PLT > 150 × 109/L. However, if a hypothetical treatment would reduce the incidence of any event developing after varices, this effect would not be uncovered in a study based on a composite endpoint, including the development of varices, whereas it would be captured in a study based on an ordinal outcome.
SAMPLE-SIZE REQUIREMENT IN A REAL-WORLD CLINICAL TRIAL FOR THE PREVENTION OF DISEASE PROGRESSION IN CC WITHOUT VARICES
In clinical trials, sample-size estimation is based on the total number of expected events and a hypothesis of event reduction under the experimental treatment. However, the number of expected events is different according to the type of outcome chosen.
In our cohort, the total number of first events was 8, 42, 55, 60, and 73 at 12, 24, 36, 48, and 60 months, respectively (Table 1), and the relevant baseline risks at these time points were therefore 0.04, 0.21, 0.27, 0.30, and 0.36 (V+ columns), respectively (Table 4).
TABLE 4.
Twelve- To 60-Month Outcome of 202 Patients With CC Without EV and Sample Sizes Required to Show A 50%, 34%, and 20% Risk Reduction Using an Ordinal Outcome or Two Different Binary (Time-to-Event) Composite Outcomes Including or Not Development of Varices*
| Observation Time (Months) and n of Patients With Outcome | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Follow-up, months | 12 | 24 | 36 | 48 | 60 | |||||
| Total patients | 202 | 202 | 202 | 202 | 202 | 202 | 202 | 202 | 202 | 202 |
| Outcome (V+/−)† | V+ | V− | V+ | V− | V+ | V− | V+ | V− | V+ | V− |
| 1 = no progression | 194 | 197 | 160 | 185 | 147 | 181 | 142 | 179 | 129 | 172 |
| 2 = varices | 3 | 25 | 35 | 38 | 44 | |||||
| 3 = bleeding | 0 | 5 | 2 | 17 | 2 | 20 | 2 | 22 | 3 | 29 |
| 4 = 1st decomp | 4 | 13 | 16 | 16 | 20 | |||||
| 5 = 2nd decomp | 0 | 1 | 1 | 3 | 3 | |||||
| 6 = death | 1 | 1 | 1 | 1 | 3 | |||||
| Total events | 8 | 5 | 42 | 17 | 55 | 20 | 60 | 22 | 73 | 29 |
| Baseline risk | 0.04 | 0.025 | 0.21 | 0.08 | 0.27 | 0.10 | 0.3 | 0.11 | 0.36 | 0.14 |
| Total sample size required to show a 50% baseline risk reduction with α = 0.05 and power 0.80 | ||||||||||
| Composite endpoint‡,§ | 2,292 | 3,705 | 385 | 1,115 | 285 | 879 | 250 | 793 | 198 | 565 |
| Ordinal outcome ∥ | 2,118 | 354 | 258 | 228 | 176 | |||||
| Total sample size required to show a 34% baseline risk reduction with α = 0.05 and power 0.80 | ||||||||||
| Composite endpoint‡,§ | 5,456 | 8,838 | 892 | 2,638 | 652 | 2,074 | 568 | 1,869 | 442 | 1,322 |
| Ordinal outcome ∥ | 5,314 | 864 | 620 | 548 | 418 | |||||
| Total sample size required to show a 20% baseline risk reduction with α = 0.05 and power 0.80 | ||||||||||
| Composite endpoint† § | 1,7031 | 27,627 | 2,726 | 8,199 | 1,976 | 6,432 | 1,713 | 5,789 | 1,317 | 4,075 |
| Ordinal outcome ∥ | 16,946 | 2,692 | 1,912 | 1,682 | 1,264 | |||||
Differences in sample-size estimates calculated by log-rank (binary) versus ordinal outcome are more marked when varices (without other decompensating event) are considered a nondecompensating event (bolded) and patients are continued in the trial until an overt decompensating event develops.
The ordinal outcome includes six categories: grade 1 = no progression; grade 2 = development of varices; grade 3 = bleeding; grade 4 = nonbleeding decompensation; grade 5 = any second decompensation; and grade 6 = death. The two composite endpoints consider the development of varices being or not being part of the outcome (bold delimited areas denote when varices are not considered a negative outcome).
V+/− denotes inclusion or exclusion of development of varices in the composite endpoint.
Sample size has been estimated according to a reduction of risk of 50%, 34%, and 20% (RRs, 0.5, 0.66, and 0.80) by the sample-size software PASS v.11, using the log-rank test (Freedman) procedure: Input parameters were the proportions of subjects free of the events of interest at the end of follow-up in the two arms, assuming none lost to follow-up. This corresponds to slightly different values of HR and OR according to baseline risks (details in the Supporting Information Appendix).
Sample size needed for cumulative risk comparison, like in the composite endpoint in the example.
Sample size needed for an ordinal outcome (or ordered categorical variable) according to the test for two ordered categorical variables procedure in Pass v.11. As input parameters, we used the proportion of subjects in the six categories at the end of follow-up, assuming none lost to follow-up, and the natural logarithm of the OR for treatment effect.
In CC, designing an RCT to prevent any disease progression would require a composite binary endpoint considering as “nonevent” all the patients remaining with CC and free of varices at the end of follow-up and as “event” all patients developing varices, bleeding, ascites, encephalopathy, or jaundice. This would require stopping the study at any of these events (according to predefined stopping rules), and, because in our cohort varices were the first event in approximately 60% of patients with disease progression (Table 1), the study would be underpowered to assess any effect on decompensation. Baseline risk at 60 months would be 0.36, and the sample size required to demonstrate a 50% time-related reduction in risk would be 198 (Table 4), whereas using the ordinal outcome under the proportional ORs assumption,(21) it would be somewhat lower at 176. Note that, in this case, using the ordinal outcome would only have the advantage of adding severity of outcome to the efficacy assessment. In fact, if the development of varices was a stopping rule, it would not be possible to assess the treatment effect after variceal development.
On the contrary, designing an RCT to prevent clinical decompensation, a more robust endpoint, would require a binary time-related composite outcome considering as “nonevent” nonprogression or development of EV and as “event” the development of any decompensating event or death. In this case, development of varices would not require stopping the trial and hence would allow for assessment of the treatment effect after the occurrence of varices. Using this binary outcome, the sample size required to demonstrate a 50% reduction in risk at 60 months would be 565—much larger than the previous example and much larger than the sample size of 176 required using the ordinal outcome, and this would be the case across all observation times (Table 4). Therefore, using the ordinal outcome would make this a feasible RCT that would not only assess the efficacy of the intervention to prevent decompensation, but would also test the effect of such intervention on the development of varices.
Sample-size estimates for the ordinal outcome and the two binary composite endpoints considered above are reported in Table 4 by the log-rank and by the method for ordered categorical variables (ordinal outcome).(20,21)
When the treatment effect is expected to differ across outcome grades because of pathophysiological mechanisms or information from previous trials, proportionality of ORs may not be assumed, and the sample-size estimation should be based on a nonproportional approach.(12) As an example, we considered the recently reported PREDESCI (prevent decompensation of cirrhosis in patients with clinically significant portal hypertension) RCT(3) showing that NSBBs significantly reduce the risk of decompensation without reducing the risk of developing EV. The sample-size estimate was based on the hypothesis of a very large risk reduction, from 25% to 10% (risk reduction, 0.6; relative risk [RR], 0.4 in treated patients), leading to an overall sample-size estimate of 210 patients, accounting for a 5% loss to follow-up, for the binary composed endpoint used (death or decompensation [ascites, bleeding, or encephalopathy] vs. no decompensation); development of varices alone was not considered as an endpoint. After a median follow-up of 37 months, risk of death or decompensation was 27% in control and 16% in treated patients (HR, 0.51; 95% CI, 0.26-0.97; RR, 0.60; OR, 0.52), whereas no effect was found on the development of varices.
To simulate a similar scenario, we took as baseline the 27% overall risk in the control group of the PREDESCI trial(3) and computed the required sample size to show a 0.6 risk reduction (from 27% to 10.8%; RR, 0.4). Using a binary composite endpoint such as the one used in the PREDESCI study (any decompensation or death vs. no decompensation) and accounting for a 5% loss to follow-up, the sample size required would be 216 patients. However, using an ordinal outcome, the sample size would be 108 when assuming a proportional OR (i.e., a similar risk reduction also for the development of varices) or 130 when rejecting the proportionality assumption (i.e., hypothesizing no effect on varices; details in Supporting Information). The same findings of a lower sample size requirement with ordinal outcome (compared to binary outcome) are obtained with lower risk-reduction assumptions of 0.4 (from 27% to 16.2%; RR, 0.6), where the sample size using a binary outcome would be 470 compared to ordinal outcome (284 assuming or 319 not assuming a proportional OR), or even with a risk-reduction assumption of 0.2 (from 27% to 21.6%; RR, 0.8), where the sample size using a binary outcome would be 2,052 compared to ordinal outcome (1,365 assuming or 1,260 not assuming proportionality).
Discussion
Based on an inception cohort study of the clinical course of cirrhosis,(13) we investigated the applicability of an ordinal outcome to describe the incidence and severity of decompensation in patients free of EV. We have also simulated a clinical trial of a hypothetical treatment for the prevention of decompensation to assess how the use of an ordinal outcome may affect the feasibility of RCTs in this area.
Completeness and length of follow-up allowed for detailed description of the sequence of events along the entire follow-up for each patient. We used 60-month outcomes because, given the low expected decompensation rate,(1) this would be a reasonable time to observe a number of events sufficient to assess the feasibility of RCTs of potential treatments for the prevention of decompensation.
The ordinal outcome was based on five described prognostic stages.(13) These stages allow for defining six grades of outcome ordered according to increasing severity of disease from “no events” to “death.” The hierarchical arrangement of outcomes is precisely the basis of ordinal outcome analysis.
A first important result of this study is the granularity achieved in the description in the sequence of events occurring in the 60-month follow-up period. Importantly, the first event was the development of varices in approximately 60% of patients, whereas in the remaining 40% a decompensating event occurred before varices, suggesting that a therapy that prevents only varices or only decompensation could be insufficient to prevent disease progression. Although it is well known that development of varices is associated with a significant increase in risk of decompensation, this would require a longer observation time.(13) Interestingly, the only patients who died before decompensation did so for causes unrelated to liver disease, suggesting that the development of any decompensating event, but not the development of varices, is a clinically sound stopping rule in trials enrolling compensated patients.
The low incidence of decompensating events, on the other hand, confirms that an RCT aiming at the prevention of decompensation would require a large sample size and a long follow-up. According to the 60-month 0.15 risk observed in our cohort, 1,322 patients should be included in such a trial and followed for this period, making the trial unfeasible. Notably, the low rate of decompensating events in this cohort is attributable to the fact that this was an inception cohort, in which all patients entered at the first diagnosis of cirrhosis and did not have varices. The rate of decompensating events would be predictably higher if patients with known cirrhosis and/or varices at baseline were considered.
Another important result of this study is having shown the applicability of ordinal outcomes to describe the clinical course of CC. In fact, the ordinal outcome used allowed for the comparison of two patient groups in our cohort divided by PLT at a cutoff point of 150 × 109/L. Using the ordinal outcome disclosed a significantly increased risk of disease progression in patients with a PLT ≤ 150 × 109/L, which was not apparent when a binary dichotomization of the outcome or a χ2 test was used for all the outcome categories. Ordinal outcomes not only provide a new way of describing outcomes in CC, but allow further insight into prognostic indicators. In fact, the statistical analysis for ordinal outcomes is based on an ordinal logistic model, which allows for covariates and adjusted ORs used in prognostic studies.
A major result of this study is that the ordinal outcome we have assessed would be much more suitable for clinical trials for the prevention of decompensation than a binary outcome, including any decompensating event versus no decompensation. In fact, inclusion of the development of varices in the ordinal outcome markedly increases the study power and hence reduces the sample size required and also avoids terminating the study in patients developing varices without having developed other decompensating events. Based on the decompensation risk observed in our cohort, under the proportional OR assumption, for an RCT to show a 50% risk reduction, 565 patients with a binary composite endpoint and a 60-month follow-up period would be required, whereas a sample size of 176 patients would be enough for a study using ordinal outcomes instead. In a more feasible scenario, for a 36-month follow-up trial, the sample-size estimates would be 879 (binary outcome) versus 258 (ordinal outcome).
When proportionality of ORs cannot be assumed, the sample-size estimate should be based on a nonproportional approach.(12) In an example reproducing the scenario of a recently published RCT(3) where the treatment effect was not proportional across the grades of our proposed ordinal outcome, we demonstrated how to estimate the required sample size when proportionality cannot be assumed.
Although this study was based on an inception cohort with early diagnosis and without dropouts, characteristics which minimize the risk of bias, there are some weaknesses to be considered. First, because the cohort was enrolled in 1981-1984, no etiological treatments were available for viral cirrhosis, which was the most prevalent cause of the disease in the included patients. However, it could be speculated that viral elimination would further reduce the development of outcomes in this patient cohort and reinforce the relevance of using ordinal outcomes. Moreover, the etiology of cirrhosis is rapidly changing as an effect of antiviral treatments and the incidence of events may change, but hierarchy of outcome severity would remain. Therefore, applicability of ordinal outcomes would likely hold even in the face of changing etiologies.
Another weakness of this study is the lack of patient stratification according to the presence (or not) of CSPH, the strongest predictor of decompensation.(22) This would require measurements of hepatic vein pressure gradient in all the included patients and would allow different sample-size estimates according to the presence or absence of CSPH. It should be mentioned, however, in this regard, that in a recent RCT of NSBBs for the prevention of decompensation in compensated patients with CSPH,(3) a significant effect was found only in patients with EV at inclusion. In that study, a composite endpoint was used, including bleeding, ascites, encephalopathy, and death. Although the development of varices was not included in the composite endpoint, a trend toward a reduction in the development of high-risk varices was found in treated patients, suggesting that analysis using ordinal outcomes might have revealed a more beneficial treatment effect than that described in the study using a binary outcome.
Clearly, there are pros and cons to be considered when using ordinal outcomes, particularly in the setting of RCTs. We have already discussed the advantage of increasing power, thus requiring smaller sample sizes and the possibility of using more flexible study designs. Major cons are likely the more difficult interpretation of results from ordinal outcome models, the lower confidence that researchers may have with this kind of study design given their unfamiliarity with it, and the insufficient evidence to predict whether or not proportional ORs may be expected. The latter would lead to some uncertainty on whether or not the study should be designed under the proportionality assumption. Nevertheless, we were able to show that sample-size estimates may be not very different whether or not proportionality is assumed. As is standard for a proportional HR model, whether the data fulfill the proportionality assumption will be checked post hoc, and in case it is not verified, the data would then be restructured appropriately.(5)
In conclusion, our study shows that ordinal outcomes are applicable in clinical studies of CC either to describe the course of the disease and assess predictors of outcomes or to design/analyze RCTs of new treatments. We demonstrate that the use of ordinal outcome may increase the statistical power in comparing different patient groups and allows for lower sample sizes for clinical trials, which is particularly important in the setting of CC where the expected baseline risk of decompensation is low.
Supplementary Material
Acknowledgments
Supported by a Yale Liver Center grant (P30 DK34989; to G.G.-T.).
Abbreviations:
- CC
compensated cirrhosis
- CI
confidence interval
- CSPH
clinically significant portal hypertension
- EV
esophageal varices
- GI
gastrointestinal
- HR
hazard ratio
- NSBB
nonselective beta-blocker
- OR
odds ratio
- PH
portal hypertension
- PLT
platelet count
- RCT
randomized clinical trial
- RR
relative risk
- VB
variceal bleeding
Footnotes
Potential conflict of interest: Dr. Abraldes consults for and received lecture fees from Gilead. He consults for Genfit and Pfizer and received lecture fees from Ferring.
Supporting Information
Additional Supporting Information may be found at onlinelibrary.wiley.com/doi/10.1002/hep.31070/suppinfo.
REFERENCES
- 1).D’Amico G, Garcia-Tsao G, Pagliaro L. Natural history and prognostic indicators of survival in cirrhosis: a systematic review of 118 studies. J Hepatol 2006;44:217–231. [DOI] [PubMed] [Google Scholar]
- 2).Abraldes JG, Trebicka J, Chalasani N, D’Amico G, Rockey D, Sha V, et al. Prioritization of therapeutic targets and trial design in cirrhotic portal hypertension. Hepatology 2019;69:1287–1299. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3).Villanueva C, Albillos A, Genesca J, Garcia-Pagan JC, Calleja JL, Aracil C, Bañares R, Morillas RM, Poca M, Peñas B, Augustin S, Abraldes JG, Alvarado E, Torres F, Bosch J. B-blockers to prevent decompensation of cirrhosis in patients with clinically significant portal hypertension (PREDESCI): a randomised, double-blind, placebo-controlled, multicentre trial. Lancet 2019;393:1597–1608. [DOI] [PubMed] [Google Scholar]
- 4).Groszmann RJ, Garcia-Tsao G, Bosch J, Grace ND, Burroughs AK, Planas R, et al. Beta-blockers to prevent gastroesophageal varices in patients with cirrhosis. N Engl J Med 2005;353:2254–2261. [DOI] [PubMed] [Google Scholar]
- 5).Scott SC, Goldberg MS, Mayo NE. Statistical assessment of ordinal outcomes in comparative studies. J Clin Epidemiol 1997;50:45–55. [DOI] [PubMed] [Google Scholar]
- 6).Murray GD, Barer D, Choi S, Fernandes H, Gregson B, Lees K, et al. Design and analysis of phase III trials with ordered outcome scales: the concept of the sliding dichotomy. J Neurotrauma 2005;22:511–517. [DOI] [PubMed] [Google Scholar]
- 7).Villanueva C, Albillos A, Genescà J, Abraldes J, Calleja JL, Aracil C, et al. Development of hyperdynamic circulation and response to β-blockers in compensated cirrhosis with portal hypertension. Hepatology 2016;63:197–206. [DOI] [PubMed] [Google Scholar]
- 8).Turco L, Garcia-Tsao G, Magnani I, Bianchini M, Costetti M, Caporali C, et al. Cardiopulmonary hemodynamics and C-reactive protein as prognostic indicators in compensated and decompensated cirrhosis. J Hepatol 2018;68:949–958. [DOI] [PubMed] [Google Scholar]
- 9).Bernardi M, Moreau R, Angeli P, Schnabl B, Arroyo V. Mechanisms of decompensation and organ failure in cirrhosis: from peripheral arterial vasodilatation to systemic inflammation hypothesis. J Hepatol 2015;63:1272–1284. [DOI] [PubMed] [Google Scholar]
- 10).D’Amico G, Morabito A, D’Amico M, Pasta L, Malizia G, Rebora P, et al. Clinical states of cirrhosis and competing risks. J Hepatol 2018;68:563–576. [DOI] [PubMed] [Google Scholar]
- 11).Roozenbeek B, Lingsma HF, Perel P, Edwards P, Roberts I, Murray GD, et al. The added value of ordinal analysis in clinical trials: an example in traumatic brain injury. Crit Care 2011;15:R127. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12).Agresti A. Analysis of Ordinal Categorical Data (Wiley Series in Probability and Statistics). Hoboken, NJ: John Wiley & Sons; 2010. [Google Scholar]
- 13).D’Amico G, Pasta L, Morabito A, D’Amico M, Caltagirone M, Malizia G, et al. Competing risks and prognostic stages in cirrhosis: a 25-year inception cohort study of 494 patients. Aliment Pharmacol Ther 2014;39:1180–1193. [DOI] [PubMed] [Google Scholar]
- 14).Quinn TJ, Dawson J, Walters M. Dr John Rankin; his life, legacy and the 50th anniversary of the Rankin Stroke Scale. Scott Med J 2008;53:44–47. [DOI] [PubMed] [Google Scholar]
- 15).Broderick JP, Adeoye O, Jordan EJ. Evolution of the Modified Rankin Scale and its use in future stroke trials. Stroke 2017;48:2007–2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16).Thomalla G, Simonsen CZ, Boutitie F, Andersen G, Berthezene Y, Cheng B, et al. MRI-guided thrombolysis for stroke with unknown time of onset. N Engl J Med 2018;379:611–622. [DOI] [PubMed] [Google Scholar]
- 17).Abraldes JG, Bureau C, Stefanescu H, Augustin S, Ney M, Blasco H, et al. Noninvasive tools and risk of clinically significant portal hypertension and varices in compensated cirrhosis: the “ANTICIPATE” study. Hepatology 2016;64:2173–2184. [DOI] [PubMed] [Google Scholar]
- 18).Augustin S, Millán L, González A, Martell M, Gelabert A, Segarra A, et al. Detection of early portal hypertension with routine data and liver stiffness in patients with asymptomatic liver disease: a prospective study. J Hepatol 2014;60:561–569. [DOI] [PubMed] [Google Scholar]
- 19).Harrell FE Jr., Margolis PA, Gove S, Mason KE, Mulholland EK, Lehmann D. Development of a clinical prediction model for an ordinal outcome: the World Health Organization Multicentre Study of Clinical Signs and Etiological Agents of Pneumonia, Sepsis and Meningitis in Young Infants. Stat Med 1998;17:909–944. [DOI] [PubMed] [Google Scholar]
- 20).Whitehead J. Sample size calculations for ordered categorical data. Stat Med 1993;12:2257–2271. [DOI] [PubMed] [Google Scholar]
- 21).Marubini E, Valsecchi MG. Analysing Survival Data From Clinical Trials and Observational Studies. Chichester, UK: John Wiley & Sons; 1997. [Google Scholar]
- 22).Ripoll C, Groszmann R, Garcia-Tsao G, Grace N, Burroughs A, Planas R, et al. Hepatic venous pressure gradient predicts clinical decompensation in patients with compensated cirrhosis. Gastroenterology 2007;133:481–488. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.




