Abstract
BACKGROUND:
Randomized controlled trials (RCTs) in children with heart disease are challenging and therefore infrequently performed. We sought to improve feasibility of perioperative RCTs for this patient cohort using data from a large, multi-center clinical registry. We evaluated potential enrollment and endpoint frequencies for various inclusion cohorts and developed a novel global rank trial endpoint. We then performed trial simulations to evaluate power gains with the global rank endpoint, and with use of planned covariate adjustment as an analytic strategy.
METHODS:
Data from the Society of Thoracic Surgery-Congenital Heart Surgery Database (STS-CHSD, 2011-2016) were used to support development of a consensus-based global rank endpoint and for trial simulations. For Monte Carlo trial simulations (n=50,000/outcome), we varied the odds of outcomes for treatment vs placebo and evaluated power based on the proportion of trial datasets with a significant outcome (p<0.05).
RESULTS:
The STS-CHSD study cohort included 35,967 infant index cardiopulmonary bypass operations from 103 STS-CHSD centers, including 11,411 (32%) neonatal cases, and 12,243 (34%) high complexity (STAT Mortality Category ≥4) cases. In trial simulations, study power was 21% for a mortality only endpoint, 47% for a morbidity and mortality composite, and 78% for the global rank endpoint. With covariate adjustment, power increased to 94%. Planned covariate adjustment was preferable to restricting to higher risk cohorts despite higher event rates in these cohorts.
CONCLUSION:
Trial simulations can inform trial design. Our findings, including the newly developed global rank endpoint, may be informative for future perioperative trials in children with heart disease.
INTRODUCTION
In children with congenital or acquired heart disease undergoing cardiac surgery, most drugs are used off label with limited data on safety or efficacy due to a paucity of randomized, controlled clinical trials (RCTs) (1). RCTs are expensive and resources are lacking to support RCTs in this patient population (2).
To improve trial feasibility, we have sought ways to reduce trial costs and improve trial efficiency. With support from the National Centers for Advancing Translational Sciences (NCATS), the Society of Thoracic Surgeons (STS) and the NCATS-funded Trial Innovation Network, we designed the STeroids to REduce Systemic inflammation after infant heart Surgery (STRESS, NCT03229538) trial. STRESS is a nested RCT being conducted within the existing infrastructure of the STS-CHSD. By leveraging registry infrastructure, we expect to reduce trial costs substantially (3). A separate but related advantage of the “trial within a registry” approach is the availability of existing registry data to facilitate trial planning to optimize efficiency, reduce costs and to enhance the likelihood of a successful trial. The FDA recently identified a priori trial modeling and simulation as a key strategy to improve trials for children with rare diseases such as congenital and acquired heart disease.(4)
Key challenges in conducting RCTs in children with heart disease, including the STRESS trial, include selection of the optimal trial inclusion cohort, endpoint selection and development of an optimized analytic strategy (5). These three concepts represent critical aspects of trial design and are the focus of our analyses. With respect to inclusion cohorts, children with heart disease represent a rare but highly heterogeneous patient population. Overly broad inclusion cohorts create challenges discerning treatment effect(s), while overly restrictive cohorts introduce feasibility challenges due to the rarity of the various diseases (5). When designing RCTs in these patients, investigators often must decide whether to include a less inclusive subset of higher risk children (e.g. neonates or highest complexity operations) where the prevalence of adverse outcomes (i.e. trial endpoints) is increased, or whether a broader inclusion cohort would be preferable due to the larger available patient population. Similarly endpoint selection introduces feasibility concerns when endpoints (e.g. operative mortality) occur infrequently, while more common endpoints (e.g. composites) are often less clinically meaningful. The global ranking (6,7) approach is a novel approach to endpoint development that potentially overcomes some of these issues. A global rank outcome uses a continuum of outcomes ranked from best to worst thereby allowing trial participants to be compared based on the worst outcome measure experienced. Benefits include improved discriminatory potential when compared to traditional “unranked” composites, the potential to combine binary as well as continuous outcome measures in a single composite and the potential to discern directionality of treatment effect based on the highest ranked outcome achieved (6,7). With respect to analytic strategies, planned covariate adjustment may be beneficial when excess patient heterogeneity results in imbalance in important covariates even after randomization (8,9).
Although there are potentially substantial trial design benefits associated with selection of the optimal inclusion cohort, use of a global rank endpoint and/or incorporation of planned covariate adjustment into the analytic strategy, all of these approaches need to be evaluated in the cohort of patients eligible for any given clinical trial. Therefore, using STS-CHSD data, we sought: 1) to evaluate various potential trial inclusion cohorts based upon disease prevalence and endpoint frequency; 2) to develop an STS-CHSD global rank endpoint, and then to validate this endpoint with Monte Carlo trial simulations, and 3) to evaluate the impact of planned covariate adjustment on trial power. Here we present our consensus-based global rank outcome measure and the results of our simulations. These analyses were performed to support the STRESS trial design, but may be relevant other potential trials in children with heart disease focused on post-operative outcomes.
METHODS
Data Source:
The STS-CHSD was used for this study. The database currently collects data from > 96% of all U.S. centers performing congenital heart surgery, including 98% of all operations.(10) Pre-operative, operative, and outcomes data are collected on all patients undergoing pediatric and congenital heart surgery at participating centers. Coding for this database is accomplished by clinicians and ancillary support staff using the International Pediatric and Congenital Cardiac Code and is entered into the contemporary version of the STS-CHS data collection form (version 3.22).(6) The Duke Clinical Research Institute serves as the data warehouse and analysis center for all of the STS National Databases. Evaluation of data quality includes the intrinsic verification of data, along with a formal process of in-person site visits and data audits conducted by a panel of independent data quality personnel and pediatric cardiac surgeons at approximately 10% of participating institutions each year. (6,10,11) This study was approved by the STS-CHS Database Access and Publications Committee and the Duke University institutional review board and was not considered human subjects research by the Duke University Institutional Review Boards in accordance with the Common Rule (45 CFR 46.102(f)).
Investigator consensus:
In considering optimal trial inclusion cohorts and outcome measures to evaluate in trial simulations, an internal investigator steering committee was developed including four pediatric cardiothoracic surgeons (DB, MLJ, JPJ, RDJ), five pediatric cardiologists (JSL, KDH, RT, PJK, HSB), a pediatric cardiac intensivist (CPH), and a biostatistician with expertise in congenital heart surgery outcomes evaluation (SO). Key steering committee decisions were summarized and presented for approval, first to an external steering committee consisting of experts in the field including a pediatric cardiothoracic surgeon, a pediatric cardiologist, a pediatric cardiac intensivist, and a parental advisor/patient advocate, and then to a data safety and monitoring board (DSMB) consisting of a pediatric cardiologist, a neonatal clinical trial specialist, and a biostatistician with expertise in congenital heart surgery.
Inclusion cohort:
More than half of all children’s heart surgeries are performed in neonates (< 30 days) and infants (< 1 year) and these younger patients have a higher risk of adverse outcomes when compared to older children (12). Moreover, in older children it may sometimes be reasonable to extrapolate efficacy from adult RCT data (e.g. related to perioperative drug use), however in younger patient populations this approach is more likely to be flawed due to unique differences in developmental physiology as well as drug metabolism and response (13,14). For these reasons we chose to focus on neonates and infants undergoing cardiopulmonary bypass (CPB) operations and exclude older children and adolescents. A key objective of trial simulations was to determine whether our perioperative steroid trial, and indeed RCTs in general, would be better served by focusing more specifically on the youngest (e.g. neonates, < 30 days old) children and/or those undergoing the most complex pediatric heart surgeries where the prevalence of adverse outcomes is increased, or whether a broader inclusion cohort would be preferable due to the larger available patient population. Therefore, for the purpose of trial simulations, a baseline analytic cohort including all infants (< 1 year) undergoing an index cardiovascular operation with CPB were potentially eligible for inclusion. Between 2011 and 2015, 46,862 index infant CPB operations from 119 centers were identified in the STS-CHSD. Based on recommendations of the internal steering committee, we excluded from the analytic cohort all infants with an adjusted gestational age < 37 weeks at the time of surgery (n=3,481), infants receiving preoperative mechanical circulatory support (n=237), infants undergoing operations not classifiable using the Society of Thoracic Surgeons-European Association for Cardio- Thoracic Surgery (STAT) Mortality Classification system (n=228) and infants undergoing isolated PDA closure with weight < 2.5kg (n=4). To ensure data accuracy, centers with >10% missing data for key demographic and outcomes variables (n=16 centers, 6,341 operations), and infants with missing data for key demographics or outcomes (n=604) were also excluded. The final study cohort included 35,967 infant CPB operations from 103 STS-CHSD centers.
Data Collection and Definitions:
Data collection included demographic information, baseline characteristics, preoperative risk factors as defined in the STS-CHSD, operative variables and outcomes data. Surgeon and center characteristics were also collected. Procedures were grouped using the STAT Mortality Scores and Categories. The STAT Mortality Score is an empirically-based index used to stratify pediatric heart operations based on their statistically estimated risk of in-hospital mortality and has been previously validated as the most accurate means of mortality risk stratification in pediatric heart surgery (15). The STAT Mortality Categories are a widely used grouping of procedure types into 5 risk levels based on a categorization of the STAT scores.
Development of a global rank endpoint:
The global rank approach (6,7,16,17) involves ranking components of a composite outcome in hierarchical fashion from worst to best. Participants in the trial then receive an outcome ranking based upon the most severe outcome experienced. The global rank approach can potentially overcome several limitations of traditional composite outcomes (7,17) but is only effective if outcomes included in the rank score are meaningfully representative of the treatment effect. To develop the STS-CHSD global rank outcome, we evaluated all outcomes and complication variables captured in the STS-CHSD. Eligible ranking variables were discussed and assigned ranks based on consensus of the investigative team and steering committee. Factors considered when evaluating potential inclusion included reliability of collection of the specific variable by the STS-CHSD, potential relationship between the component outcome and the post-operative systemic inflammatory response, and clinical implications of the component. Other potential primary and secondary outcome measures used for comparative analyses with the global rank endpoint were identified by the investigators, and approved by the steering committee and STRESS trial DSMB. Outcome measures identified for consideration included: 1) operative mortality (defined by STS-CHSD as in-hospital or after discharge but within 30 days of the operation); 2) post-operative length of stay; 3) prolonged post-operative length of stay (> 90 days); and 4) composite major morbidity and mortality.
Statistical analyses:
Data were summarized overall and stratified by age and STAT Mortality Category using standard descriptive statistics. Monte Carlo simulations were used to explore how the study's statistical power might vary for different study design options including the choice of primary endpoint, the selection of a more versus less restrictive study cohort, and the decision of whether and how to adjust for covariates in the primary endpoint analysis.
Selection of study cohort and primary endpoint:
Candidates for the choice of study cohort were infants age 0-12 months, neonates age 0-30 days, and the subset of infants undergoing high-risk operations as defined by STAT Category ≥4. Candidates for primary endpoint included operative mortality, a traditional binary composite endpoint consisting of operative mortality or a set of STS-defined major complications, and two versions of a global rank composite endpoint. The global rank approach combines multiple individual endpoints into a single number by first assigning a rank order to the various endpoints in the composite based on their relative severity and then assigning each patient a number equaling the rank of the most severe outcome that occurred. In the first global ranking endpoint, the possible values of the global rank endpoint were whole numbers between 0 and 97 where 97 was assigned to patients who experience the worst endpoint (operative mortality), categories 92 to 96 were assigned to patients who survive but experience STS-defined major complications ranked by severity, 91 was assigned to patients who survived free of STS-defined major complications but had a prolonged stay >90 days, and categories 0 to 91 were assigned to all other patients with a value equal to the patient's length of stay in days. The second global ranking endpoints was defined similarly but ignored length of stay.
Trial simulations to estimate study power for different inclusion cohorts and endpoints:
To estimate power for different design choices, we generated 40,000 simulated trial datasets for each inclusion cohort scenario (infants, neonates, infants STAT Category ≥4) with each trial having 1200 patients divided into two treatment groups. For each study design, power was estimated by performing the primary endpoint analysis in each trial dataset and counting the proportion of trial datasets in which the 95% confidence interval around the estimated treatment effect excluded the null hypothesis value. The analysis framework was a logistic regression model for the two binary endpoints (operative mortality; death or major complications) and a proportional odds model for the two global rank endpoints. The process for constructing a single simulated trial dataset was as follows. First, baseline covariates of patients in the trial were generated by randomly selecting 1200 patients from the STS exploratory cohort and recording their baseline covariate values. These became the covariates of the patients in the simulated trial. A different random sample of STS patient covariates was selected for each simulated trial. Second, a random number generator was used to assign each trial patient to steroids versus placebo with 50% probability. Third, a global rank endpoint value was simulated for each patient according to a probability distribution that differed according to the patient's simulated baseline covariates and treatment assignment based on an assumed underlying proportional odds statistical model i.e. the data generating model. Covariates in the data generating model were based on the STS congenital mortality model: age, weight, prior cardiothoracic operation, any noncardiac congenital anatomic abnormality, any chromosomal abnormality or syndrome, prematurity, preoperative mechanical circulatory support, shock persistent at the time of the operation, renal dysfunction or renal failure requiring dialysis (or both), mechanical ventilation to treat cardiorespiratory failure, preoperative neurological deficit, any other preoperative risk factor, and STAT Mortality Score. Numerical values of regression coefficients in the data generating model were chosen by first estimating a proportional odds model in the STS exploratory cohort and then treating these estimates as truth in the simulations. The coefficient for the treatment group variable was chosen to produce a treatment effect with magnitude corresponding to an odds ratio of 0.75 (Scenario #1), 0.70 (Scenario #2), or 0.65 (Scenario #3).
Trial simulations to assess power with covariate adjustment:
To assess possible power gains by adjusting for pre-randomization covariates in the trial’s primary analysis, and to determine whether power gains might be impacted by the number of covariates included in the model, the analysis of each simulated trial was performed with the following candidate adjustment strategies: 1) adjust for all 13 covariates included in the STS congenital mortality model, 2) adjust for a reduced set of covariates to include age, weight, prematurity and STAT Mortality Score (chosen based on known prognostic significance and ease of data collection), (3) do not adjust for covariates.
Sources of Funding:
This work was supported by grants from the National Centers for Advancing Translational Sciences (U01TR-001803-01, U24TR-001608-03) and from the Eunice Kennedy Shriver National Institute of Child Health and Human Development (U18FD-006298-02). The authors are solely responsible for the design and conduct of this study, all study analyses, the drafting and editing of the paper and its final contents
RESULTS
Cohort demographics and outcomes
Our STS-CHSD study cohort included 35,967 infant CPB operations from 103 STS-CHSD centers (Table 1). Neonatal cases accounted for 11,411 (32%) of all cases while STAT Mortality Category ≥ 4 cases accounted for 12,243 (34%).
Table 1.
Cohort Demographics and Case Mix
Infants (n=35,967) | Neonates (n= 11,411) | STAT ≥4 (n = 12,243) | |
---|---|---|---|
Age in days | 107 (12, 176) | 7 ,(4, 10) | 10 (5, 63) |
Weight in kg | 4.8 (3.5, 6.2) | 3.2 (2.9, 3.6) | 3.4 (3.0, 4.2) |
Race | |||
Caucasian | 23262/35189 (66%) | 7566/11169 (68%) | 7945/11994 (66%) |
Black/African American | 5299/35189 (15%) | 1462/11169 (13%) | 1744/11994 (15%) |
Asian | 1236/35189 (4%) | 340/11169 (3%) | 372/11994 (3%) |
American Indian / Alaskan Native | 273/35189 (0.8%) | 77/11169 (0.7%) | 84/11994 (0.7%) |
Native Hawaian/Pacific Islander | 159/35189 (0.5%) | 56/11169 (0.5%) | 56/11994 (0.5%) |
Other/multiple races | 4960/35189 (14%) | 1668/11169 (15%) | 1793/11994 (15%) |
Hispanic or Latino ethnicity | 6407/34100 (19%) | 2107/10832 (20%) | 2329/11641 (20%) |
Any preoperative factor | 12695/35884 (35%) | 4838/11383 (43%) | 5245/12210 (43%) |
Any chromosomal anomaly/syndrome | 10599/35904 (30%) | 2188/11391 (19%) | 2955/12216 (24%) |
Any non-cardiac anatomic abnormality | 1426/35870 (4%) | 366/11381 (3%) | 502/12207 (4%) |
Previous cardiovascular operation | 7289/35829 (20%) | 163/11351 (1%) | 1455/12183 (12%) |
Single ventricle diagnosis | 6854/35337 (19%) | 3016/11310 (27%) | 3585/12141 (30%) |
STAT 1 | 8456/35967 (24%) | 203/11411 (2%) | - |
STAT 2 | 8846/35967 (25%) | 669/11411 (6%) | - |
STAT 3 | 6422/35967 (18%) | 2146/11411 (19%) | - |
STAT 4 | 9007/35967 (25%) | 5514/11411 (48%) | 9007/12243 (73.6%) |
STAT 5 | 3236/35967 (9%) | 2879/11411 (25%) | 3236/12243 (26.4%) |
Cardiopulmonary bypass time, min (IQR) | 108 (75, 152) | 134 (93, 177) | 132 (90, 181) |
Development of the STS-CHSD global rank endpoint
Components of the STS-CHSD global rank score were ranked by consensus based on their perceived clinical relevance. Table 2 summarizes the rank order and rationale. Notably, the global rank endpoint ensures progression to the highest ranked endpoint. For example, a patient is only assigned the cardiac arrest endpoint if resuscitated without untoward consequences. A cardiac arrest leading to mechanical circulatory support, renal failure or neurologic deficit is ranked based on these higher level outcomes.
Table 2.
Global Rank endpoints and rationale
Endpoint | Global rank (rank hierarchy) |
Rationale |
---|---|---|
Post-operative mortality | 97 | - |
Heart transplant | 96 | - |
Renal failure with permanent dialysis | 95 | All variables represent major post-operative complications associated with long-term (e.g. persistent after hospital discharge) morbidity |
Neurological deficit persistent at discharge | ||
Respiratory failure requiring tracheostomy | ||
Post-op mechanical circulatory support | 94 | All complications require re-intervention and are typically associated with significantly worse outcomes. |
Unplanned cardiac reoperation | ||
Reoperation for bleeding | 93 | Similar to “94”, these complications require re-intervention but outcomes may not be as poor. |
Delayed sternal closure | ||
Unplanned interventional cardiac catheterization | ||
Cardiac arrest | 92 | Major complications where outcomes may still be excellent. |
Multi-system organ failure | ||
Prolonged post-op ventilator support > 7 days | ||
Renal failure with temporary dialysis/hemofiltration | ||
Post-operative length of stay ≥ 90 days | 91 | Rare for this outcome to occur in isolation but nonetheless the impact on families is important |
Post-operative length of stay | 90-1 | In the absence of any other complications, LOS is a meaningful measure of patient morbidity |
Marginal and cumulative prevalence of the various outcomes comprising the global rank composite are summarized in Table 3 for neonates only, for all infants and for the subset of infants undergoing highest complexity cases. Post-operative mortality occurred in 9.1% of all neonatal cases, 4.7% of infant cases and 9.9% of STAT ≥04 cases. Binary endpoints, including the binary “prolonged hospital LOS (> 90 days)” accounted for 36.2%, 22.1% and 38.9% of all outcomes for neonates, infants and STAT ≥4 cases, respectively with the remainder of outcomes defaulting to a continuous variable representing the actual duration of hospital LOS.
Table 3.
Cumulative and marginal prevalence table for the Global Rank endpoint
Infants, < 1yr N=35,967 |
Neonates, < 30 days N=11,411 |
STAT ≥4 cases, < 1yr N=12,243 |
|||||
---|---|---|---|---|---|---|---|
Outcome | Assigned Rank |
Marginal prevalence, N (%) |
Cumulative prevalence, N (%) |
Marginal prevalence, N (%) |
Cumulative prevalence, N (%) |
Marginal prevalence, N (%) |
Cumulative prevalence, N (%) |
Operative mortality | 97 | 1674/35967 (4.7%) |
1674/35967 (4.7%) |
1042/11411 (9.1%) |
1042/11411 (9.1%) |
1208/12243 (9.9%) |
1208/12243 (9.9%) |
Heart transplant (during hospitalization) | 96 | 313/35967 (0.9%) |
1987/35967 (5.5%) |
40/11411 (0.4%) |
1082/11411 (9.5%) |
293/12243 (2.4%) |
1501/12243 (12.3%) |
Renal failure with permanent dialysis, neurologic deficit persistent at discharge, or respiratory failure requiring tracheostomy | 95 | 446/35967 (1.2%) |
2433/35967 (6.8%) |
173/11411 (1.5%) |
1255/11411 (11.0%) |
221/12243 (1.8%) |
1722/12243 (14.1%) |
Post-operative mechanical circulatory support or unplanned cardiac reoperation (exclusive of reoperation for bleeding) | 94 | 1813/35967 (5.0%) |
4246/35967 (11.8%) |
933/11411 (8.2%) |
2188/11411 (19.2%) |
985/12243 (8.0%) |
2707/12243 (22.1%) |
Reoperation for bleeding, delayed sternal closure or post-op unplanned interventional cardiac catheterization | 93 | 1759/35967 (4.9%) |
6005/35967 (16.7%) |
895/11411 (7.8%) |
3083/11411 (27.0%) |
934/12243 (7.6%) |
3641/12243 (29.7%) |
Post-op cardiac arrest, multi-system organ failure, renal failure with temporary dialysis, or prolonged ventilator support (> 7 days) | 92 | 1815/35967 (5.0%) |
7820/35967 (21.7%) |
987/11411 (8.6%) |
4070/11411 (35.7%) |
1043/12243 (8.5%) |
4684/12243 (38.3%) |
Prolonged post-op length of stay (> 90 days, binary endpoint) | 91 | 141/35967 (0.4%) |
7961/35967 (22.1%) |
63/11411 (0.6%) |
4133/11411 (36.2%) |
82/12243 (0.7%) |
4766/12243 (38.9%) |
Post-op length of stay (assigned rank = length of stay in days) | 90-0 | 28006/35967 (77.9%) |
35967/35967 (100.0%) |
7278/11411 (63.8%) |
11411/11411 (100.0%) |
7477/12243 (61.1%) |
12243/12243 (100.0%) |
Impact of endpoint selection on trial power
Table 4 summarizes study power for various trial inclusion cohorts across a subset of five different outcome measures. Estimates assume 1:1 randomization and an odds ratio for the outcome varying from 0.65 to 0.75 when comparing placebo versus treatment. As expected, power for death and complications is always greater in higher risk cohorts (i.e. neonates and STAT ≥4 operations) where the event rate is higher, and when estimating a greater treatment effect. However, regardless of the inclusion cohort or treatment effect, a trial of 1200 neonates or infants remains vastly underpowered for a mortality endpoint. Similarly, a trial using a binary, unranked composite outcome is generally underpowered unless restricted to the highest risk cohorts and assuming a relatively large treatment effect. Ranking of binary variables in a global rank but without including post-operative length of stay (PLOS) results in small increases in power, varying from 1% to 4% depending on the cohort, outcome and treatment effect. However, with inclusion of post-operative length of stay (PLOS) in the global rank score, power increases substantially with power gains ranging from 7% to as much as 30% depending on the relative treatment effect and the study cohort.
Table 4.
Variability in power estimates depending on outcome measure
ODDS RATIO = 0.65 | Infants | Neonates | STAT ≥ 4 |
---|---|---|---|
Global rank, with PLOS | 91% | 96% | 97% |
Global rank, ignoring PLOS | 64% | 88% | 90% |
Binary composite based on global rank | 62% | 85% | 87% |
Operative mortality | 29% | 50% | 54% |
ODDS RATIO = 0.70 | Infants | Neonates | STAT ≥ 4 |
Global rank, with PLOS | 78% | 87% | 88% |
Global rank, ignoring PLOS | 48% | 73% | 76% |
Binary composite based on global rank | 47% | 70% | 72% |
Operative mortality | 21% | 37% | 40% |
ODDS RATIO = 0.75 | Infants | Neonates | STAT ≥ 4 |
Global rank, with PLOS | 60% | 70% | 71% |
Global rank, ignoring PLOS | 34% | 55% | 58% |
Binary composite based on global rank | 33% | 52% | 54% |
Operative mortality | 15% | 26% | 28% |
To evaluate the relative merits of more versus less restrictive inclusion cohorts, we reviewed the prevalence of cases outlined in Table 1 and used these numbers to project the likely feasible sample size for a trial restricted to the smaller cohorts. Based on the ratio of neonates to all infants, we repeated the simulations for neonates assuming a sample size of 400 and thereby compared power for a trial of 400 neonates to a trial of 1200 infants. Although event rates are approximately doubled in neonates for all of the higher level events (e.g. mortality or major morbidities, see Table 3), a neonatal trial of 400 subjects would still have lower power for mortality only (14% versus 21%), composite mortality and morbidity (29% versus 47%), and for the global rank endpoint (42% versus 78%) when compared to a more inclusive trial enrolling all infants, assuming a treatment effect odds ratio of 0.70.
Covariate adjustment to increase study power
Using trial simulations, we then evaluated power gains associated with adjustment for covariates included in the STS-CHSD risk model (Figure 1). Risk adjustment consistently improved study power with increases of up to 16%. Covariate adjustment had the greatest net benefit for endpoints with a higher frequency. For example, for an odds ratio of 0.7 there was only a 3% gain in power for the mortality endpoint (least frequent endpoint) versus a 16% power gain for the global rank endpoint including PLOS. We also evaluated the effects of a more parsimonious covariate adjustment strategy. Compared to full covariate adjustment, power when adjusting for only 4 covariates was lower by 2 to 3 percentage points.
Figure 1. Impact of planned covariate adjustment on study power.
* Based on a trial with 1200 infants and treatment effect with magnitude corresponding to an odds ratio of 0.70. Reduced covariate adjustment models include only age, prematurity and STAT Mortality Score as model covariates. Complete adjustment includes all model covariates included in the STS-CHSD congenital risk model
Discussion
In this analysis we describe the use of registry data, including Monte Carlo trial simulations, to evaluate various approaches to a perioperative trial in infants with congenital heart disease. Notable findings include development of an STS-CHSD global rank endpoint with trial simulations to validate this endpoint as a primary trial outcome, as well as simulations that demonstrate the substantial benefit of planned covariate adjustment as an analytic strategy. Trial simulations were invaluable in trial planning and resulted in key decisions including choice of the optimal inclusion cohort (all infants and all levels of case complexity), primary endpoint (the global ranking endpoint) and analytic strategy (incorporating covariate adjustment).
Power concerns in children with heart disease
The largest RCT ever conducted in children with heart disease was the CLARINET trial which enrolled 906 infants at 134 sites in Europe, Asia, North America, South America, and Africa and took almost four years to complete.(18) Other landmark pediatric cardiovascular trials such as the Pediatric Heart Network Single Ventricle Reconstruction (n=555 subjects enrolled) (19) and the Carvedilol for children and adolescents with heart failure trial (n=161 subjects enrolled) (20) have been significantly smaller. Compared to adult cardiovascular RCTs, these trial enrollment numbers are low. For example, the two pivotal adult perioperative corticosteroid trials, the DECS (21) and SIRS (22) trials, enrolled 4,494 and 7,507 subjects respectively. Such enrollment targets are simply infeasible in children with heart disease due to the rarity of the diseases and lack of funding to support these trials. To overcome these challenges it is necessary to consider unique trial approaches that maximize trial efficiency and therefore reduce costs. Our trial simulations were performed to evaluate several such strategies.
The global rank endpoint
Our global rank endpoint provides a potential solution to the cost and feasibility issues associated with enrolling large cohorts of infants with heart disease in RCTs. There are several advantages to a global ranking endpoint that improve study power and increase a trial’s discriminatory potential (6,7,16,17). First, a global rank score permits individual components of a composite to be ranked proportionate to their perceived clinical importance instead of considering all components equally. Death is generally considered a worse outcome than a successful cardiopulmonary resuscitation yet these events are considered as equals in a traditional composite outcome. Second, the global rank captures directionality of a treatment effect. For example, an unplanned cardiac reoperation such as surgical revision of a thrombosed shunt, may be life-saving but would be considered a poor outcome if included in a traditional composite outcome. With a global rank measure this favorable outcome counts as a “win” if the reoperation prevents the child from progressing to operative mortality. Alternatively, it counts as a “loss” if, for example, the child receives placebo and the trial intervention (in this case perioperative corticosteroids) might have prevented the shunt thrombosis and associated reoperation from occurring in the first place. Finally, the global rank approach permits combinations of binary clinical events (e.g. mortality, complications) and continuous variables (e.g. PLOS). As our simulations demonstrate, this significantly increases study power.
There are important considerations when developing a global rank endpoint. The endpoint should not place too much emphasis on less meaningful or potentially variable clinical outcomes like PLOS (17). PLOS can vary for non-clinical reasons including social considerations and/or variability in center-level approaches. Although some of these concerns can be addressed in the statistical analysis (e.g. incorporating center in the modeling approach), it is important to avoid an over-emphasis on proxy outcomes measures such as PLOS or a biomarker response. Once again, registry data are invaluable as they demonstrate the expected proportion of the various components of the composite. Finally interpretation of the global rank can be challenging.
Traditionally, the sum or mean of the ranks are displayed for each group to summarize the main findings but these approaches have been criticized as misleading or absent of clinical value. Recently, the probability index has been proposed as a more clinically intuitive estimate of treatment effect. This index represents the probability that a randomly selected patient from the investigational treatment has a superior response to a randomly selected patient from the control group and also permits an assessment of the heterogeneity of treatment effect (i.e. with 95% confidence intervals) (23).
Strategies to improve trial approaches
There are several potential strategies to improve feasibility of clinical trials in pediatric cardiology. The nested registry design is one approach (24) and is the approach that we are using for the STRESS trial. We project cost savings due to the use of existing registry infrastructure (e.g. limited data infrastructure costs), and also due to the relatively simpler/pragmatic study design. A drawback of the nested registry design is the need to tailor the trial to fit the existing registry infrastructure and available data. For these reasons it is helpful to leverage available registry data along with trial simulations to ensure the most optimized trial design. Our simulations demonstrated that it would be cost prohibitive to enroll sufficient numbers of infants in a trial to evaluate a primary mortality endpoint, and power estimates were at best borderline when evaluating a traditional composite morbidity and mortality endpoint. Endpoint selection often requires a tradeoff between endpoint event rates and their clinical utility. Traditionally pediatric cardiovascular trials have used intermediate or surrogate endpoints and/or biomarkers. These outcome measures occur more frequently making it more feasible to detect a meaningful treatment effect with lower enrollment, but it is important to ensure that these endpoints are well validated and truly reflect meaningful treatment outcomes. Graham and colleagues have previously evaluated the effect of various steroid dosing regimens (intraoperative alone versus pre- + intra-operative steroids) on low cardiac output syndrome and post-operative inflammatory markers only later to determine that these outcomes were poorly associated with more meaningful clinical outcomes (25-27). The same group recently completed a follow up study evaluating whether placebo versus intraoperative steroids reduce the prevalence of a composite mortality + morbidity outcome that includes biomarkers such as liver function tests and lactic acid levels (28). These trials have shaped our thinking with regard to the STRESS trial design and this model, with a traditional trial approach using composite/biomarker focused endpoints preceding or occurring in parallel with a pragmatically designed RCT focused on “harder” outcome measures, represents a meaningful mechanism for more feasibly completing RCTs in children with heart disease.
Optimal inclusion cohorts
Restricting a trial to neonates or to higher complexity cases can increase event rates and therefore potentially improve study power. Although improved power can reduce costs, more restrictive cohorts introduce feasibility challenges as there are fewer available subjects for enrollment. More restrictive cohorts also reduce generalizability but have the benefit of reducing the impact of confounders which are generally less prevalent in a more homogeneous (i.e. restricted) study cohort. Based on our trial simulations, we elected for a more inclusive trial cohort due to higher power for all endpoints when comparing equivalent enrollment targets (i.e. based on the relative frequency of cases). Power was most notably improved for the global rank endpoint, in this instance because power is affected by hospital LOS which is obviously an endpoint that can be measured in all patients. Our steering committee felt that confounding could be addressed with planned covariate adjustment (see next section) and that the benefits of improved generalizability were important. Notably, however, these considerations assume an equivalent treatment effect regardless of age which may not be the case for steroids and/or for other drug therapies.
Planned covariate adjustment in clinical trials
Our trial simulations build a strong case for planned covariate adjustment for known prognostic factors. Randomization ensures that baseline risk will tend to be balanced across treatment groups but individual differences in baseline risk can still obscure treatment effects even when they are balanced perfectly. They do this by adding noise to the outcome variable which dilutes the treatment effect's size relative to the total variation. Trials in congenital heart disease are especially susceptible to dilution of the treatment effect size-to-noise ratio because there is wide heterogeneity in patient diagnoses and significant variability in outcomes depending on case complexity, pre-operative risk factors and center-level expertise. Prior simulation studies have retrospectively analyzed various RCTs and demonstrated the potential power gains associated with covariate adjustment (9,29-32). Expert and regulatory consensus is that optimal covariate adjustment should be pre-specified but ideally based upon the known prognostic value of various covariates (12,33). Once again our approach highlights the value of the registry-based design as covariates can be identified and screened a-priori using existing registry variables identical to those that will be used in the final trial.
Limitations
Our trial simulations assume a baseline treatment effect and, although the magnitude of effect can be varied, it is never possible to account for myriad subset of conditions that might impact the treatment effect. Therefore simulations can never be used in place of a true RCT, and rather are a useful mechanism for predicting the appropriate trial design. An advantage of this approach is that the simulation data can be generalizable to other trials assuming a similar magnitude of treatment response. It should also be noted that simulations inherently rely on retrospectively collected data with all of the known concerns regarding data quality and/or missing data.
Conclusions and rationale for final trial decisions
After consideration of the data, the STRESS trial steering committee and advisory board unanimously agreed that the optimal trial cohort should include the full spectrum of case complexity and should focus on all infants rather than a subset of younger or higher risk patients. Rationale for this decision included: 1) limited generalizability of a more restrictive trial cohort; 2) high event rates across the spectrum of ages and case complexity; 3) feasibility concerns with enrolling more restrictive cohorts; and 4) data demonstrating the benefit of including adjustment for age and case complexity in the statistical analysis plan. However, the steering committee also elected to target a minimum number (n=400) of neonatal cases, concerned that enrolling too few neonates might compromise endpoint event rates. The steering committee and advisory board also agreed that the global rank endpoint (including LOS) provided the optimal combination of “hard” clinical outcomes and optimized power. The relative distribution of outcomes across the various components of the rank outcome was considered appropriate to meet the trial objectives. Finally, the steering committee and advisory board agreed that covariate adjustment should be included in the analytic plan. In summary, trial simulations facilitated several major trial design decisions. Moreover, our data and our newly developed global rank endpoint, may be informative for future trials in children with heart disease, for example other trials focused on post-operative outcomes. We suggest that when an appropriately representative dataset is available, simulations can improve trial design, enhance efficiency and reduce costs.
HIGHLIGHTS.
A global rank endpoint including mortality, 13 major morbidities and post-operative length of stay, was developed for use in perioperative trials in neonates and infants with heart disease.
Trial simulations demonstrated sunbstantial power gains when using the global rank endpoint compared to mortality or a more traditional mortality + morbidity composite
Trial simulations demonstrated further power gains with planned covariate adjustment
Acknowledgments
Sources of Funding
The STRESS Trial is supported by grants from the National Centers for Advancing Translational Sciences (U01TR-001803-01, U24TR-001608-03) and from the Eunice Kennedy Shriver National Institute of Child Health and Human Development (U18FD-006298-02).
Footnotes
Disclosures
Drs. Hill, Baldwin, Bichel, Ellis, Jeffrey Jacobs, Marshall Jacobs, Kannankeril, O’Brien, and Li receive support from the National Centers for Advancing Translational Sciences for their work in pediatric drug development (U01TR-001803-01). Dr. Hornik receives salary support for research from the Eunice Kennedy Shriver National Institute of Child Health and Human Development (grant No. 1K23HD090239) and the US government and industry for his work in pediatric and neonatal clinical pharmacology (government contract No. HHSN267200700051C). The content in this manuscript is solely the responsibility of the authors
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
References
- 1.Pasquali SK, Hall M, Slonim AD et al. Off-label use of cardiovascular medications in children hospitalized with congenital and acquired heart disease. Circ Cardiovasc Qual Outcomes 2008;1:74–83. [DOI] [PubMed] [Google Scholar]
- 2.Hill KD, Chiswell K, Califf RM, Pearson G, Li JS. Characteristics of pediatric cardiovascular clinical trials registered on ClinicalTrials.gov. Am Heart J 2014;167:921–9 e2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Lauer MS, D'Agostino RB Sr. The randomized registry trial--the next disruptive technology in clinical research? N Engl J Med 2013;369:1579–81. [DOI] [PubMed] [Google Scholar]
- 4.Jacobs JP, Mayer JE Jr.,Mavroudis C et al. The Society of Thoracic Surgeons Congenital Heart Surgery Database: 2017 Update on Outcomes and Quality. Ann Thorac Surg 2017;103:699–709. [DOI] [PubMed] [Google Scholar]
- 5.Torok RD, Li JS, Kannankeril PJ et al. Recommendations to Enhance Pediatric Cardiovascular Drug Development: Report of a Multi-Stakeholder Think Tank. J Am Heart Assoc 2018;7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Felker GM, Anstrom KJ, Rogers JG. A global ranking approach to end points in trials of mechanical circulatory support devices. J Card Fail 2008;14:368–72. [DOI] [PubMed] [Google Scholar]
- 7.Felker GM, Maisel AS. A global rank end point for clinical trials in acute heart failure. Circ Heart Fail 2010;3:643–6. [DOI] [PubMed] [Google Scholar]
- 8.Assmann SF, Pocock SJ, Enos LE, Kasten LE. Subgroup analysis and other (mis)uses of baseline data in clinical trials. Lancet 2000;355:1064–9. [DOI] [PubMed] [Google Scholar]
- 9.Kahan BC, Jairath V, Dore CJ, Morris TP. The risks and rewards of covariate adjustment in randomized trials: an assessment of 12 outcomes from 8 studies. Trials 2014;15:139. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Morales DL, Khan MS, Turek JW et al. Report of the 2015 Society of Thoracic Surgeons Congenital Heart Surgery Practice Survey. The Annals of thoracic surgery 2017;103:622–628. [DOI] [PubMed] [Google Scholar]
- 11.Clarke DR, Breen LS, Jacobs ML et al. Verification of data in congenital cardiac surgery. Cardiology in the young 2008;18 Suppl 2:177–87. [DOI] [PubMed] [Google Scholar]
- 12.Jacobs JP, Mayer JE Jr., Mavroudis C et al. The Society of Thoracic Surgeons Congenital Heart Surgery Database: 2016 Update on Outcomes and Quality. The Annals of thoracic surgery 2016;101:850–62. [DOI] [PubMed] [Google Scholar]
- 13.Hill KD, Kannankeril PJ. Perioperative Corticosteroids in Children Undergoing Congenital Heart Surgery: Five Decades of Clinical Equipoise. World J Pediatr Congenit Heart Surg 2018;9:294–296. [DOI] [PubMed] [Google Scholar]
- 14.Dunne J, Rodriguez WJ, Murphy MD et al. Extrapolation of adult data and other data in pediatric drug-development programs. Pediatrics 2011; 128:e1242–9. [DOI] [PubMed] [Google Scholar]
- 15.Jacobs JP, Jacobs ML, Maruszewski B et al. Initial application in the EACTS and STS Congenital Heart Surgery Databases of an empirically derived methodology of complexity adjustment to evaluate surgical case mix and results. Eur J Cardiothorac Surg 2012;42:775–9; discussion 779-80. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Califf RM, Harrelson-Woodlief L, Topol EJ. Left ventricular ejection fraction may not be useful as an end point of thrombolytic therapy comparative trials. Circulation 1990;82:1847–53. [DOI] [PubMed] [Google Scholar]
- 17.Subherwal S, Anstrom KJ, Jones WS et al. Use of alternative methodologies for evaluation of composite end points in trials of therapies for critical limb ischemia. Am Heart J 2012;164:277–84. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Wessel DL, Berger F, Li JS et al. Clopidogrel in infants with systemic-to-pulmonary-artery shunts. N Engl J Med 2013;368:2377–84. [DOI] [PubMed] [Google Scholar]
- 19.Ohye RG, Sleeper LA, Mahony L et al. Comparison of shunt types in the Norwood procedure for single-ventricle lesions. N Engl J Med 2010;362:1980–92. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Shaddy RE, Boucek MM, Hsu DT et al. Carvedilol for children and adolescents with heart failure: a randomized controlled trial. JAMA 2007;298:1171–9. [DOI] [PubMed] [Google Scholar]
- 21.Dieleman JM, Nierich AP, Rosseel PM et al. Intraoperative high-dose dexamethasone for cardiac surgery: a randomized controlled trial. JAMA 2012;308:1761–7. [DOI] [PubMed] [Google Scholar]
- 22.Whitlock RP, Devereaux PJ, Teoh KH et al. Methylprednisolone in patients undergoing cardiopulmonary bypass (SIRS): a randomised, double-blind, placebo-controlled trial. Lancet 2015;386:1243–53. [DOI] [PubMed] [Google Scholar]
- 23.Brown PM, Ezekowitz JA. Composite End Points in Clinical Trials of Heart Failure Therapy: How Do We Measure the Effect Size? Circ Heart Fail 2017;10. [DOI] [PubMed] [Google Scholar]
- 24.Lindberg L, Forsell C, Jogi P, Olsson AK. Effects of dexamethasone on clinical course, C-reactive protein, S100B protein and von Willebrand factor antigen after paediatric cardiac surgery. Br J Anaesth 2003;90:728–32. [DOI] [PubMed] [Google Scholar]
- 25.Graham EM, Atz AM, Butts RJ et al. Standardized preoperative corticosteroid treatment in neonates undergoing cardiac surgery: results from a randomized trial. The Journal of thoracic and cardiovascular surgery 2011;142:1523–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Graham EM, Atz AM, McHugh KE et al. Preoperative steroid treatment does not improve markers of inflammation after cardiac surgery in neonates: results from a randomized trial. The Journal of thoracic and cardiovascular surgery 2014;147:902–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Butts RJ, Scheurer MA, Atz AM et al. Comparison of maximum vasoactive inotropic score and low cardiac output syndrome as markers of early postoperative outcomes after neonatal cardiac surgery. Pediatr Cardiol 2012;33:633–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Graham EM, Martin RH, Buckley JR et al. Corticosteroid Therapy in Neonates Undergoing Cardiopulmonary Bypass: Randomized Controlled Trial. J Am Coll Cardiol 2019;74:659–668. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Hauck WW, Anderson S, Marcus SM. Should we adjust for covariates in nonlinear regression analyses of randomized trials? Control Clin Trials 1998;19:249–56. [DOI] [PubMed] [Google Scholar]
- 30.Hernandez AV, Steyerberg EW, Habbema JD. Covariate adjustment in randomized controlled trials with dichotomous outcomes increases statistical power and reduces sample size requirements. J Clin Epidemiol 2004;57:454–60. [DOI] [PubMed] [Google Scholar]
- 31.Negassa A, Hanley JA. The effect of omitted covariates on confidence interval and study power in binary outcome analysis: a simulation study. Contemp Clin Trials 2007;28:242–8. [DOI] [PubMed] [Google Scholar]
- 32.Turner EL, Perel P, Clayton T et al. Covariate adjustment increased power in randomized controlled trials: an example in traumatic brain injury. J Clin Epidemiol 2012;65:474–81. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Rao SV, Hess CN, Barham B et al. A registry-based randomized trial comparing radial and femoral approaches in women undergoing percutaneous coronary intervention: the SAFE-PCI for Women (Study of Access Site for Enhancement of PCI for Women) trial. JACC Cardiovasc Interv 2014;7:857–67. [DOI] [PubMed] [Google Scholar]