A data-zone scoring system to assess the generalizability of clinical trial results to individual patients

Luke J Laffin; Stephanie A Besser; Francis J Alenghat

doi:10.1177/2047487318815967

. Author manuscript; available in PMC: 2020 Apr 1.

Published in final edited form as: Eur J Prev Cardiol. 2018 Nov 26;26(6):569–575. doi: 10.1177/2047487318815967

A data-zone scoring system to assess the generalizability of clinical trial results to individual patients

Luke J Laffin ¹, Stephanie A Besser ², Francis J Alenghat ²

PMCID: PMC6459598 NIHMSID: NIHMS1019791 PMID: 30477321

Abstract

Introduction:

Evaluating the applicability of a clinical trial to a specific patient is difficult. A novel framework, the Trial Score, was created to quantify the generalizability of a trial’s result based on participants’ baseline characteristics and not on the trial’s inclusion and exclusion criteria.

Methods:

For each Systolic Blood Pressure Intervention Trial (SPRINT) participant, the Euclidean distance in six-dimensional space from the theoretical “average” participant was calculated to produce an individual Trial Score that incorporates multiple distinct continuous-variable baseline characteristics. We prospectively defined the “data-rich,” “data-limited,” and “data-free” zones as Trial Scores <90th percentile, the 90th–97.5th percentile, and >97.5th percentile, respectively. Trial Scores were then calculated for National Health and Nutrition Examination Survey participants to map data zones of the general population. Individual participant data from the Action to Control Cardiovascular Risk in Diabetes blood pressure trial (ACCORD-BP) was used to test if participants further from the average SPRINT participant behave differently than the overall SPRINT results.

Results:

The National Health and Nutrition Examination Survey cohort and the ACCORD-BP trial demonstrate large percentages of participants in SPRINT’s data-free and data-limited zones. Time-to-event rates seen with intensive and standard blood pressure control in SPRINT were the same as ACCORD-BP participants within SPRINT’s data-rich zone (hazard ratio 0.97, p = 0.84 and hazard ratio 0.95, p = 0.70). However, these rates were significantly different than those of ACCORD-BP participants outside SPRINT’s data-rich zone (hazard ratio 0.64, p < 0.01 and hazard ratio 0.77, p < 0.01).

Conclusions:

ACCORD-BP participants with SPRINT Trial Scores in the 90th percentile or below have similar event rates to SPRINT participants in both the intensive and standard blood pressure groups. Quantifying the difference between an individual patient and the average clinical trial participant holds promise as a tool to more precisely determine applicability of a specific trial to individual patients.

Keywords: Clinical trials, blood pressure, hypertension

Introduction

Evaluating the applicability of a clinical trial to a specific patient can be difficult. Using only inclusion and exclusion criteria does not account for the fact that study participants are heterogeneous across these criteria and heterogeneous with respect to their baseline characteristics. Thus, it is possible for a patient to fit trial parameters yet be poorly represented by the majority of trial participants. Conversely, patients may be excluded based on a parameter, yet still closely resemble the trial population in other respects. Further, most clinicians find it difficult to recall all inclusion and exclusion criteria and may apply results more broadly based on a headline result. The Systolic Blood Pressure Intervention Trial (SPRINT) generated one such headline result; a systolic blood pressure (SBP) of 120 mm Hg is an optimal goal for patients with hypertension and elevated cardiovascular risk.¹ Age parameters, other inclusion criteria, and the exclusion of diabetics cloud SPRINT’s broad applicability. Therefore, it is critical to understand which patients are well-represented and thus apt to behave similarly to the population studied in SPRINT.

Methods

This is a secondary analysis of SPRINT and Action to Control Cardiovascular Risk in Diabetes blood pressure trial (ACCORD-BP) data, which is provided in de-identified fashion by the National Heart, Lung, and Blood Institute (NHLBI) via the Biolincc data repository (for more information, see https://biolincc.nhlbi.nih.gov). National Health and Nutrition Examination Survey (NHANES) data was obtained from National Center for Health Statistics at the Center for Disease Control and Prevention. The study was deemed exempt by the University of Chicago Institutional Review Board and the data was obtained after a signed data use agreement. Statistical analysis, data synthesis, and model creation were performed between February 2017–April 2018.

A conceptual framework was created to define and succinctly quantify the applicability of a clinical trial’s result to an individual patient based on participants’ baseline characteristics, independent of inclusion and exclusion criteria. The framework is represented in Figure 1(a) and demonstrates the difference between an individual patient and the theoretical average participant enrolled in a trial. This difference was termed the Trial Score. Data zones of Trial Scores are depicted in Figure 1(b). We prospectively defined the “data-rich” zone as Trial Scores less than the 90th percentile, “data-limited” as those between the 90th and the 97.5th percentile, and “data-free” as those greater than the 97.5th percentile. The Trial Score data zone percentile boundaries were chosen so as to encompass the majority of participants enrolled in SPRINT, and to be easily reproducible when data zones are calculated for subsequent clinical trials.

Figure 1. — Conceptual framework for a *Trial Score*.

(a) Inclusion criteria of a trial are represented by the rectangular box. The theoretical “average” trial participant is found centrally and has a *Trial Score,* based on multiple baseline characteristics, of zero. Individual patients are represented by dots. If patients meet all criteria for the trial they would be found, like Patient B, inside the box. This would be the case for all trial participants. If not, they would be found outside, like Patient A. The *Trial Score* is the difference between the average trial participant and the patient. This example illustrates that even though Patient A does not meet all trial criteria and Patient B does, *Trial Score* A is less than *Trial Score* B. This suggests that Patient A is more similar to the average trial participant than Patient B. (b) The overlay of color represents data zones defined by percentiles of *Trial Scores.* This example illustrates that even though Patient A does not meet all trial criteria and Patient B does, Patient A is in a data-rich zone for the trial whereas Patient B is not. For the Systolic Blood Pressure Intervention Trial (SPRINT), we prospectively defined the “data-rich” zone as any *Trial Score* less than the 90th percentile. The “data-limited” zone was defined as a *Trial Score* between the 90th and 97.5th percentile. The “data-free” zone was defined as a *Trial Score* greater than the 97.5th percentile.

Calculating SPRINT Trial Scores

For each SPRINT participant (n = 9245), a standard z-score,

Z = (X - μ) ∕ σ

was computed for six distinct baseline, normally distributed, continuous variables: age, SBP, fasting serum glucose (Glu), non-high-density lipoprotein (HDL) cholesterol, serum creatinine (Cr), and body mass index (BMI). Multivariable logistic regression was performed using the above six independent variables (age, SBP, Glu, non-HDL, Cr, and BMI). The dependent variable was the occurrence of a major adverse cardiovascular event, which was SPRINT’s primary outcome. A major adverse cardiovascular event, as defined in SPRINT, includes myocardial infarction, acute coronary syndrome not resulting in myocardial infarction, stroke, acute decompensated heart failure, or death from cardiovascular causes. The calculated odds ratio (OR) for each independent variable was used to weight each z-score (OR_age = 1.46; OR_SBP = 1.11; OR_BMI = 1.02; OR_NonHDL = 1.21; OR_Cr = 1.33; OR_Glu = 0.93). For each SPRINT participant, the Euclidean distance from the theoretical “average” participant in six-dimensional space was calculated, using the weighted z-scores, to produce a Trial Score. Mathematically, this is computed as:

T r i a l S c o r e = (((Z_{age} * {OR}_{age})^{2} + (Z_{SBP} * {OR}_{SBP})^{2} + (Z_{BMI} * {OR}_{BMI})^{2} + (Z_{NonHDL} * {OR}_{NonHDL})^{2} + (Z_{Cr} * {OR}_{Cr})^{2} + (Z_{Glu} * {OR}_{Glu})^{2}) ∕ 6)^{0.5}

To map this metric onto a population at large, SPRINT Trial Scores were also calculated for each individual participant in NHANES from 2007–2014, age 35 years or older, with a mean SBP greater than 130mm Hg and glycated hemoglobin (HbA1c) less than 7%. These criteria were chosen because a SBP above 130 mm Hg is now defined as “hypertension” based on the most recent American College of Cardiology/American Heart Association Blood Pressure Guidelines (ACC/AHA) and SPRINT did not enroll patients with diabetes.²

Assessing validity of the SPRINT Trial Score

We hypothesized that the further a patient is from the average trial participant, the more likely he or she is to have outcomes different than the trial’s overall result. More specifically, individuals with Trial Scores in the data-limited and data-free zones behave differently than the overall SPRINT trial population.

Using individual participant baseline characteristics from the ACCORD-BP trial (a “negative” trial of intensive versus standard blood pressure control in diabetic patients),³ a SPRINT Trial Score was calculated for each ACCORD-BP participant. The time to the first occurrence of a major adverse cardiovascular event seen in SPRINT participants was compared to the time to the first occurrence of a major adverse cardiovascular event seen in ACCORD-BP participants, within and outside the data-rich zone, using Cox proportional-hazards regression. Notably, heart failure events, a secondary endpoint in ACCORD-BP, were included when analyzing major adverse cardiovascular events in the ACCORD-BP population, in order to fully align with the primary composite outcome in SPRINT.

Results

A theoretical SPRINT participant, who is exactly average for all six baseline characteristics, has by definition a Trial Score = 0, whereas the maximum Trial Score amongst SPRINT participants was 5.45. Across the distribution of Trial Scores for SPRINT participants, 1.60 and 2.13 represent the data zone boundaries of the 90th and 97.5th percentile, respectively. These values act as the borders between “data-rich,” “data-limited,” and “data-free” zones. Figure 2(a) demonstrates the distribution of Trial Scores amongst SPRINT participants.

Figure 2. — *Trial Score* distribution.

(a) Histogram demonstrating the distribution of *Trial Scores* amongst Systolic Blood Pressure Intervention Trial (SPRINT) participants. A *Trial Score* of zero represents the “average” participant. The maximum calculated *Trial Score* was 5.45. The green overlay denotes all subjects with a *Trial Score* in the “Data-Rich” zone (<90th percentile), yellow denotes subjects in the “data-limited” zone (between the 90th and 97.5th percentile), red denotes subjects with *Trial Scores* in the “data-free” zone (>97.5th percentile). (b) 2007–2014 National Health and Nutrition Examination Survey (NHANES) participants, grouped by sex and race, with percentage distribution across the SPRINT trial’s data-rich, data-limited, and data-free zones.

Analysis of the 5370 NHANES participants’ SPRINT Trial Scores reveals that these individuals differ from SPRINT participants with fewer in SPRINT’s data-rich zone, 19% in its data-limited zone, and 9% in its data-free zone. The NHANES population mapped onto SPRINT data zones shows a landscape of applicability by race and sex (Figure 2(b)).

The participants of ACCORD-BP, by their calculated SPRINT Trial Scores, differed greatly from the population enrolled in SPRINT. Only 28% were found in SPRINT’s data-rich zone. As displayed in Figure 3(a), the time-to-event rate seen with intensive blood pressure control in SPRINT was the same as the rate for intensive blood pressure control in ACCORD-BP participants within SPRINT’s data-rich zone (hazard ratio of 0.97 for SPRINT vs this ACCORD-BP subset; 95% confidence interval (CI) 0.69–1.35; p = 0.84). Similarly, event rates were no different when comparing those in the data-rich zone treated to a standard blood pressure goal (Figure 3(b), hazard ratio 0.95; 95% CI 0.72–1.25; p = 0.70).

Figure 3. — Time-to-event curves for Systolic Blood Pressure Intervention Trial (SPRINT) and Action to Control Cardiovascular Risk in Diabetes blood pressure trial (ACCORD-BP).

Shown are the cumulative hazards for SPRINT’s primary outcome. (a) Participants treated with an intensive blood pressure (BP) target in both SPRINT and ACCORD-BP (b) Participants treated with a standard BP target. ACCORD-BP participants are grouped by SPRINT *Trial Score* into the data-rich zone (< 1.60) or data-limited/free zone (≥1.60). ACCORD-BP participants in the data-rich zone behave similarly to SPRINT participants with either intensive or standard BP targets, whereas ACCORD-BP participants in the data-limited/free have significantly different outcomes than SPRINT participants. CI: confidence interval.

However the event rates of ACCORD-BP participants outside SPRINT’s data-rich zone (the majority of ACCORD-BP) were significantly different from those seen in SPRINT, in both the intensive (hazard ratio 0.64 for SPRINT vs this ACCORD-BP subset; 95% CI 0.52–0.78; p < 0.01) and standard groups (hazard ratio 0.77; 95% CI 0.64–0.93; p < 0.01).

A visual aid was created to help clinicians understand which parameters are most similar or dissimilar between an encountered patient and the average SPRINT participant (Figure 4). For example, a 49-year-old man with coronary artery disease and normal renal function is within the data-rich zone despite being excluded from SPRINT for age, whereas a 75-year-old man with chronic kidney disease is fully within the SPRINT inclusion and exclusion parameters yet falls in the data-limited zone.

Figure 4. — Visual aid for clinicians.

Based on the z-scores for six variables. The 49-year-old patient with characteristics depicted in blue would have been excluded from Systolic Blood Pressure Intervention Trial (SPRINT) based on age, but is very similar to the trial population and is found in the data-rich zone. The 75-year-old (purple) meets SPRINT inclusion criteria, but is not well represented by the subjects enrolled in SPRINT. This patient is found in the data-limited zone.

Chol: cholesterol; HDL: high-density lipoprotein: SBP: systolic blood pressure.

Discussion

Internal validity is a reason why randomized control trials (RCTs) are performed and preferred over observational studies. However, external validity (and ultimately generalizability) of RCT results presents a cloudier picture.⁴ There are numerous factors that one must consider when choosing to apply a trial result to an individual patient. Relying solely on the trial’s inclusion and exclusion criteria ignores the fact that participants are never uniformly distributed across those criteria or across baseline demographics.

Attempts have been made to demonstrate that generalization of RCT results may be assisted by statistical techniques that adjust for observed differences between an experimental subject pool and a target population.⁵ However, these predominantly aim to account for the difference between sample and population average treatment effects. For example, recent work addresses the concept of generalizability of RCT results in heart failure.⁶ Unfortunately these generalization methods do not address the variability amongst individuals within a trial and do not quantify the applicability of a trial’s result to an individual patient. With movement toward physician quality metrics, more precisely quantifying the applicability of RCT outcomes to individual patients could be an enhancement over rote methods currently in use.

The Trial Score framework is considerably different from previously published generalizability models because it uses individual trial participant baseline characteristics to more accurately reflect the distribution of participants around the theoretical “average” participant. Although looking at a clinical trial’s average baseline characteristics (oftentimes presented as Table 1 of a published article) is how clinicians may judge the applicability of a trial to their patient, we demonstrate that in fact no patients in SPRINT are exactly the average patient (which would equal a Trial Score of zero.) The Trial Score allows us to quantify the distance of any individual patient from the average trial participant. This method has allowed us to compare trial participants to a population at large (as in Figure 2(b)), an alternative trial population (as in Figure 3), or to specific individual patients encountered in practice (as in Figure 4).

The SPRINT trial is an ideal study in which to test the Trial Score framework. It is a practice-changing trial for patients with hypertension, and although others have judged the generalizability of SPRINT based solely on inclusion and exclusion criteria,⁷ those generalizability models may mischaracterize the population that could benefit from a more intensive SBP target. Additionally, the 2017 AHA/ACC hypertension guidelines use different, broader parameters than SPRINT to identify patients for anti-hypertensive treatment,² as do the European Society of Cardiology and European Society of Hypertension 2018 arterial hypertension guidelines.⁸ However, the parameters and scoring systems used in these guidelines do not provide a truly quantitative framework for determining outliers that may not look or respond like trial participants, and may not be suitable in an at-large population.^9,10 Using ACCORD-BP trial data allowed for validation of a SPRINT Trial Score, because it studied a different patient population, yet captured the same clinical end-points and studied the same intervention (an office SBP goal of 120 mm Hg versus 140 mm Hg.)

Limitations

The Trial Score, as currently calculated, does not account for certain categorical variables that represent the baseline demographics of a trial. This should not present a generalizability problem when a characteristic is seen in approximately half the cohort, such as the ratio of men to women in SPRINT (approximately 1:1). However, there is the potential to alter generalizability if a variable such as being a current smoker (only 13% of SPRINT participants) affects outcomes independent of its effect on continuous variables (such as raising non-HDL cholesterol.)

Although the Trial Score attempts to judge the applicability of a trial’s result to an individual patient irrespective of a patient meeting inclusion or exclusion criteria, it still remains incumbent upon the clinician to understand if an exclusion criterion is due to significant safety concerns. If an exclusion criterion is based on clear safety concerns, then using a Trial Score to make clinical decisions is not appropriate. An example is SPRINT’s exclusion of patients with a one-minute standing SBP of less than 110mm Hg, which could lead to more adverse events if attempting to achieve more stringent blood pressure targets.

Future directions

Further study of how clinicians can implement and apply a scoring system such as the Trial Score in everyday practice is needed. One can envision integration within the electronic medical record. Additional prospective validation of the SPRINT Trial Score with respect to blood pressure targets and patient outcomes is also needed. Similarly, application of the Trial Score framework to other clinical trials outside of studies involving hypertension is underway.

Conclusion

Using the SPRINT Trial Score helps identify a data-rich subset of ACCORD-BP that demonstrates event rates, despite diabetes, indistinguishable from SPRINT. Outcomes seen amongst ACCORD-BP participants in the data-limited and data-free zones are significantly different from SPRINT. Thus, the process of defining data-zones and a Trial Score, holds promise as a tool to more precisely quantify the applicability of trial outcomes to individual patients. A Trial Score may also help reconcile future trials involving different populations with disparate results (like ACCORD-BP and SPRINT).

The Trial Score is not meant to be prescriptive, but rather to help the clinician and patient have an informed discussion about the applicability of a clinical trial. Its use would allow a clinician to determine, in a quantifiable manner, how pertinent a trial is to a specific patient, and provide a framework to understand how other trials or populations overlap with the trial in question.

Acknowledgments

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: LJ Laffin was supported by the NHLBI T32 postdoctoral training grant (T32 HL007381). FJ Alenghat is supported by the NHLBI (K08 HL116600).

Declaration of conflicting interests

The authors declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: LJ Laffin – work was in-part supported by NHLBI T32 postdoctoral training grant (T32 HL007381), has received speaking fees from EP Consulting. FJ Alenghat – supported by the NHLBI (K08 HL116600). SA Besser – none.

References

1.Group SR, Wright JT Jr, Williamson JD, et al. A randomized trial of intensive versus standard blood-pressure control. N Engl J Med 2015; 373: 2103–2116. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Whelton PK, Carey RM, Aronow WS, et al. 2017 ACC/AHA/AAPA/ABB/ACPM/AGS/APhA/ASH/ASPC/NMA/PCNA guideline for the prevention, detection, evaluation, and management of high blood pressure in adults: A report of the American College of Cardiology/American Heart Association Task Force on clinical practice guidelines. J Am Coll Cardiol 2018; 71: e127–e248. [DOI] [PubMed] [Google Scholar]
3.Group AS, Cushman WC, Evans GW, et al. Effects of intensive blood-pressure control in type 2 diabetes mellitus. N Engl J Med 2010; 362: 1575–1585. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Rothwell PM. External validity of randomised controlled trials: “To whom do the results of this trial apply?” Lancet 2005; 365: 82–93. [DOI] [PubMed] [Google Scholar]
5.Kern HL, Stuart EA, Hill J, et al. Assessing methods for generalizing experimental impact estimates to target populations. J Res Educ Eff 2016; 9: 103–127. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Cahan A, Cahan S and Cimino JJ. Computer-aided assessment of the generalizability of clinical trial results. Int J Med Inform 2017; 99: 60–66. [DOI] [PubMed] [Google Scholar]
7.Bress AP, Tanner RM, Hess R, et al. Generalizability of SPRINT results to the U.S. adult population. J Am Coll Cardiol 2016; 67: 463–472. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Williams B, Mancia G, Spiering W, et al. 2018 ESC/ESH guidelines for the management of arterial hypertension. Eur Heart J 2018; 39: 3021–3104. [DOI] [PubMed] [Google Scholar]
9.Bakris G and Sorrentino M. Redefining hypertension – assessing the new blood-pressure guidelines. N Engl J Med 2018; 378: 497–499. [DOI] [PubMed] [Google Scholar]
10.Vinther JL, Jacobsen RK and Jorgensen T. Current European guidelines for management of cardiovascular disease: Is medical treatment in nearly half a population realistic? Eur J Prev Cardiol 2018; 25: 157–163. [DOI] [PubMed] [Google Scholar]

[R1] 1.Group SR, Wright JT Jr, Williamson JD, et al. A randomized trial of intensive versus standard blood-pressure control. N Engl J Med 2015; 373: 2103–2116. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R2] 2.Whelton PK, Carey RM, Aronow WS, et al. 2017 ACC/AHA/AAPA/ABB/ACPM/AGS/APhA/ASH/ASPC/NMA/PCNA guideline for the prevention, detection, evaluation, and management of high blood pressure in adults: A report of the American College of Cardiology/American Heart Association Task Force on clinical practice guidelines. J Am Coll Cardiol 2018; 71: e127–e248. [DOI] [PubMed] [Google Scholar]

[R3] 3.Group AS, Cushman WC, Evans GW, et al. Effects of intensive blood-pressure control in type 2 diabetes mellitus. N Engl J Med 2010; 362: 1575–1585. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R4] 4.Rothwell PM. External validity of randomised controlled trials: “To whom do the results of this trial apply?” Lancet 2005; 365: 82–93. [DOI] [PubMed] [Google Scholar]

[R5] 5.Kern HL, Stuart EA, Hill J, et al. Assessing methods for generalizing experimental impact estimates to target populations. J Res Educ Eff 2016; 9: 103–127. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R6] 6.Cahan A, Cahan S and Cimino JJ. Computer-aided assessment of the generalizability of clinical trial results. Int J Med Inform 2017; 99: 60–66. [DOI] [PubMed] [Google Scholar]

[R7] 7.Bress AP, Tanner RM, Hess R, et al. Generalizability of SPRINT results to the U.S. adult population. J Am Coll Cardiol 2016; 67: 463–472. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R8] 8.Williams B, Mancia G, Spiering W, et al. 2018 ESC/ESH guidelines for the management of arterial hypertension. Eur Heart J 2018; 39: 3021–3104. [DOI] [PubMed] [Google Scholar]

[R9] 9.Bakris G and Sorrentino M. Redefining hypertension – assessing the new blood-pressure guidelines. N Engl J Med 2018; 378: 497–499. [DOI] [PubMed] [Google Scholar]

[R10] 10.Vinther JL, Jacobsen RK and Jorgensen T. Current European guidelines for management of cardiovascular disease: Is medical treatment in nearly half a population realistic? Eur J Prev Cardiol 2018; 25: 157–163. [DOI] [PubMed] [Google Scholar]

PERMALINK

A data-zone scoring system to assess the generalizability of clinical trial results to individual patients

Luke J Laffin

Stephanie A Besser

Francis J Alenghat