Abstract
The impact of electronic health record (EHR) discontinuity, i.e., receiving care outside of a given EHR system, on EHR-based risk prediction is unknown. We aimed to assess the impact of EHR-continuity on performance of clinical risk scores.
The study cohort consisted of patients aged ≥65 years with ≥1 EHR encounter in the two networks in Massachusetts (MA, 2007/1/1–2017/12/31, internal training and validation dataset), and one network in North Carolina (NC, 2007/1/1–2016/12/31, external validation dataset) that were linked with Medicare claims data. Risk scores were calculated using EHR data alone vs. linked EHR-claims data (not subject to misclassification due to EHR-discontinuity): 1) combined co-morbidity score (CCS), 2) claim-based frailty score (CFI), 3) CHAD2DS2-VASc, 4) HAS-BLED. We assessed the performance of CCS and CFI predicting death, CHAD2DS2-VASc predicting ischemic stroke, and HAS-BLED predicting bleeding by Area under ROC Curve (AUC), stratified by quartiles of predicted EHR-continuity (Q1–4).
There were 319,740 patients in the MA systems and 125,380 in the NC system. In the external validation dataset, AUC for EHR-based CCS predicting one-year risk of death was 0.583 in Q1 (lowest) EHR-continuity group, which increased to 0.739 in Q4 (highest) EHR-continuity group. The corresponding improvement in AUC was 0.539 to 0.647 for CFI, 0.556 to 0.637 for CHAD2DS2-VASc, and 0.517 to 0.556 for HAS-BLED. The AUC in Q4 EHR-continuity group based on EHR alone approximates that based on EHR-claims data.
The prediction performance of 4 clinical risk scores was substantially worse in patients with lower vs. high EHR-continuity.
Keywords: data leakage, care continuum, patient connectedness, data completeness, risk score
INTRODUCTION
Over the past decade, the utilization of electronic health records (EHR) has increased in comparative effectiveness research (CER),1 2 because they contain rich clinical information that is not typically available in insurance claims databases, such as vital sign measurements, laboratory results, lifestyle factors, and clinical documents. However, a key limitation of this data source is that medical care provided outside of study EHR network (“out-of-network”) may not be observable to the researchers. This issue has been called “EHR observability”3 or “EHR-discontinuity” (i.e., receiving care outside of the reach of the study EHR) and has been shown to cause substantial misclassification of key variables relevant for CER in the majority of non-integrated U.S EHR systems4 5. To reduce such information bias, we have previously developed and validated a prediction algorithm to identify patients with high EHR-continuity.6 Prior studies have demonstrated that patients in the top quintile of predicted EHR-continuity had 3.5 to 5.8 fold less misclassification of 40 clinical factors commonly used as drug exposure, confounders, and outcome variables in CER studies compared to those in the lower quintiles of predicted EHR-continuity.4 5 In addition to these common drug exposures, confounders, and outcome variables, low predicted EHR-continuity could also threaten the accuracy of risk score measurement.
A clinical risk score is a single numeric value that is calculated based on the presence (or absence) of medical conditions and predicts the risk of a clinically relevant outcome. Such scores are useful in risk-stratification to guide physicians’ decisions of therapeutic strategies and also are often used for confounding adjustment and subgroup effect assessment in CER studies. Four clinical risk scores are among the most commonly used scores in clinical decision support and CER: combined comorbidity score7 (CCS), claims-based frailty index (CFI)8–11, CHA2DS2-VASc12 13 (stands for risk factors of stroke, including congestive heart failure, hypertension, age, diabetes mellitus, stroke, vascular disease, female) score, as well as HAS-BLED13–15 (stands for risk factors for bleeding, including hypertension, abnormal renal/liver function, stroke, bleeding, elderly, and drugs/alcohol) score. CCS is a comorbidity score that combines 20 elements from the Charlson index and the Elixhauser system to predict short-term and long-term mortality.7 16 CFI has been validated against clinical measures of frailty and is well-predictive of mortality.8–11 CHA2DS2-VASc and HAS-BLED scores are widely used to stratify risks of stroke and bleeding, respectively, among patients with nonvalvular atrial fibrillation (AF).14 17 CCS and CFI are developed in claims database and mostly used for confounding adjustment in CER whereas CHA2DS2-VASc and HAS-BLED scores are mostly used for clinical decision aids.
There has been no prior study assessing the impact of EHR data-completeness or EHR-discontinuity on the prediction performance of clinical risk scores. We aimed to evaluate the prediction performance of risk scores stratified by the level of predicted EHR-continuity and assess generalizability of the findings in three US multi-center EHR systems, two in Massachusetts (MA) and one in North Carolina (NC).
METHODS
Data source
We linked the longitudinal claims data from fee-for-service Medicare Parts A, B, and D databases with two EHR systems from Massachusetts (MA, 2007/1/1–2017/12/31) and North Carolina (NC, 2007/1/1–2016/12/31), separately.5 18 The MA EHR system was further separated into 2 networks, serving as internal training and validation datasets, respectively. The first network (MA EHR system 1) consists of 1 tertiary hospital, 3 community hospitals, and 19 primary care centers. The second network (MA EHR system 2) includes 1 tertiary hospital, 1 community hospital, and 18 primary care centers. The NC EHR system includes 1 tertiary hospital, 5 community hospitals, and over 200 clinics. In the previous work to develop and validate the predicted EHR-continuity scores, we used the same internal training, internal validation, and external validation datasets.5 6 All three structured EHR databases have patient demographics, medical diagnoses, procedures, medication prescriptions, and rich clinical data such as laboratory test results and lifestyle measures. The Medicare claims data contain longitudinal information on demographics, claims records for inpatient and outpatient medical diagnoses, procedures, dispensed medications. The study was approved by the Institutional Review Board (IRB) of the Brigham and Women’s Hospital.
Study population
We identified individuals who had at least 365 days of continuous Medicare enrollment and at least 1 EHR encounter overlapping with the Medicare enrollment period. The first eligible date when these two criteria were met was defined as the index date. The study cohort was additionally required to be at least 65 years old on the index date. For the assessment of CHA2DS2-VASc and HAS-BLED score performance, we restricted individuals to those who had ≥1 inpatient or outpatient diagnostic code indicating AF without valvular disease 365 days before the index date. The follow-up time started one day after the index date to the earliest of 1) disenrollment from Medicare, 2) death, 3) end of data, 4) 365 days after the index date.
Calculation of predicted EHR-continuity
The details for the algorithm to calculate the predict EHR-continuity has been described previously.4 5 19 Briefly, the model predictors of the EHR-continuity were measured based on EHR data alone during a 365-day pre-index period. The model predictors are mainly indicators related to primary care follow-up in the study EHR, demographics (age, sex, and race), as well as healthcare utilization factors (Table S1).
Measurement of risk scores
We measured components of 4 risk scores (CCS, CFI, CHA2DS2-VASc, and HAS-BLED) during 365-day pre- index period based on EHR data alone and EHR-claims linked data separately. The details of building each risk score can be found in our previous study20 or existing literature.7 16 17 21 22 We calculated CCS and CFI based on the published algorithms (components and coefficients are listed in Supplementary Tables S2 and S3, respectively). We calculated the CHA2DS2-VASc score by assigning 1 point for each of congestive heart failure, hypertension, age ≥75 years (point doubled), diabetes mellitus, prior stroke or thromboembolism (point doubled), vascular disease, age 65–74 years, female sex (see definition in Supplementary Tables S4). We calculated the modified HAS-BLED score by assigning 1 point for each of hypertension, abnormal renal/liver function, stroke, bleeding history or predisposition, elderly (age >65 years), and drugs/alcohol concomitantly (see definition in Supplementary Tables S5). The final CHAD2DS2-VASc and HAS-BLED scores were calculated by adding up the point assigned to each risk factor. Both CHAD2DS2-VASc and HAS-BLED scores range from 0 to 9 with 10 possible values.
Outcome ascertainment
Because insurance claims data will capture the data even if the care was provided outside of the EHR system, we considered EHR-claims linked data as our “reference-standard” when assessing the effect of EHR-discontinuity on risk score performance. That is, we compared model performance based on EHR data alone vs. that based on linked EHR and claims data. The outcomes were measured during a 365-day post-index period using EHR-claims data. Therefore, the risk scores from EHR alone and EHR-claims data, respectively, will predict the corresponding outcomes from EHR-claims data. Outcomes that we measured are 1) all-cause death (for CCS and CFI) 2) ischemic stroke (for CHA2DS2-VASc score) and 3) major bleeding (for HAS-BLED score). Ischemic stroke and bleeding were defined by International Classification of Disease 9th and 10th Revision (ICD-9/10) codes from primary discharge diagnosis code (Table S6).23
External validation of risk score misclassification stratified by level of EHR-continuity
The mean standardized difference (MSD) was calculated as the weighted mean standardized difference from each risk sore category where the weight is the number of patients in each risk score category based on EHR plus claims data. We also compared risk score misclassification in MA EHR systems20 vs. that in NC systems. Our previous study based on two MA EHR systems found that proportion of patients with misclassification of risk score categories was improved when we restricted patients to high EHR-continuity.20 However, since different EHR systems may represent different patient compositions, we aimed to conduct an external validation of the previous study findings. The misclassification was defined as the proportions of patients who were misclassified into a different risk score category based on EHR data alone vs. linked EHR-claims data. Specifically, we investigated misclassification by ≥1 category, ≥2 categories, respectively. Then the relative risk of misclassification was compared between low vs. high EHR-continuity group, then relative risk (RR) and its 95% confidence interval (CI) was calculated. Specifically, the RR was calculated by dividing the proportion of patients with misclassified risk score categories in the low EHR-continuity group by that in the high EHR-continuity group. Additionally, we also evaluated the risk of misclassification across deciles of EHR-continuity. Our prior study based on the trend analysis by deciles determined that the top 20% is the threshold to achieve satisfactory variable classification, thus deciles are used to report score misclassification.5 6 20 Cochran-Armitage Trend Test was used to test the trend of misclassification across the deciles of predicted EHR-continuity. Two-sided p value of 0.05 was used for the statistical significance cutoff level.
Assessment of prediction performance of risk scores
Area Under the Receiver operating characteristic (AUC) curves were estimated from the logistic regression model when using risk scores measured based on EHR alone and EHR-claims, respectively, to predict the corresponding clinical endpoints. The model comparisons were made in the overall cohort and within each quartile of EHR-continuity. We stratify the comparison by quartiles of predicted EHR-continuity based on cut-off values in the internal training dataset (Q1: <25%, Q2: 25–50%, Q3: 50–75%, Q4: ≥75%). We assessed model performance by quartiles based on prior study, which allows us to have sufficient outcome event number in each subgroup and demonstrate the trend of model performance by increasing EHR-continuity.24 The differences in AUC curves between EHR vs EHR-claims data and p-values were evaluated using DeLong test.25
Similarity of high versus low EHR data-continuity cohorts
We evaluated the representativeness of comorbidity profiles, assessed by CCS,7 26 between high (EHR-continuity in top 20%) vs. low EHR-continuity (lower 80%) using cut-offs informed by prior studies.6 20 CCS was assessed using the reference standard EHR-claims data to reflect comorbidity profiles not influenced by EHR-continuity. Standardized differences were calculated to evaluate the difference in the proportions between high vs. low EHR-continuity groups within each CCS category (range from −2 to 21). Characteristics with a standardized difference <0.1 were considered comparable between high vs. low EHR-continuity groups.27
RESULTS
Study population
We identified in total of 133,246 individuals from MA EHR system 1 (internal training data), 186,494 from MA EHR system 2 (internal validation data), and 125,380 from NC EHR system (external validation data). Among those, 17,458(13.1%), 24,608 (13.2%), and 12,305 (9.8%) were patients with non-valvular AF in internal training data, internal validation data, and external validation data, respectively (Table 3). About 57% –61% were female and the mean age (±standard deviation) were similar across different data, ranging from 73.6 (±7.2) to 74.2 (±7.7) years old (Table 1). There were 82.6% - 87.9% White and 2.2% - 9.9% African American patients in these EHR data. We observed that the standardized mean difference in the same covariate measured from EHR alone vs. EHR-claims data ranged from 8% to 141.4%.
Table 3.
Number of patients and number of outcomesa in each quartile of EHR-continuity
| EHR-continuity |
Total | ||||
|---|---|---|---|---|---|
| Q1 | Q2 | Q3 | Q4 | ||
|
| |||||
| Internal training data set (MA EHR system 1) | |||||
| Total number of patients | 32,910 | 33,407 | 33,615 | 33,314 | 133,246 |
| Deathb | 2579 (7.8%) | 1466 (4.4%) | 2541 (7.6%) | 1473 (4.4%) | 8059 (6.0%) |
| Number of patients with AFc, d | 4,606 | 3,944 | 5,133 | 3,775 | 17,458 |
| Ischemic strokec | 146 (3.2%) | 102 (2.6%) | 115 (2.2%) | 67 (1.8%) | 430 (2.5%) |
| Bleedingd | 568 (12.3%) | 338 (8.6%) | 530 (10.3%) | 374 (9.9%) | 1810 (10.4%) |
|
| |||||
| Internal validation data set (MA EHR system 2) | |||||
| Total population | 33,764 | 55,887 | 49,462 | 47,381 | 186,494 |
| Death | 3446 (10.2%) | 4697 (8.4%) | 2581 (5.2%) | 2190 (4.6%) | 12914 (6.9%) |
| Total patients with AF | 5,412 | 7,271 | 6026 | 5899 | 24,608 |
| Ischemic stroke | 182 (3.4%) | 170 (2.3%) | 133 (2.2%) | 111 (1.9%) | 596 (2.4%) |
| Bleeding | 630 (11.6%) | 867 (11.9%) | 655 (10.9%) | 619 (10.5%) | 2771 (11.3%) |
|
| |||||
| External validation data set (NC EHR system) | |||||
| Total patients | 28,750 | 330,214 | 31,787 | 32,928 | 125,380 |
| Death | 1553 (5.4%) | 1378 (0.4%) | 2235 (7.0%) | 1701 (5.2%) | 6867 (5.5%) |
| Total patients with AF | 2,725 | 2,751 | 3,253 | 3,576 | 12,305 |
| Ischemic stroke | 67 (2.5%) | 84 (3.1%) | 99 (3.0%) | 103 (2.9%) | 353 (2.9%) |
| Bleeding | 288 (10.6%) | 296 (10.8%) | 367 (11.3%) | 432 (12.1%) | 1383 (11.2%) |
AF: atrial fibrillation; EHR: electronic health records
Q1 – Q4: quartiles of EHR-continuity, Q1 is the lowest quartile and Q4 is the highest quartile.
Each outcome was measured using EHR + claims data within 365 days following the index date.
Performance of combined comorbidity score and claims-based frailty index predicting death outcome was assessed among the total population.
Performance of CHA2DS2-VASc score predicting ischemic stroke was assessed among atrial fibrillation patients.
Performance of HAS-BLED score predicting bleeding was assessed among atrial fibrillation patients.
Table 1.
Baseline characteristics of study cohort measured from EHR versus EHR + claims
| MA EHR system 1 (Internal training set) | MA EHR system 2 (Internal validation set) | NC EHR system (External validation set) | |||||||
|---|---|---|---|---|---|---|---|---|---|
|
| |||||||||
| EHR only | EHR + claims | SMD | EHR only | EHR + claims | SMD | EHR only | EHR + claims | SMD | |
| N | 133,246 | 133,246 | - | 186,494 | 186,494 | - | 125,380 | 125,380 | |
| Age in years, mean (SD) | 73.6 (7.4) | 73.6 (7.4) | 0.00 | 74.2 (7.7) | 74.2 (7.7) | 0.00 | 73.6 (7.2) | 73.6 (7.2) | 0.00 |
| Race/ethnicity, % | |||||||||
| White | 84.2% | 84.2% | 0.00 | 82.6% | 82.6% | 0.00 | 87.9% | 87.9% | 0.00 |
| African American | 2.2% | 2.2% | 0.00 | 3.4% | 3.4% | 0.00 | 9.9% | 9.9% | 0.00 |
| Others/unkown | 13.5% | 13.5% | 0.00 | 14.0% | 14.0% | 0.00 | 2.2% | 2.2% | 0.00 |
| Female, % | 56.6% | 56.6% | 0.00 | 61.1% | 61.1% | 0.00 | 58.4% | 58.4% | 0.00 |
| Congestive heart failure, % | 3.4% | 9.0% | −0.23 | 1.7% | 9.5% | −0.34 | 2.3% | 7.3% | −0.24 |
| Hypertension, % | 40.2% | 79.0% | −0.86 | 28.3% | 80.0% | −1.21 | 25.3% | 81.0% | 1.35 |
| Diabetes, % | 7.5% | 21.3% | −0.40 | 4.1% | 22.4% | −0.56 | 6.3% | 22.0% | 0.46 |
| Stroke/TIA, % | 3.5% | 10.3% | −0.27 | 1.4% | 9.2% | −0.35 | 2.4% | 8.0% | 0.25 |
| Liver disease, % | 3.0% | 8.8% | −0.25 | 1.2% | 7.9% | −0.33 | 1.4% | 4.7% | 0.39 |
| Renal disease, % | 7.4% | 22.2% | −0.43 | 4.2% | 22.8% | −0.57 | 6.4% | 19.3% | 0.29 |
| Bleeding, % | 3.9% | 14.3% | −0.37 | 2.5% | 14.9% | −0.45 | 3.4% | 10.7% | 0.29 |
| Drug or alcohol abuse, % | 1.4% | 3.9% | −0.16 | 0.7% | 3.5% | −0.20 | 0.7% | 1.5% | 0.08 |
| Atrial fibrillation, % | 7.3% | 16.5% | −0.29 | 3.7% | 16.2% | −0.43 | 4.6% | 11.6% | 0.26 |
| Valvular disease, % | 2.5% | 10.2% | −0.32 | 0.9% | 8.8% | −0.37 | 1.0% | 6.1% | 0.28 |
| CHAD2DS2-VASc score, mean (SD) | 2.7 (1.3) | 3.6 (1.6) | −0.66 | 2.5 (1.1) | 3.7 (1.6) | −0.92 | 2.5 (1.2) | 3.5 (1.5) | 0.74 |
| HAS-BLED score, mean (SD) | 1.8 (1.2) | 2.8 (1.3) | −0.85 | 1.5 (1.0) | 2.8 (1.3) | −1.18 | 1.4 (0.9) | 2.6 (1.2) | 1.13 |
| Combined comorbidity score, mean (SD) | 0.8 (1.8) | 2.1 (3.0) | −0.53 | 0.5 (1.3) | 2.1 (3.1) | −0.71 | 0.4 (1.4) | 1.4 (2.6) | 0.48 |
| Frailty index, mean (SD) | 0.1 (0.0) | 0.2 (0.1) | −0.76 | 0.1 (0.0) | 0.2 (0.1) | −1.05 | 0.1 (0.0) | 0.2 (0.1) | 1.41 |
EHR=electronic health records; SMD=standardized mean difference; SD=standard deviation; TIA=transient ischemic stroke
Risk score misclassification
We found a similar pattern of risk score misclassification by levels of EHR-continuity in the training, internal and external validation sets. In the external validation set, the weighted standardized mean difference of risk scores measured from EHR data alone versus EHR-claims data decreased from the lowest EHR-continuity decile (decile 1) to the highest EHR-continuity decile (decile 10) (Figure S1), with the Cochran-Armitage trend test p-value of <0.05 for all 4 risk scores. The misclassification by ≥1 category of disease risk scores was higher in low EHR-continuity group, ranging from 48% to 85% across the 4 scores. In contrast, the high EHR-continuity group had reduced misclassification of risk scores, ranging from 24% to 40%. The RR of misclassification (95% CI) was 2.07 (1.98, 2.16) for CHAD2DS2-VASc, 1.83 (1.76, 1.91) for HAS-BLED, 1.31 (1.29, 1.33) for CCS, and 1.44 (1.41, 1.48) for CFI (Table 2). A similar pattern but higher RR was observed for misclassification by ≥2 categories (Table 2). Trend of misclassification of ≥1 or ≥2 categories of risk scores by deciles of EHR-continuity was statistically significant at 0.05 level, based on the Cochran–Armitage test (Figures S2 and S3).
Table 2.
Relative risk of misclassification by ≥1 or ≥2 categories of four risk score in low vs. high EHR-continuity groups (in external validation data)
| Low EHR-continuitya N = 98,333 (N for AF patients= 9,353)c | High EHR-continuity (ref)b N = 27,047 N for AF patients = 2,952)c | RR (95% CI) | |
|---|---|---|---|
|
| |||
| Misclassification by ≥1 category (%) | |||
| CHAD2DS2-VASc | 82.5 | 39.9 | 2.07 (1.98, 2.16) |
| HAS-BLED | 85.2 | 46.6 | 1.83 (1.76, 1.91) |
| Combined comorbidity score | 47.4 | 36.1 | 1.31 (1.29, 1.33) |
| Claims-based frailty Index | 34.9 | 24.2 | 1.44 (1.41, 1.48) |
| Misclassification by ≥2 categories (%) | |||
| CHAD2DS2-VASc | 51.5 | 22.2 | 2.32 (2.16, 2.49) |
| HAS-BLED | 54.2 | 20.5 | 2.64 (2.46, 2.84) |
| Combined comorbidity score | 24.8 | 19.0 | 1.31 (1.27, 1.34) |
| Claims-based frailty Index | 6.7 | 2.9 | 2.34 (2.18, 2.52) |
AF = atrial fibrillation; CI=confidence interval; EHR=electronic health records; RR = relative risk
Predicted EHR-continuity score in the lower 80%
Predicted EHR-continuity score in the top 20%
Misclassification of combined comorbidity score and claims-based frailty index were assessed among the total study population while that of CHAD2DS2-VASc and HAS-BLED scores were assessed among the atrial fibrillation patients only.
Performance of risk scores in predicting corresponding outcomes stratified by EHR-continuity
The AUCs estimated from EHR data alone and EHR-claims data are shown in Figure 1, numbers of outcomes in each EHR-continuity quartiles as well as in the total cohort are provided in Table 3. For CHAD2DS2-VASc, the prediction of stroke had AUC of 0.595 from EHR alone vs. 0.664 alone from EHR-claims in the internal training dataset. We observed that in each level of EHR-continuity, AUCs were almost always higher when the model predictors are ascertained based on EHR-claims data for all risk score in 3 datasets. In the external validation dataset, the AUC of CHAD2DS2-VASc score predicting stroke was 0.610 using EHR data alone vs. 0.634 using EHR-claims data. Using EHR-based CCS to predict one-year risk of death, the AUC was 0.583 among patients with Q1 (worst) EHR-continuity, which increased to 0.739 in those with Q4 (the best) EHR-continuity level. The corresponding improvement in AUC was 0.539 to 0.647 for CFI predicting death, 0.556 to 0.637 for CHAD2DS2-VASc predicting ischemic stroke, and 0.519 to 0.566 for HAS-BLED predicting bleeding. In contrast, AUC remained relatively stable across Q1–4 of EHR-continuity when the risk scores were measured by linked EHR-claims data. The performance reduction of risk score based on EHR alone vs. EHR-claims data was much smaller in Q4 than Q1 of EHR-continuity. We observed similar patterns in the MA systems (Figure 1). The comparisons of AUC between EHR alone versus EHR-claims using top 20% (high EHR-continuity) vs. lower 80% (low EHR-continuity) cut-off are shown in the Supplemental Materials (Figures S3 to S15).
Figure 1. AUROC and 95% CI of risk score and outcome stratified by EHR-continuity quartiles.

Q1-Q4 represents quartiles of EHR-continuity, Q1 is the lowest quartile and Q4 is the highest quartile.
AUROC= area under the receiver operating characteristic; CCS=combined comorbidity score; CFI=claims-based frailty index; CI=confidence interval; EHR=electronic health records.
Number of total patients and number of events are presented in Table 3.
Similarity in baseline characteristics of high vs low EHR data-continuity cohorts
In the MA training set, the distribution of CCS characteristics was similar for high vs low data continuity cohorts using the EHR-claims data measurements. The standardized mean difference between the proportions within each CCS category ranged from 0 to 0.03, all smaller than the cut-off value of 0.1 (Figure 2). In the internal and external validation cohorts, we yielded consistent results regarding the representativeness of high EHR data-continuity cohort with the remaining cohort (Supplemental Materials Figure S16 and S17).
Figure 2. Distribution of combined comorbidity score categories in low versus high EHR continuity in the internal training data set (MA EHR system 1).

a. Predicted EHR-continuity score in the lower 80%
b. Predicted EHR-continuity score in the top 20%
c. standardized mean difference
CCS=combined comorbidity score; EHR=electronic health records.
DISCUSSION
In this study using 3 large EHR systems linked with Medicare claims data, we evaluated the impact of EHR-continuity on the performance of 4 commonly used clinical risk scores – CHAD2DS2-VASc, HAS-BLED, combined CCS, and CFI—when predicting the target clinical endpoints. The prediction performance of these scores based on EHR alone in patients with low EHR-continuity was substantially worse compared to that based on EHR-claims data. In contrast, in those with high EHR-continuity, the prediction performance of risk scores based on EHR alone was similar to that based on EHR-claims data. The co-morbidity profiles based on EHR-claims data were similar in patients with high vs. low EHR-continuity. We observed consistent results in the MA training and internal validation cohorts as well as the NC external validation cohort.
To our knowledge, this is the first study investigating the impact of EHR-continuity on prediction model performance. EHRs have increasingly been used for prediction model development.28–30 In the US, with a few exceptions of highly integrated healthcare systems in which EHR and payor data are directly linked, there is no way to track if a patient receives care outside of a given EHR system accessible to the research team. Without additional data linkage to mitigate such information leakage, it has been shown that relying only on the EHR will lead to a substantial amount of misclassification of key clinical factors.4 5 Our analysis showed that such misclassification will translate into poor performance of prediction models across a range of clinical scenarios, from prediction of stroke or bleeding risk among patients with AF to using combined comorbidity score or frailty index for predicting mortality in the general population. It is now a common practice to embed clinical decision support tools based on validated prediction models in the EHR to generate alerts for clinicians.31 The European Society of Cardiology32 and the American College of Cardiology/American Heart Association Task Force on Practice Guidelines and the Heart Rhythm Society33 recommend using benefit of stroke prevention estimated by CHAD2DS2-VASc vs. the risk of bleeding estimated by HAS-BLED to guide physicians prescribing decision of oral anticoagulants.32 33 However, the predictions based on EHR data from patients with low EHR-continuity may be biased and the automatically generated alerts could be misleading. We recommend researchers to first use our validated algorithm to determine EHR-continuity before applying any clinical decision support model in an EHR and only generate the predicted values based on data from those with adequate EHR data-completeness.
EHRs alone have increasingly been used for CER.34 These 4 risk scores – CHAD2DS2-VASc, HAS-BLED, CCS and CFI – are widely used in CER for confounding adjustment and treatment effect heterogeneity.35–40 With the increased popularity of EHR data in CER studies, it is important to understand the performance of these scores in the EHR data setting. Our findings suggest that calculating these risk scores among patients with low EHR-continuity can result in reduced model performance of these scores. When the scores are used as for confounding adjustment, such misclassification would result in residual confounding. If the score is used for risk stratification and subgroup classification, the miscalculated scores would lead to biased subgroup analyses. Our results lend support for restricting the analysis to patients with high EHR-continuity if the researchers only have access to EHR data, in which the performance of these scores approximate that based on EHR-claims data. Such restriction would inevitably raise two concerns. First, the findings in the subgroup with higher EHR-continuity may not be generalizable to the overall population. However, we found the prevalence of key comorbidities among patients with high vs. low EHR-continuity to be similar, which reduces such concern. The second concern of such restriction is reduced sample size and statistical precision. While the model performance of these scores increased by quartiles of EHR-continuity, it is reasonable to use our model to identify the top 2 quartiles of EHR-continuity, for which the model performance of the scores was considered satisfactory in our analyses across 3 EHR systems. Such a recommendation should be viewed as a reasonable starting point than a rigid magic number. It can be tailored according to specific needs of each study. For example, in a very large cohort study, researchers may reasonably use our model to identify patients with top quartile of EHR continuity, which may lead to a better model performance while still providing sufficient statistical precision.
Our study has some limitations. First, although our study provided some preliminary evidence of robustness and consistency across 3 EHR networks from MA and NC in the US, testing in more EHR systems is still warranted before we can conclude the findings are applicable to a wider context. Second, our study only included those aged 65 or older and our findings may not be generalizable to younger populations as their medical seeking behaviors and general risk profiles can be very different than the elderly population. Lastly, our findings are drawn from 4 commonly used risk scores. Testing of the impact of EHR-continuity on other scores in different populations is needed to demonstrate robustness and generalizability.
In conclusion, based on 3 US EHR systems across MA and NC, we found the prediction performance of 4 commonly used clinical risk scores is substantially worse in patients with low EHR-continuity. Restricting the study cohort to those with high EHR-continuity yielded a similar prediction performance of risk scores based on EHR alone vs. EHR-claims data. Calculating these scores based on EHR information in those with low EHR-continuity may lead to misleading alerts, residual confounding, or biased subgroup analyses according to these scores. We did not find a substantial difference in the comorbidity profiles in patients with high vs. low EHR-continuity based on EHR-claims data.
Supplementary Material
Study Highlights.
• What is the current knowledge on the topic?
The “out-of-network” discontinuity in electronic health records (EHR) is known to cause misclassification of variables measured from EHR data.
• What question did this study address?
How does EHR-discontinuity impact the performance of clinical risk scores based on EHR data?
• What does this study add to our knowledge?
We found that using EHR-based combined comorbidity score to predict one-year mortality, the AUC increased from 0.583 in patients in the lowest quartile to 0.739 in the highest quartile of predicted EHR-continuity. The corresponding improvement in AUC was 0.539 to 0.647 for claims-based frailty index predicting death, 0.556 to 0.637 for CHAD2DS2-VASc predicting stroke, and 0.517 to 0.556 for HAS-BLED predicting bleeding.
• How might this change clinical pharmacology or translational science?
The EHR-based prediction of commonly used clinical risk scores in patients with lower EHR-continuity can be misleading. Calculating these scores based on EHR information in those with low EHR-continuity may lead to misleading alerts, residual confounding, or biased subgroup analyses according to these scores
Source of Funding:
This project was funded by NIH 1R01LM013204 and 1R01LM012594
Footnotes
Competing Interests Statement: Dr. Weberpals is a former employee of Hoffmann-La Roche and held shares in Hoffmann-La Roche. Dr. Merola David Merola reports being an employee and shareholder of Aetion, Inc. All other authors declared no competing interests for this work.
Reference
- 1.Randhawa GS. Building electronic data infrastructure for comparative effectiveness research: accomplishments, lessons learned and future steps. J Comp Eff Res 2014;3(6):567–72. doi: 10.2217/cer.14.73 [published Online First: 2014/12/17] [DOI] [PubMed] [Google Scholar]
- 2.Corley DA, Feigelson HS, Lieu TA, et al. Building Data Infrastructure to Evaluate and Improve Quality: PCORnet. J Oncol Pract 2015;11(3):204–6. doi: 10.1200/jop.2014.003194 [published Online First: 2015/05/17] [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Wang S, Schneeweiss S. A Framework for Visualizing Study Designs and Data Observability in Electronic Health Record Data. Clinical Epidemiology 2022;Volume 14 doi: 10.2147/CLEP.S358583 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Lin KJ, Glynn RJ, Singer DE, et al. Out-of-system Care and Recording of Patient Characteristics Critical for Comparative Effectiveness Research. Epidemiology 2018;29(3):356–63. doi: 10.1097/EDE.0000000000000794 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Lin KJ, Rosenthal GE, Murphy SN, et al. External Validation of an Algorithm to Identify Patients with High Data-Completeness in Electronic Health Records for Comparative Effectiveness Research. Clinical epidemiology 2020;12:133–41. doi: 10.2147/CLEP.S232540 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Lin KJ, Singer DE, Glynn RJ, et al. Identifying Patients With High Data Completeness to Improve Validity of Comparative Effectiveness Research in Electronic Health Records Data. Clin Pharmacol Ther 2018;103(5):899–905. doi: 10.1002/cpt.861 [published Online First: 2017/09/03] [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Gagne JJ, Glynn RJ, Avorn J, et al. A combined comorbidity score predicted mortality in elderly patients better than existing scores. J Clin Epidemiol 2011;64(7):749–59. doi: 10.1016/j.jclinepi.2010.10.004 [published Online First: 2011/01/07] [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Kim DH, Glynn RJ, Avorn J, et al. Validation of a Claims-Based Frailty Index Against Physical Performance and Adverse Health Outcomes in the Health and Retirement Study. J Gerontol A Biol Sci Med Sci 2018. doi: 10.1093/gerona/gly197 [published Online First: 2018/08/31] [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Kim DH, Schneeweiss S, Glynn RJ, et al. Measuring Frailty in Medicare Data: Development and Validation of a Claims-Based Frailty Index. J Gerontol A Biol Sci Med Sci 2018;73(7):980–87. doi: 10.1093/gerona/glx229 [published Online First: 2017/12/16] [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Kim DH, Patorno E, Pawar A, et al. Measuring Frailty in Administrative Claims Data: Comparative Performance of Four Claims-Based Frailty Measures in the U.S. Medicare Data. J Gerontol A Biol Sci Med Sci 2020;75(6):1120–25. doi: 10.1093/gerona/glz224 [published Online First: 2019/10/01] [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Gautam N, Bessette L, Pawar A, et al. Updating International Classification of Diseases Ninth Revision to Tenth Revision of a Claims-Based Frailty Index. J Gerontol A Biol Sci Med Sci 2020. doi: 10.1093/gerona/glaa150 [published Online First: 2020/06/13] [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Lip GY, Nieuwlaat R, Pisters R, et al. Refining clinical risk stratification for predicting stroke and thromboembolism in atrial fibrillation using a novel risk factor-based approach: the euro heart survey on atrial fibrillation. Chest 2010;137(2):263–72. doi: 10.1378/chest.09-1584 [published Online First: 2009/09/19] [DOI] [PubMed] [Google Scholar]
- 13.Friberg L, Rosenqvist M, Lip GY. Evaluation of risk stratification schemes for ischaemic stroke and bleeding in 182 678 patients with atrial fibrillation: the Swedish Atrial Fibrillation cohort study. Eur Heart J 2012;33(12):1500–10. doi: 10.1093/eurheartj/ehr488 [published Online First: 2012/01/17] [DOI] [PubMed] [Google Scholar]
- 14.Pisters R, Lane DA, Nieuwlaat R, et al. A novel user-friendly score (HAS-BLED) to assess 1-year risk of major bleeding in patients with atrial fibrillation: the Euro Heart Survey. Chest 2010;138(5):1093–100. doi: 10.1378/chest.10-0134 [DOI] [PubMed] [Google Scholar]
- 15.Lip GY, Frison L, Halperin JL, et al. Comparative validation of a novel risk score for predicting bleeding risk in anticoagulated patients with atrial fibrillation: the HAS-BLED (Hypertension, Abnormal Renal/Liver Function, Stroke, Bleeding History or Predisposition, Labile INR, Elderly, Drugs/Alcohol Concomitantly) score. J Am Coll Cardiol 2011;57(2):173–80. doi: 10.1016/j.jacc.2010.09.024 [published Online First: 2010/11/30] [DOI] [PubMed] [Google Scholar]
- 16.Sun JW, Rogers JR, Her Q, et al. Validation of the Combined Comorbidity Index of Charlson and Elixhauser to Predict 30-Day Mortality Across ICD-9 and ICD-10. Medical Care 2018;56(9) [DOI] [PubMed] [Google Scholar]
- 17.Lip GYH, Nieuwlaat R, Pisters R, et al. Refining Clinical Risk Stratification for Predicting Stroke and Thromboembolism in Atrial Fibrillation Using a Novel Risk Factor-Based Approach: The Euro Heart Survey on Atrial Fibrillation. Chest 2010;137(2):263–72. doi: 10.1378/chest.09-1584 [DOI] [PubMed] [Google Scholar]
- 18.Merola D SS, Jin Y, Lii J, Lin KJ. Advancing an Algorithm for the Identification of Patients with High Data-Continuity in Electronic Health Records. Clinical Epidemiology 2022;14 1339–49. doi: 10.2147/CLEP.S370031 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Merola D, Schneeweiss S, Jin Y, et al. Advancing an Algorithm for the Identification of Patients with High Data-Continuity in Electronic Health Records. Clinical epidemiology 2022; 14. http://europepmc.org/abstract/MED/36387928 10.2147/CLEP.S370031 https://europepmc.org/articles/PMC9653024 https://europepmc.org/articles/PMC9653024?pdf=render (accessed 2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Jin Y, Schneeweiss S, Merola D, et al. Impact of longitudinal data-completeness of electronic health record data on risk score misclassification. Journal of the American Medical Informatics Association 2022;29(7):1225–32. doi: 10.1093/jamia/ocac043 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Pisters R, Lane DA, Nieuwlaat R, et al. A Novel User-Friendly Score (HAS-BLED) To Assess 1-Year Risk of Major Bleeding in Patients With Atrial Fibrillation: The Euro Heart Survey. CHEST 2010;138(5):1093–100. doi: 10.1378/chest.10-0134 [DOI] [PubMed] [Google Scholar]
- 22.Kim DH, Schneeweiss S, Glynn RJ, et al. Measuring Frailty in Medicare Data: Development and Validation of a Claims-Based Frailty Index. The Journals of Gerontology: Series A 2017;73(7):980–87. doi: 10.1093/gerona/glx229 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Wahl PM, Rodgers K, Schneeweiss S, et al. Validation of claims-based diagnostic and procedure codes for cardiovascular and gastrointestinal serious adverse events in a commercially-insured population. Pharmacoepidemiol Drug Saf 2010;19(6):596–603. doi: 10.1002/pds.1924 [published Online First: 2010/02/09] [DOI] [PubMed] [Google Scholar]
- 24.Joshua Lin K, Jin Y, Gagne J, et al. Longitudinal Data Discontinuity in Electronic Health Records and Consequences for Medication Effectiveness Studies. Clinical Pharmacology & Therapeutics 2022;111(1):243–51. doi: 10.1002/cpt.2400 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.DeLong ER, DeLong DM, Clarke-Pearson DL. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics 1988;44(3):837–45. [published Online First: 1988/09/01] [PubMed] [Google Scholar]
- 26.Sun JW, Rogers JR, Her Q, et al. Adaptation and Validation of the Combined Comorbidity Score for ICD-10-CM. Medical Care 2017;55(12):1046–51. doi: 10.1097/mlr.0000000000000824 [DOI] [PubMed] [Google Scholar]
- 27.Austin PC. Balance diagnostics for comparing the distribution of baseline covariates between treatment groups in propensity-score matched samples. Stat Med 2009;28(25):3083–107. doi: 10.1002/sim.3697 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Paxton C, Niculescu-Mizil A, Saria S. Developing predictive models using electronic medical records: challenges and pitfalls. AMIA Annu Symp Proc 2013;2013:1109–15. [published Online First: 2014/02/20] [PMC free article] [PubMed] [Google Scholar]
- 29.Matheny ME, Ricket I, Goodrich CA, et al. Development of Electronic Health Record–Based Prediction Models for 30-Day Readmission Risk Among Patients Hospitalized for Acute Myocardial Infarction. JAMA Network Open 2021;4(1):e2035782–e82. doi: 10.1001/jamanetworkopen.2020.35782 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Goldstein BA, Navar AM, Pencina MJ, et al. Opportunities and challenges in developing risk prediction models with electronic health records data: a systematic review. Journal of the American Medical Informatics Association 2016;24(1):198–208. doi: 10.1093/jamia/ocw042 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Sutton RT, Pincock D, Baumgart DC, et al. An overview of clinical decision support systems: benefits, risks, and strategies for success. npj Digital Medicine 2020;3(1):17. doi: 10.1038/s41746-020-0221-y [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Hindricks G, Potpara T, Dagres N, et al. 2020 ESC Guidelines for the diagnosis and management of atrial fibrillation developed in collaboration with the European Association for Cardio-Thoracic Surgery (EACTS): The Task Force for the diagnosis and management of atrial fibrillation of the European Society of Cardiology (ESC) Developed with the special contribution of the European Heart Rhythm Association (EHRA) of the ESC. European Heart Journal 2020;42(5):373–498. doi: 10.1093/eurheartj/ehaa612 [DOI] [PubMed] [Google Scholar]
- 33.January CT, Wann LS, Calkins H, et al. 2019 AHA/ACC/HRS Focused Update of the 2014 AHA/ACC/HRS Guideline for the Management of Patients With Atrial Fibrillation: A Report of the American College of Cardiology/American Heart Association Task Force on Clinical Practice Guidelines and the Heart Rhythm Society in Collaboration With the Society of Thoracic Surgeons. Circulation 2019;140(2):e125–e51. doi: doi: 10.1161/CIR.0000000000000665 [DOI] [PubMed] [Google Scholar]
- 34.Huang B, Qiu T, Chen C, et al. Comparative Effectiveness Research Using Electronic Health Records Data: Ensure Data Quality. London, 2020. [Google Scholar]
- 35.Cameron CG, Synnott PG, Pearson SD, et al. Evaluating the Importance of Heterogeneity of Treatment Effect: Variation in Patient Utilities Can Influence Choice of the “Optimal” Oral Anticoagulant for Atrial Fibrillation. Value in Health 2016;19(5):661–69. doi: 10.1016/j.jval.2016.03.1835 [DOI] [PubMed] [Google Scholar]
- 36.Presley CA, Chipman J, Min JY, et al. Evaluation of frailty as an unmeasured confounder in observational studies of antidiabetic medications. The Journals of Gerontology: Series A 2019;74(8):1282–88. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Dong Y-H, Chang C-H, Wang J-L, et al. Association of infections and use of fluoroquinolones with the risk of aortic aneurysm or aortic dissection. JAMA internal medicine 2020;180(12):1587–95. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Rome BN, Gagne JJ, Avorn J, et al. Non-warfarin oral anticoagulant copayments and adherence in atrial fibrillation: a population-based cohort study. American Heart Journal 2021;233:109–21. [DOI] [PubMed] [Google Scholar]
- 39.D’Silva KM, Cromer SJ, Yu EW, et al. Risk of incident atrial fibrillation with zoledronic acid versus denosumab: a propensity score–matched cohort study. Journal of Bone and Mineral Research 2021;36(1):52–60. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Kim SC, Solomon DH, Rogers JR, et al. Cardiovascular safety of tocilizumab versus tumor necrosis factor inhibitors in patients with rheumatoid arthritis: a multi-database cohort study. Arthritis & rheumatology 2017;69(6):1154–64. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
