Skip to main content
Journal of the American Medical Informatics Association: JAMIA logoLink to Journal of the American Medical Informatics Association: JAMIA
. 2020 Mar 5;27(4):601–605. doi: 10.1093/jamia/ocaa014

Investigating the impact of disease and health record duration on the eMERGE algorithm for rheumatoid arthritis

Vanessa L Kronzer 1, Liwei Wang 2, Hongfang Liu 2, John M Davis III 1, Jeffrey A Sparks 3, Cynthia S Crowson 1,4,
PMCID: PMC7647254  PMID: 32134444

Abstract

Objective

The study sought to determine the dependence of the Electronic Medical Records and Genomics (eMERGE) rheumatoid arthritis (RA) algorithm on both RA and electronic health record (EHR) duration.

Materials and Methods

Using a population-based cohort from the Mayo Clinic Biobank, we identified 497 patients with at least 1 RA diagnosis code. RA case status was manually determined using validated criteria for RA. RA duration was defined as time from first RA code to the index date of biobank enrollment. To simulate EHR duration, various years of EHR lookback were applied, starting at the index date and going backward. Model performance was determined by sensitivity, specificity, positive predictive value, negative predictive value, and area under the curve (AUC).

Results

The eMERGE algorithm performed well in this cohort, with overall sensitivity 53%, specificity 99%, positive predictive value 97%, negative predictive value 74%, and AUC 76%. Among patients with RA duration <2 years, sensitivity and AUC were only 9% and 54%, respectively, but increased to 71% and 85% among patients with RA duration >10 years. Longer EHR lookback also improved model performance up to a threshold of 10 years, in which sensitivity reached 52% and AUC 75%. However, optimal EHR lookback varied by RA duration; an EHR lookback of 3 years was best able to identify recently diagnosed RA cases.

Conclusions

eMERGE algorithm performance improves with longer RA duration as well as EHR duration up to 10 years, though shorter EHR lookback can improve identification of recently diagnosed RA cases.

Keywords: rheumatoid arthritis, natural language processing, algorithm, electronic health record, eMERGE

INTRODUCTION

Electronic health record (EHR) data are increasingly being used for disease classification in clinical and genomics studies.1 The Electronic Medical Records and Genomics (eMERGE) group2 has created algorithms using machine learning logistic regression models and natural language processing techniques to capture disease diagnoses using EHR data.3,4 These techniques have now been applied to more than 50 diseases,5 including rheumatoid arthritis (RA).6,7

The eMERGE RA algorithm has already been successfully applied in many studies of RA.8–12 To create it, the authors generated a training dataset using the 2010 American College of Rheumatology (ACR)/European League Against Rheumatism (EULAR) classification criteria for RA.13 They then created a penalized logistic regression model, with relative weights for features that were significantly associated with RA.6 They set specificity at a threshold of 97% to identify RA cases from controls. The model performed very well, with a sensitivity of 65%, a positive predictive value (PPV) of 90%, and an area under the receiver-operating characteristic curve (AUC) of 95%, and demonstrated portability to external datasets.12 Of note, model performance has not yet been assessed in a population-based cohort.

The eMERGE authors also noted that there are “no temporal restraints” to the algorithm.6 However, the 2 positive weights in the algorithm are for the number of RA diagnosis codes and rheumatoid factor (RF) laboratory values. Thus, patients with RA of shorter duration may have fewer positive weights and lower likelihood of being captured by the algorithm. Investigators may be particularly interested in identifying incident RA or RA of early duration, but it is unclear how well this algorithm identifies these patients. It is possible that the algorithm may preferentially identify patients with longstanding RA which may affect the composition and interpretation of studies.

In addition to RA duration, EHR duration may also impact model performance. One of the negative weights in the algorithm is the total number of encounters per subject with a coded diagnosis in a patient’s record. Thus, the longer the EHR before RA diagnosis, the less likely a patient may be correctly identified as having RA. This may reduce the algorithm’s ability to identify incident cases of RA, or even prevalent cases as the length of time of the EHR expands. Despite the importance of including these model features in the RA algorithm, the effect of RA and EHR duration on the performance of the eMERGE algorithm has not been studied. Doing so is critical to assess the algorithm’s ability to identify incident RA as well as prevalent cases of RA in the modern era of expanding EHR.

To address these gaps, our aims were threefold. First, we aimed to calculate the validity of the eMERGE RA algorithm in our population-based cohort. Second, we aimed to determine algorithm accuracy over varying intervals of RA duration. Third, we aimed to determine algorithm accuracy over varying time intervals of EHR lookback. We hypothesized that the model would perform well in our population-based cohort, performing best for EHR lookback of 10 years because that was the duration on which the model was trained, and that sensitivity for newly diagnosed RA cases would be poor but improved by shorter EHR lookback.

MATERIALS AND METHDOS

Study design and participants

This population-based cohort study used the subset of Mayo Clinic Biobank14 participants who were also included in the Rochester Epidemiology Project (REP),15 or approximately 14 000 individuals. We defined the index date for patient characteristics as the time of completing the biobank enrollment questionnaire. Biobank enrollment began April, 2009. Only participants with at least 1 diagnosis code for RA (International Classification of Diseases–Ninth Revision 714.x except 714.3) prior to the biobank enrollment questionnaire were evaluated by the algorithm, as 1 diagnosis code was the criteria for inclusion in the validation cohorts.12 This study received institutional review board approval from Mayo Clinic and Olmsted Medical Center, which waived the need for informed consent. It also complies with the Declaration of Helsinki and follows the STROBE (STrengthening the Reporting of OBservational studies in Epidemiology) reporting guidelines for observational studies.16

RA cases

All 497 patients were manually reviewed for RA status using 1987 ACR and 2010 ACR/EULAR classification criteria.13,17 Trained nurse abstractors previously screened the population of Olmsted County, the defined geographic area for the REP, for incident RA using either the 1987 or 2010 classification.18 Any unclear cases were reviewed and categorized by a rheumatologist. V.L.K. performed additional chart review to determine if prevalent RA cases who moved into Olmsted County with preexisting RA met criteria. Using this combined approach, 213 (43%) of the 497 possible RA cases met either 1987 ACR or 2010 ACR/EULAR criteria for RA.

Measures

The eMERGE RA algorithm was obtained from the eMERGE website.6 For ease of viewing, the features for this algorithm and their corresponding weights are included in Supplementary Table 1. Data for the eMERGE algorithm, including diagnosis codes for RA, systemic lupus erythematosus, and psoriatic arthritis; RF laboratory codes; and the total number of diagnosis codes, were all collected from the EHR, which began January 1, 1995. The 2 primary exposure variables were RA duration and EHR lookback. We defined RA duration as the number of years from the first RA diagnosis code to the index date of the biobank enrollment questionnaire. As a proxy for EHR duration, we manually imposed varying intervals of EHR lookback in years starting from index date and going backward up to January 1, 1995.

Statistical analysis

Descriptive statistics (eg, means, percentages) were used to summarize the characteristics of the patients after initial screen who did and did not fulfill criteria for RA. Comparisons between groups were performed using chi-square and rank sum tests. eMERGE model performance was calculated using sensitivity, specificity, PPV, negative predictive value (NPV), and AUC. In addition to using all available data, we also calculated performance for EHR lookbacks of 1 year, 3 years, 5 years, 10 years, and 15 years and RA durations of <2 years, 2 to <5 years, 5 to <10 years, and ≥10 years. All analyses were prespecified in a protocol and were performed using SAS version 9.4 (SAS Institute, Cary, NC).

RESULTS

Patient characteristics

Of the 497 potential cases, 213 (43%) were confirmed as RA. The characteristics of patients with confirmed RA did not differ from those without, except for RA duration (Table 1). Mean available EHR duration was 15 years in both groups. Characteristics also did not differ between the RA and no-RA groups when stratified by RA duration.

Table 1.

Characteristics at index date of biobank enrollment questionnaire among the 497 biobank participants with at least 1 RA code

Characteristic RA (n = 213) No RA (n = 284) P value
Age, y 65 ±14 63 ±16 .11
Female 159 (75) 194 (68) .12
White, non-Hispanic 209 (98) 280 (99) .68
Education bachelor's degree or higher 85 (40) 120 (42) .50
Never smoker 119 (56) 150 (53) .72
Years from first RA codea 9 (4-15) 6 (3-11) <.001
EHR follow-up, y 15 (15-17) 15 (15-17) .94

Values are mean ± SD, n (%), or median (interquartile range).

EHR: electronic health record; RA: rheumatoid arthritis.

a

Measured as time between first RA diagnosis code and the index date of biobank enrollment questionnaire.

Overall model performance

Using all available EHR data without any imposed lookback period, the model correctly identified 112 (53%) and missed 101 (47%) of the cases. The overall model sensitivity was 53%, specificity was 99%, PPV was 97%, NPV was 74%, and AUC was 76%. The model incorrectly identified 4 (1%) and correctly categorized 280 (99%) of cases in the no-RA group. Among the RA cases, those missed by the model were younger (mean age at survey: 63 vs 68 years; P = .022), had a shorter RA duration (mean: 4.1 vs 6.7; P < .001), and were more likely to be seronegative (36% vs 85%; P < .001).

RA duration

RA disease duration had a substantial influence on model performance (Table 2). Sensitivity was particularly low for RA durations <5 years. AUC was also low for RA duration <5 years. PPV remained over 90% regardless of RA duration, though was lowest for RA duration of 2-5 years.

Table 2.

Model performance characteristics by RA durationa

Characteristic <2 y (n = 34) 2 to <5 y (n = 32) 5 to <10 y (n = 48) ≥10 y (n = 99)
Sensitivity 9% 28% 63% 71%
Specificity 100% 99% 97% 99%
Positive predictive value 100% 90% 94% 99%
Negative predictive value 64% 76% 80% 74%
AUC 54% 63% 80% 85%

AUC: area under the receiver-operating characteristic curve; RA: rheumatoid arthritis.

a

Prior to index date of biobank enrollment questionnaire.

EHR lookback

Model performance as measured by sensitivity and AUC also varied by EHR lookback (Table 3). Both sensitivity and AUC increased until EHR reached a lookback of 10 years. PPV remained over 95% regardless of EHR lookback.

Table 3.

Model performance characteristics by EHR lookbacka

Characteristic 1 y (n = 213) 3 y (n = 208) 5 y (n = 203) 10 y (n = 193) 15 y (n = 136)
Sensitivity 13% 38% 45% 52% 52%
Specificity 100% 99% 99% 99% 99%
Positive predictive value 100% 98% 96% 97% 97%
Negative predictive value 61% 68% 71% 73% 73%
AUC 56% 68% 72% 75% 75%

AUC: area under the receiver-operating characteristic curve; EHR: electronic health record.

a

Cutoffs imposed prior to index date of biobank enrollment questionnaire.

Combined RA duration and EHR lookback

In analyses of RA duration and EHR lookback together (Table 4), sensitivity was the model parameter most affected by variations in RA and EHR duration. Both sensitivity and AUC improved greatly for RA duration and EHR lookback >5 years and peaked at 10 years or longer. However, the optimal EHR lookback depended on RA duration. That is, for RA disease duration <2 years, EHR lookback of 3 years was optimal. As disease duration increased to 2-5 years, the optimal EHR lookback increased to 5 years. Similarly, for RA duration 5-10 years, optimal sensitivity and concordance was achieved with EHR lookback of 10 years, while for the longest RA duration cases, the longest EHR lookback performed best.

Table 4.

Model performance characteristics by a combination of RA duration and EHR lookbacka

EHR lookbacka Sensitivity
Positive predictive value
Area under the curve
<2 y 2 to <5 y 5 to <10 y ≥10 y <2 y 2 to <5 y 5 to <10 y ≥10 y <2 y 2 to <5 y 5 to <10 y ≥10 y
1 y 14% 5% 15% 14% 100% 67% 100% 100% 57% 52% 57% 57%
3 y 24% 25% 44% 42% 100% 89% 100% 98% 62% 62% 72% 71%
5 y 14% 29% 58% 53% 100% 89% 93% 98% 57% 64% 78% 76%
10 y 0% 29% 63% 68% N/A 89% 96% 99% N/A 64% 81% 83%
15 y 0% 18% 58% 71% N/A 80% 100% 98% N/A 58% 79% 85%

EHR: electronic health record; N/A: not available; RA: rheumatoid arthritis.

a

Prior to index date of biobank enrollment questionnaire.

DISCUSSION

This study demonstrates the portability of the eMERGE RA algorithm not only in a population-based cohort, but also across varying time intervals of RA and EHR duration. We found that RA duration had the strongest impact on model performance, with longer RA duration improving model performance. We also found that model performance improved with increasing EHR lookback up to 10 years, though shorter EHR lookback had improved sensitivity for newly diagnosed RA cases. These findings support the validity of the eMERGE RA algorithm for clinical and genomics research, especially for prevalent RA. However, the algorithm demonstrated very low sensitivity for identifying patients with early RA. In contrast, the specificity and PPV were quite high. Depending on the researcher’s goals, the high PPV of the algorithm for early RA could be useful, as those identified are very likely to truly have incident RA. Nevertheless, owing to the low sensitivity, it is possible that studies using this algorithm may have difficulty in identifying incident and early RA, which could have a major impact on the study samples and results.

Of the 2 temporal dimensions studied, algorithm performance was most dependent on RA duration. The best performance occurred in RA cases with the longest RA duration (10 years or more). Presumably, model performance would continue to improve for longer RA durations. Conversely, model performance was relatively poor for recently diagnosed RA cases, with sensitivity <10% for the newly diagnosed RA cases. These findings were not surprising, as the model positively weights the number of RA diagnosis codes and RF laboratory values. Thus, the eMERGE algorithm may not be preferred for identifying newly diagnosed RA cases.

Algorithm performance also increased with EHR lookback, up to 10 years. After 10 years, its performance plateaued but, importantly, did not decline, as was hypothesized. The increased performance with increased EHR lookback observed in this study is consistent with a prior study, in which the institution with the shortest EHR duration performed slightly worse than the others.12 Furthermore, the peak at 10 years may have occurred because that was the duration on which the model was trained.12 Overall, these findings provide reassurance that ongoing EHR expansion will not negatively impact eMERGE performance.

While overall model performance increased with EHR lookback, we also found that the optimal EHR lookback depended on RA duration. As we hypothesized, a shorter EHR lookback (3 years) allowed for the highest sensitivity for recently diagnosed RA cases. However, sensitivity was still only about 25%. As RA duration increased, the optimal EHR lookback also increased in parallel. These findings can be used to help target RA case identification for a certain RA duration, such as newly diagnosed cases. Further, the model’s dependence on RA and EHR duration suggests that incorporating temporal variables into the algorithm may be beneficial.

Finally, this study found that the eMERGE algorithm is portable to this separate, population-based cohort. The sensitivities of around 50% are similar to published rates of 43%-75% when the model is not trained on local data.12 This study’s PPVs of 96%-100% were slightly higher than previously published reports of 79%-95%,7,12 whereas the AUC of around 75% was lower than previously published reports of 89%-97%.12 Most likely, these differences occurred from the differing definitions of RA. This study required either 1987 ACR or 2010 ACR/EULAR criteria, which is more stringent than physician diagnosis, thus increasing PPV. In contrast, prior studies used physician diagnosis for RA classification. Physician diagnosis logically correlates well with physician-generated diagnosis codes, thus increasing the AUC. It is also possible our population was composed of a higher proportion of early RA than prior studies, which worsened model performance as we demonstrated, though RA duration was not reported in the prior eMERGE studies. Overall, despite the lower AUC, the fact that the PPV was so high provides reassurance that RA cases identified by the eMERGE algorithm do have true RA.

Strengths of this study include its ability to extend the generalizability of the eMERGE algorithm to a population-based sample, use of validated ACR/EULAR criteria for RA as the gold standard, sample size of 213 cases (nearly double that of prior studies),12 and importantly, relatively long EHR duration, which permitted calculations over many different time periods of EHR lookback. Several limitations also deserve mention. First, although all patients for this study came from the population-based REP, the study is not perfectly population-based because only those patients who participated in the biobank were included. Second, use of ACR/EULAR criteria for RA rather than physician diagnosis as done in the original validation study impedes direct comparisons with prior studies. In particular, it likely increased the PPV, NPV, and specificity while lowering sensitivity and AUC in our cohort, as ACR criteria are stricter than physician diagnosis, and some patients with RA may not meet ACR classification criteria. Third, RA duration was calculated from the time of first RA diagnosis code, rather than from the time of symptom onset, so true RA duration is likely longer than reported. Fourth, small sample sizes limited calculations especially at the extremes of RA duration and EHR lookback. Indeed, not enough RA cases were diagnosed within 1 year of the index date to permit calculations of very early RA. Even when using <2 years as the cutoff, PPV and AUC could not be calculated for the longer EHR durations. Fifth, we presume EHR durations longer than 15 years would not affect model performance because we observed no decline from EHR lookbacks of 10-15 years, but we were not able to objectively test longer intervals. Finally, the population of Olmsted County is predominately Caucasian, and the proportion of college-educated subjects in this study population is higher than the national average, so the findings may not be generalizable to more diverse or disadvantaged populations.

CONCLUSION

The eMERGE algorithm for RA performed well in our population-based cohort. As hypothesized, model performance increased with longer RA disease duration as well as longer EHR duration, up to 10 years. The algorithm demonstrated low sensitivity for identifying patients with early RA. However, sensitivity for recently diagnosed cases was improved by using a shorter EHR lookback. These findings may clarify the eMERGE algorithm’s utility for clinical and genomics research using RA. Further research is needed to determine the impact of EHR duration and disease duration on the numerous other eMERGE disease phenotypes.

FUNDING

Dr. Kronzer is supported by the Rheumatology Research Foundation Resident Research Preceptorship. Dr. Sparks is supported by the National Institute for Arthritis and Musculoskeletal Skin Diseases (K23 AR069688, L30 AR066953, R03 AR075886, P30 AR070253, and P30 AR072577) and the Rheumatology Research Foundation K Supplement Award. Dr. Crowson was supported by R01 AR046849. R01 AG034676 provide infrastructure support to make this work possible.

AUTHOR CONTRIBUTIONS

VLK, LW, and CSC made substantial contributions to the design and implementation of the research and to the analysis of the experimental results. VLK drafted the initial manuscript. LW, HL, JMD, JAS, and CSC provided critical revision of the manuscript. All authors gave final approval of the version to be published. All authors agree to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.

CONFLICT OF INTEREST STATEMENT

The authors have no competing interests to declare.

Supplementary Material

ocaa014_Supplementary_Data

REFERENCES

  • 1. Ford E, Carroll JA, Smith HE, et al. Extracting information from the text of electronic medical records to improve case detection: a systematic review. J Am Med Inform Assoc 2016; 23 (5): 1007–15. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. Smoller JW, Karlson EW, Green RC, et al. An eMERGE Clinical Center at Partners Personalized Medicine. J Pers Med 2016; 6 (1): E5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Liao KP, Cai T, Savova GK, et al. Development of phenotype algorithms using electronic medical records and incorporating natural language processing. BMJ 2015; 350 (11): h1885. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Kho AN, Pacheco JA, Peissig PL, et al. Electronic medical records for genetic research: results of the eMERGE consortium. Sci Transl Med 2011; 3 (79): 79re1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.PheKB: A knowledgebase for discovering phenotypes from electronic medical records. https://phekb.org/ Accessed August 7, 2019
  • 6.Partners Phenotyping Group. Partners HealtHcare. Rheumatoid arthritis (RA). 2016. https://phekb.org/phenotype/585 Accessed July 29, 2019.
  • 7. Liao KP, Cai T, Gainer V, et al. Electronic medical records for discovery research in rheumatoid arthritis. Arthritis Care Res (Hoboken) 2010; 62 (8): 1120–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Yu Z, Kim SC, Vanni K, et al. Association between inflammation and systolic blood pressure in RA compared to patients without RA. Arthritis Res Ther 2018; 20 (1): 107. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Hejblum BP, Cui J, Lahey LJ, et al. Association between anti-citrullinated fibrinogen antibodies and coronary artery disease in rheumatoid arthritis. Arthritis Care Res (Hoboken) 2018; 70 (7): 1113–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Lin C, Karlson EW, Canhao H, et al. Automatic prediction of rheumatoid arthritis disease activity from the electronic medical records. PLoS One 2013; 8 (8): e69932. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Liao KP, Kurreeman F, Li G, et al. Associations of autoantibodies, autoimmune risk alleles, and clinical diagnoses from the electronic medical records in rheumatoid arthritis cases and non-rheumatoid arthritis controls. Arthritis Rheum 2013; 65 (3): 571–81. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Carroll RJ, Thompson WK, Eyler AE, et al. Portability of an algorithm to identify rheumatoid arthritis in electronic health records. J Am Med Inform Assoc 2012; 19 (e1): e162–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Aletaha D, Neogi T, Silman AJ, et al. 2010 Rheumatoid arthritis classification criteria: an American College of Rheumatology/European League Against Rheumatism collaborative initiative. Arthritis Rheum 2010; 62 (9): 2569–81. [DOI] [PubMed] [Google Scholar]
  • 14. Olson JE, Ryu E, Johnson KJ, et al. The Mayo Clinic Biobank: a building block for individualized medicine. Mayo Clin Proc 2013; 88 (9): 952–62. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. St Sauver JL, Grossardt BR, Yawn BP, et al. Data resource profile: the Rochester Epidemiology Project (REP) medical records-linkage system. Int J Epidemiol 2012; 41 (6): 1614–24. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. von Elm E, Altman DG, Egger M, et al. The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) Statement: guidelines for reporting observational studies. Int J Surg 2014; 12 (12): 1495–9. [DOI] [PubMed] [Google Scholar]
  • 17. Arnett FC, Edworthy SM, Bloch DA, et al. The American Rheumatism Association 1987 revised criteria for the classification of rheumatoid arthritis. Arthritis Rheum 1988; 31 (3): 315–24. [DOI] [PubMed] [Google Scholar]
  • 18. Myasoedova E, Crowson CS, Kremers HM, et al. Is the incidence of rheumatoid arthritis rising? Results from Olmsted County, Minnesota, 1955-2007. Arthritis Rheum 2010; 62 (6): 1576–82. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

ocaa014_Supplementary_Data

Articles from Journal of the American Medical Informatics Association : JAMIA are provided here courtesy of Oxford University Press

RESOURCES