Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2025 Jan 1.
Published in final edited form as: Pharmacoepidemiol Drug Saf. 2023 Aug 31;33(1):e5684. doi: 10.1002/pds.5684

Improving the accuracy of automated gout flare ascertainment using natural language processing of electronic health records and linked Medicare claims data

Kazuki Yoshida 1,2,3, Tianrun Cai 1,2, Lily G Bessette 4, Erin Kim 4, Su Been Lee 4, Luke E Zabotka 4, Alec Sun 4, Julianna M Mastrorilli 4, Theresa A Oduol 4, Jun Liu 4, Daniel H Solomon 1,2,4, Seoyoung C Kim 1,2,4, Rishi J Desai 2,4, Katherine P Liao 1,2,5
PMCID: PMC10873073  NIHMSID: NIHMS1940122  PMID: 37654015

Abstract

Background:

We aimed to determine whether integrating concepts from the notes from the electronic health record (EHR) data using natural language processing (NLP) could improve the identification of gout flares.

Methods:

Using Medicare claims linked with EHR, we selected gout patients who initiated the urate-lowering therapy (ULT). Patients’ 12-month baseline period and on-treatment follow-up were segmented into 1-month units. We retrieved EHR notes for months with gout diagnosis codes and processed notes for NLP concepts. We selected a random sample of 500 patients and reviewed each of their notes for the presence of a physician documented gout flare. Months containing at least 1 note mentioning gout flares were considered months with events. We used 60% of patients to train predictive models with LASSO. We evaluated the models by the area under the curve (AUC) in the validation data and examined positive/negative predictive values (P/NPV).

Results:

We extracted and labeled 839 months of follow-up (280 with gout flares). The claims-only model selected 20 variables (AUC=0.69). The NLP concept-only model selected 15 (AUC=0.69). The combined model selected 32 claims variables and 13 NLP concepts (AUC=0.73). The claims-only model had a PPV of 0.64 [0.50,0.77] and an NPV of 0.71 [0.65,0.76], whereas the combined model had a PPV of 0.76 [0.61,0.88] and an NPV of 0.71 [0.65,0.76].

Conclusion:

Adding NLP concept variables to claims variables resulted in a small improvement in the identification of gout flares. Our data-driven claims-only model and our combined claims/NLP-concept model outperformed existing rule-based claims algorithms reliant on medication use, diagnosis, and procedure codes.

PLAIN LANGUAGE SUMMARY:

Gout flares (attacks) are an important outcome in the study of gout treatment. But determining gout flares at scale in large databases is difficult because of gout flares’ transient and recurrent nature. Diagnostic codes typically used in these databases are not granular enough. We examined whether natural language processing (NLP), a collection of techniques to utilize free text descriptions of patient status, can improve the identification of gout flares. We used a dataset that linked the Medicare and a local electronic health record (EHR). We fully reviewed the EHR notes from 500 patients to determine whether these patients had gout flares in each month of their follow-up. We trained prediction models with diagnostic codes only, NLP concepts only, and both combined to identify the gout flare status for each month. We found that although the model with diagnostic codes only and model with NLP concepts only performed similarly, the combined model provided better performance in identifying gout flares.

INTRODUCTION

Gout is a chronic metabolic disease characterized by chronic elevation of serum urate (SU) and acute, painful episodic arthritis (called a gout flare) caused by monosodium urate (MSU) crystal deposition in the joints. (1,2) The goal of gout management is to reduce serum urate with urate-lowering therapy (ULT) to prevent further deposition of MSU crystals. (3) Chronic suppression of SU below the crystallization level (6.8 mg/dl) is thought to result in reduction of existing MSU crystals, (2) thereby resulting in a reduction in gout flares in the long run.

Although such a treat-to-target (TTT) approach to ULT to drive the SU below the crystallization level is advocated by rheumatology societies, the actual impact of SU suppression on the gout flare outcome in routine practice is not fully understood, leaving room for studies utilizing routinely collected data. However, claims study of the impact of ULT on the gout flare outcome is hampered by the difficulty associated with the transient and recurrent nature of gout flares. The International Classification of Diseases (ICD) 9 and 10 have codes for “gout,” but it is used for both acute presentation of a gout flare and the chronic diagnosis of gout (typically used for chronic maintenance visits to reimburse SU measurements and ULT). Thus, the diagnosis code alone is not sufficient for the ascertainment of the gout flare outcome.

We previously developed several rule-based algorithms using claims information on medications and procedures (4) that could be used for gout flares (e.g., non-steroidal anti-inflammatory drugs [NSAIDs] and joint aspiration) with a varying degree of temporal proximity to an ICD-9 code for gout (ICD9 274.xx). Upon the assessment of their accuracy against electronic health record (EHR) review, the positive predictive values (PPVs) for the true gout flares ranged from 50.0% (ICD-9 gout code + NSAIDs/Cox-2 selective inhibitor) to 68.4% (ICD-9 gout code + same-day procedure code for a relevant procedure). The sensitivity and specificity were not assessed in this previous study. (4)

We hypothesized that information from the unstructured EHR data will contribute information in addition to claims data that can improve capture of gout flares. Thus, in this study, we combined data extracted using natural language processing (NLP) from the EHR data with claims data to identify gout flares in the setting of comparative effectiveness research using a combined EHR-claims data source.

METHODS

Data source and eligibility

We used Medicare claims data (fee-for-service, 2007–2016 Parts A/B/D) linked to EHR data from an academic medical care network in Massachusetts (Mass General Brigham). (5) We identified patients diagnosed with gout based on the gout diagnostic code (ICD9 274.00–03, 81, 82, 39, 9; ICD10 M10) and newly initiated ULT (either allopurinol or febuxostat). The medication initiation requirement was added to replicate a comparative effectiveness study. We required at least 365 days of continuous enrollment for medical and medication coverage allowing for gaps of ≤ 30 days, age ≥ 65 years at the index, and no use of any ULT during the 365-day washout. We also required at least one EHR encounter during the 365-day baseline period to ensure interactions with the Mass General Brigham system. We assessed the mean capture proportion (6) to assess the proportion of encounters that occurred within the Mass General Brigham system among all encounters for these patients.

Defining the concept and operationalization of gout flares

The gold standard for the gout flare outcome was obtained through medical record review. A gout flare is conceptually defined as “a clinically evident episode of acute inflammation induced by MSU crystals” by the Gout, Hyperuricemia, and Crystal-Associated Disease Network (G-CAN) ‘s consensus statement regarding labels and definitions of disease states of gout. (7) G-CAN uses the term “gout flare” for both initial and recurrent acute arthritis induced by MSU crystals. No operational definition (how to determine it in practice) is given in the G-CAN consensus.

For patients with an established diagnosis of gout, the American College of Rheumatology (ACR) has created a provisional definition of gout flares based on patient reports. (8) Having a score of 3 or more on the 4-point scale consisting of the presence of any patient-reported warm joint, any patient-reported swollen joint, patient-reported pain at rest score of >3 (0–10 scale), and patient-reported flare. This provisional definition demonstrated a sensitivity of 85% and a specificity of 95% in an external validation against an expert diagnosis of gout flares. (9)

Since our labeling effort was based on a retrospective review of EHR, we could not take such a criteria-based approach (8,10) due to the lack of consistent reporting of patient history, physical examination, and laboratory and imaging findings in EHR. Thus, we defined a “definite gout flare” (in the note description) as an explicit mention by the treating physician (gout flare, gout attack, gout arthritis, gout synovitis). We defined a “definite no gout flare” (in the note description) as no relevant symptoms suggestive of gout flare (such as pain in joints or bursas) OR, in the presence of a symptom, an explicit mention by the treating physician of an alternative diagnosis. A mention of a suspected diagnosis of gout flare was considered a “possible gout flare.”

EHR review for gout flare labeling

Due to its episodic and recurrent nature, labeling for gout flares was performed at the at the note level rather than at the patient level. For each note, only an ongoing gout flare at the time of the encounter was considered to avoid labeling descriptions of gout flares in the past as positive. As a result, the target of prediction was a gout flare that resulted in a health care encounter during the episode. After random sampling 500 patients from the full cohort, we devised several measures to improve the efficiency of EHR review for gout flares. We first determined the most useful time frame for a comparative effectiveness study to be the 12-month lookback period prior to the ULT initiation and ULT on-treatment follow-up. As a clinical encounter for gout (either for a gout flare or for chronic management) is likely to occur with a claims ICD9/10 code, we further segmented the longitudinal data by creating 1-month time segments and identified time segments containing at least one ICD9/10 gout code.

Textual descriptions relevant to gout flare determination were manually reviewed by research assistants for each note assisted by the Chart Review Tool Powered by NLP (CHANL) (10). The CHANL tool automatically searched and highlighted all synonyms of the clinically chosen seed terms (gout; flare; gout flare; joint pain; bursitis; synovitis; podagra; gonagra; big toe; monosodium urate crystals; MSU; uric acid; redness; warm; activity limitation; functional limitation; colchicine; colcrys; glucocorticoid; depo-medrol; intraarticular injection; allopurinol; NSAIDs; indomethacin) through the Unified Medical Language System (UMLS).

The extracted text was clinically adjudicated by a physician reviewer according to the aforementioned rules. We then mapped the labels to the longitudinal data of 1-month (30-day) precision. The 1-month time segments were anchored to ULT initiation rather than calendar months. Any presence of notes with definite gout flares in a 1-month time segment was considered a definite gout flare during the 1-month period. Any presence of notes with possible gout flares without notes with definite gout flares was considered possible gout flare during the 1-month period. A 1-month period only containing notes labeled as definite no gout flares was considered a 1-month period without gout flares. Any 1-month periods not containing any notes could not be labeled and were not included in the subsequent analyses.

Variables for predictive modeling

We constructed a set of predictors both from the claims data and the EHR data. From the claims data, we included demographics, comorbidities, relevant medications, health care utilization, and laboratory orders (a total of 93 variables). For some variables, such as medications, we allowed time-varying definitions for each 1-month period. The unstructured EHR data were processed using NLP to extract concepts related to a gout flare. This list of concepts was created using named entity recognition software to process gout-related online articles on gout and approximately 100 EHR notes describing gout flares to identify relevant concepts. As a result, we obtained a list of 88 concepts (Supplementary Table 1). Each concept extracted from the narrative notes was aggregated for each 1-month period and was deemed present if any of the notes during the 1-month period contained the concept.

Predictive modeling and evaluation

We then used 60% of patients to train predictive models with the least absolute shrinkage and selection operator (LASSO; SAS 9.4 HPGENSELECT procedure) using claims variables only, EHR medical concepts only, and both combined. The shrinkage penalty was chosen by the Akaike information criteria (AIC). We evaluated the added value of medical concepts by comparing the area under the curve (AUC) of ROCs in the 60% training data as well as a separate 40% validation data (later batches of patients reviewed). As the proposed use involves thresholding the predictive probability into a binary variable for gout flare during each 1-month period, we examined the PPV at predetermined specificity thresholds of 85, 90, 95, and 97%. The previously published rule-based algorithms (4) were also re-evaluated using the same month-level precision data. As we required at least one ICD9/10 gout code in each 1-month period, only the additional part of the algorithm (gout-related medications and gout-related procedures) was implemented.

RESULTS

Patients

We initially identified a total of 4,402 patients who had an ICD diagnosis of gout and initiated either allopurinol or febuxostat. The baseline characteristics of these ULT initiators are described in Table 2. The mean (SD) age was 76.9 (7.4) years old, and 40% of patients were female. The comorbidity burden was high, with 94% having hypertension, 8% with previous myocardial infarction, and 44% with hospitalization in the past year. The mean capture proportion was 20.9%.

Table 2.

Variables selected by LASSO.

Selected Variables Claims-Only Model NLP Concept-Only Model Combined Claims/NLP Concepts Model
Age
Chronic Kidney Disease
Hyperlipidemia
Recent Heart Failure
Comorbidity Score
ECG Order
Cardiac Stress Test Order
Utilization Cardiology*
Utilization Medication Number
Utilization Outpatient visits
Utilization Rheumatology*
ACE or ARB
Calcium Channel Blockers
Insulin
Nitrates
NSAIDs
Statin
Glucocorticoids
Time-varying Coronary Artery Disease
Time-varying COPD
Time-varying Diabetes
Time-varying Hyperlipidemia
Time-varying Myocardial Infarction
Time-varying Peripheral Vascular Disease
Time-varying Stroke
Time-varying Comorbidity Score
Time-varying ECG Order
Time-varying Lipid Order
Time-varying colchicine
Time-varying NSAIDs
Time-varying Opioids
Time-varying Oral Glucocorticoids
NLP Concept Alcohol
NLP Concept Allopurinol
NLP Concept Arthralgia
NLP Concept Rheumatoid Arthritis
NLP Concept Colchicine
NLP Concept Diabetes Mellitus
NLP Concept Diarrhea
NLP Concept Gout
NLP Concept Heart Failure
NLP Concept Pain
NLP Concept Stroke
NLP Concept Gout, Primary
NLP Concept Procedure
NLP Concept Urate Measurement
NLP Concept Flare
*

Past utilization of these specialties. Not the specialty of the provider who described the gout flare.

Abbreviations: LASSO: Least Absolute Shrinkage and Selection Operator; NLP: Natural language processing; ECG: Electrocardiogram; ACE: Angiotensin-converting enzyme; ARB: Angiotensin receptor blocker; NSAIDs: Non-steroidal anti-inflammatory drugs; COPD: Chronic obstructive pulmonary disease.

EHR review and yield

We reviewed information from a random sample of 500 patients. We designated the first 300 patients reviewed as the training data and the last 200 patients reviewed as the validation data. As we enriched the notes for gout flares by focusing on the 12-month pre-ULT initiation period and on-treatment follow-up and also the months with gout claims ICD9/10 codes, we extracted 2,074 notes from 179 patients (the remaining 121 patients did not have notes in the chosen periods) in the training data and 1,408 notes from 118 patients (the remaining 82 patients did not have notes in the chosen periods) in the validation data. These notes mapped to 496 months (161 months with gout flares) in the training data and 343 months (119 months with gout flares) in the validation data.

Predictive model training

Model selection via LASSO with each set of variables (claims-only, NLP concepts-only, and both combined) resulted in the selected variables in Table 2 (coefficients in Supplementary Table 2). Among the 20 selected variables in the claims-only model were time-varying medications (NSAIDs, colchicine, oral glucocorticoids, and opioids) that may be used for the treatment of gout flares. Interestingly, cardiovascular-related variables, such as hyperlipidemia and electrocardiogram orders, were also selected. The NLP concept-only model selected 15 CUIs, corresponding to expected NLP concepts, such as “gout” (C0018099), “flare” (C1517205), “arthralgia” (C0003862). NLP concepts for potential inciting events, such as “stroke” (C0038454), “heart failure” (C0018802), and “alcohol” (C0001962), also appeared among them. The combined claims/NLP-concept model selected 32 claims variables and 13 NLP concepts. All NLP concepts in the NLP concept-only model were selected except for “diabetes mellitus” and “stroke”, which may have become duplicative to the corresponding claims variables.

Predictive model evaluation

The performance assessment within the 60% training data resulted in an AUC of 0.761 for the claims-only model, 0.773 for the NLP concept-only model, and 0.853 for the combined claims/NLP-concept model. We then applied the estimated coefficients to the 40% validation data. The corresponding validation ROCs are shown in Figure 1. The claims-only model (AUC 0.693) and the NLP concept-only model (AUC 0.688) again showed similar discrimination performance, although both were attenuated. The combined claims/NLP-concept model showed a small improvement in the AUC of 0.731, although the confidence interval included the AUCs of the first two models.

Figure 1.

Figure 1.

Validation area under the curve (AUC) and 95% confidence interval (CI) for each model.

We then evaluated the models at probability thresholds that gave the target specificity of 60, 70, 80, 85, 90, 95, and 97% (Table 3; Supplementary Spreadsheet for the full results). The best PPV and NPV were achieved at the specificity threshold of 90% for the claims-only model, which gave a PPV of 64%, NPV of 70%, and sensitivity of 30%. The NLP-only model achieved the best PPV and NPV at the specificity threshold of 97% for the claims-only model, which gave a PPV of 80%, NPV of 69%, and sensitivity of 20%. The best PPV and NPV were achieved at the specificity threshold of 95% for the combined claims/NLP-concept model, which had a PPV of 76%, NPV of 71%, and sensitivity of 27%. The sensitivity of the combined claims/NLP-concept model was low (27%), but it was better than the corresponding sensitivity of the claims-only model (sensitivity of 11% at the specificity threshold of 95%).

Table 3.

Performance of the models at different specificity thresholds (60, 70, 80, 85, 90, 95, 97%).

Prob. Cutoff TP TN FP FN Sens Spec PPV NPV
Claims-Only Model
0.55 9 212 6 110 0.08 [0.04, 0.14] 0.97 [0.94, 0.99] 0.60 [0.32, 0.84] 0.66 [0.60, 0.71]
0.50 13 208 10 106 0.11 [0.06, 0.18] 0.95 [0.92, 0.98] 0.57 [0.34, 0.77] 0.66 [0.61, 0.71]
0.42 36 198 20 83 0.30 [0.22, 0.39] 0.91 [0.86, 0.94] 0.64 [0.50, 0.77] 0.70 [0.65, 0.76]
0.39 41 186 32 78 0.34 [0.26, 0.44] 0.85 [0.80, 0.90] 0.56 [0.44, 0.68] 0.70 [0.65, 0.76]
0.36 52 175 43 67 0.44 [0.35, 0.53] 0.80 [0.74, 0.85] 0.55 [0.44, 0.65] 0.72 [0.66, 0.78]
0.32 69 155 63 50 0.58 [0.49, 0.67] 0.71 [0.65, 0.77] 0.52 [0.43, 0.61] 0.76 [0.69, 0.81]
0.31 78 132 86 41 0.66 [0.56, 0.74] 0.61 [0.54, 0.67] 0.48 [0.40, 0.55] 0.76 [0.69, 0.82]
NLP-Only Model
0.57 24 212 6 95 0.20 [0.13, 0.29] 0.97 [0.94, 0.99] 0.80 [0.61, 0.92] 0.69 [0.64, 0.74]
0.54 33 208 10 86 0.28 [0.20, 0.37] 0.95 [0.92, 0.98] 0.77 [0.61, 0.88] 0.71 [0.65, 0.76]
0.45 38 197 21 81 0.32 [0.24, 0.41] 0.90 [0.86, 0.94] 0.64 [0.51, 0.76] 0.71 [0.65, 0.76]
0.43 39 186 32 80 0.33 [0.24, 0.42] 0.85 [0.80, 0.90] 0.55 [0.43, 0.67] 0.70 [0.64, 0.75]
0.40 47 176 42 72 0.39 [0.31, 0.49] 0.81 [0.75, 0.86] 0.53 [0.42, 0.63] 0.71 [0.65, 0.77]
0.37 61 153 65 58 0.51 [0.42, 0.61] 0.70 [0.64, 0.76] 0.48 [0.39, 0.57] 0.73 [0.66, 0.78]
0.32 73 132 86 46 0.61 [0.52, 0.70] 0.61 [0.54, 0.67] 0.46 [0.38, 0.54] 0.74 [0.67, 0.80]
Combined Claims/NLP Concept Model
0.68 19 212 6 100 0.16 [0.10, 0.24] 0.97 [0.94, 0.99] 0.76 [0.55, 0.91] 0.68 [0.62, 0.73]
0.59 32 208 10 87 0.27 [0.19, 0.36] 0.95 [0.92, 0.98] 0.76 [0.61, 0.88] 0.71 [0.65, 0.76]
0.50 41 198 20 78 0.34 [0.26, 0.44] 0.91 [0.86, 0.94] 0.67 [0.54, 0.79] 0.72 [0.66, 0.77]
0.45 49 186 32 70 0.41 [0.32, 0.51] 0.85 [0.80, 0.90] 0.60 [0.49, 0.71] 0.73 [0.67, 0.78]
0.42 58 175 43 61 0.49 [0.39, 0.58] 0.80 [0.74, 0.85] 0.57 [0.47, 0.67] 0.74 [0.68, 0.80]
0.35 66 153 65 53 0.55 [0.46, 0.65] 0.70 [0.64, 0.76] 0.50 [0.42, 0.59] 0.74 [0.68, 0.80]
0.27 89 131 87 30 0.75 [0.66, 0.82] 0.60 [0.53, 0.67] 0.51 [0.43, 0.58] 0.81 [0.74, 0.87]
Rule-Based Algorithms
Gout-Related Medications 88 113 111 31 0.74 [0.44, 0.57] 0.50 [0.65, 0.82] 0.44 [0.37, 0.51] 0.78 [0.71, 0.85]
Colchicine 51 155 69 68 0.43 [0.63, 0.75] 0.69 [0.34, 0.52] 0.42 [0.34, 0.52] 0.70 [0.63, 0.75]
NSAID/Cox-2 selective 17 197 27 102 0.14 [0.83, 0.92] 0.88 [0.09, 0.22] 0.39 [0.24, 0.55] 0.66 [0.60, 0.71]
Oral Glucocorticoids 45 191 33 74 0.38 [0.80, 0.90] 0.85 [0.29, 0.47] 0.58 [0.46, 0.69] 0.72 [0.66, 0.77]
Gout-Related Procedures 18 207 17 101 0.15 [0.88, 0.96] 0.92 [0.09, 0.23] 0.51 [0.34, 0.69] 0.67 [0.62, 0.72]
No Gout-Related Medications 31 115 109 88 0.26 [0.45, 0.58] 0.51 [0.18, 0.35] 0.22 [0.16, 0.30] 0.57 [0.50, 0.64]

Abbreviations: Prob. Cutoff: probability cutoff for the predicted probability of gout flares; TP: true positive count; TN: true negative count; FP: false positive count; FN: false negative count; Sens: sensitivity; Spec: specificity; PPV: positive predictive value; NPV: negative predictive value.

Point estimate [95% Confidence Interval] for sensitivity, specificity, PPV, and NPV.

The re-evaluation of the previously proposed rule-based algorithms to detect gout flares gave PPVs ranging from 39% to 58% and NPVs ranging from 67% to 78%. The best trade-off was seen with the additional requirement of oral glucocorticoid use, which gave the highest PPV (58%) with a reasonable NPV (72%). This best PPV with the oral glucocorticoid algorithm was lower than PPVs achieved by the models (64% for the claims-only model, 80% for the NLP-only model, and 76% for the combined claims/NLP-concept model).

DISCUSSION

Studying the comparative effectiveness of gout treatment in large-scale real-world data is hampered by the difficulty of ascertaining gout flares. Studying gout flares is particularly challenging as it requires the ability to detect a change in state over time, e.g., among patients with gout, identify periods when they are experiencing a gout flare vs when they are in the intercritical period. In this study, we observed that including information regarding gout flares from narrative notes can improve the ability to classify gout flares with higher accuracy compared to claims data alone. In contrast to general phenotyping approaches which use cross-sectional data to determine a prevalent phenotype, in this study, each subject’s timeline was segmented by months to determine within each period, their probability of having a gout flare. This framework also enabled application of existing methods for comparative effectiveness studies.

Phenotyping using algorithms is increasingly used to improve the throughput of real-world data studies. For example, there are rheumatology examples incorporating NLP with diagnosis code data in rheumatoid arthritis, (12) ankylosing spondylitis, (13) and history of pseudogout (14). These NLP models increased the classification performance over claims-only (typically rule-based) algorithms. Identifying gout flare among a population of patients with gout is more challenging as it requires identifying the timing of the flare. As well, identifying gout flares requires finding data that can distinguish gout flare from prevalent gout. A gout flare is described as “a clinically evident episode of acute inflammation induced by MSU crystals.” The period in between gout flares is called an intercritical period. The prevalent diagnosis of gout is still present during an intercritical period, but not gout flares. As a result, our chart review for gout flares involved labeling at the note level rather than at the patient level and was much more laborious than for prevalent diagnoses.

We have previously developed claims rule-based algorithms for gout flares. (4) The current study is distinct in several ways. We conducted a more thorough chart review, enabling the assessment of sensitivity, specificity, ROC, PPV, and NPV. Our previous chart review was selective and only allowed for the assessment of PPV for the rule-in algorithms and NPV for rule-out algorithms. Also, our current approach utilized penalized regression modeling to probabilistically assess gout flares rather than rule-based approach reliant on medication use, diagnosis, and procedure codes alone, even for the claims-only model. The claims-only model (PPV 64.3%) performed better than rule-based algorithms (PPVs 39–58% in the reassessment in the current cohort). The addition of NLP concepts to the model development process allowed a further improvement in the model performance in the validation dataset with a PPV of 76% and an NPV of 71% at a specificity of around 95%.

Compared to the previous rule-based attempt, we used a data-driven approach to the variable selection process. The NLP concept variables relevant for gout flares were initially extracted from online medical articles on gout management. We then curated these variables based on our pilot EHR note data. Both NLP concept variables and claims variables were subject to further selection by LASSO. Nonetheless, the selection process resulted in variables that have some similarity to the previous rule-based algorithms, which were derived from clinical reasoning. For example, both claims NSAIDs and oral glucocorticoids were included in the final combined claims/NLP-concept model. The use of NLP concepts, however, allowed for capturing more subtle concepts that are not directly available in claims, such as pain, arthralgia, and flare (generic flare concept). Our data-driven variable selection resulted in a much larger set of variables, which seem to represent concurrent conditions, inciting conditions for gout flare, or resulting conditions. Diabetes and hyperlipidemia are common comorbidities among gout patients and may have been reported together when documenting a gout flare.

Stroke, myocardial infarction, and heart failure were also selected in the final model. These conditions may have been the inciting events to gout flares. Unexpectedly, the final model included the diarrhea NLP concept, which may be a consequence of colchicine use, which remained both as a claims and NLP concept variables.

The strengths of our study include the robust pipeline of model development, which included the construction of the relevant NLP concept dictionary, NLP-assisted chart review, data-driven model construction, and gout flare prediction with a month-level precision. As a result, the models were richer in the features incorporated than possible with clinical reasoning only, and we were able to achieve a higher PPV than our previous claims-only approach. Nonetheless, some limitations must be acknowledged. Our investigation was limited to a linked Medicare and local EHR dataset from a single academic medical care network. As a result, the sample size was relatively modest. Also how the model may perform in a younger cohort with fewer comorbidities or non-academic practice settings remains to be seen. The need for both claims and EHR data to predict gout flare limits the use of the best performing combined claims/NLP-concept model to settings with the availability of both. Our enrichment process, i.e., narrowing down of data points to be included in the model development process, selected follow-up months with ICD codes for gout. This is reasonable screening as gout flares requiring clinical encounters likely generate ICD codes for gout. However, gout flares that were self-managed with stockpiled glucocorticoids and over-the-counter medications could have been excluded from the model development process, making our prediction target “gout flares that resulted in clinical encounters during the episodes.” It must be acknowledged that mean capture proportion was relatively low at 20.9%, meaning many healthcare encounters occurred outside the Mass General Brigham system. This may explain relatively small gains in performance of our model by NLP concepts. Further, a gout flare that started in a 1-month period but resulted in an encounter in the following 1-month period is labeled only for the latter 1-month period. In possible instances of gout ICD codes and flare notes falling into two separate 1-month periods, the flare would have been missed. We restricted our eligible cohort to those with non-tophaceous gout ICD codes. Thus, the performance may not be generalizable to chronic tophaceous gout patients with persistent arthritis rather than episodic flares. We only focused on NLP features in the notes, however meta-data from the notes, such as the provider specialty may provide predictive.

In summary, we used NLP techniques to develop and validate data-driven gout flare algorithms for the linked claims-EHR dataset. Both claims-only and NLP concept-only models performed better than our previously developed claims-only rule-based algorithm (4) in their PPV with similar NPV. The combined claims/NLP-concept model had a better accuracy with a sensible data-driven selection of predictor variables. This combined claims/NLP-concept model may improve the feasibility of comparative effectiveness studies of gout treatment strategies in real-world linked claims-EHR data.

Supplementary Material

1

Table 1.

Baseline characteristics of the entire study cohort of urate-lowering therapy initiators. Baseline characteristics of the study cohort of urate-lowering therapy initiators.

N 4,402
Age, years 76.9 (7.4)
White 89.2%
Female 39.8%
Comorbidities
- Coronary artery disease 29.7%
- Chronic kidney disease 44.3%
- Chronic obstructive pulmonary disease 21.6%
- Diabetes mellitus 44.4%
- Myocardial infarction 8.0%
- Hypertension 93.9%
- Obesity 19.6%
- Stroke 8.84%
Medications used in prior 365 days
- ACEI/ARB 63.3%
- BB 62.3%
- CCB 37.3%
- Diuretics 68.2%
- Non-insulin anti-diabetics 21.2%
- NSAIDs 32.0%
- Anticoagulants 28.1%
- Antiplatelet 11.6%
- Colchicine 43.1%
- Statins 67.7%
- Glucocorticoids 42.4%
Healthcare utilization in prior 365 days
- Any hospitalization 44.3%
- HbA1c ordered 50.6%
- SU ordered 78.4%
- Number of cardiology visit, median [IQR] 1 [0, 2]
- Cumulative number of medications, median [IQR] 13 [9, 18]

Abbreviations: ACEI: angiotensin converting enzyme inhibitor; ARB: angiotensin receptor blocker; BB: beta-blockers; CCB: calcium channel blockers; NSAID: non-steroidal anti-inflammatory drug; HbA1c: hemoglobin A1c; SU: serum urate; IQR: interquartile range.

KEY POINTS:

  • Large scale comparative effectiveness studies of gout treatment are hampered by the insufficiency of claims data for ascertaining gout flares, which are transient and recurrent.

  • We examined the additional value of natural language processing (NLP) concepts extracted from electronic health records (EHR) in identifying gout flares at the month-level precision using data-driven LASSO models.

  • Although claims-only model and NLP concept-only models performed similarly, the combined model provided a somewhat improved performance in identifying gout flares.

FUNDING, GRANT/AWARD INFORMATION:

The study was supported by R01 AR073314 (PI: SCK). KY received support from the Brigham and Women’s Hospital Department of Medicine Fellowship Award, NIAMS K23 AR076453, and NIAMS R03 AR081309. TC, DHS, and KPL receive funding from NIAMS P30AR072577.

Kazuki Yoshida has received consulting fees from OM1, Inc. for unrelated work.

Tianrun Cai has nothing to disclose.

Lily G. Bessette has nothing to disclose.

Erin Kim has nothing to disclose.

Su Been Lee has nothing to disclose.

Luke E. Zabotka has nothing to disclose.

Alec Sun has nothing to disclose.

Julianna M. Mastrorilli has nothing to disclose.

Theresa A. Oduol has nothing to disclose.

Jun Liu has nothing to disclose.

Daniel H. Solomon has received research support from Abbvie, Amgen, CorEvitas, Genentech, and Moderna for unrelated work. As well, he receives royalties from UpToDate on unrelated work.

Seoyoung C. Kim: has received research grants to the Brigham and Women’s Hospital from Pfizer, Roche, AbbVie, and Bristol-Myers Squibb for unrelated work.

Rishi J. Desai has received research grants to the Brigham and Women’s Hospital from Vertex and Bristol-Myers Squibb for unrelated work.

Footnotes

COMPETING INTERESTS:

Katherine P. Liao has nothing to disclose.

Bibliography

  • 1.Neogi T. Clinical practice. Gout. N Engl J Med 2011;364:443–452. [DOI] [PubMed] [Google Scholar]
  • 2.Dalbeth N, Merriman TR, Stamp LK. Gout. Lancet 2016;388:2039–2052. [DOI] [PubMed] [Google Scholar]
  • 3.FitzGerald JD, Dalbeth N, Mikuls T, Brignardello‐Petersen R, Guyatt G, Abeles AM, et al. 2020 American College of Rheumatology Guideline for the Management of Gout. Arthritis & Rheumatology 2020;72:879–895. [DOI] [PubMed] [Google Scholar]
  • 4.MacFarlane LA, Liu C-C, Solomon DH, Kim SC. Validation of claims-based algorithms for gout flares. Pharmacoepidemiol Drug Saf 2016;25:820–826. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Murphy SN, Chueh HC. A security architecture for query tools used to access large biomedical databases. Proc AMIA Symp 2002:552–556. [PMC free article] [PubMed] [Google Scholar]
  • 6.Lin KJ, Glynn RJ, Singer DE, Murphy SN, Lii J, Schneeweiss S. Out-of-system Care and Recording of Patient Characteristics Critical for Comparative Effectiveness Research. Epidemiology 2018;29:356–363. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Bursill D, Taylor WJ, Terkeltaub R, Abhishek A, So AK, Vargas-Santos AB, et al. Gout, Hyperuricaemia and Crystal-Associated Disease Network (G-CAN) consensus statement regarding labels and definitions of disease states of gout. Ann Rheum Dis 2019;78:1592–1600. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Gaffo AL, Schumacher HR, Saag KG, Taylor WJ, Dinnella J, Outman R, et al. Developing a provisional definition of flare in patients with established gout. Arthritis Rheum 2012;64:1508–1517. [DOI] [PubMed] [Google Scholar]
  • 9.Gaffo AL, Dalbeth N, Saag KG, Singh JA, Rahn EJ, Mudano AS, et al. Brief Report: Validation of a Definition of Flare in Patients With Established Gout. Arthritis & Rheumatology (Hoboken, NJ) 2018;70:462–467. [DOI] [PubMed] [Google Scholar]
  • 10.Neogi T, Jansen TLTA, Dalbeth N, Fransen J, Schumacher HR, Berendsen D, et al. 2015 Gout Classification Criteria: an American College of Rheumatology/European League Against Rheumatism collaborative initiative. Arthritis Rheumatol 2015;67:2557–2568. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Cai T. Chart Review Tool Powered by NLP (CHANL). Available at: https://celehs.github.io/CHANL.html. Accessed April 26, 2022. [Google Scholar]
  • 12.Liao KP, Cai T, Gainer V, Goryachev S, Zeng-treitler Q, Raychaudhuri S, et al. Electronic medical records for discovery research in rheumatoid arthritis. Arthritis Care Res (Hoboken) 2010;62:1120–1127. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Zhao SS, Hong C, Cai T, Xu C, Huang J, Ermann J, et al. Incorporating natural language processing to improve classification of axial spondyloarthritis using electronic health records. Rheumatology 2020;59:1059–1065. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Tedeschi SK, Cai T, He Z, Ahuja Y, Hong C, Yates KA, et al. Classifying Pseudogout Using Machine Learning Approaches With Electronic Health Record Data. Arthritis Care & Research 2021;73:442–448. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

1

RESOURCES