Classifying Pseudogout using Machine Learning Approaches with Electronic Health Record Data

Sara K Tedeschi; Tianrun Cai; Zeling He; Yuri Ahuja; Chuan Hong; Katherine A Yates; Kumar Dahal; Chang Xu; Houchen Lyu; Kazuki Yoshida; Daniel H Solomon; Tianxi Cai; Katherine P Liao

doi:10.1002/acr.24132

. Author manuscript; available in PMC: 2022 Mar 1.

Published in final edited form as: Arthritis Care Res (Hoboken). 2021 Mar;73(3):442–448. doi: 10.1002/acr.24132

Classifying Pseudogout using Machine Learning Approaches with Electronic Health Record Data

Sara K Tedeschi ^1,², Tianrun Cai ^1,², Zeling He ^1,³, Yuri Ahuja ², Chuan Hong ³, Katherine A Yates ², Kumar Dahal ¹, Chang Xu ¹, Houchen Lyu ¹, Kazuki Yoshida ^1,², Daniel H Solomon ^1,², Tianxi Cai ^3,^4,^*, Katherine P Liao ^1,^2,^4,^*

PMCID: PMC7338229 NIHMSID: NIHMS1066373 PMID: 31910317

Abstract

Objective:

Identifying pseudogout in large datasets is difficult due to its episodic nature and lack of billing codes specific to this acute subtype of calcium pyrophosphate (CPP) deposition disease. We evaluated a novel machine learning approach for classifying pseudogout using electronic health record (EHR) data.

Methods:

We created an EHR data mart of patients with ≥1 relevant billing code or ≥2 natural language processing (NLP) mentions of pseudogout or chondrocalcinosis, 1991–2017. We selected 900 for gold standard chart review for: (1) definite pseudogout, synovitis+synovial fluid CPP crystals; (2) probable pseudogout, synovitis+chondrocalcinosis; (3) not pseudogout. We applied a topic modeling approach to identify definite/probable pseudogout. A combined algorithm included topic modeling plus manually-reviewed CPP crystal results. We compared algorithm performance and cohorts identified by: (1) billing codes, (2) presence of CPP crystals, (3) topic modeling, (4) combined algorithm.

Results:

Among 900 subjects, 123 (13.7%) had pseudogout by chart review (68 definite, 55 probable). Billing codes had sensitivity 65% and PPV 22% for pseudogout. Presence of CPP crystals had sensitivity 29% and PPV 92%. Without using CPP crystal results, topic modeling had sensitivity 29% and PPV 79%. The combined algorithm yielded sensitivity 42% and PPV 81%. The combined algorithm identified 50% more patients than presence of CPP crystals; the latter captured a portion of definite pseudogout and missed probable pseudogout.

Conclusion:

For pseudogout, an episodic disease with no specific billing code, combining NLP, machine learning methods, and synovial fluid lab results yielded an algorithm that significantly boosted PPV compared to billing codes.

Keywords: algorithm, CPPD, pseudogout, machine learning

INTRODUCTION

Pseudogout, also called acute calcium pyrophosphate (CPP) crystal arthritis, represents the acute inflammatory subtype of calcium pyrophosphate deposition disease (CPPD).¹ The incidence of pseudogout has not been well characterized, even among the 8–10 million adults in the United States with CPPD.² While pseudogout was first recognized in 1962, nearly 60 years later our understanding of risk factors for and long-term outcomes of this inflammatory arthritis remain limited. One of the main challenges in studying pseudogout epidemiology is accurately identifying pseudogout in large datasets. The lack of specific billing codes for this acute subtype of CPPD poses a major challenge to identifying the disease in large administrative datasets. We previously reported that a published billing code algorithm for CPPD had very low positive predictive value (PPV) (18%) for the pseudogout phenotype in an academic medical center electronic health record (EHR) dataset.³

Machine learning approaches incorporating information from narrative EHR notes provide an opportunity to more accurately classify patients with rare or episodic diseases for which billing codes may not exist or are not accurate. Rich clinical data documented in narrative notes can be transformed into a structured format using natural language processing (NLP) techniques. Counts of pertinent clinical “concepts” or term mentions (e.g. “pseudogout”) can then be included in machine learning algorithms together with other structured data such as billing codes, laboratory data, and prescriptions.

Previous studies have validated several machine learning approaches for phenotyping chronic conditions.⁴ Pseudogout poses unique challenges due to its lack of specific billing codes as well as its episodic nature, which means that documentation about pseudogout in the EHR is sparse. We hypothesized that leveraging the semantic structure of the data (i.e. the interconnectedness of pseudogout-related words in narrative notes) may help to optimize an algorithm for identifying pseudogout. Topic modeling is a type of statistical modeling that can be used to identify structure, or “topics,” in a dataset.⁵ Traditionally used to discover topics in bodies of text, such as “politics” in a set of newspaper articles, here we use it to identify discussion of pseudogout in narrative notes. We integrate topic modeling with our published machine learning approaches for phenotyping to identify a cohort of definite/probable pseudogout cases from EHR data, and compare this approach to using billing codes or manually-reviewed laboratory data.

METHODS

Overview

Our approach (Figure 1) includes (1) applying an initial filter containing billing codes relevant to pseudogout to create a preliminary data mart enriched for pseudogout cases and obtaining gold standard labels by medical record review, (2) curating EHR data and applying NLP techniques to extract features from narrative notes for patients in the data mart, (3) applying a second filter to further enrich the data mart for pseudogout cases, (4) identifying additional gold-standard labels for patients in the final data mart, (5) applying a topic modeling approach to predict the probability of pseudogout, and (6) evaluating algorithm performance and comparing cohorts.

Data source

We developed our algorithm using the Partners HealthCare Research Patient Data Repository (RPDR), 1991–2017. Partners RPDR includes EHR data for 5.5 million patients from two large academic medical centers—Brigham and Women’s Hospital and Massachusetts General Hospital—and their affiliated community hospitals, community health centers, and primary care practices.⁶ In an initial attempt to increase the prevalence of pseudogout in our data mart – which improves algorithm performance^7,8 – we identified patients with ≥1 relevant billing code (e.g., 712.x for chondrocalcinosis) or a simple text search of narrative notes to form a preliminary data mart (Figure 1).

EHR data extraction

Among 50,062 patients in the preliminary data mart, we obtained all narrative notes (such as clinic visits, discharge summaries, radiology reports, and pathology reports), selected laboratory data (e.g. synovial fluid crystal analysis, parathyroid hormone, magnesium), and prescriptions for relevant medications (e.g. NSAIDs, oral steroids, colchicine). Synovial fluid crystal analysis performed by the hospital laboratory was provided as structured data (ever/never performed). Among >10,000 patients with synovial fluid crystal analysis performed by the laboratory, the presence or absence of synovial fluid CPP crystals was determined by manual review of laboratory data because these results were recorded as free text rather than in structured data fields.

Natural language processing (NLP) to extract information from narrative notes

To transform information from narrative notes into structured data, we applied NLP. The National Library of Medicine maintains a database of medical concepts, the Unified Medical Language System (UMLS).⁹ We first identified a list of 73 UMLS medical concepts relevant to pseudogout from online knowledge sources such as MedLine and Wikipedia, using an automated method; see Supplement 1 for a complete list of these concepts.¹⁰ NILE software for NLP was applied to 9,756,936 narrative notes among all patients in the preliminary data mart to count mentions of relevant NLP concepts for each patient.¹¹ Forty-one NLP concepts appeared in >5% of notes containing the NLP concept “pseudogout” and were included in subsequent steps.

Creation of the final data mart

We randomly selected 600 patients from the preliminary data mart for EHR review to estimate pseudogout prevalence in the data mart. Prevalence was 5.5%, which is suboptimal for developing any algorithm as low prevalence limits the positive predictive value (PPV) of an algorithm. To further increase pseudogout prevalence, we used data from this randomly selected group to create a second filter including billing codes and NLP concepts (Figure 1). Our final data mart included 30,089 patients passing the second filter; pseudogout prevalence was 13.7% (see Results).

Pseudogout gold standard labels

We then randomly selected 900 patients from the final data mart for gold standard chart review by one of two reviewers (SKT, KAY) to label as: (1) definite pseudogout, (2) probable pseudogout, or (3) not pseudogout; see Table 1 for definitions. Pseudogout definitions were based on Ryan and McCarty’s proposed diagnostic criteria for CPPD and the 2011 EULAR recommendations for CPPD terminology and diagnosis;^1,12 we required synovitis for both definite and probable pseudogout. Due to the random selection, some of the 900 patients overlapped with the initial 600 patients; only those passing both the first and second filters were included in the 900. All cases of definite or probable pseudogout were confirmed by a board-certified rheumatologist (SKT).

Table 1.

Pseudogout definitions for gold standard medical record review

Definite pseudogout	Synovitis (joint pain, swelling, tenderness, +/− warmth) and Synovial fluid crystal analysis positive for calcium pyrophosphate crystals as documented in laboratory results and/or narrative notes
Probable pseudogout	Synovitis (joint pain, swelling, tenderness, +/− warmth) and 1) Acute onset in the wrist, knee or ankle, and chondrocalcinosis in the affected joint or 2) A rheumatologist or orthopedist thought pseudogout was the most likely diagnosis

Open in a new tab

Definitions were based on Ryan and McCarty’s proposed diagnostic criteria for CPPD and the 2011 EULAR recommendations for CPPD terminology and diagnosis

Topic modeling approach

To identify definite/probable pseudogout vs. not, we applied a novel approach that employs topic modeling followed by penalized regression.⁴ We herein refer to the topic modeling method followed by regression as the “topic modeling approach.”

For common conditions such as diabetes, including the primary ICD billing code for the condition (e.g. 250.00 for diabetes mellitus without mention of complications) and primary NLP concept alone (e.g. “diabetes”) in an algorithm can achieve relatively high PPVs.¹³ However, for episodic or uncommon conditions that may be discussed at only a handful of visits, such as pseudogout, additional features related to the condition may be useful. Topic modeling provides a method for identifying discussion of pseudogout in the EHR by combining information from a wide variety of features, including symptoms (e.g. joint swelling), lab tests (e.g. synovial fluid crystal analysis), and medications. We employed a novel topic modeling method, sureLDA, as it has recently been shown to work well for phenotyping a host of both acute and chronic diseases from EHR data (Ahuja Y, Zhou D, He Z, et al. sureLDA: A novel multi-disease automated phenotyping method for the electronic health record. In revision for JAMIA). This method predicts a pseudogout propensity score (sureLDA score) for each of the 30,089 patients. See Supplement 2 for further details on our topic modeling approach.

To optimize pseudogout prediction combining the unsupervised sureLDA method and gold standard labels, we subsequently developed a supervised regression model including sureLDA score, counts of the NLP concept “pseudogout”, and whether synovial fluid crystal analysis had been performed by the hospital laboratory (ever/never). We used the coefficients from the model to obtain the predicted probability of definite/probable pseudogout (range 0–1) for all 30,089 patients in the final data mart. We defined the probability threshold for classifying a patient as definite/probable pseudogout by setting the specificity at 98%. We used 10-fold cross validation to correct for overfitting.

The topic modeling approach only included EHR features that were available as structured data or via NLP. Thus, information regarding the presence of synovial fluid CPP crystals—which required manual review of laboratory data—was not included in the topic modeling approach. By contrast, whether synovial fluid crystal analysis had ever been performed by the laboratory (regardless of result) was available as structured data and could be included in this approach.

Performance assessment

For comparison, we computed the accuracy of five alternative phenotyping methods for pseudogout: (1) ≥1 relevant ICD-9/10 billing code; (2) ≥3 relevant ICD-9/10 billing codes; (3) presence of synovial fluid CPP crystals; (4) topic modeling approach (described above); (5) a combined algorithm (topic modeling approach and/or presence of synovial fluid CPP crystals). We calculated sensitivity, specificity, PPV, and area under the curve (AUC) for each of the five algorithms based on the 900 gold-standard labels. 10-fold cross-validation was used to correct performance metrics for overfitting bias for algorithms (4) and (5). The F-score, a metric jointly representing the sensitivity and PPV of the algorithm, was calculated as the harmonic mean of sensitivity and PPV. We applied each of the five algorithms to the final data mart (n=30,089) to identify the resultant cohorts and examined similarities and differences across the cohorts.

RESULTS

Among the 900 randomly-selected subjects from the final data mart of 30,089 patients passing both filters (Figure 1), 123 (13.7%) had pseudogout by chart review (68 definite, 55 probable). The presence of ≥1 billing code for chondrocalcinosis and/or other disorders of calcium metabolism, previously used to identify pseudogout patients, had a sensitivity of 65%, specificity of 63%, PPV of 22% and F-score of 32% for definite/probable pseudogout. (Table 2) Requiring ≥3 billing codes had a lower sensitivity of 46%, a higher specificity of 79%, and marginally improved the PPV to 26%. Presence of CPP crystals by manual review of laboratory results had a sensitivity of 29%, PPV of 92%, and F-score of 44%; as expected it was 100% specific because definite pseudogout was defined by the presence of CPP crystals. Without using CPP crystal results, the topic modeling approach had a sensitivity of 29%, specificity of 98%, PPV of 79%, and F-score of 42%. The combined algorithm yielded a higher sensitivity of 42% with similar specificity of 98%, PPV of 81% and F-score of 55%.

Table 2.

Performance of algorithms to identify definite or probable pseudogout in an electronic health record (EHR) dataset

	Performance among gold-standard labels (N=900)					Cases identified in EHR dataset (N=30,089)
Algorithm	Sensitivity	Specificity	PPV	AUC	F-score	Cases identified in EHR dataset (N=30,089)
≥1 billing code^a	0.65	0.63	0.22	0.64	0.32	12,035
≥3 billing codes^a	0.46	0.79	0.26	0.63	0.32	7,213
Presence of CPP crystals^b	0.29	1.00	0.92	0.64	0.44	1,630
Topic modeling approach^c	0.29	0.98	0.79	0.86	0.42	1,870
Combined algorithm: topic modeling approach and/or presence of CPP crystals	0.42	0.98	0.81	0.70	0.55	2,490

Open in a new tab

Among ICD-9 or 10 billing codes for chondrocalcinosis or calcium metabolism disorder: ICD-9 712.1*, 712.2*, 712.3*, 275.49; ICD-10 M11.1*, M11.2*, M11.8*, E83.59. Adapted from Bartels CM, et al. J Clin Rheumatol 2015;21(4):189–92, which only included ICD-9 codes, by also including ICD-10 codes

Presence of synovial fluid CPP crystals was ascertained via manual review of laboratory results recorded as free text in the EHR

Topic modeling approach includes: score for propensity of pseudogout from a topic modeling method (sureLDA) including all relevant features, counts of the NLP concept “pseudogout”, and whether synovial fluid crystal analysis was performed (regardless of result)

When we applied the five algorithms to our final data mart of 30,089 patients, ≥1 billing code alone yielded the largest cohort (n=12,035). However, the low PPV of 22% for ≥1 billing code alone raises concerns about misclassification of many non-pseudogout patients as pseudogout. On the other hand, classifying all subjects with CPP crystals in synovial fluid as pseudogout yielded a high PPV of 92%, but missed 71% of chart-review confirmed cases of definite or probable pseudogout.

Table 3 illustrates important differences between the cohort classified by ≥1 billing code vs. cohorts classified by presence of CPP crystals or by the combined algorithm. The billing code cohort was slightly younger, had a higher percentage of females, and a lower percentage of African Americans. Mentions of “pseudogout” in narrative notes, history of synovial fluid crystal analysis, and prescriptions for colchicine, NSAIDs, and oral glucocorticoids were much less common in the billing code cohort than the other two cohorts.

Table 3.

Comparison of cohorts identified by algorithms for definite or probable pseudogout applied to the final datamart of 30,089 patients

	≥1 billing code	Presence of CPP crystals	Combined algorithm: topic modeling approach and/or presence of CPP crystals
Number of patients (n)	12,035	1,630	2,490
Age at last medical visit, years	72.8 (15.6)	76.4 (13.0)	76.3 (12.8)
Female	55.6	50.6	50.8
Race
White	84.7	79.5	81.0
African American	4.8	8.8	7.8
Other	10.5	11.7	11.2
≥1 pertinent billing code	100.0	72.6	74.1
≥1 NLP concept “pseudogout”	34.3	86.1	90.8
Synovial fluid crystal analysis performed, regardless of result	18.9	100.0	86.0
Synovial fluid CPP crystals present	9.8	100.0	65.5
Prescription medications in EHR
Colchicine	17.3	35.1	43.4
NSAID	59.1	69.1	72.7
Oral glucocorticoids	44.4	62.9	67.4

Open in a new tab

Presented as mean (SD) or percentage unless otherwise indicated

The combined algorithm identified 50% more patients than presence of CPP crystals, as the latter captured most but not all cases of definite pseudogout and missed all cases of probable pseudogout which, by definition, did not have synovial fluid CPP crystals in laboratory results. The cohorts identified by presence of CPP crystals and the combined algorithm were remarkably similar, even though the combined algorithm contained both definite and probable pseudogout patients while the CPP crystal cohort only included definite pseudogout. Mean age, sex, race, presence of pertinent billing codes, mentions of “pseudogout” in narrative notes, and prescriptions for NSAIDs and oral glucocorticoids were similar between these two cohorts. Synovial fluid crystal analysis was very common (86%) in the combined algorithm cohort, and was required by definition for the CPP crystal cohort. Colchicine prescriptions were slightly more common in the combined algorithm cohort compared to the CPP crystal cohort.

DISCUSSION

For pseudogout, an episodic disease without a specific billing code, adding information derived from a topic modeling approach to an existing approach for phenotyping using NLP and machine learning yielded an algorithm with significantly improved PPV compared to billing codes alone. The combined algorithm, incorporating a topic modeling approach and presence of synovial fluid CPP crystals, yielded a large cohort of patients with high likelihood for definite or probable pseudogout that can be employed for epidemiologic studies of this crystalline arthritis. Similarities between the cohorts identified by presence of CPP crystals and by the combined algorithm are reassuring and suggest that the cohort identified by the combined algorithm accurately represents pseudogout.

We compared five algorithms for classifying pseudogout and identified tradeoffs of each approach. For an investigator who wishes to identify pseudogout with 100% specificity and is willing to accept the tradeoff of missing the majority of cases, reviewing laboratory results for the presence of synovial fluid CPP crystals may be sufficient. In our EHR, the major downside to obtaining synovial fluid CPP crystals results is that time-consuming manual review was required, as the lab recorded these results as free text with a variety of labels (e.g. “1+ intracellular CPP”, “CPP crystals present”, “2+ calcium pyrrophosphate [sic] crystals”). Additionally, synovial fluid crystal analyses performed by rheumatologists in clinic were documented in narrative notes but were not recorded in laboratory results; this explains why the algorithm defined by presence of CPP crystals in laboratory data missed some cases of definite pseudogout. A topic modeling approach, which used data extracted via NLP and other structured EHR data, achieved a high PPV (79%) with moderate sensitivity (29%) and did not require manual review of synovial fluid laboratory data.

Since the goal of this project was to construct a large pseudogout cohort for future epidemiologic studies, the combined algorithm will be used due to its high PPV (81%) plus improved sensitivity (42%) that provides a larger cohort than the presence of CPP crystals or the topic modeling approach alone. We manually reviewed 100 randomly selected cases among the 2,490 pseudogout cases identified by our combined algorithm and found that 85 fulfilled the study definition of definite or probable pseudogout. This signifies a PPV of 85% in this small randomly selected sample, consistent with the PPV in our derivation set of gold-standard labels. The large cohort identified by the combined algorithm will provide improved power for epidemiologic association studies and improved generalizability compared to a smaller cohort constructed among subjects who necessarily had both synovial fluid crystal analysis performed at a Partners HealthCare laboratory and a positive result for CPP crystals.

The sensitivity achieved by our combined algorithm (42%) is slightly lower than the sensitivity of machine learning algorithms for other rheumatic diseases such as rheumatoid arthritis (sensitivity 63%)⁸ and systemic lupus erythematosus (SLE) (sensitivity 47% for definite/probable SLE).⁶ Nonetheless, our combined algorithm provides a more robust PPV of 81% compared to ≥1 billing code alone (PPV 22%), while achieving a modest sensitivity of 42%.

Several case-control studies focused on risk factors for pseudogout defined pseudogout using one diagnosis code for pseudogout (READ code N02.14) recorded by general practitioners in the UK.^14,15 To our knowledge, the accuracy of one READ code for pseudogout has not been validated against medical record review and might represent a broader definition of CPPD, pseudogout, or a combination of these and other conditions. A published billing code algorithm for CPPD, developed using Veterans’ Administration data, had very low PPV for pseudogout (18%) in our EHR.³ In the present study, we identified that increasing the number of billing codes (e.g. ≥3) increased the specificity but did not substantially increase the PPV for pseudogout, providing motivation for identification using approaches that incorporate a broader set of information such as NLP. Our method for identifying pseudogout using data from narrative notes via NLP and machine learning methods combined with synovial fluid laboratory data provides a blueprint for identifying pseudogout cohorts in other EHR systems. Our approach may also prove useful for phenotyping other episodic or rare rheumatic diseases for which documentation may be sparse and/or specific billing codes do not exist.

Our study has several limitations, including the exclusive use of EHR data from an academic medical center, which may limit the generalizability of the combined algorithm to non-academic settings. Synovial fluid crystal analysis results were not available as structured data, so time-consuming manual review was required and may not be feasible for other datasets. Notably, the topic modeling approach produced an algorithm with a PPV close to 80%, even without including synovial fluid CPP crystal results, though with a lower sensitivity and thus smaller cohort size. Validation in an external EHR dataset will be required to determine reproducibility of algorithm performance. Our machine learning algorithm was designed to classify definite or probable pseudogout rather than just definite pseudogout as definite pseudogout requires synovial fluid crystal analysis, which is under-utilized in clinical practice and may produce false-negative results due to challenges with identifying small, weakly birefringent CPP crystals.^16–18 Classification criteria for pseudogout have not yet been developed. Thus, we defined definite and probable pseudogout based on Ryan and McCarty’s proposed diagnostic criteria for CPPD and the 2011 EULAR CPPD terminology and diagnosis recommendations.

Pseudogout epidemiology research hinges on the ability to accurately identify pseudogout patients in large datasets. We provide a method for classifying definite or probable pseudogout using EHR data among 5.5 million subjects, though further testing in an external dataset is needed prior to widespread application.

Supplementary Material

Supp info1

NIHMS1066373-supplement-Supp_info1.docx^{(23.2KB, docx)}

Supp info2

NIHMS1066373-supplement-Supp_info2.docx^{(147.6KB, docx)}

SIGNIFICANCE AND INNOVATIONS.

Limited methods exist to identify pseudogout patients for large epidemiologic studies, particularly due to its episodic nature—and thus intermittent documentation in the electronic health record (EHR)—and lack of specific billing codes.
To address this need, we tested a novel topic modeling-based method that draws on a wide variety of EHR data to predict pseudogout, rather than using a traditional method of manually creating a small list of potentially predictive features to predict pseudogout.
We developed an approach to identify pseudogout patients using EHR data with a PPV of 81% and identified 2,490 patients with definite or probable pseudogout.
The proposed approach allows for the development of a pseudogout cohort and will enable much needed epidemiologic studies of pseudogout, the acute inflammatory subtype of calcium pyrophosphate crystal deposition disease.

Funding:

This work was supported by the National Institutes of Health (grant numbers K23 AR075070, P30 AR072577, L30 AR070514), the Brigham and Women’s Hospital Faculty Career Development Award, and the Brigham and Women’s Hospital Department of Medicine Hearst Young Investigator Award.

REFERENCES

1.Zhang W, Doherty M, Bardin T, et al. European League Against Rheumatism recommendations for calcium pyrophosphate deposition. Part I: terminology and diagnosis. Annals of the rheumatic diseases 2011;70:563–70. [DOI] [PubMed] [Google Scholar]
2.Abhishek A, Neogi T, Choi H, Doherty M, Rosenthal AK, Terkeltaub R. Review: Unmet Needs and the Path Forward in Joint Disease Associated With Calcium Pyrophosphate Crystal Deposition. Arthritis & rheumatology 2018;70:1182–91. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Tedeschi SK, Solomon DH, Liao KP. Pseudogout among Patients Fulfilling a Billing Code Algorithm for Calcium Pyrophosphate Deposition Disease. Rheumatology international 2018;38:1083–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Zhang Y, Cai T, Yu S, et al. Methods for high-throughput phenotyping with electronic medical record data using a common semi-supervised approach (PheCAP). Nature Protocols 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Liu L, Tang L, Dong W, Yao S, Zhou W. An overview of topic modeling and its current applications in bioinformatics. Springerplus 2016;5:1608. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Jorge A, Castro VM, Barnado A, et al. Identifying lupus patients in electronic health records: Development and validation of machine learning algorithms and application of rule-based algorithms. Seminars in arthritis and rheumatism 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Liao KP, Cai T, Savova GK, et al. Development of phenotype algorithms using electronic medical records and incorporating natural language processing. Bmj 2015;350:h1885. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Liao KP, Cai T, Gainer V, et al. Electronic medical records for discovery research in rheumatoid arthritis. Arthritis care & research 2010;62:1120–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Unified Medical Language System Terminology Services. 2017. (Accessed May 1, 2017, at https://uts.nlm.nih.gov/home.html.)
10.Yu S, Chakrabortty A, Liao KP, et al. Surrogate-assisted feature extraction for high-throughput phenotyping. J Am Med Inform Assoc 2017;24:e143–e9. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Yu S, Cai T, Cai T. NILE: Fast natural language processing for electronic health records. 2013.
12.Ryan L, McCarty D. Calcium pyrophosphate crystal deposition disease; pseudogout; articular chondrocalcinosis. In: McCarty D, ed. Arthritis and Allied Conditions. 10th ed. Philadelphia: Lea & Febiger; 1985:1515–46. [Google Scholar]
13.Liao KP, Sun J, Cai TA, et al. High-throughput multimodal automated phenotyping (MAP) with application to PheWAS. J Am Med Inform Assoc 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Rho YH, Zhu Y, Zhang Y, Reginato AM, Choi HK. Risk factors for pseudogout in the general population. Rheumatology 2012;51:2070–4. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Roddy E, Muller S, Paskins Z, Hider SL, Blagojevic-Bucknall M, Mallen CD. Incident acute pseudogout and prior bisphosphonate use: Matched case-control study in the UK-Clinical Practice Research Datalink. Medicine (Baltimore) 2017;96:e6177. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Bartels CM, Singh JA, Parperis K, Huber K, Rosenthal AK. Validation of administrative codes for calcium pyrophosphate deposition: a Veterans Administration study. Journal of clinical rheumatology : practical reports on rheumatic & musculoskeletal diseases 2015;21:189–92. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Berendsen D, Neogi T, Taylor WJ, Dalbeth N, Jansen TL. Crystal identification of synovial fluid aspiration by polarized light microscopy. An online test suggesting that our traditional rheumatologic competence needs renewed attention and training. Clinical rheumatology 2017;36:641–7. [DOI] [PubMed] [Google Scholar]
18.Swan A, Amer H, Dieppe P. The value of synovial fluid assays in the diagnosis of joint disease: a literature survey. Annals of the rheumatic diseases 2002;61:493–8. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supp info1

NIHMS1066373-supplement-Supp_info1.docx^{(23.2KB, docx)}

Supp info2

NIHMS1066373-supplement-Supp_info2.docx^{(147.6KB, docx)}

[R1] 1.Zhang W, Doherty M, Bardin T, et al. European League Against Rheumatism recommendations for calcium pyrophosphate deposition. Part I: terminology and diagnosis. Annals of the rheumatic diseases 2011;70:563–70. [DOI] [PubMed] [Google Scholar]

[R2] 2.Abhishek A, Neogi T, Choi H, Doherty M, Rosenthal AK, Terkeltaub R. Review: Unmet Needs and the Path Forward in Joint Disease Associated With Calcium Pyrophosphate Crystal Deposition. Arthritis & rheumatology 2018;70:1182–91. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R3] 3.Tedeschi SK, Solomon DH, Liao KP. Pseudogout among Patients Fulfilling a Billing Code Algorithm for Calcium Pyrophosphate Deposition Disease. Rheumatology international 2018;38:1083–8. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R4] 4.Zhang Y, Cai T, Yu S, et al. Methods for high-throughput phenotyping with electronic medical record data using a common semi-supervised approach (PheCAP). Nature Protocols 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] 5.Liu L, Tang L, Dong W, Yao S, Zhou W. An overview of topic modeling and its current applications in bioinformatics. Springerplus 2016;5:1608. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R6] 6.Jorge A, Castro VM, Barnado A, et al. Identifying lupus patients in electronic health records: Development and validation of machine learning algorithms and application of rule-based algorithms. Seminars in arthritis and rheumatism 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R7] 7.Liao KP, Cai T, Savova GK, et al. Development of phenotype algorithms using electronic medical records and incorporating natural language processing. Bmj 2015;350:h1885. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R8] 8.Liao KP, Cai T, Gainer V, et al. Electronic medical records for discovery research in rheumatoid arthritis. Arthritis care & research 2010;62:1120–7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] 9.Unified Medical Language System Terminology Services. 2017. (Accessed May 1, 2017, at https://uts.nlm.nih.gov/home.html.)

[R10] 10.Yu S, Chakrabortty A, Liao KP, et al. Surrogate-assisted feature extraction for high-throughput phenotyping. J Am Med Inform Assoc 2017;24:e143–e9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] 11.Yu S, Cai T, Cai T. NILE: Fast natural language processing for electronic health records. 2013.

[R12] 12.Ryan L, McCarty D. Calcium pyrophosphate crystal deposition disease; pseudogout; articular chondrocalcinosis. In: McCarty D, ed. Arthritis and Allied Conditions. 10th ed. Philadelphia: Lea & Febiger; 1985:1515–46. [Google Scholar]

[R13] 13.Liao KP, Sun J, Cai TA, et al. High-throughput multimodal automated phenotyping (MAP) with application to PheWAS. J Am Med Inform Assoc 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R14] 14.Rho YH, Zhu Y, Zhang Y, Reginato AM, Choi HK. Risk factors for pseudogout in the general population. Rheumatology 2012;51:2070–4. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R15] 15.Roddy E, Muller S, Paskins Z, Hider SL, Blagojevic-Bucknall M, Mallen CD. Incident acute pseudogout and prior bisphosphonate use: Matched case-control study in the UK-Clinical Practice Research Datalink. Medicine (Baltimore) 2017;96:e6177. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R16] 16.Bartels CM, Singh JA, Parperis K, Huber K, Rosenthal AK. Validation of administrative codes for calcium pyrophosphate deposition: a Veterans Administration study. Journal of clinical rheumatology : practical reports on rheumatic & musculoskeletal diseases 2015;21:189–92. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R17] 17.Berendsen D, Neogi T, Taylor WJ, Dalbeth N, Jansen TL. Crystal identification of synovial fluid aspiration by polarized light microscopy. An online test suggesting that our traditional rheumatologic competence needs renewed attention and training. Clinical rheumatology 2017;36:641–7. [DOI] [PubMed] [Google Scholar]

[R18] 18.Swan A, Amer H, Dieppe P. The value of synovial fluid assays in the diagnosis of joint disease: a literature survey. Annals of the rheumatic diseases 2002;61:493–8. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Classifying Pseudogout using Machine Learning Approaches with Electronic Health Record Data

Sara K Tedeschi, MD, MPH

Tianrun Cai, MD

Zeling He, MS

Yuri Ahuja, MS

Chuan Hong, PhD

Katherine A Yates, MD

Kumar Dahal, MS

Chang Xu, MPH

Houchen Lyu, MD, MS

Kazuki Yoshida, MD, ScD

Daniel H Solomon, MD, MPH

Tianxi Cai, ScD

Katherine P Liao, MD, MPH

Abstract

Objective:

Methods:

Results:

Conclusion:

INTRODUCTION

METHODS

Overview

Figure 1.

Data source

EHR data extraction

Natural language processing (NLP) to extract information from narrative notes

Creation of the final data mart

Pseudogout gold standard labels

Table 1.

Topic modeling approach

Performance assessment

RESULTS

Table 2.

Table 3.

DISCUSSION

Supplementary Material

SIGNIFICANCE AND INNOVATIONS.

Funding:

REFERENCES

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases