Abstract
Objectives
Algorithms have been developed to identify rheumatoid arthritis-interstitial lung disease (RA-ILD) in administrative data with positive predictive values (PPVs) between 70-80%. We hypothesized that including ILD-related terms identified within chest computed tomography (CT) reports through text mining would improve the PPV of these algorithms in this cross-sectional study.
Methods
We identified a derivation cohort of possible RA-ILD cases (n=114) using electronic health record data from a large academic medical center and performed medical record review to validate diagnoses (reference standard). ILD-related terms (e.g., ground glass, honeycomb) were identified in chest CT reports by natural language processing. Administrative algorithms including diagnostic and procedural codes as well as specialty were applied to the cohort both with and without the requirement for ILD-related terms from CT reports. We subsequently analyzed similar algorithms in an external validation cohort of 536 participants with RA.
Results
The addition of ILD-related terms to RA-ILD administrative algorithms increased the PPV in both the derivation (improvement ranging from 3.6 to 11.7%) and validation cohorts (improvement 6.0 to 21.1%). This increase was greatest for less stringent algorithms. Administrative algorithms including ILD-related terms from CT reports exceeded a PPV of 90% (maximum 94.6% derivation cohort). Increases in PPV were accompanied by a decline in sensitivity (validation cohort −3.9 to −19.5%).
Conclusions
The addition of ILD-related terms identified by text mining from chest CT reports led to improvements in the PPV of RA-ILD algorithms. With high PPVs, use of these algorithms in large data sets could facilitate epidemiologic and comparative effectiveness research in RA-ILD.
Keywords: rheumatoid arthritis, interstitial lung disease, natural language processing, informatics
INTRODUCTION
Rheumatoid arthritis-associated interstitial lung disease (RA-ILD) is a common extra-articular manifestation of rheumatoid arthritis (RA) that leads to increased mortality (1–4). In addition to or separate from ILD, people with RA may have other RA-associated pulmonary manifestations such as rheumatoid nodules or airway disease (5). There is no single diagnostic test for RA-ILD, and the optimal method to establish the diagnosis is a multidisciplinary approach involving rheumatologists, pulmonologists, radiologists, and pathologists (6). The evaluation of individuals suspected of having RA-ILD typically includes high-resolution computed tomography (CT) of the chest and pulmonary function tests (PFTs), with lung biopsy generally reserved only for atypical presentations where a histopathologic diagnosis may be required and infection or other non-ILD conditions needs to be excluded. There is limited data available to guide the optimal screening, monitoring, and management of RA-ILD, and future studies utilizing large healthcare data sets are needed to inform the management of this condition and to improve the ability of researchers to study RA-ILD in real-world datasets.
Multiple factors complicate the identification of RA-ILD in large data sets, including the absence of specific diagnostic tests, frequent occurrence of subclinical disease (7, 8), lack of standardized treatments, and difficulty establishing a diagnosis. Previous studies on RA-ILD utilizing real-world data have utilized a variety of methods and data sources to identify RA-ILD cases, including algorithms within claims data (1, 9, 10), review of death certificates (11), and the creation of national registries (12). Most case-finding approaches have not been validated. Administrative databases include information regarding patients’ encounters with the healthcare system (e.g., diagnostic codes, billing codes, drug dispensing) and are an appealing source for comparative effectiveness, outcomes, and epidemiologic studies. In contrast to registries designed specifically for research, these databases are generated from routine clinical practice and are prone to misclassification. Previously, we systematically evaluated the performance of administrative ILD algorithms (i.e., algorithms constructed from data elements available in administrative databases) incorporating International Classification of Diseases (ICD) and Current Procedural Terminology (CPT) codes for identifying ILD within the Veterans Affairs Rheumatoid Arthritis (VARA) registry, a prospective RA cohort (13). In this study, algorithms requiring ≥2 ICD-9 or ICD-10 codes for ILD in addition to combinations of a pulmonologist diagnosis and diagnostic testing (e.g., PFTs, chest CT, lung biopsy) had positive predictive values (PPV) of approximately 80% (13). Studies evaluating similar algorithms in other non-veteran data sets (e.g., U.S. Medicare claims data) have confirmed these results and extended them to identifying incident RA-ILD (14, 15).
While a PPV of approximately 80% is commonly viewed as a cutoff for useful application, higher specificity may be desired in comparative effectiveness and outcomes studies to reduce misclassification bias. Beyond what is available in administrative data, a more specific finding for RA-ILD is the presence of distinct imaging features on chest CT, which can include ground glass opacities, honeycombing, traction bronchiectasis, and reticular opacities (5). These findings are not available in administrative databases, nor are they available as discrete data elements in electronic health records (EHR). Rather, they exist as free text within clinical reports in the EHR. Text mining is required to extract these descriptors using informatics approaches such as natural language processing (NLP). We hypothesized that modifying RA-ILD algorithms to include the presence of ILD-related terms derived from CT reports identified by NLP would improve the PPV of previously validated algorithms, albeit with an unknown decline in sensitivity.
MATERIALS AND METHODS
Study design and conduct
We performed cross sectional studies using independent derivation and validation cohorts to evaluate whether the inclusion of ILD-related terms from chest CT reports identified by automated regular expressions, a form of NLP, could improve the PPV of administrative algorithms to identify RA-ILD. Cohorts and algorithms were derived within the University of Nebraska Medical Center Clinical Research Analytics Environment (CRANE) data warehouse (16) and externally validated within the VARA registry (13). The study was approved by the institutional review boards at the University of Nebraska Medical Center and Omaha Veterans Affairs Medical Center.
Derivation cohort
Cohort identification.
We identified a cohort of adults with possible RA-ILD within the CRANE clinical data warehouse, an environment containing EHR and linked patient-level data for individuals receiving care at the University of Nebraska Medical Center beginning in 2012. We selected patients ≥19 years of age with ≥1 diagnostic (outpatient or inpatient) or problem list code for RA (ICD-9: 714.0, 714.1, 714.2, 718.81; ICD-10: M05, M06.0, M06.8, M06.9; SNOMED CT: 69896004) plus either a RA medication or laboratory test, as well as ≥1 diagnostic or problem list code for ILD (ICD-9: 515, 516.3, 516.8, 516.9; ICD-10: J84.1, J84.89, J84.9; SNOMED CT: 233703007). We excluded individuals with a history of sarcoidosis (SNOMED CT 31541009), systemic sclerosis (SNOMED CT 89155008), myositis (SNOMED CT 26889001), systemic lupus erythematosus (SNOMED CT 55464009), hypersensitivity pneumonitis (SNOMED CT 37471005), radiation pneumonitis (SNOMED CT 89004001), pneumoconiosis (SNOMED CT 40122008), and asbestosis (SNOMED CT 22607003) using all available data.
RA-ILD classification.
Medical records for the identified patients were then selected for detailed, systematic medical record review by a single investigator (BAL), which served as the reference standard. Medical record review included the review of clinical notes (rheumatology and pulmonology specialty notes when available, with review of additional notes when information from these specialists was not present or additional information was needed), chest CT reports, PFTs, lung biopsy results, and RA autoantibody results. Clinical features included in the 2010 ACR/EULAR RA classification criteria (17) were extracted from the medical record, when available. Patients were considered to have a clinical diagnosis of RA when the diagnosis was made by a rheumatologist or, when rheumatology records were not available, review of clinical documentation was consistent with a small joint inflammatory arthritis without better alternative explanation (e.g., systemic lupus erythematosus). Patients were considered to have a clinical diagnosis of ILD (RA-associated) when the diagnosis was made by a pulmonologist or rheumatologist and was not due to a defined secondary etiology other than RA (e.g., hypersensitivity pneumonitis) or when imaging/biopsy features were supportive of the diagnosis. Imaging features compatible with RA-ILD included reticular opacities, honeycombing, ground glass opacities, and radiologist impressions of interstitial lung disease or pulmonary fibrosis. Cases were classified as RA-ILD when both RA and ILD diagnoses were present, not RA-ILD when either RA or ILD diagnoses were not present, and indeterminate for RA-ILD when there were insufficient records to confirm or disprove an RA and/or ILD diagnosis. If there was uncertainty in RA-ILD classification, records were reviewed and discussed with a second reviewer (BRE) to reach a consensus. Indeterminate RA-ILD cases (n=5) were excluded from further analysis.
Identification of ILD terms from CT reports.
All available chest CT reports for patients in the cohort were extracted from the CRANE data warehouse. We identified ILD-related terms and common variations of these terms (‘honeycomb’, ‘honey comb’, ‘interstitial lung disease’, ‘pulmonary fibrosis’, ‘ground glass opacit’, ‘groundglass’, ‘reticular opacit’, ‘reticulation’, ‘rheumatoid lung’) in chest CT reports by using regular expressions, a NLP technique utilized to identify strings of characters within text (18). Terms were excluded if they were located within the ‘Indication’ section of the report. To account for the potential use of negative phrasing in association with these terms in CT reports (e.g., ‘no evidence of ILD’), we subsequently excluded all terms preceded by negative modifiers (‘no’, ‘negative’, ‘no evidence’, ‘not found’, ‘not consistent with’, ‘inconsistent with’) within 40 characters of the ILD-related term. Terms were identified using search strings of regular expressions utilizing the Python coding language.
Algorithm derivation.
We adapted administrative RA-ILD algorithms incorporating combinations of diagnostic and procedural codes from previous work (13) and applied them to the derivation cohort (Table 1). Specialty diagnostic codes required the entry of an ICD code for ILD by either a rheumatologist or pulmonologist. Chest CT was identified by CPT codes 71250, 71260, and 71270, and lung biopsy was identified by ICD procedure (33.20, 33.28) or CPT (32607) codes. All data used for algorithm construction was obtained from the CRANE data warehouse. Algorithms were subsequently modified to additionally require the presence of at least one ILD-related term identified by automated regular expressions within chest CT reports from the EHR. Patients were considered to have a positive term if they had at least one ILD-related term identified that was not preceded by a negative modifier. Patients without searchable chest CT reports were considered to be negative for these terms and thus not identified by the algorithms requiring ILD-related terms, reflecting how these algorithms may perform in real-world settings where CT reports may be missing.
Table 1.
Components of RA-ILD algorithms.
| Algorithm | Algorithm search criteria |
|---|---|
| Derivation cohort | |
| Administrative RA-ILD algorithms alone | |
| Algorithm 1 | ≥1 ICD for ILD |
| Algorithm 2 | ≥2 ICD for ILD |
| Algorithm 3 | ≥2 ICD for ILD plus specialty code |
| Algorithm 4 | ≥2 ICD for ILD plus specialty code or chest CT/lung biopsy |
| Algorithms plus ILD-related term searching of chest CT reports | |
| Algorithms 1T – 4T | Algorithms 1-4 plus ILD-related term from chest CT |
| Algorithms 1TN – 4TN | Algorithms 1T-4T with negative terms excluded |
| Validation cohort | |
| Administrative RA-ILD algorithms alone | |
| Algorithm A | ≥1 outpatient or discharge diagnosis for ILD |
| Algorithm B | ≥2 outpatient or discharge diagnoses for ILD >30 days apart |
| Algorithm C | Algorithm B plus ≥1 pulmonologist diagnosis |
| Algorithm D | Algorithm C plus diagnostic testing (chest CT or lung biopsy) |
| Algorithms including ILD-related term searching of chest CT reports1 | |
| Algorithm T | ≥1 ILD-related term without additional requirements |
| Algorithm AT | Algorithm A plus ≥1 ILD-related term |
| Algorithm BT | Algorithm B plus ≥1 ILD-related term |
| Algorithm CT | Algorithm C plus ≥1 ILD-related term |
| Algorithm DT | Algorithm D plus ≥1 ILD-related term |
Negative terms were excluded
Abbreviations: ICD = International Classification of Diseases; RA-ILD = rheumatoid arthritis-associated interstitial lung disease; CT = computed tomography
External validation
We then externally validated these algorithms using a separate previously constructed sample of RA-ILD and RA-no ILD participants within the prospective, multicenter VARA registry, details of which have been published previously (13). Briefly, stratified subsampling through initial ILD screening was performed to enrich the sample with RA-ILD cases. This included participants with ≥1 inpatient or ≥2 outpatient ICD-9/10 codes for ILD (n=293), as well as a random sample of remaining VARA participants (n=243) in order to determine the sensitivity of the selected algorithms. Medical record review was performed in a standardized fashion, classifying ILD status based on clinical diagnoses, imaging, and lung biopsy findings. All participants in the VARA registry meet 1987 ACR criteria for RA (19) and race was self-reported. We obtained chest CT reports from radiology data within the VA Corporate Data Warehouse through an optimized text search for ‘chest CT’, ‘CT’, or ‘thorax’, in addition to CPT codes (71250, 71260, 71270, G0297, 71275). Corresponding report and impression text was extracted from radiology data corresponding to exam dates of radiology procedures, and ILD-related terms were identified within CT reports following the same process as in the derivation cohort. We then modified four administrative algorithms corresponding to those in the derivation cohort that were previously applied to this data set (13) to include the requirement for ILD-related terms from chest CT reports, and excluded terms preceded by negative modifiers (Table 1). In addition to these algorithms, we evaluated the performance of an algorithm requiring solely the presence of an ILD-related term in CT reports.
Statistical analysis
In the derivation cohort, algorithm performance was evaluated by calculation of PPV and sensitivity, referent to the least stringent algorithm – Algorithm 1. For external validation, we calculated the sensitivity, specificity, PPV, and negative predictive values (NPV) of the algorithms, with weighting to account for the sub-sampling process from the overall VARA registry (13). Analyses were completed using StataSE 17 and RStudio 2022.07.2.
Sensitivity analysis
Since indeterminate RA-ILD cases would not be excluded in real-world data, we performed sensitivity analyses within the derivation cohort in which indeterminate cases were instead classified as not RA-ILD.
Because chest CT reports were reviewed when classifying RA-ILD cases, this introduced the possibility that the performance of algorithms including ILD-related terms may be biased by this method of case definition (i.e., both the algorithms and medical record review considering an ILD-related term to be evidence of RA-ILD). To evaluate this possibility, we performed a sensitivity analysis in the derivation cohort testing algorithms only among subjects with chest CT images available in the EHR that could be independently reviewed by an expert reviewer. Chest CT images were independently reviewed for this study by a pulmonologist with expertise in ILD (DH) blinded to the clinical data, and images classified as either compatible with RA-ILD, indeterminate, or not likely RA-ILD. Indeterminate cases were excluded from analysis (n=11).
Our primary approach classifying those without CT reports as not fulfilling algorithms requiring ILD-related terms reflects real-world data missingness but underestimates the sensitivity of the algorithms themselves. Therefore, we performed a sensitivity analysis evaluating the algorithms only among patients with searchable CT reports (i.e., excluded individuals without a chest CT report) to determine the sensitivity directly related to ILD-related term searching.
RESULTS
Derivation cohort patient characteristics
We identified 114 patients with ≥1 diagnostic or problem list code for both RA and ILD in the CRANE data warehouse. Following medical record review, 77 were classified as RA-ILD, 32 as not RA-ILD, and 5 were indeterminate and excluded from subsequent analysis. Characteristics of the derivation cohort, including those with confirmed RA-ILD, are shown in Table 2. Patients with RA-ILD were slightly more likely to be male relative to the overall cohort. Of the 77 confirmed RA-ILD cases, 61 (79%) had sufficient documentation in the EHR to confirm 2010 ACR/EULAR classification criteria for RA. Nearly all patients with verified RA-ILD had a diagnosis by a pulmonologist and/or rheumatologist confirmed by medical record review (94%), with the remaining patients being diagnosed by other specialties with supportive imaging features (Supplemental Figure 1). Among those not meeting RA classification criteria, this was most commonly due to insufficient documentation available to assess the number and pattern of joints involved, and all of these patients had a diagnosis of RA by a board-certified rheumatologist. Searchable CT reports were available within the CRANE data warehouse for 72 (63%) patients, of which 61 contained at least one ILD-related term that was not preceded by a negative modifier.
Table 2.
Derivation cohort patient characteristics.
| Overall cohort | Validated RA-ILD cases | |
|---|---|---|
| Number |
114 | 77 |
| Age (years) |
67.4 (10.5) | 68.7 (11.2) |
| Female sex |
76/114 (66.7%) | 46/77 (59.7%) |
| White race |
93/113 (82.3%) | 61/77 (79.2%) |
| Anti-CCP/RF seropositive1 |
81/98 (82.6%) | 60/70 (85.7%) |
| Current or former smoker |
54/97 (55.7%) | 38/66 (57.6%) |
| Pulmonology clinic visit |
93/114 (81.6%) | 73/77 (94.8%) |
| ILD on most recent chest CT report |
64/88 (72.7%) | 59/62 (95.1%) |
| Lung biopsy |
20/114 (17.5%) | 13/77 (16.9%) |
| ILD diagnosis by pulmonologist |
73/114 (64.0%) | 67/77 (87.0%) |
| Rheumatology clinic visit |
104/114 (91.2%) | 76/77 (98.7%) |
| 2010 ACR/EULAR RA criteria |
79/114 (69.3%) | 61/77 (79.2%) |
| N not classified as RA-ILD |
37 | - |
| ILD present but no RA |
8/37 (21.6%)2 | - |
| RA present but no ILD |
22/37 (59.5%) | - |
| No RA and no ILD |
2/37 (5.4%) | - |
| Insufficient records to determine | 5/37 (13.5%) | - |
Values mean (SD) or n (%) of non-missing
As determined by laboratory reference values when available, or provider documentation when labs not available.
Four of these patients diagnosed with connective tissue disease other than rheumatoid arthritis
Abbreviations: RA-ILD = rheumatoid arthritis-associated interstitial lung disease; ACR = American College of Rheumatology; EULAR = European League Against Rheumatism; SD = standard deviation; ICD = International Classification of Diseases; Anti-CCP = anti-cyclic-citrullinated peptide antibody; RF = rheumatoid factor
Algorithm performance
Performance of the administrative algorithms in the derivation cohort is summarized in Table 3. The least-stringent administrative algorithm, Algorithm 1 (≥1 ICD for ILD), identified the highest number of RA-ILD cases (n=77) but also demonstrated the lowest PPV (70.6%). Among the administrative-based algorithms, algorithm 3 (≥2 ICD for ILD plus specialty code) showed the highest PPV at 90.5%, while identifying 74.0% of the RA-ILD cases in the cohort.
Table 3.
Performance of RA-ILD algorithms with and without text mining for ILD terms from CT reports in derivation cohort.
| Algorithm* | PPV (95% CI) | RA-ILD cases identified, n (% sensitivity to algorithm 1) | Change in PPV compared to algorithms 1-4 (%) | Change in PPV compared to algorithms 1T-4T (%) |
|---|---|---|---|---|
| Administrative RA-ILD algorithms alone | ||||
| Algorithm 1 | 70.6 (61.1, 79.0) | 77 (100) | - | - |
| Algorithm 2 | 85.5 (75.6, 92.5) | 65 (84.4) | - | - |
| Algorithm 3 | 90.5 (80.4, 96.4) | 57 (74.0) | - | - |
| Algorithm 4 | 87.7 (77.9, 94.2) | 64 (83.1) | - | - |
| Algorithms plus ILD-related term searching of chest CT reports | ||||
| Algorithm 1T | 82.3 (70.5, 90.8) | 51 (66.2) | 11.7 | - |
| Algorithm 2T | 91.3 (79.2, 97.6) | 42 (54.5) | 5.8 | - |
| Algorithm 3T | 94.6 (81.8, 99.3) | 35 (45.5) | 4.1 | - |
| Algorithm 4T | 91.3 (79.2, 97.6) | 42 (54.5) | 3.6 | - |
| Algorithms plus ILD-related term searching of chest CT reports (negative terms excluded) | ||||
| Algorithm 1TN | 83.6 (71.9, 91.8) | 51 (66.2) | 13.0 | 1.3 |
| Algorithm 2TN | 93.3 (81.7, 98.6) | 42 (54.5) | 7.8 | 2.0 |
| Algorithm 3TN | 94.6 (81.8, 99.3) | 35 (45.5) | 4.1 | 0 |
| Algorithm 4TN | 93.3 (81.7, 98.6) | 42 (54.5) | 5.6 | 2.0 |
Algorithm search criteria is listed in Table 1
Abbreviations: RA-ILD = rheumatoid arthritis-associated interstitial lung disease; CT = computed tomography; PPV = positive predictive value; CI = confidence interval
The addition of ILD-related terms identified by NLP from chest CT reports to the algorithms improved the PPV of all algorithms by a range of 3.6 to 11.7%. The least stringent algorithm, algorithm 1T (≥1 ICD for ILD plus ≥1 ILD-related term) demonstrated the greatest improvement in PPV (increased by 11.7%). The sensitivity declined for all algorithms with the requirement for ILD-related terms in chest CT reports (range of worsening sensitivity 28.5 to 33.8%). Exclusion of chest CT terms preceded by negative modifiers had a modest impact on algorithm performance, improving the PPV by a range of 0 to 2.0% while identifying the same number of RA-ILD cases. Algorithms excluding negative terms had an overall accuracy ranging from 59.6 to 67.0%.
Algorithm performance in sensitivity analysis
In a sensitivity analysis in which the 5 indeterminate RA-ILD cases were not excluded and rather classified as not RA-ILD, the PPV of algorithms 1-4 declined by 0-3.1% with no change in the percentage of cases captured.
Chest CT images were available within the EHR for independent expert review for 67 of 114 (59%) patients. Following the combination of imaging and clinical review, 50 patients were classified as RA-ILD, 6 as not RA-ILD, and 11 as indeterminate and excluded from further analysis. There was high concordance between blinded expert imaging review and medical record review for classification of RA-ILD (Supplemental Figure 2). After exclusion of indeterminate cases, there was moderate agreement in ILD classification between expert imaging review and medical record review (kappa = 0.53). The PPV of all algorithms both with and without the inclusion of ILD-related terms were greater in this analysis (range 89.3 to 100%) than those obtained for the overall derivation cohort (Supplemental Table 1), indicating that review of CT reports when determining the reference standard did not inflate assessments of algorithm performance. When the 11 indeterminate cases were classified as not RA-ILD, there was a decline in the PPV of all algorithms ranging from −7.5 to −14.1% (Supplemental Table 2). However, absolute PPVs remained similar to those observed in the primary analysis where RA-ILD cases were classified by medical record review alone (Table 3).
When limiting the analysis to patients with searchable CT reports, the decline in algorithm sensitivity with the addition of ILD-related terms was attenuated with a decrease in algorithm sensitivity of 5.6% for all algorithms. Persistent but smaller improvements in PPV were observed relative to the primary analysis (range 1.9 to 8.6%).
External validation
We externally validated the NLP algorithms in a previously assembled cohort of 536 U.S. Veterans with RA participating in the VARA registry, of whom 203 had RA-ILD confirmed by medical record review (13). Patient characteristics have been reported previously (13). The performance of algorithms with and without ILD-related terms is shown in Figure 1. We observed similar improvements in PPV with the requirement of ILD-related terms across the algorithms, ranging from 6.0 to 21.1%, and an accompanying decline in sensitivity ranging from 3.9 to 19.5%. Specificity and NPV were >90% for all algorithms tested (Figure 1). The requirement for ILD-related terms led to negligible to modest improvements in specificity (range 0.6 to 6.3%) and declines in NPV (range 0.4 to 1.7%). Algorithms incorporating ILD-related terms had accuracies ranging from 94.1 to 95.4%. Requiring only ≥1 ILD-related term without additional algorithm requirements (Algorithm T) demonstrated a PPV of 63.8% but had the highest sensitivity (75.2%) (Supplemental Table 3).
Figure 1. Performance of RA-ILD algorithms with and without ILD-related terms from chest computed tomography reports in validation cohort.

Positive predictive value (A), sensitivity (B), negative predictive value (C), and specificity (D) of RA-ILD algorithms both without (white) and with (gray) inclusion of ILD-related terms from chest computed tomography reports in validation cohort. Delta denotes the change in values between algorithms with/without ILD terms. Error bars indicate 95% confidence intervals.
Abbreviations: RA-ILD = rheumatoid arthritis-associated interstitial lung disease.
In a sensitivity analysis restricting the sample to those with available chest CT reports, there was a substantial improvement in the sensitivity of algorithm T (ILD-related terms alone; 75.2% to 92.3%) (Table 4). Among algorithms that included administrative requirements, the loss in sensitivity with the addition of ILD-related terms was attenuated by approximately half of that observed in the primary analysis (range −1.9 to −9.6%). Improvements in PPV were similar to those observed in the primary analysis (range 6.7 to 22.0).
Table 4.
Sensitivity analysis restricted to subjects with searchable CT reports in the validation cohort.
| Algorithm | Sensitivity (95% CI) | Δ sensitivity* | PPV (95% CI) | Δ PPV* | ||
|---|---|---|---|---|---|---|
|
| ||||||
| Algorithm alone | Algorithm plus ILD-related terms | Algorithm alone | Algorithm plus ILD-related terms | |||
| Algorithm A | 85.4 (59.3, 95.9) | 75.8 (53.0, 89.7) | −9.6 | 52.9 (42.9, 62.7) | 74.9 (68.9, 80.0) | 22.0 |
| Algorithm B | 71.7 (51.0, 86.1) | 65.2 (46.4, 80.2) | −6.5 | 76.0 (67.0, 83.1) | 84.1 (78.2, 88.7) | 8.1 |
| Algorithm C | 53.8 (39.0, 68.0) | 50.3 (36.3, 64.2) | −3.5 | 80.9 (70.5, 88.3) | 87.6 (81.2, 92.0) | 6.7 |
| Algorithm D | 48.0 (34.4, 61.9) | 46.1 (33.1, 59.6) | −1.9 | 78.7 (73.3, 83.2) | 89.2 (83.1, 93.3) | 10.5 |
| Algorithm T | - | 92.3 (88.7, 94.9) | - | - | 63.8 (58.8, 68.5) | - |
with addition of ILD-related term searches
Abbreviations: Δ = change, CT = computed tomography, PPV = positive predictive value
DISCUSSION
In this study, we tested whether modifying existing administrative RA-ILD algorithms (13) with ILD-related terms identified from chest CT reports by text mining with automated regular expressions would improve the PPV for RA-ILD. In both derivation and external validation cohorts, we found that the additional requirement of ILD-related terms meaningfully improved the PPV of RA-ILD algorithms, with PPVs of some algorithms as high as or approaching 90%. As expected, there were accompanying declines in sensitivity with these requirements, largely related to the unavailability of chest CT reports in these real-world datasets. These highly specific RA-ILD algorithms can be used for comparative effectiveness and outcomes research and will improve specificity and PPV of ILD case finding when chest CT reports are available.
Large administrative and EHR data sources have enabled the conduct of comparative effectiveness and outcomes research in RA. Unlike clinical trials or disease registries, these data sources require algorithms (or computable phenotypes) to define study populations. The creation of accurate algorithms for RA-ILD has been challenging, with the most complex administrative algorithms achieving a maximum PPV of approximately 80% (13–15). Recognizing that several chest CT findings are specific features of RA-ILD and the availability of these reports in EHRs, we sought to harness their potential to improve the accuracy of RA-ILD algorithms. Using automatic regular expressions, a NLP approach, we extracted ILD-related terms from chest CT reports and required their presence in addition to various administrative data requirements (e.g., diagnostic and procedure codes). Supporting our hypothesis, the addition of chest CT-identified ILD-related terms improved the PPV of RA-ILD algorithms up to or approaching 90%.
While the combination of several administrative data requirements and ILD-related terms led to the highest PPV, another important finding from this study is that the addition of these terms to simpler algorithms could meaningfully improve algorithm performance. The largest improvements in PPV by adding ILD-related terms from chest CT was for simpler administrative algorithms, such as those requiring a single diagnostic code for ILD (PPV improvement 13.0% derivation and 21.1% validation cohort). Moreover, the combination of a single ILD diagnostic code plus ILD-related terms produced a PPV that was similar to the most complicated administrative algorithms additionally requiring a specialist diagnosis and/or diagnostic testing (PPV 83.6% derivation and 74.9% validation cohort). This demonstrates that administrative algorithms with fewer requirements can have strong performance when automatic regular expressions of chest CT to obtain ILD-related terms is incorporated. Given varying availability of provider specialty, procedures, and chest CT results in real-world data sources, our findings provide researchers guidance and flexibility when constructing valid RA-ILD study populations.
As expected, the gains in PPV coincided with an attenuation of the number of RA-ILD cases identified. This was driven in part by the unavailability of searchable chest CT reports for a portion of potential subjects (37% in the derivation and 29% in the validation cohort). Particularly in the derivation cohort, a single-center EHR, many unavailable chest CT reports resulted from chest CTs being performed outside the health system. As record sharing across EHRs continues to increase, we expect that the sensitivity of algorithms incorporating ILD-related terms would improve without negatively impacting the PPV. Indeed, in our sensitivity analyses restricted to patients with searchable CT reports available, algorithm sensitivity improved to as high as 75.8%, without declines in PPV.
The choice of which RA-ILD algorithms to utilize in future studies will be dependent upon multiple factors including the research question being investigated as well as characteristics of the data set. For studies requiring large sample sizes or aiming to avoid underestimation of RA-ILD, for example, use of the less stringent algorithms may be necessary while acknowledging that these typically lead to a lower PPV. One common use case might be to preliminarily identify patients potentially eligible for an RA-ILD clinical trial or for a prospective observational study, where clinical assessment of ILD can be performed as a second step to improve specificity at the time of enrollment. Conversely, many outcomes and comparative effectiveness studies prioritize obtaining the highest PPV, so these studies will benefit from selecting one of the more stringent algorithms with the addition of ILD-related terms. As mentioned previously, the data set may impose inherent limitations on algorithm use if algorithm components are not readily available within the data set (e.g., CT reports must be accessible to utilize algorithms incorporating ILD-related terms).
While the primary focus of this study was on improving the performance of RA-ILD algorithms by capturing ILD-related findings from chest CT reports, other potential applications of these findings exist. Within a healthcare system, use of the automated regular expressions program on chest CT reports may facilitate the earlier identification of RA-ILD and even identify sub-clinical RA-ILD (i.e., EHR-based disease surveillance system (20). For people with RA in the healthcare system, it could perform ILD surveillance, alert clinicians when a person with RA may have ILD based on their chest CT report, and guide the clinician to obtain additional evaluation including PFTs, specialist consultation, and multidisciplinary discussion. These applications could become increasingly relevant as future research informs the optimal monitoring and management of patients with RA-ILD.
A limitation of this study was the inability to assess the population-level sensitivity of the algorithms in the derivation cohort. In order to create our derivation cohort, we identified patients in the EHR with at least one ICD or SNOMED code for ILD. Therefore, we are unable to comment on the sensitivity of these algorithms for the total RA population. In contrast, the sampling method utilized for assembling our validation cohort allowed such inferences to be made. Second, the identification of ILD-related terms from CT reports relied on radiologist documentation, which may be impacted by inter-reader variability. However, the makeup of our validation cohort with patients from 12 separate sites helps to bolster the generalizability of these results and limits the bias that may have been present with a more limited number of reading radiologists. While ILD-related terms and their common variations were captured from reports, our program did not aim to account for misspellings or abbreviations. Finally, validation of RA-ILD cases was retrospective and not all subjects had undergone full evaluation for RA-ILD. There may be misclassification related to the reference standard, though employed criteria were consistent with prior studies (3).
In summary, by identifying ILD-related terms from chest CT reports (i.e., text mining) using automated regular expression and incorporating them into RA-ILD administrative algorithms, we significantly improved the PPV for RA-ILD in derivation and external validation cohorts. PPVs as high as 90% for RA-ILD were achieved with these algorithms, which will minimize misclassification during cohort construction and facilitate high-quality RA-ILD comparative effectiveness and outcomes research. The text mining approach could additionally be harnessed for EHR-based disease surveillance efforts to aid in the early identification of clinical and sub-clinical RA-ILD.
Supplementary Material
SIGNIFICANCE AND INNOVATIONS.
Accurately identifying individuals with rheumatoid arthritis-interstitial lung disease (RA-ILD) in real-world datasets can be challenging, with recently developed administrative algorithms obtaining positive predictive values (PPVs) of approximately 80%.
We evaluated whether extracting ILD-related terms from chest computed tomography (CT) reports in electronic health records through text mining could improve RA-ILD classification in independent, real-world datasets.
The addition of ILD-related terms extracted from chest CT reports improved the PPV of RA-ILD algorithms up to or approaching 90%, with an expected decline in algorithm sensitivity largely related to unavailability of chest CT reports.
Use of these algorithms can help to identify homogenous populations for RA-ILD comparative effectiveness and outcomes research using real-word data and/or electronic health record-based disease surveillance.
Funding:
BAL is supported by the UNMC Mentored Scholars Program. BRE is supported by a VA CSR&D (IK2 CX002203). JFB is supported by the VA CSR&D (I01 CX001703) and VA RR&D (I01 RX003644). TRM is supported by grants from the VA (BLR&D Merit I01 BX004660), National Institutes of Health (2U54GM115458), U.S. Department of Defense (PR200793), and the Rheumatology Research Foundation. JRC is supported by the National Institute of Arthritis and Musculoskeletal and Skin Diseases (P30AR072583).
*The project described utilizes the UNMC Clinical Research Analytics Environment (CRANE). CRANE is supported by funding from the National Institute of General Medical Sciences, U54 GM115458 and the Patient Centered Outcomes Research Institute, PCORI CDRN-1306-04631. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH or PCORI. The views expressed in this article are those of the authors and do not necessarily reflect the position or policy of the Department of Veterans Affairs or the United States government.
Disclosures:
BRE has consulted with Boehringer-Ingelheim and received research funding from Boehringer-Ingelheim. TRM has consulted with Pfizer, Sanofi, Gilead, Horizon and received research funding from Bristol Myers Squib and Horizon. JFB has consulted with Bristol-Myers Squib Pfizer, Cumberland Pharma, CorEvitas, and Burns-White, LLC.
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
REFERENCES
- 1.Raimundo K, Solomon JJ, Olson AL, Kong AM, Cole AL, Fischer A, et al. Rheumatoid Arthritis-Interstitial Lung Disease in the United States: Prevalence, Incidence, and Healthcare Costs and Mortality. The Journal of rheumatology. 2019;46(4):360–9. [DOI] [PubMed] [Google Scholar]
- 2.Hyldgaard C, Ellingsen T, Hilberg O, Bendstrup E. Rheumatoid Arthritis-Associated Interstitial Lung Disease: Clinical Characteristics and Predictors of Mortality. Respiration. 2019;98(5):455–60. [DOI] [PubMed] [Google Scholar]
- 3.Bongartz T, Nannini C, Medina-Velasquez YF, Achenbach SJ, Crowson CS, Ryu JH, et al. Incidence and mortality of interstitial lung disease in rheumatoid arthritis: a population-based study. Arthritis and rheumatism. 2010;62(6):1583–91. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Koduri G, Norton S, Young A, Cox N, Davies P, Devlin J, et al. Interstitial lung disease has a poor prognosis in rheumatoid arthritis: results from an inception cohort. Rheumatology (Oxford, England). 2010;49(8):1483–9. [DOI] [PubMed] [Google Scholar]
- 5.Spagnolo P, Lee JS, Sverzellati N, Rossi G, Cottin V. The Lung in Rheumatoid Arthritis: Focus on Interstitial Lung Disease. Arthritis & rheumatology (Hoboken, NJ). 2018;70(10):1544–54. [DOI] [PubMed] [Google Scholar]
- 6.England BR, Hershberger D. Management issues in rheumatoid arthritis-associated interstitial lung disease. Current opinion in rheumatology. 2020;32(3):255–63. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Samhouri BF, Vassallo R, Achenbach SJ, Kronzer VL, Davis JM 3rd, Myasoedova E, et al. Incidence, Risk Factors, and Mortality of Clinical and Subclinical Rheumatoid Arthritis-Associated Interstitial Lung Disease: A Population-Based Cohort. Arthritis care & research. 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Brito Y, Glassberg MK, Ascherman DP. Rheumatoid arthritis-associated interstitial lung disease: current concepts. Current rheumatology reports. 2017; 19(12):1–8. [DOI] [PubMed] [Google Scholar]
- 9.Curtis JR, Sarsour K, Napalkov P, Costa LA, Schulman KL. Incidence and complications of interstitial lung disease in users of tocilizumab, rituximab, abatacept and anti-tumor necrosis factor ɑ agents, a retrospective cohort study. Arthritis research & therapy. 2015;17:319. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Suissa S, Hudson M, Ernst P. Leflunomide use and the risk of interstitial lung disease in rheumatoid arthritis. Arthritis and rheumatism. 2006;54(5):1435–9. [DOI] [PubMed] [Google Scholar]
- 11.Olson AL, Swigris JJ, Sprunger DB, Fischer A, Fernandez-Perez ER, Solomon J, et al. Rheumatoid arthritis-interstitial lung disease-associated mortality. Am J Respir Crit Care Med. 2011;183(3):372–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Hyldgaard C, Hilberg O, Pedersen AB, Ulrichsen SP, Løkke A, Bendstrup E, et al. A population-based cohort study of rheumatoid arthritis-associated interstitial lung disease: comorbidity and mortality. Annals of the rheumatic diseases. 2017;76(10):1700–6. [DOI] [PubMed] [Google Scholar]
- 13.England BR, Roul P, Mahajan TD, Singh N, Yu F, Sayles H, et al. Performance of Administrative Algorithms to Identify Interstitial Lung Disease in Rheumatoid Arthritis. Arthritis care & research. 2020;72(10):1392–403. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Cho SK, Doyle TJ, Lee H, Jin Y, Tong AY, Ortiz AJS, et al. Validation of claims-based algorithms to identify interstitial lung disease in patients with rheumatoid arthritis. Seminars in arthritis and rheumatism. 2020;50(4):592–7. [DOI] [PubMed] [Google Scholar]
- 15.Meehan M, Shah A, Lobo J, Oates J, Clinton C, Annapureddy N, et al. Validation of an algorithm to identify incident interstitial lung disease in patients with rheumatoid arthritis. Arthritis research & therapy. 2022;24(1):2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Waitman LR, Aaronson LS, Nadkarni PM, Connolly DW, Campbell JR. The Greater Plains Collaborative: a PCORnet Clinical Research Data Network. J Am Med Inform Assoc. 2014;21(4):637–41. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Aletaha D, Neogi T, Silman AJ, Funovits J, Felson DT, Bingham CO 3rd, et al. 2010 Rheumatoid arthritis classification criteria: an American College of Rheumatology/European League Against Rheumatism collaborative initiative. Arthritis and rheumatism. 2010;62(9):2569–81. [DOI] [PubMed] [Google Scholar]
- 18.Kaur G. Usage of regular expressions in NLP. International Journal of Research in Engineering and Technology IJERT. 2014;3(01):7. [Google Scholar]
- 19.Arnett FC, Edworthy SM, Bloch DA, McShane DJ, Fries JF, Cooper NS, et al. The American Rheumatism Association 1987 revised criteria for the classification of rheumatoid arthritis. Arthritis and rheumatism. 1988;31(3):315–24. [DOI] [PubMed] [Google Scholar]
- 20.Aliabadi A, Sheikhtaheri A, Ansari H. Electronic health record-based disease surveillance systems: A systematic literature review on challenges and solutions. J Am Med Inform Assoc. 2020;27(12):1977–86. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
