The role and uptake of coronary computed tomography angiography (CTA) in the evaluation and management of cardiovascular disease continues to expand rapidly throughout the United States and world-wide.1 Importantly, as the amount of cardiovascular imaging data grows exponentially, there is an unmet need to develop efficient techniques to accurately extract and characterize such information for research, quality improvement, and population health efforts. Natural language processing (NLP) is one such technology that has been developed across a variety of clinical domains, working to transform clinical reports into structured data available for large-scale analytic work.2,3 Accordingly, we sought to develop and validate a multidimensional NLP system to accurately phenotype coronary CTA reports to serve as the foundation for the Mass General Brigham Coronary CTA Registry, one of the largest coronary CTA registries in the United States.
To achieve this goal, we designed three separate but interrelated NLP modules using the open-source Canary NLP system4 to accurately phenotype coronary CTA text reports from two large academic medical centers in Boston, MA. The primary “qualitative” module was designed to extract and categorize the qualitative descriptions of coronary stenoses in the free-text of coronary CTA reports. The secondary “quantitative” module was built to extract numeric references to the percent stenosis for each vessel. Finally, a third “plaque” module was designed to extract references to the presence of atherosclerotic plaque to further ascertain cases where a vessel may have a description of coronary plaque even though there is no associated luminal stenosis.
We built the modules using the Canary NLP platform because of its transparent human-led design processes allowing for optimization and error correction, transferability to other institutions and datasets, and its excellent accuracy compared to other NLP methodologies. For each module, a custom set of word classes were created, which represented linguistically similar words or phrases that are then used to build varying sentence structures. Additionally, phrase structures were designed by combining different word classes to generate meaningful linguistic information that can then be extracted as structured data. The goal of this process was to design linguistically flexible – but highly accurate – sentence structures to extract the concepts of interest as structured data outputs. The qualitative module contained 22 word classes and 1073 phrase structures, the quantitative module contained 12 word classes and 85 phrase structures, and the plaque module contained 20 word classes and 1021 phrase structures.
As part of the validation process, 520 random coronary CTA reports were downloaded from a cohort of 28,175 individuals who underwent coronary CTA from 2003 to 2021. A total of 313 reports were used as a training set and 207 reports were used as a validation set. All 520 reports were manually adjudicated for the presence of plaque and for the severity of reported stenosis in each coronary artery by two cardiovascular imaging specialists with expertise in coronary CTA (D.M.H. and A.S.). The designer of the NLP (A.N.B.) was blinded to the validation set reports. This study was approved by the Institutional Review Board at Mass General Brigham.
In order to best categorize the degree of luminal stenosis, each vessel was adjudicated in accordance with the Coronary Artery Disease Reporting and Data System 2.0 (CAD-RADS),5 regardless of whether the coronary CTA report provided information on the qualitative degree of stenosis, percent stenosis, or a combination thereof. Accordingly, the adjudicators categorized the reported degree of stenosis into the following qualitative categories based on linguistic descriptors used in common clinical practice: (1) minimal, (2) mild, (3) moderate, (4) significant, (5) severe, (6) occluded, and (7) non-diagnostic vessel. When only the percent stenosis was reported for a given vessel, the following cut-points were determined based on CAD-RADS 2.05 and supplemented the primary qualitative stenosis data: (1) 1–24%: minimal, (2) 25–49%: mild, (3) 50–69%: moderate, (4) 70–100%: severe, and (5) 100%: occluded. Finally, when there was no stenosis reported in a given vessel, the presence or absence of plaque was extracted in a binary fashion.
We calculated the performance of the combined NLP modules on the 207 validation set reports by computing the sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) on a per-vessel, per-report level. The accuracy of the combined NLP modules was calculated as (true positives + true negatives)/(true positives + false positives + false negatives + true negatives). The 207 reports in the validation set described the findings for 1298 coronary vessels. The NLP output was conservatively defined as a true positive when the output was an exact match with the manual physician adjudication. Accordingly, if the NLP output for a given coronary artery indicated the presence of “moderate stenosis” but the manual adjudication indicated “severe stenosis,” this was conservatively classified as a false negative. Additionally, we separately assessed the performance of the NLP in determining the presence of a non-diagnostic coronary vessel. Finally, as an additional measure of robustness and clinical validity, the CAD-RADS 2.0 classification5 was computed for each of the 207 reports and the performance of the NLP was compared to that of the manually adjudicated validation set.
Among the 207 validation set reports representing 1298 described vessels, there were 390 unique coronary arteries with luminal stenosis or plaque. The NLP modules demonstrated excellent performance across the prespecified degrees of stenosis and coronary plaque with a sensitivity of 90.3%, specificity of 98.4%, and PPV and NPV exceeding 95%. Additionally, the NLP demonstrated robust performance with regard to the calculated CAD-RADS classification with a calculated accuracy exceeding 95%. See Table 1 for the full performance details. After the modules were validated, they were deployed on 30,279 coronary CTA reports representing 28,175 patients within the Mass General Brigham system from 2003 to 2021. Preliminary analysis has demonstrated that 19,605 (65%) of the studies had at least 1 vessel with reported plaque or stenosis. Of the 30,279 studies, 35% (10,674) had a calculated CAD-RADS of 0, 13% (3982) CAD-RADS 1, 25% (7584) CAD-RADS 2, 7% (2235) CAD-RADS 3, 13% (3760) CAD-RADS 4A-4B, and 7% (2044) CAD-RADS 5.
Table 1.
Natural language processing performance.
| NLP Performance: Per-Vessel, Per-Report, Stenosis and Plaque | |||||
|---|---|---|---|---|---|
| Cardiologist Manual Adjudication | |||||
| Natural Language Processing | Positive | Negative | |||
| Positive | 352 | 15 | 367 | PPV: 95.9% (95% CI 93.4–97.7%) | |
| Negative | 38 | 893 | 931 | NPV: 95.9% (95% CI 94.4–97.1%) | |
| 390 | 908 | 1298 vessels | |||
| Sensitivity: 90.3% | Specificity: 98.4% | Accuracy: 95.9% | |||
| (95% CI 86.9–93.0%) | (95% CI 97.3–99.1%) | ||||
| NLP Performance: Per-Patient, Overall CAD-RADS | |||||
| Cardiologist Manual Adjudication | |||||
| Natural Language Processing | Positive | Negative | |||
| Positive | 128 | 2 | 130 | PPV: 98.5% (95% CI 94.6–99.8%) | |
| Negative | 7 | 70 | 77 | NPV: 90.9% (95% CI 82.2–96.3%) | |
| 135 | 72 | 207 reports | |||
| Sensitivity: 94.8% | Specificity: 97.2% | Accuracy: 95.7% | |||
| (95% CI 89.6–97.9%) | (95% CI 90.3–99.7%) | ||||
| NLP Performance: Per-Vessel, Non-Diagnostic Segment | |||||
| Cardiologist Manual Adjudication | |||||
| Natural Language Processing | Positive | Negative | |||
| Positive | 27 | 8 | 35 | PPV: 77.1% (95% CI 59.9–89.6%) | |
| Negative | 24 | 1239 | 1263 | NPV: 98.1% (95% CI 97.2–98.8%) | |
| 51 | 1247 | 1298 vessels | |||
| Sensitivity: 52.9% | Specificity: 99.4% | Accuracy: 97.5% | |||
| (95% CI 38.5–67.1%) | (95% CI 98.7–99.7%) | ||||
As the role of coronary CTA continues to expand in clinical practice, it is paramount to understand factors that are associated with adverse cardiovascular prognosis as well as opportunities for targeted preventive therapies. By harnessing the power and efficiency of NLP, this work will enable the large-scale analysis of coronary CTA findings merged with detailed clinical information for a large contemporary coronary CTA registry. A limitation of this work is that it does not rely on direct image analysis but rather extracts findings directly from imaging text reports which may be subject to inter-reader variability in stenosis/plaque reporting. Nevertheless, the NLP was demonstrated to be extremely accurate and is computationally efficient, allowing it to process thousands of reports in just a few hours. We believe that this NLP and its rigorous validation will serve as a model for how to effectively leverage large-scale preexisting cardiovascular imaging data to generate deep insights regarding the interplay of coronary atherosclerosis, baseline risk factors, and patient outcomes.
Footnotes
Disclosures
There are no relevant disclosures.
Declaration of competing interest
The authors have no relevant disclosures or conflicts of interest related to the development and validation of the natural language processing modules described in this Technical Report.
Contributor Information
Adam N. Berman, Division of Cardiovascular Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
Arthur Shiyovich, Division of Cardiovascular Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA.
David W. Biery, Division of Cardiovascular Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
Rhanderson N. Cardoso, Division of Cardiovascular Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
Brittany N. Weber, Division of Cardiovascular Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
Milena Petranovic, Department of Radiology, Division of Thoracic Imaging and Intervention, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA.
Stephanie A. Besser, Division of Cardiovascular Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
Jon Hainer, Department of Radiology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA.
Jason H. Wasfy, Cardiology Division, Department of Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
Alexander Turchin, Division of Endocrinology, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA.
Marcelo F. Di Carli, Division of Cardiovascular Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA; Department of Radiology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
Ron Blankstein, Division of Cardiovascular Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA; Department of Radiology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA.
Daniel M. Huck, Division of Cardiovascular Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA; Department of Radiology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA.
References
- 1.Gulati M, Levy PD, Mukherjee D, et al. 2021 AHA/ACC/ASE/CHEST/SAEM/SCCT/SCMR guideline for the evaluation and diagnosis of chest pain: executive summary: a report of the American college of cardiology/American heart association joint committee on clinical practice guidelines. Circulation. 2021;144(22):e368–e454. 10.1161/CIR.0000000000001030. [DOI] [PubMed] [Google Scholar]
- 2.Berman AN, Ginder C, Sporn ZA, et al. Natural Language processing for the ascertainment and phenotyping of left ventricular hypertrophy and hypertrophic cardiomyopathy on echocardiogram reports. Am J Cardiol. 2023;206:247–253. 10.1016/j.amjcard.2023.08.109. [DOI] [PubMed] [Google Scholar]
- 3.Berman AN, Biery DW, Ginder C, et al. Natural language processing for the assessment of cardiovascular disease comorbidities: the cardio-Canary comorbidity project. Clin Cardiol. 2021;44(9):1296–1304. 10.1002/clc.23687. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Malmasi S, Sandor NL, Hosomura N, Goldberg M, Skentzos S, Turchin A. Canary: an NLP platform for clinicians and researchers. Appl Clin Inf. 2017;8(2):447–453. 10.4338/ACI-2017-01-IE-0018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Cury RC, Leipsic J, Abbara S, et al. CAD-RADS 2.0 - 2022 coronary artery disease-reporting and data system: an expert consensus document of the society of cardiovascular computed tomography (SCCT), the American college of cardiology (ACC), the American college of radiology (ACR), and the north America society of cardiovascular imaging (NASCI). JACC Cardiovasc Imaging. 2022;15(11):1974–2001. 10.1016/j.jcmg.2022.07.002. [DOI] [PubMed] [Google Scholar]
