Abstract
Background
Bicuspid aortic valve (BAV) is the most common congenital heart defect but often evades timely diagnosis due to variable clinical presentations. Prior to October 2024, no specific diagnosis code existed for BAV, limiting retrospective identification.
Objectives
The purpose of this study was to develop and validate a natural language processing (NLP) system for automated extraction of heart valve morphology from echocardiographic reports, with focus on BAV detection.
Methods
We developed a rule-based NLP system using MedSpaCy to analyze echocardiographic reports from the Veterans Affairs Corporate Data Warehouse. The system was trained on 555 manually annotated reports and validated on 170 held-out reports. System performance was evaluated on valve leaflet structure identification.
Results
The NLP system achieved excellent performance for BAV detection with a precision of 0.925, a sensitivity of 0.939, and an F1-score of 0.932. When applied to 14,453,591 echocardiographic documents from 3,478,658 patients, the system identified 83,461 patients (2.40%) with affirmed BAV. Among patients identified by the International Classification of Diseases-10 code Q23.81, NLP showed 86.1% concordance, with manual review confirming NLP accuracy in discordant cases.
Conclusions
This NLP approach enables large-scale retrospective identification of BAV patients from clinical text, creating the largest BAV cohort to date and facilitating future cardiovascular research and clinical decision-making.
Key words: bicuspid aortic valve, congenital heart disease, echocardiography, electronic health records, natural language processing
Central Illustration
Advances in echocardiography have substantially improved screening capabilities for valvular heart disease over the past decade. However, the unstructured free-text format of echocardiographic reports creates significant barriers to systematic data analysis and large-scale research applications.1,2 Lack of standardization in report formatting and time-intensive manual extraction limit our ability to analyze echocardiographic data on a larger scale.
Bicuspid aortic valve (BAV) is the most common congenital cardiac malformation, affecting 0.5% to 2% of the general population.3 This condition most commonly results from developmental fusion of 2 aortic valve cusps (fused-type BAV), creating a central raphe and eliminating a functional commissure.4 Less commonly, BAV presents as 2 cusps of roughly equal size, each associated with one of only 2 aortic sinuses (2-sinus BAV type); no raphe is present in this type.4 Finally, patients with the newly recognized partial-fusion BAV have 3 aortic sinuses and 3 cusps; however, one of the 3 commissures is partially fused (<50%).4 The resulting structural abnormalities from these BAV phenotypes disrupt normal aortic valve hemodynamics and predispose patients to serious cardiovascular complications including aortic stenosis, aortic regurgitation, and thoracic aortic aneurysm formation.5 Given the wide spectrum of clinical presentations—with many patients remaining asymptomatic for years—BAV frequently escapes early detection,6,7 emphasizing the critical importance of systematic screening and identification strategies.
The primary diagnostic modality for BAV is transthoracic echocardiography, with findings typically documented in clinical reports.5,6 Until October 2024, the International Classification of Diseases (ICD) lacked a specific code for BAV, creating substantial challenges for retrospective patient identification and epidemiological studies. Natural language processing (NLP) has emerged as a powerful tool for extracting structured information from clinical text, with several studies successfully applying NLP to echocardiographic reports for valve assessment and quantitative measurements.8, 9, 10, 11, 12, 13, 14 Recent investigations have expanded beyond single-valve analysis to comprehensive evaluation of all 4 cardiac valves.1,15 These advances demonstrate the potential for large-scale automated analysis of cardiac valve data, creating new opportunities for studying complex conditions like BAV.
This study presents a novel NLP framework specifically designed to extract heart valve morphological information from echocardiographic reports, with particular emphasis on accurate BAV identification. Using a comprehensive data set from the U.S. Department of Veterans Affairs (VA) health system, we developed and rigorously validated an automated approach for systematic extraction of BAV-related clinical information. We examined system performance characteristics, implementation challenges, and the broader implications for improving BAV diagnosis and cardiovascular research.
Methods
Study setting and data source
The VA operates the largest integrated health care system in the United States, encompassing 170 medical centers and 1,380 outpatient clinics across the United States, territories, and Philippines. The Corporate Data Warehouse (CDW) maintains comprehensive electronic health record (EHR) data for over 25 million patients dating back to 1994. This retrospective study was conducted using VA CDW clinical data with Institutional Review Board approval and waiver of informed consent and Health Insurance Portability and Accountability Act (HIPAA) authorization from the Philadelphia and Salt Lake City VA Medical Centers.
Study population and data selection
The study cohort included all patients who underwent at least one echocardiography procedure within the VA health care system. Procedures were identified using a comprehensive list of Current Procedural Terminology (CPT) codes (Supplemental Table 1). Associated echocardiographic reports were extracted from CDW. Given the primary research focus on BAV identification, a separate keyword search was implemented to capture documents containing variants of “bicuspid” and “bileaflet” terminology (complete keyword list provided in Appendix), resulting in an additional 1.4 million documents. The final cohort comprised 14,453,591 echocardiographic documents, corresponding to 3,478,658 unique patients.
For model development and validation, 850 reports were selected for manual annotation. Given the rarity of BAV, random samples did not result in any relevant cases; therefore, 20% of documents were sampled randomly and 80% were selected using variants of the terms “bicuspid” and “bileaflet” (see a full set of terms used in the Supplemental Appendix). All reports were randomly split into groups: 555 (65%) for training, 125 (15%) for validation, and 170 (20%) for final testing. Patient-level separation was maintained between training and testing cohorts to ensure unbiased performance evaluation. The training set was used for NLP development, an iterative process of creating rules and logic and measuring performance against the annotated data. Once the system was performing well on the training set, the validation set was used to check for new errors prior to measuring final performance on the test set.
As additional validation, we identified VA patients who received the newly implemented BAV-specific ICD-10 code (Q23.81) following echocardiographic procedures and compared these cases with NLP-identified BAV patients.
Manual annotation protocol
Clinical reports were annotated using the eHOST annotation platform16 by 2 registered nurses with extensive chart review experience. The annotation guideline specified identification of 4 cardiac valves (aortic, mitral, tricuspid, pulmonary), associated leaflet structures when present, and creation of relationships between valve and structure entities. Given the limited terminology used for cardiac valve and leaflet structure description, the eHOST preannotation tool was utilized to highlight known keywords prior to manual review, with annotators instructed to identify any missed entities.
Context statements including uncertainty, negation, historical mentions, or nonpatient experiencers were captured and linked to corresponding valve entities. Prosthetic heart valves were annotated as distinct entities. For bicuspid leaflet structures, annotators specified whether the valve was functionally or congenitally bicuspid (default: unspecified). The final annotation schema included 14 entities, 2 relationship types (valve-to-leaflet and valve-to-context), and 1 attribute classification. A total of 1,641 annotations were created, with 315 utilized in the held-out test data set.
NLP system architecture
The NLP system was designed as a rule-based framework for identifying cardiac valves and associated leaflet morphology in echocardiographic reports. Development utilized MedSpaCy,17 a Python clinical NLP library built on the spaCy framework. Figure 1 illustrates the system architecture and processing logic.
Figure 1.
Overview of the NLP System Logic With an Example of the Input and Output Structure
NLP = natural language processing.
The system initiates document processing by identifying mentions of the 4 cardiac valves: aortic, tricuspid, pulmonary, and mitral. While valve acronyms were accepted, complete valve phrases were required for entity recognition (isolated terms like “aortic” were excluded). Each identified heart valve served as an anchor point for subsequent concept attachment, with valve identification representing the minimum system output.
Following valve identification, the system searched the containing sentence for leaflet structure terms: bicuspid, tricuspid, and normal. Priority assignment favored more specific terms (bicuspid, tricuspid) based on proximity to the valve mention. Normal structure classification was assigned only when specific terms were absent. To distinguish functional BAV from congenital variants, the system searched for terms indicating functional bicuspid morphology. Prosthetic valve identification was implemented to differentiate mechanical from native BAV, as many prosthetic aortic valves exhibit bicuspid structure.
The ConText algorithm18 was then applied at the sentence level to identify uncertainty, negation, historical, and nonpatient experiencer modifiers related to valve leaflet structure. To enhance BAV detection sensitivity, a custom component analyzed aortic valve instances lacking sentence-level leaflet structure assignment by creating a 100-token window extension and flagging bicuspid terminology within this expanded context.
System output included the document identification, valve type and raw text, leaflet structure type and raw text, context flags, functional and prosthetic indicators, the sentence containing the heart valve, and the heart valve start and end indices. The NLP system has been made publicly available at https://github.com/VINCI-AppliedNLP/bicuspid-aortic-valve.
Statistical analysis
The NLP system was validated by processing the 170 annotated documents set aside for testing and comparing the system output with the manual chart reviewed labels. We used the Python library sklearn19 to count the true positive, false positive, false negative, and true negative cases and to calculate the performance metrics. A case was considered a true positive if the manually annotated heart valve and leaflet structure both matched the valve and structure output by the NLP. False negatives were cases where the annotator labeled the valve leaflet structure, but the structure was not output by NLP. False positives were instances where either the NLP system output a leaflet structure, but the annotator did not identify one, or the NLP-identified leaflet structure did not match the manually labeled structure type. A case was considered a true negative when there was no manually labeled structure for the valve and the NLP did not output a structure.
Precision, sensitivity, F1 score, specificity, negative predictive value, positive likelihood ratio, and negative likelihood ratio were calculated for each valve type. The equations for each metric are given in Supplemental Table 2. CIs were calculated through bootstrapping, which was performed using the “sample” function from the pandas20 Python library.
Results
Study cohort description
The final cohort comprised 14,453,591 echocardiographic documents, corresponding to 3,478,658 unique patients. The majority (87%) were associated with one of 4 CPT codes: 93306, 93307, 93320, and 93325. The number of documents associated with each CPT code are presented in the Appendix.
Documents were not filtered by date, so the dates associated with the included documents ranged from the year 1841, which is the automatic substitution for unknown dates used in the VA, to 2,697. When reviewing the distribution between 2000 and 2024, the number of generated reports steadily increased from 174,940 in 2000 to 674,842 in 2009. From 2009 to 2024, an average of 682,102 reports were generated each year, with a dip to 572,516 reports in 2020.
Considering the distribution across VA sites, the average number of reports per station was 111,186 and all 130 stations produced at least 1 report. The station with the most reports produced 872,175 overall, with the second highest-producing station generating 417,357 reports. The majority of stations (70) generated <100,000 reports each.
NLP system performance
The system was evaluated on the 338 instances from the held-out test set where both heart valve and leaflet structure annotations were present. Supplemental Table 3 presents the F1-score, negative predictive value, positive likelihood ratio, and negative likelihood ratio at the instance level, stratified by valve type. Given the primary focus on BAV identification, the performance of BAV detection is presented separately, but it is a subset of the overall aortic valve group.
Of all 850 manually annotated reports, 376 (44.24%) had at least one instance of BAV. On the test set, the system achieved a sensitivity of 0.939 (95% CI: 0.868-1.00) and a precision of 0.925 (95% CI: 0.852-0.985), corresponding to an F1-score of 0.932 (95% CI: 0.882-0.977), for BAV detection specifically. Overall system performance across all valve types demonstrated a sensitivity of 0.935 (95% CI: 0.921-0.967), a precision of 0.905 (95% CI: 0.847-0.928), and an F1-score of 0.920 (95% CI: 0.885-0.946). The total support count of 338 represents nonoverlapping instances, with the 66 BAV cases constituting a subset of the broader aortic valve category.
Patient-level BAV classification
Application of the NLP system to the complete data set of 14,453,591 documents identified 653,208 documents (4.52%) and 83,461 patients (2.40%) with affirmed BAV. Patient-level BAV classification required at least one instance of aortic valve with bicuspid structure that was neither prosthetic nor functionally bicuspid. Reports in which BAV was explicitly negated or that included nonpatient experiencer modifiers were excluded from downstream analysis. Given the congenital nature of BAV, instances with historical modifiers were considered affirmed cases. Patients with exclusively uncertain BAV mentions were classified as possible BAV. For patients who underwent aortic valve replacement procedures, affirmed BAV classification required evidence of congenital BAV in preprocedural reports. If no leaflet was extracted, the valve was not assumed to be normal. Rather, the report was excluded from downstream analysis.
Comparison with ICD-10 BAV classification
Analysis of patients assigned the BAV-specific ICD-10 code (Q23.81) identified 1,573 individuals who received this diagnosis following echocardiographic procedures. Among these, 1,355 patients (86.1%) were concordantly identified as having BAV by the NLP system.
To investigate the 218 discordant cases (ICD-10 positive, NLP negative), 50 patients were randomly selected for detailed manual review of corresponding echocardiographic reports and NLP output. The majority of these reports explicitly documented tricuspid aortic valve morphology or structurally normal valves, with the NLP system correctly extracting this information. Less frequently, reports indicated poor aortic valve visualization or absent morphological description, resulting in appropriate null NLP output. Importantly, no discordant cases were attributed to NLP system errors, suggesting potential issues with ICD-10 coding accuracy or documentation of BAV diagnosis in reports not captured by our selection criteria.
Discussion
We successfully developed and validated an NLP system capable of accurately extracting heart valve leaflet morphology from echocardiographic reports, with particular strength in BAV identification. Implementation across the VA health care system identified 83,461 patients with affirmed BAV, representing the largest BAV cohort assembled to date, exceeding previous studies by more than ten-fold.21, 22, 23, 24
Previous BAV research has relied on CPT codes for aortic valve replacement procedures and ICD codes related to congenital cardiac malformations for patient identification.21 However, these approaches lack specificity for BAV and may substantially underestimate disease prevalence. Our NLP approach enables researchers to create focused BAV cohorts that extend well beyond the limitations of structured EHR data elements. The high-performance metrics achieved on the independent test set (F1-score of 0.932 for BAV) demonstrate the system’s reliability for accurate case identification.
The recent introduction of the BAV-specific ICD-10 code (Q23.81) occurred near the completion of our study, resulting in limited overlap between NLP-identified and ICD-coded patients within our data set. Manual review of discordant cases revealed that ICD-10 positive but NLP-negative patients had clear documentation of tricuspid or normal aortic valve morphology, suggesting either documentation of BAV in external reports not captured by our analysis or potential diagnostic coding errors. Future work may include additional manual chart review to fully investigate the cause of these disagreements. While this ICD-10 code provides a new avenue for BAV identification, it only became available in October 2024. Additionally, previous studies have revealed inconsistencies in ICD coding, suggesting ICD codes alone are insufficient for diagnosis identification in observational studies.25 Our NLP system enables comprehensive retrospective identification of BAV patients using historical EHR data predating this ICD coding implementation.
Error analysis and system limitations
Analysis of NLP output errors revealed that the majority involved tricuspid and normal leaflet structure classification. The dual meaning of “tricuspid” as both a valve name and structure type created classification challenges when both references appeared in the same sentence (eg, “TRICUSPID VALVE: tricuspid is normal in morphology”). More sophisticated disambiguation logic would be required to address these instances.
Similarly, “normal” references to concepts other than leaflet structure (eg, “aortic valve excursion is normal”) were incorrectly classified as normal leaflet morphology. This issue particularly affected pulmonary valve classification, which had the lowest annotation support (54 instances) and frequent references to normal functional parameters. The system's prioritization of specific terms (tricuspid, bicuspid) before general terms (normal) improved overall performance but requires refinement for comprehensive valve assessment.
Additional error sources included incomplete valve phrase identification, unrecognized prosthetic valves, and spelling variants not captured during training. These issues were relatively infrequent but highlight areas for future system enhancement.
Study limitations
Several limitations merit consideration. The NLP approach only identifies BAV in patients who underwent echocardiographic procedures within the VA system, potentially missing cases diagnosed through external health care providers. The system was specifically developed and validated using echocardiographic reports, and performance may decrease when applied to other clinical note types.
To minimize overall errors, the system assigns each structure term to only one valve entity, though some reports use single structure terms to describe multiple valves (eg, “aortic valve, mitral valve, tricuspid valve, pulmonary valve are normal”). More sophisticated logic would be needed to handle these instances, though their rarity in our data set did not justify implementation complexity.
The literature describes unicuspid and quadricuspid valve morphologies as extremely rare variants. No instances were identified during development, so these structure types are not included in the current system. The prosthetic valve detection rules were optimized for aortic valves given the BAV focus, and additional development would be needed for comprehensive prosthetic valve identification across all cardiac valves.
Clinical and research implications
This work demonstrates the potential for NLP technology to unlock valuable clinical information embedded within unstructured medical text. The ability to systematically identify large BAV cohorts from historical echocardiographic data creates new opportunities for cardiovascular research, including studies of disease progression, treatment outcomes, and genetic associations. The automated approach also supports clinical decision-making by facilitating identification of patients who may benefit from specialized cardiovascular care or surveillance protocols.
Conclusions
We present a validated rule-based NLP system that accurately extracts heart valve leaflet morphology from clinical echocardiographic reports. The system demonstrated excellent performance for BAV identification, enabling creation of the largest BAV patient cohort reported to date. This automated approach addresses the historical challenge of BAV identification in structured EHR data and facilitates large-scale retrospective cardiovascular research (Central Illustration).
Central Illustration.
Using NLP to Identify Bicuspid Aortic Valve in Echocardiography Reports
BAV is a rare congenital heart defect, which is recorded in free-text echocardiography reports. An NLP system was designed to extract heart valve leaflet structure information for all 4 heart valves, with the initial use case being the identification of BAV. System achieved an F1 score of 0.932 on BAV identification, and retrospective application of the NLP system at the VA identified 83,461 BAV patients, enabling large-scale retrospective studies of this phenotype. BAV = bicuspid aortic valve; NER = named entity recognition; other abbreviation as in Figure 1.
The successful implementation of NLP for cardiac valve assessment illustrates the broader potential for automated clinical text processing to enhance both research capabilities and clinical care. As health care systems increasingly recognize the value of unstructured clinical data, sophisticated NLP approaches will play an essential role in translating documented clinical observations into actionable insights for patient care and scientific discovery.
In this paper, we present a custom NLP rule-based system that accurately extracts heart valve leaflet structure information from clinical notes. The initial use case for these data was the identification of BAV, which has not been available in structured data elements of the EHR until quite recently. The NLP system identified 83,461 patients with BAV from free-text echocardiography reports, making this the largest cohort of patients with BAV to our knowledge and enabling large-scale retrospective studies of this phenotype.
Perspectives.
COMPETENCY IN SYSTEMS-BASED PRACTICE: NLP tools for automated extraction of heart valve morphology from clinical reports may enhance clinical decision-making for patients with BAV and other structural valve abnormalities by enabling systematic identification and tracking of these conditions across large health care systems.
TRANSLATIONAL OUTLOOK: The development of large, well-characterized BAV patient cohorts through automated text processing creates new opportunities for cardiovascular research, including studies of disease progression, treatment outcomes, genetic associations, and population-level epidemiology that were previously limited by manual data extraction constraints.
Funding support and author disclosures
This work was supported using resources and facilities of the Department of Veterans Affairs (VA) Informatics and Computing Infrastructure (VINCI), including NLP resources, which is funded under the research priority to Put VA Data to Work for Veterans (VA ORD 24-D4V-02). Drs Bowles, Lynch, DiNatale, Pridgen, and Alba have received grants from Alnylam Pharmaceuticals, Inc, AstraZeneca Pharmaceuticals LP, Biodesix, Inc, Janssen Pharmaceuticals, Inc, Novartis International AG, Parexel International Corporation through the University of Utah or Western Institute for Veteran Research, outside the submitted work. Dr Levin has received grants from the Doris Duke Foundation (2023-2024); research funding to the institution from MyOme; and consulting fees from BridgeBio, outside the submitted work. Dr Damrauer has received grants from the National Heart, Lung, and Blood Institute, in kind support from Novo Nordisk; and has received consulting fees from Tourmaline Bio, outside the current work. This publication does not represent the views of the Department of Veterans Affairs or the United States Government. All other authors have reported that they have no relationships relevant to the contents of this paper to disclose.
Footnotes
The authors attest they are in compliance with human studies committees and animal welfare regulations of the authors’ institutions and Food and Drug Administration guidelines, including patient consent where appropriate. For more information, visit the Author Center.
Appendix
For supplemental tables, please see the online version of this paper.
Supplementary Materials
a
References
- 1.Xie F., Lee M.S., Allahwerdy S., Getahun D., Wessler B., Chen W. Identifying the severity of heart valve stenosis and regurgitation among a diverse population within an integrated health care system: natural language processing approach. JMIR Cardio. 2024;8 doi: 10.2196/60503. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Gonzalez-Hernandez G., Sarker A., O’Connor K., Savova G. Capturing the patient’s perspective: a review of advances in natural language processing of health-related text. Yearb Med Inform. 2017;26(01):214–227. doi: 10.15265/IY-2017-029. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Siu S.C., Silversides C.K. Bicuspid aortic valve disease. J Am Coll Cardiol. 2010;55(25):2789–2800. doi: 10.1016/j.jacc.2009.12.068. [DOI] [PubMed] [Google Scholar]
- 4.Michelena H.I., Della Corte A., Evangelista A., et al. International consensus statement on nomenclature and classification of the congenital bicuspid aortic valve and its aortopathy, for clinical, surgical, interventional and research purposes. Ann Thorac Surg. 2021;112(3):e203–e235. doi: 10.1016/j.athoracsur.2020.08.119. [DOI] [PubMed] [Google Scholar]
- 5.Tessler I., Albuisson J., Goudot G., et al. Bicuspid aortic valve: genetic and clinical insights. AORTA. 2021;09(04):139–146. doi: 10.1055/s-0041-1730294. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Liu T., Xie M., Lv Q., et al. Bicuspid aortic valve: an update in morphology, genetics, biomarker, complications, imaging diagnosis and treatment. Front Physiol. 2019;9:1921. doi: 10.3389/fphys.2018.01921. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Michelena H.I., Desjardins V.A., Avierinos J.F., et al. Natural history of asymptomatic patients with normally functioning or minimally dysfunctional bicuspid aortic valve in the community. Circulation. 2008;117(21):2776–2784. doi: 10.1161/CIRCULATIONAHA.107.740878. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Solomon M.D., Tabada G., Allen A., Sung S.H., Go A.S. Large-scale identification of aortic stenosis and its severity using natural language processing on electronic health records. Cardiovasc Digit Health J. 2021;2(3):156–163. doi: 10.1016/j.cvdhj.2021.03.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Fontenla-Seco Y., Lama M., González-Salvado V., Peña-Gil C., Bugarín-Diz A. A framework for the automatic description of healthcare processes in natural language: application in an aortic stenosis integrated care process. J Biomed Inform. 2022;128 doi: 10.1016/j.jbi.2022.104033. [DOI] [PubMed] [Google Scholar]
- 10.Nath C., Albaghdadi M.S., Jonnalagadda S.R. A natural language processing tool for large-scale data extraction from echocardiography reports. PLoS One. 2016;11(4) doi: 10.1371/journal.pone.0153749. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Dong T., Sunderland N., Nightingale A., et al. Development and evaluation of a natural language processing system for curating a trans-thoracic echocardiogram (TTE) database. Bioengineering (Basel) 2023;10(11):1307. doi: 10.3390/bioengineering10111307. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Vaid A., Argulian E., Lerakis S., et al. Multi-center retrospective cohort study applying deep learning to electrocardiograms to identify left heart valvular dysfunction. Commun Med. 2023;3(1):24. doi: 10.1038/s43856-023-00240-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Strange G., Stewart S., Watts A., Playford D. Enhanced detection of severe aortic stenosis via artificial intelligence: a clinical cohort study. Open Heart. 2023;10(2) doi: 10.1136/openhrt-2023-002265. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Ueda D., Yamamoto A., Ehara S., et al. Artificial intelligence-based detection of aortic stenosis from chest radiographs. Eur Heart J - Digit Health. 2022;3(1):20–28. doi: 10.1093/ehjdh/ztab102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Patterson O.V., Freiberg M.S., Skanderson M., Samah J.F., Brandt C.A., DuVall S.L. Unlocking echocardiogram measurements for heart disease research through natural language processing. BMC Cardiovasc Disord. 2017;17(1):151. doi: 10.1186/s12872-017-0580-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Leng C. eHOST annotation tool. 2011. https://github.com/chrisleng/ehost
- 17.Eyre H., Chapman A.B., Peterson K.S., et al. Launching into clinical space with medspaCy: a new clinical text processing toolkit in Python. AMIA Annu Symp Proc. 2022;2021:438–447. [PMC free article] [PubMed] [Google Scholar]
- 18.Harkema H., Dowling J.N., Thornblade T., Chapman W.W. ConText: an algorithm for determining negation, experiencer, and temporal status from clinical reports. J Biomed Inform. 2009;42(5):839–851. doi: 10.1016/j.jbi.2009.05.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Pedregosa F., Varoquaux G., Gramfort A., et al. Scikit-learn: machine learning in python. J Mach Learn Res. 2011;12:2825–2830. [Google Scholar]
- 20.The Pandas Development Team pandas-dev/pandas: Pandas. Zenodo. Preprint posted online February 2020. [DOI]
- 21.Glotzbach J.P., Hanson H.A., Tonna J.E., et al. Familial associations of prevalence and cause-specific mortality for thoracic aortic disease and bicuspid aortic valve in a large-population database. Circulation. 2023;148(8):637–647. doi: 10.1161/CIRCULATIONAHA.122.060439. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Lim M.S., Strange G., Playford D., Stewart S., Celermajer D.S. Characteristics of bicuspid aortic valve disease and stenosis: the national echo database of Australia. J Am Heart Assoc. 2021;10(17) doi: 10.1161/JAHA.121.020785. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Yoon S.H., Kim W.K., Dhoble A., et al. Bicuspid aortic valve morphology and outcomes after transcatheter aortic valve replacement. J Am Coll Cardiol. 2020;76(9):1018–1030. doi: 10.1016/j.jacc.2020.07.005. [DOI] [PubMed] [Google Scholar]
- 24.Song S., Seo J., Cho I., Hong G.R., Ha J.W., Shim C.Y. Progression and outcomes of non-dysfunctional bicuspid aortic valve: longitudinal data from a large Korean bicuspid aortic valve registry. Front Cardiovasc Med. 2021;7 doi: 10.3389/fcvm.2020.603323. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Nelson S., Yin Y., Trujillo Rivera E.A., et al. Are ICD codes reliable for observational studies? Assessing coding consistency for data quality. Digit Health. 2024;10 doi: 10.1177/20552076241297056. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
a



