Table 4.
Portfolio of open-source NLP tools and datasets applied in cardiology contexts
Name and Origin | Description | Accessibility |
---|---|---|
NLP tools | ||
clinical Text Analysis and Knowledge Extraction System (cTakes); Mayo Clinic | A modular pipeline of components using both rule-based and machine learning methods to support information extraction; based on UIMA (Unstructured Information Management Architecture) standards. | Open-source at http://www.ohnlp.org |
EchoExtractor; Veterans Affairs | An application which extracts Concept-Value pairs for metrics measured during an echocardiogram study. | Open-source at https://github.com/department-of-veterans-affairs/EchoExtractor |
Leo; Veterans Affairs Informatics and Computing Infrastructure (VINCI) | A set of services and libraries that leverages UIMA standards to enable rapid creation and deployment of NLP analysis tools and incorporation of previously developed tools. | Open-source at https://department-of-veterans-affairs.github.io/Leo/userguide.html |
MedTagger; Mayo Clinic | A set of tools developed for indexing based on dictionaries, information extraction based on patterns, and machine learning-based named entity recognition to support information extraction; based on UIMA standards. | Open-source at https://github.com/OHNLP/MedTagger |
pyConText; University of Utah | A Python implementation of ConText, a simple text processing algorithm for identifying a large number of features and relationships between features. | Open-source at https://pypi.org/project/pyConTextNLP/0.6.0.5/ |
semEHR; King’s College London, UK | A general-purpose search and analytics tool that processes heterogeneous data sources, covers a range of biomedical concepts, and captures context to support information extraction in study-specific or case-specific contexts. | Open-source at https://github.com/CogStack/CogStack-SemEHR |
Datasets | ||
The Medical Information Mart for Intensive Care III (MIMIC III), Massachusetts Institute of Technology | Deidentified, freely available, critical care database of over 60,000 intensive care unit admissions. | https://mimic.mit.edu/ |
Electronic Medical Records and Genomics (eMERGE) network, National Human Genome Research Institute (NHGRI) | Combines DNA biorepositories with EHR data from several clinical sites nationally, and has been extensively used to develop phenotyping algorithms. | https://emerge-network.org/ |
Integrating Biology and the Bedside (i2b2), Partners Healthcare | A dataset of deidentified patient discharge summaries made available for research purposes. | https://www.i2b2.org/ |