Abstract
Objective:
One important concept in informatics is data which meets the principles of Findability, Accessibility, Interoperability and Reusability (FAIR). Standards, such as terminologies (findability), assist with important tasks like interoperability, Natural Language Processing (NLP) (accessibility) and decision support (reusability). One terminology, Solor, integrates SNOMED CT, LOINC and RxNorm. We describe Solor, HL7 Analysis Normal Form (ANF), and their use with the high definition natural language processing (HD-NLP) program.
Methods:
We used HD-NLP to process 694 clinical narratives prior modeled by human experts into Solor and ANF. We compared HD-NLP output to the expert gold standard for 20% of the sample. Each clinical statement was judged “correct” if HD-NLP output matched ANF structure and Solor concepts, or “incorrect” if any ANF structure or Solor concepts were missing or incorrect. Judgements were summed to give totals for “correct” and “incorrect”.
Results:
113 (80.7%) correct, 26 (18.6%) incorrect, and 1 error. Inter-rater reliability was 97.5% with Cohen’s kappa of 0.948.
Conclusion:
The HD-NLP software provides useable complex standards-based representations for important clinical statements designed to drive CDS.
Keywords: Natural Language Processing, Interoperability, Clinical Decision Support, Controlled Terminology
1. Introduction
Technical (syntactic) interoperability addresses how computers exchange data. This is accomplished with messaging protocols and data formats. Semantic interoperability builds upon syntactic interoperability and addresses how computers interpret meaning of data. Semantic interoperability allows EHRs to unambiguously and consistently determine meaning of the data for data presentation and decision support.
Terminologies, such as the Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT), are used as data encoding standards in informatics and contribute to semantic interoperability. These terminologies can also provide a foundation for other tasks such as knowledge management, data integration, and decision support1. Three of the most commonly used terminology standards are SNOMED CT, LOINC and RxNorm, each with particular strengths. Increasingly, there is interest in combining these partially overlapping standards to enhance clinical expressivity. Solor is an integrated terminology system created in collaboration with the U.S. Veterans Affairs (VA)2 that combines SNOMED CT (representing diseases, findings, and procedures), Logical Observation Identifiers, Names, and Codes (representing laboratory test results), and RxNorm (representing medications)2,3.
1.1. Solor
The Solor terminology layer builds primarily upon SNOMED CT, RxNorm, and LOINC by integrating their content and semantics, and normalizing the means to identify and version components, lexically search, logically define, semantically retrieve, and collaboratively extend. The potential advantages of a computable approach enabled by combining SNOMED CT, LOINC, and RxNorm into a single consistent suite for encoding clinical knowledge and data are clear; clinical data can flow among clinical documentation, decision support applications, and order entry at the point of care. This single consistent method of encoding clinical data can also support research, quality measurement, and other secondary uses.
Solor has two fundamental building blocks: concepts and semantics4. A concept is defined as an idea or a medically related idea, such as heart attack4. These ideas also include a synonym or a fully specified name. A semantic is data that provides contextual meaning to the concept4.
Like SNOMED CT, Solor is built on a logic model3. Most of the terms are shared by Solor and SNOMED CT and these concepts are arranged into hierarchies using “is a” relationships4. Each concept has at least one “is a” relationship, except for the top level concepts, which are the most general concepts. Thus, due to the “is a” relationships, one can traverse the hierarchies from the general concepts to more specific concepts.
One goal of using Solor is to improve interoperability. Interoperability of EHR data is critical for clinical decision support. Given that health care for an individual is often delivered by more than one health provider, integration of data from multiple health providers is needed to view the complete health record. To achieve interoperability, clinical systems must understand both the structure (syntax) and meaning (semantics) of the clinical information being exchanged3. Without these two features, information may be viewable by humans, but not integrated for clinical decision support3. One way for providing interoperability is through the use of standards, such as Solor. Solor allows for interoperability by providing structure and meaning to the patient data being exchanged between health providers.
1.2. ANF
Analysis Normal Form (ANF) is a type of highly regularized small information model that is designed to be independent of the content of the clinical statement. For example, a single ANF “performance statement” model can be used to describe any action that has previously been performed, and – if applicable - the results of that action. Broad classes of actions are represented identically including observations of presence or absence of a clinical phenomenon, undergoing a procedure, or the administration of a medication.
The goal of ANF is to provide a simple, consistent and highly re-usable information model for clinical statements. This makes it easier for analysts to understand the data and how it is stored than requiring knowledge of hundreds or thousands of statement-specific specializations. It also helps to ensure that the data can be expressed in an operable and scalable way. The more that data is normalized, the simpler it will become to analyze, and the likelihood of analysis errors will be reduced. ANF represents clinical data for data analyst’s purposes, not in a way we may choose to display the data for a clinician4. ANF was approved as an HL7 informational ballot in 20195.
1.3. HD-NLP
High Definition Natural Language Processing (HD-NLP) is a pipeline developed at the University at Buffalo, which evolved from the HTP-NLP work from UB. The system uses a full semantic parse in memory and then uses an encoder to link text to any set of Ontologies which a user wants to use to represent the knowledge in the free text being codified. Each entity is tagged as an affirmed, negative or uncertain assertion and each is further tagged with a date time stamp. We then automatically generate compositional expressions where applicable in the source text using the semantic relations available in the ontologies being encoded. We add the metadata from the record using the analysis normal form standard and link it to the information stored from computing over the input string.
HD-NLP uses several sources of synonymy, kept in separate synonym sets (synsets) which are available for interrogation to understand why certain results were obtained. The system is architected so that the input queries come to an input queue and then they are processed and sent to an output queue where each job can then be picked up by the user. This is available as a web service. The service can provide Solor and ANF output but also can limit its search to the source ontologies (SNOMED CT, RxNorm, and LOINC). We made use of the HD-NLP to rapidly assign terminology concepts to text in patient records or KNART (knowledge artifact) input text 6–8. A level of syntactic processing was required to match text with ontological terms. The linguistic representation is specified in language models. Of primary concern here was an English language model to identify sentences, phrases, words, and parts of speech. Terms from Solor and its source ontologies were then assigned to spans of text.
2. Methods
Narrative clinical statements designed for clinical decision support numbering 694 were obtained from the VA KNART project to create clinical content using the HL7 Knowledge Artifact specification9. Each of these clinical statements were previously assigned Solor concepts in ANF structures by experienced human modelers.
We used HD-NLP to algorithmically assign terminology concepts to text in KNART input text. HD-NLP program output consisted of the input narrative clinical statements and the corresponding ANF/Solor models.
Authors MR and PE reviewed 140 (20%) randomly selected narrative clinical statements with their corresponding HD-NLP outputs and compared them to the human modeled “gold standard”. Each narrative clinical statement was judged as “correct” if the HD-NLP output matched the human modeled Solor concepts and human modeled ANF structure. The output was judged as “incorrect” if either the Solor concepts or ANF structure were missing or incorrect. These were then summed to give a total for “correct” and “incorrect” respectively. Forty of 140 elements were double reviewed and conflicts were resolved by consensus. A kappa interrater reliability statistic was calculated.
3. Results
Of the 140 HD-NLP outputs containing both Solor concepts and ANF structures we found: 26 (18.6%) incorrect outputs, 113 (80.7%) correct outputs, and 1 error. The error was due to the fact that there was no output for that single record, for some unknown reason. Incorrect was triggered mainly by missing Solor concepts in the HD-NLP output. In some cases modifier concepts, such as “alcohol” in “alcohol abuse,” “former” in “former illicit substance use,” and “cup-to-disc” in “cup-to-disc ratio” were missed. In a couple of cases the output was completely incorrect. For example, the input read “polyp cytology shows high-grade dysplasia,” while the output read “polyp aplasia.” In addition, difficulties were seen with laterality. For instance, the concept “left” in the input was represented as “left to right” in the HD-NLP output. Despite these difficulties, the HD-NLP output was correct in most cases. These include such examples as follows: (1) input as “regular menstrual cycle” and output as “regular periods,” (2) input as “patient gender is female” and output as “patient sex female girl,” and (3) input as “cognitive impairment” and output as “cognitive impairment.” Inter-rater reliability was 97.5% with a Cohen’s Kappa of 0.948.
4. Discussion
We believe further improvements are possible and needed. This includes improvement in such items as: (1) missing or bad synonymy, and (2) bad laterality mappings.
Solor, a formal integration of SNOMED CT, LOINC and RxNorm, represents a significant advance towards semantic interoperability and health information exchange. In addition, it will improve the findability of important clinical statements. The Analysis Normal Form brings standardization to the small information models needed to complete a clinical statement, while enhancing consistency and reducing complexity.
Natural language processing with HD-NLP can provide access to data by mapping clinical utterances in notes and reports to clinical statements, which are reused for clinical decision support. By modeling the KNARTS with Solor and ANF using the HD-NLP system, we can provide a representation that will match the HD-NLP derived data from clinical notes and reports that can then be used to trigger clinical decision support rules. In addition, we expect that HD-NLP can reduce coding burdens on clinicians during data entry, providing well-coded structured data for CDS. This partnership between standards and technology can assist our ability to make practical clinical decision support which may otherwise require duplicate and structured data entry. The more seamless our CDS implementations are, the more they will be easily implemented and shared, fulfilling the important FAIR principle in informatics.
5. Conclusions
Solor integrates SNOMED CT, LOINC, and RxNorm, not merely by just combining these terminologies, but by using an underlying logic model, improving semantic interoperability. This provides improved findability and reusability of data for clinical decision support. ANF further improves interoperability by providing simple and consistent structure to deliver terminological payload as statement models about patients. The HD-NLP software provides access to important clinical statements, which are required to drive CDS. By using this pipeline in a FAIR manner, we can improve the safety and efficacy of the healthcare that we provide for our patients.
6. Acknowledgments
This work has been supported in part by grants from NIH NLM T15LM012595, NIAAA R21AA026954, NIAAA R33AA026954, and NCATS UL1TR001412. This study was funded in part by the U.S. Department of Veterans Affairs.
References
- 1.Bodenreider O Biomedical ontologies in action: role in knowledge management, data integration and decision support. Yearb Med Inform Published online 2008:67–79. [PMC free article] [PubMed] [Google Scholar]
- 2.Resnick MP, Brown SH, Campbell KE, Montella D, LeHouillier F, Elkin PL. Turning Data into Information: Evaluation of SOLOR Poster presented at the: AMIA annual symposium; November 2020. Accessed March 9, 2021. https://knowledge.amia.org/72332-amia-1.4602255/t005-1.4604904/t005-1.4604905/3416966-1.4605215/3416966-1.4605216?qr=1 [Google Scholar]
- 3.Staes C, Campbell K. From Retrospective Mapping to Prospective Standardization: A Comparison of Integration Strategies to Achieve Semantic Data Interoperability Department of Veterans Affairs,Veterans Health Administration (VHA) Office of Informatics and Analytics (OIA) Knowledge Based Systems (KBS); 2017:26. Accessed March 8, 2021. http://solor.io/wp-content/uploads/2017/12/White-paper_Achieving-semantic-data-interoperability.pdf [Google Scholar]
- 4.Sujansky W ISAAC’s KOMET and Solor - A Treatise on Symbolic Data Systems; 2019:194. http://solor.io/wp-content/uploads/2019/02/symbolic-information-analytics-20190226.pdf [Google Scholar]
- 5.Singnureanu I HL7 Informative Ballot HL7 CIMI Logical Model: Analysis Normal Form (ANF), Release 1; 2019:130. http://solor.io/wp-content/uploads/2019/08/ANF_Ballot_20190819.pdf [Google Scholar]
- 6.Elkin PL, Froehling DA, Wahner-Roedler DL, Brown SH, Bailey KR. Comparison of natural language processing biosurveillance methods for identifying influenza from encounter notes. Ann Intern Med 2012;156(1 Pt 1):11–18. doi: 10.7326/0003-4819-156-1-201201030-00003 [DOI] [PubMed] [Google Scholar]
- 7.Murff HJ, FitzHenry F, Matheny ME, et al. Automated identification of postoperative complications within an electronic medical record using natural language processing. JAMA 2011;306(8):848–855. doi: 10.1001/jama.2011.1204 [DOI] [PubMed] [Google Scholar]
- 8.Schlegel DR, Crowner C, Lehoullier F, Elkin PL. HTP-NLP: A New NLP System for High Throughput Phenotyping. Stud Health Technol Inform 2017;235:276–280. [PMC free article] [PubMed] [Google Scholar]
- 9.HL7 Standards Product Brief - HL7 Standard: Clinical Decision Support Knowledge Artifact Specification, Release 1.3 | HL7 International Accessed March 9, 2021. http://www.hl7.org/implement/standards/product_brief.cfm?product_id=337