Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2022 May 10.
Published in final edited form as: Stud Health Technol Inform. 2021 Nov 18;287:89–93. doi: 10.3233/SHTI210822

Automated Modeling of Clinical Narrative with High Definition Natural Language Processing using Solor and Analysis Normal Form

Melissa P RESNICK a,1, Frank LeHOUILLIER a, Steven H BROWN b, Keith E CAMPBELL b, Diane MONTELLA b, Peter L ELKIN a,b,c,d
PMCID: PMC9088023  NIHMSID: NIHMS1796116  PMID: 34795088

Abstract

Objective:

One important concept in informatics is data which meets the principles of Findability, Accessibility, Interoperability and Reusability (FAIR). Standards, such as terminologies (findability), assist with important tasks like interoperability, Natural Language Processing (NLP) (accessibility) and decision support (reusability). One terminology, Solor, integrates SNOMED CT, LOINC and RxNorm. We describe Solor, HL7 Analysis Normal Form (ANF), and their use with the high definition natural language processing (HD-NLP) program.

Methods:

We used HD-NLP to process 694 clinical narratives prior modeled by human experts into Solor and ANF. We compared HD-NLP output to the expert gold standard for 20% of the sample. Each clinical statement was judged “correct” if HD-NLP output matched ANF structure and Solor concepts, or “incorrect” if any ANF structure or Solor concepts were missing or incorrect. Judgements were summed to give totals for “correct” and “incorrect”.

Results:

113 (80.7%) correct, 26 (18.6%) incorrect, and 1 error. Inter-rater reliability was 97.5% with Cohen’s kappa of 0.948.

Conclusion:

The HD-NLP software provides useable complex standards-based representations for important clinical statements designed to drive CDS.

Keywords: Natural Language Processing, Interoperability, Clinical Decision Support, Controlled Terminology

1. Introduction

Technical (syntactic) interoperability addresses how computers exchange data. This is accomplished with messaging protocols and data formats. Semantic interoperability builds upon syntactic interoperability and addresses how computers interpret meaning of data. Semantic interoperability allows EHRs to unambiguously and consistently determine meaning of the data for data presentation and decision support.

Terminologies, such as the Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT), are used as data encoding standards in informatics and contribute to semantic interoperability. These terminologies can also provide a foundation for other tasks such as knowledge management, data integration, and decision support1. Three of the most commonly used terminology standards are SNOMED CT, LOINC and RxNorm, each with particular strengths. Increasingly, there is interest in combining these partially overlapping standards to enhance clinical expressivity. Solor is an integrated terminology system created in collaboration with the U.S. Veterans Affairs (VA)2 that combines SNOMED CT (representing diseases, findings, and procedures), Logical Observation Identifiers, Names, and Codes (representing laboratory test results), and RxNorm (representing medications)2,3.

1.1. Solor

The Solor terminology layer builds primarily upon SNOMED CT, RxNorm, and LOINC by integrating their content and semantics, and normalizing the means to identify and version components, lexically search, logically define, semantically retrieve, and collaboratively extend. The potential advantages of a computable approach enabled by combining SNOMED CT, LOINC, and RxNorm into a single consistent suite for encoding clinical knowledge and data are clear; clinical data can flow among clinical documentation, decision support applications, and order entry at the point of care. This single consistent method of encoding clinical data can also support research, quality measurement, and other secondary uses.

Solor has two fundamental building blocks: concepts and semantics4. A concept is defined as an idea or a medically related idea, such as heart attack4. These ideas also include a synonym or a fully specified name. A semantic is data that provides contextual meaning to the concept4.

Like SNOMED CT, Solor is built on a logic model3. Most of the terms are shared by Solor and SNOMED CT and these concepts are arranged into hierarchies using “is a” relationships4. Each concept has at least one “is a” relationship, except for the top level concepts, which are the most general concepts. Thus, due to the “is a” relationships, one can traverse the hierarchies from the general concepts to more specific concepts.

One goal of using Solor is to improve interoperability. Interoperability of EHR data is critical for clinical decision support. Given that health care for an individual is often delivered by more than one health provider, integration of data from multiple health providers is needed to view the complete health record. To achieve interoperability, clinical systems must understand both the structure (syntax) and meaning (semantics) of the clinical information being exchanged3. Without these two features, information may be viewable by humans, but not integrated for clinical decision support3. One way for providing interoperability is through the use of standards, such as Solor. Solor allows for interoperability by providing structure and meaning to the patient data being exchanged between health providers.

1.2. ANF

Analysis Normal Form (ANF) is a type of highly regularized small information model that is designed to be independent of the content of the clinical statement. For example, a single ANF “performance statement” model can be used to describe any action that has previously been performed, and – if applicable - the results of that action. Broad classes of actions are represented identically including observations of presence or absence of a clinical phenomenon, undergoing a procedure, or the administration of a medication.

The goal of ANF is to provide a simple, consistent and highly re-usable information model for clinical statements. This makes it easier for analysts to understand the data and how it is stored than requiring knowledge of hundreds or thousands of statement-specific specializations. It also helps to ensure that the data can be expressed in an operable and scalable way. The more that data is normalized, the simpler it will become to analyze, and the likelihood of analysis errors will be reduced. ANF represents clinical data for data analyst’s purposes, not in a way we may choose to display the data for a clinician4. ANF was approved as an HL7 informational ballot in 20195.

1.3. HD-NLP

High Definition Natural Language Processing (HD-NLP) is a pipeline developed at the University at Buffalo, which evolved from the HTP-NLP work from UB. The system uses a full semantic parse in memory and then uses an encoder to link text to any set of Ontologies which a user wants to use to represent the knowledge in the free text being codified. Each entity is tagged as an affirmed, negative or uncertain assertion and each is further tagged with a date time stamp. We then automatically generate compositional expressions where applicable in the source text using the semantic relations available in the ontologies being encoded. We add the metadata from the record using the analysis normal form standard and link it to the information stored from computing over the input string.

HD-NLP uses several sources of synonymy, kept in separate synonym sets (synsets) which are available for interrogation to understand why certain results were obtained. The system is architected so that the input queries come to an input queue and then they are processed and sent to an output queue where each job can then be picked up by the user. This is available as a web service. The service can provide Solor and ANF output but also can limit its search to the source ontologies (SNOMED CT, RxNorm, and LOINC). We made use of the HD-NLP to rapidly assign terminology concepts to text in patient records or KNART (knowledge artifact) input text 68. A level of syntactic processing was required to match text with ontological terms. The linguistic representation is specified in language models. Of primary concern here was an English language model to identify sentences, phrases, words, and parts of speech. Terms from Solor and its source ontologies were then assigned to spans of text.

2. Methods

Narrative clinical statements designed for clinical decision support numbering 694 were obtained from the VA KNART project to create clinical content using the HL7 Knowledge Artifact specification9. Each of these clinical statements were previously assigned Solor concepts in ANF structures by experienced human modelers.

We used HD-NLP to algorithmically assign terminology concepts to text in KNART input text. HD-NLP program output consisted of the input narrative clinical statements and the corresponding ANF/Solor models.

Authors MR and PE reviewed 140 (20%) randomly selected narrative clinical statements with their corresponding HD-NLP outputs and compared them to the human modeled “gold standard”. Each narrative clinical statement was judged as “correct” if the HD-NLP output matched the human modeled Solor concepts and human modeled ANF structure. The output was judged as “incorrect” if either the Solor concepts or ANF structure were missing or incorrect. These were then summed to give a total for “correct” and “incorrect” respectively. Forty of 140 elements were double reviewed and conflicts were resolved by consensus. A kappa interrater reliability statistic was calculated.

3. Results

Of the 140 HD-NLP outputs containing both Solor concepts and ANF structures we found: 26 (18.6%) incorrect outputs, 113 (80.7%) correct outputs, and 1 error. The error was due to the fact that there was no output for that single record, for some unknown reason. Incorrect was triggered mainly by missing Solor concepts in the HD-NLP output. In some cases modifier concepts, such as “alcohol” in “alcohol abuse,” “former” in “former illicit substance use,” and “cup-to-disc” in “cup-to-disc ratio” were missed. In a couple of cases the output was completely incorrect. For example, the input read “polyp cytology shows high-grade dysplasia,” while the output read “polyp aplasia.” In addition, difficulties were seen with laterality. For instance, the concept “left” in the input was represented as “left to right” in the HD-NLP output. Despite these difficulties, the HD-NLP output was correct in most cases. These include such examples as follows: (1) input as “regular menstrual cycle” and output as “regular periods,” (2) input as “patient gender is female” and output as “patient sex female girl,” and (3) input as “cognitive impairment” and output as “cognitive impairment.” Inter-rater reliability was 97.5% with a Cohen’s Kappa of 0.948.

4. Discussion

We believe further improvements are possible and needed. This includes improvement in such items as: (1) missing or bad synonymy, and (2) bad laterality mappings.

Solor, a formal integration of SNOMED CT, LOINC and RxNorm, represents a significant advance towards semantic interoperability and health information exchange. In addition, it will improve the findability of important clinical statements. The Analysis Normal Form brings standardization to the small information models needed to complete a clinical statement, while enhancing consistency and reducing complexity.

Natural language processing with HD-NLP can provide access to data by mapping clinical utterances in notes and reports to clinical statements, which are reused for clinical decision support. By modeling the KNARTS with Solor and ANF using the HD-NLP system, we can provide a representation that will match the HD-NLP derived data from clinical notes and reports that can then be used to trigger clinical decision support rules. In addition, we expect that HD-NLP can reduce coding burdens on clinicians during data entry, providing well-coded structured data for CDS. This partnership between standards and technology can assist our ability to make practical clinical decision support which may otherwise require duplicate and structured data entry. The more seamless our CDS implementations are, the more they will be easily implemented and shared, fulfilling the important FAIR principle in informatics.

5. Conclusions

Solor integrates SNOMED CT, LOINC, and RxNorm, not merely by just combining these terminologies, but by using an underlying logic model, improving semantic interoperability. This provides improved findability and reusability of data for clinical decision support. ANF further improves interoperability by providing simple and consistent structure to deliver terminological payload as statement models about patients. The HD-NLP software provides access to important clinical statements, which are required to drive CDS. By using this pipeline in a FAIR manner, we can improve the safety and efficacy of the healthcare that we provide for our patients.

6. Acknowledgments

This work has been supported in part by grants from NIH NLM T15LM012595, NIAAA R21AA026954, NIAAA R33AA026954, and NCATS UL1TR001412. This study was funded in part by the U.S. Department of Veterans Affairs.

References

RESOURCES