Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2023 Dec 20.
Published in final edited form as: Proceedings (IEEE Int Conf Bioinformatics Biomed). 2020 Feb 6;2019:1177–1178. doi: 10.1109/bibm47256.2019.8983220

Ontology of Consumer Health Vocabulary: providing a formal and interoperable semantic resource for linking lay language and medical terminology

Muhammad Amith 1, Kirk Roberts 2, Cui Tao 3, Licong Cui 4, Hua Xu 5
PMCID: PMC10732699  NIHMSID: NIHMS1931475  PMID: 38125584

Abstract

The Consumer Health Vocabulary has been an important contribution to the health informatics field since its introduction in 2006. Many studies have utilized the vocabulary for various scientific research to bridge the gap between consumers and health experts. Given the flat file format of the Consumer Health Vocabulary dataset, we developed a SKOS-based ontology of the dataset. As an ontology, this dataset can be semantically linked to other resources to provide consumer-level meaning. In addition with this artifact, we plan to further expand the terminology.

Keywords: Taxonomy, Terminology, Biomedical Informatics, Semantic Web

I. Introduction

Consumer health informatics is defined as “the field devoted to informatics from multiple consumer or patient views” [1]. One important area of research from this domain is the communication gap resulting from misunderstanding of expert level terminology that is communicated to patients. One solution is to link synonymous consumer terms (or “lay” terms) with medical terminology in various modalities and tasks, such as health care applications, medical records, and information retrieval endeavors [2].

In 2006, Zeng and Tse generated the Consumer Health Vocabulary (CHV) that assigns UMLS with similar consumer language [2]. The vocabulary was previously hosted on the University of Utah website, http://consumerhealthvocab.chpc.utah.edu/CHVwiki/, and was available in a flat file format. From its inception, it has been an important resource for many previous studies. Currently, the site is no longer available.

Ontologies are expressive terminological artifacts that link domain concepts to data using languages like Web Ontology Language (OWL2) [3] and Resource Description Format (RDF) [4]. Simple Knowledge Organization System (SKOS) is a categorical scheme for controlled vocabularies using RDF to design a standard vocabulary [5]. Essentially, Concepts in SKOS represent a thought or idea. The Concept class is annotated with alternative labels and has links to other Concepts indicating their semantic relationship.

To sustain this work and further enhance the CHV for interoperability, we propose to align the CHV data to SKOS and transform the data to an ontology artifact for reuse and distribution, called the Ontology of Consumer Health Vocabulary (OCHV).

II. Method

To create the Ontology of Consumer Health Vocabulary, we used the last updated Consumer Health Vocabulary dataset, dated from 2011. This last version has 158,508 UMLS terms that were either noted as formal medical terms or consumer language, and each term has an associated UMLS CUI identifier.

We developed Java software code to extract the data from CHV and organize it using the SKOS scheme. The software code utilizes a combination of OWL API (v.5.1) [6], Super CSV (2.4) [7], and Google Guava (v.28) [8] libraries to preform this task. We utilized Super CSV to read and extract the data from the flat files. With OWL API, we organized the terms and linked them according to the SKOS categorical scheme, and we also added some sub-classes for the core Concept of SKOS.

Figure 1 shows the abstract model to organize the OCHV. Mentioned earlier, we created two subclasses for Concept - CHV Concept and UMLS Concept. Each of these sub-concepts indicate whether the concepts in CHV was a formal UMLS term and/or lay consumer term, and each had a preferred label assigned to each. In addition, we also annotated the alternative labels for CHV Concepts that have supplemental terms associated. Between the CHV Concepts and UMLS Concepts, there are several concepts that are linked using SKOS’ has related match property to identify their semantic relatedness.

Fig. 1.

Fig. 1.

Class level structure of the Ontology of Consumer Health Vocabulary (OCHV).

After the processing procedures described, the ontology is outputted in an OWL2 file format. The Ontology of Consumer Heath Vocabulary contains 115,645 classes and 18 properties. This draft version of the Ontology of Consumer Health Vocabulary [9], is available at the National Center for Biomedical Ontology BioPortal.

With the UMLS code and their associated lay term, we can potentially extend various ontologies and terminologies with consumer language, and lead towards resources and tools that can be consumer-friendly. We intend to continue developing the OCHV, which includes expanding the terminology with new lay terms that may have emerged since 2011.

III. Conclusion

We present a representation of the Consumer Health Vocabulary as an ontology called the Ontology of Consumer Health Vocabulary (OCHV). This ontology encodes data from the original CHV dataset that was previously available on http://consumerhealthvocab.chpc.utah.edu. Within OCHV, each UMLS term is linked with the consumer language equivalent. With the ever-evolving consumer language, our next goal is to mine consumer corpora for additional consumer terms that can populate OCHV. OCHV is currently hosted on the BioPortal website, https://bioportal.bioontology.org/ontologies/OCHV.

Acknowledgment

Research was supported by the National Library of Medicine of the National Institutes of Health under Award Numbers R01LM011829 and R00LM012104, and the National Institute of Allergy and Infectious Diseases of the National Institutes of Health under Award Number R01AI130460.

Contributor Information

Muhammad Amith, School of Biomedical Informatics, University of Texas Health Science Center at Houston Houston, TX, United States.

Kirk Roberts, School of Biomedical Informatics University of Texas Health Science Center at Houston Houston, TX, United States.

Cui Tao, School of Biomedical Informatics University of Texas Health Science Center at Houston Houston, TX, United States.

Licong Cui, School of Biomedical Informatics University of Texas Health Science Center at Houston Houston, TX, United States.

Hua Xu, School of Biomedical Informatics University of Texas Health Science Center at Houston Houston, TX, United States.

References

RESOURCES