Skip to main content
AMIA Annual Symposium Proceedings logoLink to AMIA Annual Symposium Proceedings
. 2026 Feb 14;2025:433–442.

Identifying Missing IS-A Relations in SNOMED CT with Fine-Tuned Pre-trained Language Models and Non-lattice Subgraphs

Xubing Hao 1, Rashmie Abeysinghe 1, Jay Shi 2, Guo-Qiang Zhang 1, Licong Cui 3,*
PMCID: PMC12919620  PMID: 41726504

Abstract

Ensuring the completeness of IS-A relations in SNOMED CT is crucial for maintaining its accuracy in clinical applications. In this study, we propose a hybrid approach leveraging non-lattice subgraphs and pre-trained language models (PLMs) to identify missing IS-A relations in SNOMED CT. We fine-tuned four BERT-based models: BERT, DistillBERT, DeBERTa, and BioClinicalBERT, and four generative large language models (LLMs): BioMistral, Llama3, Gemma2, and Phi-4. Missing IS-A relations were identified through consensus predictions by all eight models. De-BERTa achieved the best performance (precision: 0.96, recall: 0.97, F1-score: 0.965) for IS-A relation prediction. Our approach identified 678 potential missing IS-A relations in SNOMED CT (March 2023 US Edition), of which 100 randomly selected cases were manually reviewed by a domain expert, confirming 93 as valid (93% precision). These results demonstrate the effectiveness of fine-tuned PLMs in detecting missing IS-A relations within non-lattice subgraphs, offering a promising avenue for improving SNOMED CT’s quality.

1. Introduction

SNOMED CT (Systematized Nomenclature of Medicine - Clinical Terms) is a comprehensive, multilingual clinical healthcare terminology that is used worldwide1. Developed and maintained by SNOMED International, SNOMED CT provides a standardized vocabulary for representing clinical concepts, enabling semantic interoperability and effective data exchange across different healthcare systems. It facilitates accurate documentation, retrieval, and analysis of clinical data in electronic health records (EHRs), and can be implemented in various clinical applications including clinical decision support systems, interoperability frameworks, and reporting tools2.

Despite its extensive coverage and widespread adoption, SNOMED CT hierarchy is imperfect containing quality issues such as missing or erroneous IS-A relations. Such hierarchical inconsistencies can critically impact downstream applications that rely on accurate concept classification and retrieval, such as patient cohort identification, clinical data analysis, and decision support system. For example, missing IS-A relations reduce the recall and inaccurate IS-A relations lower the precision of cohort queries3. Although curators continuously seek to improve the accuracy and comprehensiveness of biomedical ontologies including SNOMED CT, errors and inconsistencies remain inevitable due to the rapid expansion and evolution of biomedical knowledge4. As a result, it is essential to develop effective methods for identifying and resolving these defects to enhance the reliability and usability of SNOMED CT.

In recent years, pre-trained language models (PLMs) have achieved great advance in natural language processing (NLP), demonstrating their ability to capture complex language patterns across various domains of knowledge5. Usage of PLMs has increased dramatically, with models such as BERT (Bidirectional Encoder Representations from Transformers), GPT (Generative Pre-trained Transformer), and LLaMA (Large Language Model Meta Artificial Intelligence) receiving significant attention for NLP tasks such as text classification, summarization, and question answering68. Notably, PLMs have shown to be effective in biomedical terminology research from various aspects such as quality assurance9, ontology matching10, and ontology learning11.

Driven by the capabilities of PLMs, this study seeks to investigate their potential to identify missing IS-A relations within non-lattice subgraphs of SNOMED CT, targeted graph fragments potentially containing quality issues. We fine-tune eight state-of-the-art PLMs, four being BERT-variants: BERT, DistillBERT, DeBERTa, and BioClinicalBERT and the other four being generative large language models (LLMs): BioMistral, Llama3, Gemma2, and Phi-4. Potential missing IS-A relations are identified through a consensus prediction strategy from all eight models. A newer release of SNOMED CT is leveraged to evaluate the suggestions. In addition, a randomly selected subset of these suggestions is manually evaluated by a domain expert.

2. Background

2.1. SNOMED CT

SNOMED CT comprises more than 360,000 active concepts, with each concept representing a unique clinical meaning1. Each concept in SNOMED CT is assigned a unique concept ID and a fully specified name (FSN) that unambiguously conveys its meaning such as “Cardiac flow imaging (procedure)”, where a semantic tag is placed in parentheses at the end of the FSN12. Concepts in SNOMED CT are hierarchically organized in a directed acyclic graph (DAG) of IS-A relations, where a concept may have one or more parent concepts. SNOMED CT contains 19 top-level sub-hierarchies, including “Clinical finding”, “Procedure”, “Body Structure”, “Observable Entity”, “Organism”, and “Pharmaceutical/Biological Product13.

2.2. Non-lattice subgraphs

A lattice is defined as a specific type of DAG in which any two nodes have a unique maximal common descendant and a unique minimal common ancestor1417. Here a maximal common descendant of two nodes is a node that is a descendant of both, and no other common descendant is an ancestor of it; and a minimal common ancestor of two nodes is a node that is an ancestor of both, and no other common ancestor is a descendant of it. Although the lattice structure is considered a desirable structural property for a well-formed terminology that support multiple inheritance, terminologies such as SNOMED CT do not always adhere to this principle. A non-lattice pair is a pair of concepts in a terminology that share more than one maximal common descendant or minimal common ancestor. Given a non-lattice pair a = (a1, a2) and its maximal common descendants mcd(a), a non-lattice subgraph can be obtained by reversely computing the minimal common ancestors of the maximal common descendants, i.e. mca(mcd(a)), and aggregating all concepts and IS-A relations between mca(mcd(a)) and mcd(a), where mca(mcd(a)) and mcd(a) are referred to as upper bounds and lower bounds of the non-lattice subgraph respectively. The size of a non-lattice subgraph is defined as the number of concepts it contains15. Our previous studies have shown the promise of analyzing lexical patterns of non-lattice subgraphs in a terminology to automatically identify quality defects, such as missing IS-A relations and missing intermediary concepts15, 18, 19.

Figure 1(A) presents a non-lattice subgraph of size 7 from the March 2023 SNOMED CT United States (US) Edition. Concept pair “Imaging guidance procedure” and “Removal of foreign body” forms a non-lattice pair as they share two maximal common descendants, “Imaging guided removal of foreign body” and “Removal of intravascular foreign body using fluoroscopic guidance with contrast”. In this non-lattice subgraph, there is a missing IS-A relation between the concept pair “Removal of intravascular foreign body using fluoroscopic guidance with contrast” and “Imaging guided removal of foreign body” (see Figure 1(B)).

Figure 1:

Figure 1:

(A): A non-lattice subgraph of size 7 in SNOMED CT (March 2023 US Edition). (B): Missing IS-A relation (denoted in red) identified in the non-lattice subgraph.

2.3. Related work on missing IS-A relation identification

Automated approaches for identifying missing IS-A relation in biomedical terminologies include both rule-based approaches and machine learning-based approaches4. Among rule-based approaches, Quesada et al. have suggested missing relations in SNOMED CT by analyzing lexical regularities which are common word patterns shared among class labels20. Mougin et al. have developed reasoning methods and the compositional structure of concept names to identify missing IS-A relations in Gene Ontology (GO)21. Bodenreider et al. have compared the IS-A relations inferred by the ELK reasoner to the original SNOMED CT hierarchy to suggest missing IS-A relations22. In addition to non-lattice-based methods15, 18, 19, we developed an approach that compares linked and unlinked concept pairs based on shared lexical features in the Human Phenotype Ontology (HPO) to uncover missing IS-A relations23.

Among learning-based approaches, Liu et al. introduced a deep learning approach that employs Convolutional Neural Network (CNN) to uncover missing IS-A relations in the NCI thesaurus (NCIt)24. Recently, they have also investigated the performances of LLMs in predicting IS-A relations in SNOMED CT with diverse prompt templates25. Mežnar et al. have comprehensively evaluated graph-based machine learning approaches such as Graph Auto-Encoders, node2vec, and TransE to recommend missing IS-A relations26. In a previous study, we trained a Graph Convolutional Network (GCN)-based binary classifier to predict missing IS-A relations among hierarchically unrelated concept-pairs that exhibit a containment lexical pattern in SNOMED CT27.

3. Methods

In this work, we introduce a hybrid method leveraging non-lattice subgraphs and PLMs to identify missing IS-A relations in SNOMED CT (March 2023 US Edition). First, we extract non-lattice subgraphs from SNOMED CT and construct a dataset of positive and negative instances of IS-A concept pairs. We then formulate the IS-A relation prediction task in two ways: as a text classification task leveraging four BERT-variants and as a text generation task leveraging four LLMs. We design distinct inputs and prompts for the two tasks and fine-tuned all eight PLMs. The fine-tuned models are applied to the subsequent task of identifying missing IS-A relations using a consensus prediction strategy (see Figure 2).

Figure 2:

Figure 2:

An overview of the tasks for missing IS-A identification.

3.1. Data preparation

We compute all non-lattice subgraphs in SNOMED CT using ANT-LCA, an efficient, large-scale algorithm for non-lattice detection in biomedical ontologies16. Since larger non-lattice subgraphs may contain smaller ones28, in this study we focus on non-lattice subgraphs of size less than or equal to 10. We then create a dataset of positive and positive IS-A concept pairs and randomly split the dataset into training, validation and testing set based on the ratio of 8:1:1. The generation of positive and negative pairs is detailed as follows.

3.1.1. Positive instance preparation

Positive instances in our dataset consist of concept pairs that have direct IS-A relations between them, i.e., if the concept b is the parent of the concept a, then (a, b) would form a positive instance. Note that all such concept pairs in SNOMED CT are aggregated to form the positive instance set. Positive instances can be originated from either within non-lattice subgraphs or outside of non-lattice subgraphs. For instance, the concepts “Radiologic guidance procedure” and “Imaging guidance procedure” from the non-lattice subgraph in Figure 1(A) is a positive instance as they form a child-parent pair. The concepts “Leiomyosarcoma of cardioesophageal junction” and “Leiomyosarcoma of esophagus”, though outside of any non-lattice subgraphs, also is a positive instance as they form a child-parent pair.

3.1.2. Negative instance preparation

Negative instances are derived from unrelated concept pairs found in the lower and upper bounds of non-lattice sub-graphs. Specifically, for any two concepts m and n that belong to either the lower or upper bounds of a non-lattice subgraph, two negative instances (m, n) and (n, m) are created. For instance, the non-lattice subgraph in Figure 1(A) generates four negative instances: (1) (“Imaging guidance procedure”, “Removal of foreign body”); (2) (“Removal of foreign body”, “Imaging guidance procedure”); (3) (“Imaging guided removal of foreign body”, “Removal of intravascular foreign body using fluoroscopic guidance with contrast”); and (4) (“Removal of intravascular foreign body using fluoroscopic guidance with contrast”, “Imaging guided removal of foreign body”).

3.2. Experiment setup for IS-A relation prediction

In this work, we approach the IS-A relation prediction task as either a text classification task or text generation task depending on the type of model used.

Text classification is a fundamental NLP task involving categorizing text into predefined classes or labels. Specifically, we frame IS-A relation prediction as a binary classification problem where a model will categorize a given concept pair as as either forming an IS-A relation or not. The output of this approach is a probability of the concept pair belonging to each class. We investigate fine-tuning four BERT-variant PLMs: BERT (bert-base-uncased), DistillBERT (distillbert-base-uncased), DeBERTa (deberta-v3-base), and BioClinicalBERT. BERT utilizes the Transformer architecture to pre-train deep bidirectional representations, uniquely conditioning on both left and right contexts across all layers6. DistillBERT, pretrained using knowledge distillation, is a smaller, faster, and more efficient variant of BERT29. DeBERTa is an improved variant of BERT that introduces a disentangled attention mechanism and an enhanced mask decoder to better encode word content and position. BioClinicalBERT is a domain-specific adaptation of BERT pre-trained on clinical notes from MIMIC-III, a database containing electronic health records from ICU patients30.

For generative Large Language Models (LLMs), we explore a different strategy by framing the problem as a text generation task. The models are provided with instruction-based prompts containing relevant information regarding a concept pair, expecting models to generate responses that indicate the presence or absence of an IS-A relation. We investigate instruction tuning four LLMs: BioMistral (BioMistral-7B), Llama 3 (Meta-Llama-3-8B-Instruct), Gemma2 (gemma-2-9b-it), and Phi-4 (14.7B). BioMistral is an open-source LLM designed for the biomedical domain, built on Mistral and further pre-trained on PubMed Central31. Meta’s Llama 3 is a series of LLMs designed to support multilinguality, coding, reasoning, and tool usage, with its largest model featuring 405 billion parameters32. Gemma 2, incorporating advanced Transformer modifications like local-global attention and group-query attention, is a new generation of lightweight, open-source models ranging from 2 billion to 27 billion parameters33. Phi-4 is a 14 billion parameter state-of-the-art model pre-trained with high-quality synthetic and organic data along with post-training innovations34. As the responses of the models may not be a straightforward binary answer, we employ the Levenshtein Distance (LD) metric to measure the discrepancy between the model’s response and the words “Yes” and “No” to determine the final label35.

3.3. Input and prompt design

Table 1 demonstrates the input and prompt designed for the text classification and text generation tasks respectively. In the text classification task, the input is structured to include the fully specified names of the two concepts, which are delimited by a “ | “ symbol.

Table 1:

Input and prompt design for the two tasks.

Task Config. Design Example
Text classification Input {concept 1} | {concept 2} Radiologic guidance procedure | Imaging guidance procedure
Text generation Prompt ### Instruction:
Classify whether concept 1 is a subtype of concept 2 or not.
### Input:
Concept 1: {concept 1}
Concept 2: {concept 2}
### Response:
### Instruction:
Classify whether concept 1 is a subtype of concept 2 or not.
### Input:
Concept 1: Radiologic guidance procedure
Concept 2: Imaging guidance procedure
### Response:

In the text generation task, the prompt is structured to include three segments: a task instruction, an input, and a response. The task instruction defines the task to be performed by the models which is “Classify whether concept 1 is a subtype of concept 2 or not.” Similar to the text classification task, the input contains the fully specified names of the two concepts. Note that, during the instruction-tuning phase, the corresponding response (“Y” or “N”) will be included as part of the prompt. When the instruction-tuned model is applied to predict missing IS-A relations, the response is left empty.

3.4. Missing IS-A identification

As missing IS-A relations are identified among concept pairs without an existing IS-A relation, we use the negative instances in our testing set as the candidate pool for detecting potentially missing IS-A relations. All eight fine-tuned models are applied to these cases, and a negative instance is flagged as a potentially missing IS-A relation through a consensus prediction strategy. Specifically, an instance is only considered to indicate a missing IS-A relation if all models unanimously predict it as such.

3.5. Evaluation

We evaluate the performance of the model in multiple ways. Firstly, the performance for IS-A relation prediction is assessed through precision, recall, and F1-score of the predictions on the test set. To assess the effectiveness of our hybrid method for identifying potential missing IS-A relations in SNOMED CT, we conduct two types of evaluation: (1) automated evaluation using a newer release of SNOMED CT; and (2) manual evaluation by a domain expert. For the automated evaluation, we check whether the potential missing IS-A relations identified by the method have been introduced in a newer release of SNOMED CT. For the manual evaluation, we randomly select a subset of the models’ suggested potential missing IS-A relations and invite a domain expert (author JS) with experience in clinical terminology assessment to review their validity.

4. Results

4.1. Fine-tuning experiments

Our experiments were based on the March 2023 US Edition of SNOMED CT which contained 367,700 concepts. A total of 247,591 non-lattice subgraphs were extracted from this release of SNOMED CT, among which 45,769 were with a size of less than or equal to 10. Following our sample selection strategy, our dataset comprised a total of 594,291 positive instances and 325,132 negative instances. This dataset was then randomly split into 735,540 instances for training (475,433 positives and 260,106 negatives), 91,943 for validation (59,249 positives and 32,513 negatives), and 91,943 for testing (59,249 positives and 32,513 negatives).

We leveraged the Hugging Face Transformers library to fine-tune and evaluate the models used in this study36. Given the computational demands of training, validation, and testing PLMs on datasets comprising a vast number of data points, we adopted a distributed training and testing strategy using the PyTorch framework37. The experiments were conducted on a server running CentOS Linux (release 7.9.2009) with CUDA version 11.6. The models were trained on eight NVIDIA A100 graphics cards, each with 80GB of RAM. The training process utilized a learning rate of 5e-5, a batch size of 8, five training epochs, and the AdamW optimizer.

4.2. Model performance

Table 2 demonstrates the precision, recall, and F1-scores of various models investigated. These results indicated that PLMs, in general, are effective in IS-A relation prediction in SNOMED CT. Specifically, DeBERTa achieved the best performance with a precision of 0.96, a recall of 0.97, and an F1-score of 0.965. Notably, all the BERT variants outperformed generative LLMs for IS-A relation prediction.

Table 2:

Model performance in different task settings.

Model Task Config. Precision Recall F1 score
bert-base-uncased Text classification Input 0.957 0.963 0.960
distillbert-base-uncased Text classification Input 0.944 0.963 0.953
Bio ClinicalBERT Text classification Input 0.959 0.966 0.962
deberta-v3-base Text classification Input 0.960 0.970 0.965
BioMistral-7B Text generation Prompt 0.950 0.934 0.942
Meta-Llama-3-8B-Instruct Text generation Prompt 0.952 0.928 0.940
gemma-2-9b-it Text generation Prompt 0.954 0.937 0.946
phi-4 (14.7B) Text generation Prompt 0.953 0.938 0.946

4.3. Missing IS-A relation identification

After applying the fine-tuned models to the testing set and conducting the consensus prediction strategy on negative instances, we identified 678 potential missing IS-A relations in total in the March 2023 US Edition of SNOMED CT.

We evaluated these potential missing IS-A relations identified using the automated evaluation strategy leveraging the newer March 2025 US Edition of SNOMED CT. The results revealed that 52 out of 678 potential missing IS-A relations identified have been already incorporated into this newer release. Table 3 demonstrates ten examples of such instances that were automatically verified. For instance, the IS-A relation between concept “Cryotherapy to celiac plexus (procedure)” and concept “Operation on sympathetic nerve (procedure)” was missing in the March 2023 release but had been added to the March 2025 release of SNOMED CT.

Table 3:

Ten examples of missing IS-A relations that had been added to the newer release.

Descendant concept Ancestor concept
Herpes zoster auricularis (disorder) Infection of peripheral nerve (disorder)
Cryotherapy to celiac plexus (procedure) Operation on sympathetic nerve (procedure)
Aicardi Goutieres syndrome (disorder) Hereditary degenerative disease of central nervous system (disorder)
Finding related to ability to walk up stairs (finding) Finding related to ability to mobilize (finding)
Common variable agammaglobulinemia (disorder) Disorder of immune structure (disorder)
Control of hemorrhage of duodenum (procedure) Procedure on duodenum (procedure)
Delusional disorder caused by phencyclidine (disorder) Hallucinogen delusional disorder (disorder)
Cholera screening (procedure) Screening for intestinal infectious disease (procedure)
Reconstruction of annulus of cardiac valve (procedure) Surgical procedure on soft tissue (procedure)
Female cystocele and uterine prolapse (disorder) Anterior vaginal wall prolapse (disorder)

For the manual evaluation, the domain expert was provided with randomly selected 100 instances from the 678 potential missing IS-A relations identified. The expert confirmed that 93 out of 100 are valid cases, resulting in a precision of 93%. Table 4 shows ten examples of valid missing IS-A relations confirmed by the domain expert. For example, the domain expert confirmed the validity of the IS-A relation between concept “Sphingomyelin/cholesterol lipidosis (disorder)” and concept “Inherited metabolic disorder of nervous system (disorder)”. The identified missing IS-A relations and domain expert evaluation results are available at https://github.com/XubingHao/AMIA2025Symposium.

Table 4:

Ten examples of valid missing IS-A relations confirmed by the domain expert.

Descendant concept Ancestor concept
Sphingomyelin/cholesterol lipidosis (disorder) Inherited metabolic disorder of nervous system (disorder)
Genus Rhodococcus (organism) Subclass Nocardioform Actinomycetes (organism)
Myositis ossificans associated with burns (disorder) Lesion of skeletal muscle (disorder)
Endoscopic cauterization of polyp of colon (procedure) Cauterization of lesion - large intestine (procedure)
Cardiac electrophysiology (procedure) Procedure on heart (procedure)
Housing unsatisfactory (finding) Finding of characteristics of home environment (finding)
Manubriosternal joint structure (body structure) Synarthrosis structure (body structure)
Alveolectomy, including curettage of osteitis (procedure) Partial excision of facial bone (procedure)
Puerperal sepsis (disorder) Complication of the puerperium (disorder)
Medication monitoring (regime/therapy) Monitoring of patient (regime/therapy)

5. Discussion

In this study, we investigated the effectiveness of fine-tuning state-of-the-art PLMs for IS-A relation identification in non-lattice subgraphs of SNOMED CT. We used all existing direct IS-A relations as positive instances and generated negative instances from the unrelated concept pairs in lower and upper-bounds of non-lattice subgraphs. Our experiments showed that all the PLMs achieve high performance for IS-A relation prediction, with the best model being DeBERTa, achieving an F1-score of 0.965 on the test set.

We believe there are a number of reasons for our approach to be very effective in missing IS-A relation identification. First and foremost, non-lattice subgraphs tend to already indicate areas with higher concentration of errors in a biomedical terminology38. Secondly, our consensus prediction strategy leveraging PLMs only obtain instances vetted by all eight models thereby significantly enhancing the precision of the missing IS-A suggestions.

5.1. False positives

Though the review by the domain expert revealed our approach to be very effective in identifying missing IS-A relations, it still fails in certain instances. Table 5 demonstrates the seven instances where the domain expert found that the missing IS-A relation suggested by the method to be invalid. For example, the suggested missing IS-A relation “Spastic diplegia (disorder)” IS-A “Diplegia of lower limbs (disorder)” was found to be invalid, because diplegia simply refers to paralysis of both sides of the body, not referring to lower limbs.

Table 5:

Seven instances of invalid missing IS-A relations pointed out by the domain expert.

Descendant concept Ancestor concept Domain expert’s comment
Percutaneous drainage of abscess of perineum using computed tomography guidance (procedure) Perineum X-ray (procedure) The drainage procedure is not a child of an imaging procedure. They are distinct entities.
Unifocalization operation using the azygos system (procedure) Repair of vein (procedure) This is slightly complicated because the unifocaliation procedure is used to repair from aorta (artery) to the lungs - which is not intended to repair any veins.
Spastic diplegia (disorder) Diplegia of lower limbs (disorder) Diplegia simply refers to paralysis of both sides of the body. It does not refer to lower limbs. So the parent concept is more specific than the child.
Hypogonadism with anosmia (disorder) Disorder of smell (disorder) Hypogonadism with anosmia is primarily a disorder of sex hormones (gonadotropin) and not a smell disorder. The smell disorder is secondary to the primary hormonal problem.
Chloride measurement (procedure) Chlorine measurement (procedure) Chlorine is Cl2 (diatomic gas), while Chloride is the negative charged anion of chlorine. Technically they are distinct.
Fistulization of cisterna chyli (procedure) Repair of cisterna chyli (procedure) This one is a bit tricky. The Child concept is classified as a ”procedure”, while in reality this is a surgical complication resulting from another surgical procedure. So in this case, the thesaurus has some false information.”Fistulization” is a complication, it is not a repair of the cisterna.
Aspiration of abdomen using cone beam computed tomography guidance (procedure) Cone beam computed tomography of abdomen (procedure) Borderline. Depends on how one interprets the aspiration procedure. Technically aspiration procedure is independent of the computed tomography.

5.2. Comparison with related work

Previous automated approaches for identifying missing IS-A relations in SNOMED CT include rule-based approaches and learning-based approaches. In terms of precision of the identified missing IS-A relations, our study significantly outperformed previous approaches achieving a precision of 93%17, 27, 3941. In addition, previous studies face the constraints of only uncovering missing IS-A relations between concept pairs exhibiting a lexical or logical pattern among them. For example, in one study, the candidate concept pair used to identify missing IS-A relations shared common lexical words40. This work overcomes such limitations by having the ability to identify missing IS-A relations even when such lexical or logical patterns between concepts pairs are absent. For example, this study identified the missing IS-A relation between concept “Herpes zoster auricularis (disorder)” and concept “Infection of peripheral nerve (disorder)” which any of our previous studies would fail to detect.

5.3. Limitations and future work

In this study, the pool of candidates where the fine-tuned model was applied to find potential missing IS-A relations was the negative instances in the test set. To systematically uncover missing IS-A relations across all unrelated concept pairs in the lower and upper bound non-lattice subgraphs, we plan to adopt a cross-validation-inspired approach introduced in our previous work27. This approach involves using different data splits for training the models and applying the model for missing IS-A relation identification. This generally happens across multiple runs so that missing IS-A relations can be identified from the entire set of candidates. Additionally, we currently focus our analysis on non-lattice subgraphs of size less than or equal to 10. In our future work, we aim to extend our method to all non-lattice subgraphs in SNOMED CT. We believe this would enable a more comprehensive detection of missing IS-A relations.

We used a consensus prediction strategy in this work for identifying missing IS-A relations. While this strategy improves precision, it may also demonstrate a trade-off by potentially filtering out valid missing IS-A relations that were not predicted consistently by all models. In future work, we plan to explore a flexible aggregation strategy considering model confidence scores or weighted voting mechanisms.

We fine-tuned models up to 14 billion parameters in this work. An extension would involve deploying much larger model architectures, such as Llama-3.3-70B model with 70 billion parameters. The increased capacity of such models could potentially enhance understanding and prediction accuracy due to their deeper and more complex architectures. With such larger models, we also intend to investigate the performance of zero-shot and few-shot learning strategies against the fine-tuning approach used in this work.

As mentioned earlier, though minimal, our approach still made some invalid missing IS-A suggestions. The exact reasoning behind the models making such suggestions remains unknown. Future research could explore enhancing generative PLMs to not only predict IS-A relations but also provide justifications for their decisions. We will also explore some of the recently released models capable of complex reasoning tasks, such as Open AI’s o-series models and DeepSeek R142, 43.

Another limitation of this work is that the manual evaluation was conducted by a single domain expert which may introduce subjectivity in the assessment of missing IS-A relations. While the expert’s knowledge and experience provide valuable insights, the absence of multiple evaluators limits the ability to cross-validate judgments. Future work could involve multiple domain experts (including terminologists at SNOMED International) to enhance interrater reliability and ensure a more comprehensive and objective evaluation. Additionally, we plan to share our findings with the SNOMED CT curators to help improve the quality of the IS-A hierarchy in SNOMED CT.

6. Conclusion

In this study, we investigated the effectiveness of fine-tuning Pre-trained Language Models (PLMs) to identify missing IS-A relations in non-lattice subgraphs of SNOMED CT. We fine-tuned eight state-of-the-art PLMs: BERT, Distill-BERT, DeBERTa, BioClinicalBERT, BioMistral, Llama3, Gemma2, and Phi-4. Potential missing IS-A relations were identified through consensus prediction from all eight models. Experiments showed that all the models performed well in the IS-A relation prediction task with the best model being DeBERTa achieving a precision of 0.96, a recall of 0.97, and an F1-score of 0.965. Our approach identified 678 potentially missing IS-A relations. Out of these, 52 cases were found to be IS-A relations in a newer release of SNOMED CT. Evaluation by a domain expert on a random sample of 100 potential missing IS-A relations revealed that 93 are in fact valid (93% precision). This indicates that fine-tuned PLMs have the potential to identify hierarchical quality issues in non-lattice subgraphs, supporting SNOMED CT’s quality assurance efforts.

Acknowledgment

This work was supported by the National Science Foundation (NSF) through grant 2047001 and National Institutes of Health (NIH) through grants R01NS116287 and R01AG084236. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NSF or NIH.

Figures & Tables

References

  • 1.What is SNOMED CT?; (Online; accessed March, 2025). https://www.snomed.org/what-is-snomed-ct.
  • 2.SNOMED CT Implementation. (Online; accessed March, 2025). https://confluence.ihtsdotools.org/ display/docstart/8.+snomed+ct+implementation.
  • 3.Hao X, Li X, Huang Y, Shi J, Abeysinghe R, Tao C, et al. Quantitatively assessing the impact of the quality of SNOMED CT subtype hierarchy on cohort queries. Journal of the American Medical Informatics Association. 2025;32(1):89–96. doi: 10.1093/jamia/ocae272. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Amith M, He Z, Bian J, Lossio-Ventura JA, Tao C. Assessing the practice of biomedical ontology evaluation: Gaps and opportunities. Journal of biomedical informatics. 2018;80:1–13. doi: 10.1016/j.jbi.2018.02.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Norouzi SS, Mahdavinejad MS, Hitzler P. Conversational ontology alignment with chatgpt. arXiv preprint arXiv:230809217. 2023 [Google Scholar]
  • 6.Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018 [Google Scholar]
  • 7.Radford A, Wu J, Child R, Luan D, Amodei D, Sutskever I, et al. Language models are unsupervised multitask learners. OpenAI blog. 2019;1(8):9. [Google Scholar]
  • 8.Touvron H, Martin L, Stone K, Albert P, Almahairi A, Babaei Y, et al. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:230709288. 2023 [Google Scholar]
  • 9.Hao X, Abeysinghe R, Roberts K, Cui L. Logical definition-based identification of potential missing concepts in SNOMED CT. BMC Medical Informatics and Decision Making. 2023;23(1):87. doi: 10.1186/s12911-023-02183-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Hertling S, Paulheim H. Olala: Ontology matching with large language models. Proceedings of the 12th Knowledge Capture Conference 2023. 2023:p. 131–9. [Google Scholar]
  • 11.Babaei Giglou H, D’Souza J, Auer S. International Semantic Web Conference. Springer; 2023. LLMs4OL: Large language models for ontology learning; pp. p. 408–27. [Google Scholar]
  • 12.SNOMED CT Fully Specified Name. (Online; accessed March, 2025). https://confluence.ihtsdotools.org/ display/DOCEG/Fully+Specified+Name.
  • 13.SNOMED CT Structure of Domain Coverage. (Online; accessed March, 2025). https://confluence.ihtsdotools.org/display/DOCEG/Structure+of+Domain+Coverage.
  • 14.Zhang GQ, Bodenreider O. Large-scale, exhaustive lattice-based structural auditing of SNOMED CT. AMIA annual symposium proceedings. 2010:2010, p. 922. [Google Scholar]
  • 15.Cui L, Zhu W, Tao S, Case JT, Bodenreider O, Zhang GQ. Mining non-lattice subgraphs for detecting missing hierarchical relations and concepts in SNOMED CT. Journal of the American Medical Informatics Association. 2017;24(4):788–98. doi: 10.1093/jamia/ocw175. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Zhang GQ, Xing G, Cui L. An efficient, large-scale, non-lattice-detection algorithm for exhaustive structural auditing of biomedical ontologies. Journal of biomedical informatics. 2018;80:106–19. doi: 10.1016/j.jbi.2018.03.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Cui L, Bodenreider O, Shi J, Zhang GQ. Auditing SNOMED CT hierarchical relations based on lexical features of concepts in non-lattice subgraphs. Journal of biomedical informatics. 2018;78:177–84. doi: 10.1016/j.jbi.2017.12.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Zheng F, Abeysinghe R, Cui L. 2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) IEEE; 2019. A hybrid method to detect missing hierarchical relations in NCI Thesaurus; pp. p. 1948–53. [Google Scholar]
  • 19.Abeysinghe R, Brooks MA, Cui L. Leveraging non-lattice subgraphs to audit hierarchical relations in NCI Thesaurus. AMIA annual symposium proceedings. 2020;2019:p. 982. [PMC free article] [PubMed] [Google Scholar]
  • 20.Quesada-Martínez M, Ferna´ndez-Breis JT, Karlsson D. Exploring Complexity in Health: An Interdisciplinary Systems Approach. IOS Press; 2016. Suggesting missing relations in biomedical ontologies based on lexical regularities; pp. p. 384–8. [Google Scholar]
  • 21.Mougin F. Digital Healthcare Empowering Europeans. IOS Press; 2015. Identifying redundant and missing relations in the gene ontology; pp. p. 195–9. [Google Scholar]
  • 22.Bodenreider O. Identifying missing hierarchical relations in SNOMED CT from logical definitions based on the lexical features of concept names. CEUR workshop proceedings. 2016;1747:p. IT601. [Google Scholar]
  • 23.Mohtashamian M, Hu R, Abeysinghe R, Hao X, Xu H, Cui L. Automated Identification of Missing IS-A Relations in the Human Phenotype Ontology. AMIA Annual Symposium Proceedings. 2023;2022:p. 785. [PMC free article] [PubMed] [Google Scholar]
  • 24.Liu H, Zheng L, Perl Y, Geller J, Elhanan G. Can a convolutional neural network support auditing of nci thesaurus neoplasm concepts? ICBO. 2018 [Google Scholar]
  • 25.Liu H, Zhou S, Chen Z, Perl Y, Wang J. 2024 IEEE 12th International Conference on Healthcare Informatics (ICHI) IEEE; 2024. Using Generative Large Language Models for Hierarchical Relationship Prediction in Medical Ontologies; pp. p. 248–56. [Google Scholar]
  • 26.Mežnar S, Bevec M, Lavrač N, Škrlj B. Ontology completion with graph-based machine learning: a comprehensive evaluation. Machine Learning and Knowledge Extraction. 2022;4(4):1107–23. [Google Scholar]
  • 27.Abeysinghe R, Zheng F, Bernstam EV, Shi J, Bodenreider O, Cui L. A deep learning approach to identify missing is-a relations in SNOMED CT. Journal of the American Medical Informatics Association. 2023;30(3):475–84. doi: 10.1093/jamia/ocac248. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Hao X, Abeysinghe R, Zheng F, Cui L. 2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) IEEE; 2021. Leveraging non-lattice subgraphs for suggestion of new concepts for SNOMED CT; pp. p. 1805–12. [Google Scholar]
  • 29.Sanh V, Debut L, Chaumond J, Wolf T. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:191001108. 2019 [Google Scholar]
  • 30.Alsentzer E, Murphy JR, Boag W, Weng WH, Jin D, Naumann T, et al. Publicly available clinical BERT embeddings. arXiv preprint arXiv:190403323. 2019 [Google Scholar]
  • 31.Labrak Y, Bazoge A, Morin E, Gourraud PA, Rouvier M, Dufour R. Biomistral: A collection of open-source pretrained large language models for medical domains. arXiv preprint arXiv:240210373. 2024 [Google Scholar]
  • 32.Grattafiori A, Dubey A, Jauhri A, Pandey A, Kadian A, Al-Dahle A, et al. The llama 3 herd of models. arXiv preprint arXiv:240721783. 2024 [Google Scholar]
  • 33.Team G, Riviere M, Pathak S, Sessa PG, Hardin C, Bhupatiraju S, et al. Gemma 2: Improving open language models at a practical size. arXiv preprint arXiv:240800118. 2024 [Google Scholar]
  • 34.Abdin M, Aneja J, Behl H, Bubeck S, Eldan R, Gunasekar S, et al. Phi-4 technical report. arXiv preprint arXiv:241208905. 2024 [Google Scholar]
  • 35.Levenshtein VI, et al. Soviet physics doklady. Vol. 10. Soviet Union; 1966. Binary codes capable of correcting deletions, insertions, and reversals; pp. p. 707–10. [Google Scholar]
  • 36.HuggingFace. Transformers. (Online; accessed March, 2025). https://huggingface.co/docs/transformers/v4.46.3/en/index.
  • 37.PyTorch. (Online; accessed March, 2025). https://pytorch.org/
  • 38.Abeysinghe R, Zheng F, Cui L. A Comparison of Exhaustive and Non-lattice-based Methods for Auditing Hierarchical Relations in Gene Ontology. AMIA Annual Symposium Proceedings. 2022;2021:p. 177. [PMC free article] [PubMed] [Google Scholar]
  • 39.Zheng F, Shi J, Cui L. A lexical-based approach for exhaustive detection of missing hierarchical IS-A relations in SNOMED CT. AMIA Annual Symposium Proceedings. 2021;2020:p. 1392. [PMC free article] [PubMed] [Google Scholar]
  • 40.Hao X, Abeysinghe R, Shi J, Cui L. 2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) IEEE; 2022. A substring replacement approach for identifying missing IS-A relations in SNOMED CT; pp. p. 2611–8. [Google Scholar]
  • 41.Abeysinghe R, Zheng F, Shi J, Lhatoo SD, Cui L. Leveraging logical definitions and lexical features to detect missing IS-A relations in biomedical terminologies. Journal of Biomedical Semantics. 2024;15(1):6. doi: 10.1186/s13326-024-00309-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.OpenAI. Pioneering research on the path to AGI. (Online; accessed March, 2025). https://openai.com/research/
  • 43.DeepSeek-AI. DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning. 2025. (Online; accessed March, 2025). https://arxiv.org/abs/2501.12948/

Articles from AMIA Annual Symposium Proceedings are provided here courtesy of American Medical Informatics Association

RESOURCES