A Machine-Assisted Framework for Ontology Development and Standardization: Case Study in Digital Health Technologies

Fang Chen; Taylor B Harrison; Sunyang Fu; Ling He; Zhiyi Yue; Shuyu Lu; Liwei Wang; Xiaoyang Ruan; Hongfang Liu

. 2026 Feb 14;2025:238–247.

A Machine-Assisted Framework for Ontology Development and Standardization: Case Study in Digital Health Technologies

Fang Chen ¹, Taylor B Harrison ^1,², Sunyang Fu ¹, Ling He ⁴, Zhiyi Yue ⁵, Shuyu Lu ¹, Liwei Wang ¹, Xiaoyang Ruan ¹, Hongfang Liu ¹

PMCID: PMC12919423 PMID: 41726495

Abstract

Digital health technologies (DHTs) are reshaping healthcare by enabling personalized care, improving patient outcomes, and accelerating clinical research. However, the surge in DHT-related literature creates new challenges in effectively organizing, retrieving, and applying the resulting knowledge. Ontologies, structured frameworks for categorizing and connecting concepts, are central to meeting these challenges. Traditional ontology development in digital health often depends on manual processes, limiting efficiency, scalability, and cross-disciplinary adaptability. Building on previous work categorizing DHTs, we propose a new framework combining DHT lexicon extraction, ontology enrichment, and human-in-the-loop validation. In this study, we illustrate how the concept of a “adaptive ontology,” powered by large language models, can classify and enhance DHT ontologies systematically, yet semi-automatically. Thus, providing a practical path to managing the evolving landscape of digital health.

Introduction

The World Health Organization (WHO) defines digital health technologies (DHTs) as the use of digital tools and platforms, such as electronic health, mobile health, telehealth, and health data, to enhance healthcare delivery, improve health outcomes, and promote well-being ¹. On a global scale, DHTs are reshaping healthcare by enabling personalized care, improving patient outcomes, and accelerating clinical research ^2,3. In the last decade, the United States National Institutes of Health alone has funded over $7.6 billion in DHT research ¹⁸. Moreover, since 2018, the digital health product market has produced more than 116 million wearable devices which is projected to double within the next five years ⁴. While the majority of wearables are designed for everyday and non-clinical use, an increasing number of researchers and clinicians are beginning to adopt them for real-time outpatient data collection ⁴. Consequently, the variety and scale of DHT-related data are growing exponentially.

The rapid and diverse innovation of DHTs introduces substantial organization challenges in medicine, giving rise to a need for systematic extraction, curation, and standardizing of DHT concepts that enable consistent scientific discovery (e.g., literature synthesis) and policy development. To solve these challenges, the WHO Global Strategy on Digital Health (2020-2025) highlights the importance of building a unified framework among Member States, backed by robust and interoperable information technology infrastructures. Such approach enables seamless and secure data exchange among health care providers, public health authorities, and research institutions, reinforcing collaborative care and evidence-based decision-making ¹⁷.

However, studies have found that current reporting of DHT literature is heterogeneous, with overlapping and inconsistent definitions ⁵. These inconsistencies impede global collaboration and complicate the exchange of digital health knowledge, underscoring the need for more cohesive, universally adopted terminologies. Furthermore, the number of digital health publications indexed in PubMed increased by over 400% between 2010 (approximately 5,000 articles) and 2020 (over 25,000 articles), reflecting an exponential growth within the digital health domain. While efforts like the 2023 taxonomy proposed by Hey et al. offers hierarchical classification systems, such manually developed ontologies are inherently limited by their dependence on expert input, slow revision cycles, and limited scalability ¹¹. These limitations hinder the agility of a DHT ontology to adapt effectively in the fast-paced digital health landscape.

Recent advances in large language models (LLMs) have demonstrated powerful natural language understanding and generation capabilities that may benefit dynamic generation and maintenance of ontology development. Related studies indicate that models such as ChatGPT-4 and LLaMA3 can automatically identify, classify, and intelligently extend conceptual structures, thereby efficiently constructing and optimizing knowledge frameworks ^6-8. Liu et al. proposed OntoTune, a framework for aligning LLMs with medical ontologies using in-context learning, showing that LLMs can be guided by hierarchical conceptual structures without extensive retraining ¹⁴. Fathallah et al. extended the NeOn-GPT pipeline to facilitate ontology learning in the life sciences, addressing critical limitations like shallow hierarchy generation and token constraints through prompt engineering and ontology reuse ¹⁵. These methods emphasize the potential of LLMs to support scalable and rapid ontology development.

Despite these promising features, limited studies have explored LLM-assisted approaches to support domain-specific ontology production in the context of DHTs. In addition, most existing approaches focus on schema-level ontology development or instance generation, with limited emphasis on full-cycle, human-in-the-loop frameworks tailored to specific domains. To address these limitations, our study introduces a comprehensive, machine-assisted framework that tightly integrates LLMs with terminology extraction, ontology extension, and human-in-the-loop validation. Here, we use DHTs as a case demonstration for the demonstration of our “adaptive ontology” to support the continuous expansion and organization of DHT terms in an ontological structure.

Methods

We propose a machine-assisted framework that integrates four key components to accelerate the classification of emerging digital health technology (DHT) vocabulary and support the dynamic expansion of existing ontologies (Figure 1). First, the terminology extraction and expansion processes leverage an LLM to systematically generate or augment comprehensive lists of DHT-related terms. Second, we incorporate DHT-related terms into a dictionary look-up pipeline, MedTagger ^{9, 16}, in order to automatically extract and tag relevant terms from a large corpus of DHT-related literature. Third, we apply zero-shot learning for ontology extension, in which newly identified terms are either mapped to existing ontology levels or used to propose new classes or subclasses, using the seed ontology’s “is-a” relationship as structural guidance. Finally, we employ a human-in-the-loop validation process that enables a team science approach for domain experts to iteratively review and refine both the terminology and ontology structure; thereby, maintaining accuracy and contextual relevance. In the following sections, we describe each component of our framework using DHTs as a case demonstration.

Figure 1. — Study design of the machine-assisted classification and dynamic expansion of DHT vocabulary.

DHT Study Abstract Collection

The search strategy and data extraction methods used in this study were adapted from the previously published ACCEL framework ¹³. In this approach, standard PRISMA-based screening is combined with LLM-assisted data extraction to streamline the identification and collection of relevant studies. The search strategy was conducted using Ovid MEDLINE from 1946 to February 8, 2024, to retrieve DHT-enabled randomized controlled trials (RCTs). To ensure comprehensive search term coverage, the U.S. Food and Drug Administration’s (FDA) DHT categories were adopted: SaMD, Advanced Analytics, AI, Cloud, Cybersecurity, Interoperability, MDDS, MMA, and Wireless. Detailed descriptions of the inclusion and exclusion criteria, the LLM-assisted screening process, and the human annotation procedures are available in the original ACCEL framework study ^{12, 13}. After the literature search was completed, the Digital Object Identifier (DOI) of each eligible study was stored. A custom Python script was then developed to interface with the PubMed application programming interface (API), facilitating automated abstracts retrieval via unique DOI. The Python script and other resources can be found on GitHub: https://github.com/OHNLP/ACCEL.

DHT Term Lists Preparation

List 1: Seed Ontology term list ¹¹. This term list was carefully selected based on a previously established hierarchical classification covering key DHT device categories, which comprises 500 DHT device names, serves as the reference standard. This is the foundational reference, or “seed ontology.” The seed ontology’s hierarchical structure includes a root level, followed by a class level, further subdivided into subclasses, and culminating in specific product names, with each of them linked to additional metadata (Figure 2). We focused on high-level classification and ontology expansion, only terms corresponding to the class, subclass, and product levels were included in the seed ontology term list. All other descriptive data and metadata were excluded to maintain clarity and simplicity for the classification and expansion task.
List 2: LLM-expanded DHT terms list based on the seed ontology. A zero-shot learning approach extends the DHTs-related terminology beyond the initial seed ontology term list, leveraging ChatGPT-4o. This model was chosen for advanced language modeling capabilities, comprehensive knowledge base, and consistent performance in generative tasks. The prompt (Box 1) was designed solely to expand upon existing DHT terms without preserving the seed ontology’s hierarchical structure, thereby introducing novel concepts that may not have been captured previously.
List 3: LLM-extracted DHT terms list from collected DHT-enabled RCT abstracts. ChatGPT-4o is used to extract all DHTs-related terms from each abstract. Any redundant terms identified from the seed ontology term list were removed to ensure uniqueness and reduce noise.
List 4: Human annotated LLM-expanded terms list from retreived abstracts. Subsequent annotation and validation were performed on the LLM-extracted term list. A research team of more than three members developed an annotation guideline using MedTator (https://medtator.ohnlp.org), an open-source text annotation tool that runs entirely in a local environment, to finalize the DHTs-related vocabulary ¹⁹. Terminologies duplicating those found in the seed ontology term list were, likewise, eliminated to preserve uniqueness.

Figure 2. — Hierarchical Structure of the Seed Ontology. Specific products and associated metadata within each subclass category are omitted for clarity.

Automated DHT Term Mapping via MedTagger

This task assesses the coverage of each term list in real-world classification and identifies potential gaps in the seed ontology. We integrate the four term lists into MedTagger, a resource-driven, open-source information extraction framework based on the Unstructured Information Management Architecture (UIMA) ¹⁰. MedTagger was then applied to 5,442 DHT-related abstracts.

Box 1. Zero-Shot Learning Prompt for LLMAssisted DHT Term Expansion.

You are an expert in digital health technologies (DHTs). Below is a list of existing DHT terms from our seed ontology. Please propose additional relevant terms that could be included in an expanded ontology. Avoid duplicating any terms from the provided list, and do not include definitions or references. Focus exclusively on introducing new DHT-related terms.

[*seed ontology term list*]

LLM-Assisted Classification and Ontology Expansion

The next task leverages an LLM to classify newly identified DHT terms and expand the seed ontology based on terms not previously covered. Using the DHT term dictionary generated in prior steps, we can compared abstract-level annotations to the existing ontology. Abstracts that do not match any terms in the current ontology are flagged as candidates containing novel or emerging DHT concepts. After removing duplicates, these potentially new terms are submitted in a zero-shot classification prompt (Figure 2 and Box 2) using ChatGPT-4o. The prompt to performs two primary functions:

Classify new DHT terms into appropriate positions within the existing ontology, when feasible.
Propose new ontology structures, either as entirely new classes or subclasses, for terms that cannot be integrated into the current hierarchy.

To balance the need for ontological stability and the flexibility to capture evolving DHT concepts, the prompt instructs the model toward conservative recommendations at the class level, where terms are typically more broad. Conversely, this approach allows for more exploratory LLM proposals at the subclass level, which benefits from greater specificity. In cases where the LLM determines that a term does not align with any existing class or subclass and should not be included, the model is instructed to flag the term in a separate list and provide a justification for its exclusion. All LLM-generated classifications and structural recommendations undergo manual review by domain experts to assess validity, coherence, and relevance to the overall ontology.

Box 2. Zero-Shot Learning Prompt for LLM-Assisted DHT Term Classification and Ontology Expansion.

You are an expert in Digital Health Technology (DHT) ontology and taxonomy. We have an existing DHT Ontology with this structure:

*Figure 2*

Instructions for Expanding the Ontology:

Comprehensive Direct Classification
- Classify each term from the list below into an existing Subclass if a clear match exists.
Aggressive Subclass Creation
- Whenever 3 or more terms share a unique theme or functional characteristic not covered by an existing Subclass, create a new Subclass under the most appropriate existing Class. Label all new Subclasses with “(new).”
- Avoid dumping large sets of items into “Other,” “Other → Other,” or any large catch-all. Instead, if you see 3+ items that form a recognizable grouping, create a new Subclass for them.
Be Cautious with New Classes
- Only create a brand-new top-level Class if there is a significant cluster of terms that truly cannot be integrated under any existing Class (including newly formed Subclasses).
- If a new Class is absolutely needed, label it “(new)” and briefly justify it (e.g., “Brand / Manufacturer (new)” if you have dozens of brand-only terms).
Irrelevant Terms
- If a term is not actually a digital health technology (no digital/electronic component or brand relevance), list it as “Irrelevant” with a short reason, instead of placing it in the CSV.
Output Format
1. CSV with 3 columns: “Class,” “Subclass,” “Product.”
  - Each row corresponds to one (Class, Subclass) pair.
  - In the “Product” column, comma-separate all terms that share that exact (Class, Subclass).
  - Mark all newly created Classes or Subclasses with “(new)” in the CSV.
2. After the CSV, provide a separate “Irrelevant Terms List” with each irrelevant term and a brief reason.

Now, classify the following input terms according to the above instructions: [*new DHT term list*]

Remember:

−
Always try to place items under existing Classes or newly formed Subclasses before creating a new Class.
−
Create a new Subclass if 3+ items share a theme that does not fit an existing Subclass.
−
Only introduce a new Class (labeled “(new)”) if absolutely necessary.
−
Write the final answer as a CSV (Class, Subclass, Product) plus an Irrelevant Terms list with reasons.

Human-In-The-Loop Validation

To ensure rigorous validation, terminology expansions underwent systematic review by at least two independent human evaluators with expertise in biomedical informatics and digital health technologies. Reviewers assessed each term based on its relevance, overlap with existing ontology structures, and integrability into the current ontology framework. Disagreements were resolved through consensus meetings guided by structured annotation guidelines, enhancing both the reliability and practical utility of the ontology. The assessment follows a binary approach across four dimensions: 1) whether each generated term is relevant to the DHT domain, 2) whether newly created classes or subclasses overlap with existing ones in the seed ontology, 3) whether new subclasses can be integrated into the current seed ontology, and 4) whether terms designated as “irrelevant” (along with their explanations) are appropriately excluded.

Results

DHT Study Abstract Collection

The ACCEL framework search strategy discovered 169,424 articles. After applying inclusion and exclusion criteria such as deduplication, keyword removal, and LLM-assisted screening, 5,442 articles remained eligible for abstraction extraction (Figure 3).

DHT Term Lists Preparation

A total of 78 unique terms were included in the seed ontology term list (List 1). An LLM-based expansion of this seed ontology generated 20 additional unique terms (List 2). From the collected abstracts, 892 unique DHT-related terms were extracted by the LLM compiled into List 3. Subsequently, a human annotation process refined this list to 118 unique terms (List 4). For consistency, normalization was applied at the beginning to remove special characters and convert each term to lowercase.

Automated DHT Term Mapping via MedTagger

All four normalized term lists from Task 1 were used to match relevant terms against 5,442 abstracts using MedTagger. The result outputted 5,442 Extensible Markup Language (XML) files, containing both the recognized DHT terms and their originating term lists. In total, 4,583 (84.3%) contained at least one mapped DHT term, while 859 abstracts (15.7%) remained unmapped (Figure 4).

Figure 4. — *Distribution of Mapped vs. Unmapped Abstracts by Rule Type (Term List)*. Each pie chart depicts the proportion of abstracts that were successfully matched (“Mapped”) or remained unmatched (“Unmapped”) for each of the four term lists. These term lists—derived from the seed ontology (List 1), LLM-expanded DHTs (List 2), LLM-extracted DHTs (List 3), and human-annotated DHTs (List 4)—were used as rule inputs in MedTagger to identify relevant DHT concepts within 5,442 abstracts.

The seed ontology terms successfully matched 2,156 abstracts, representing 47.0% of the total. Although these terms capture the foundational scope of DHT concepts, the result indicates that more than half of the abstracts did not align with the existing seed ontology, suggesting possible gaps or emerging vocabularies not yet covered. The LLM-expanded terms (List 2) covered 407 abstracts (8.9%). Despite a smaller standalone coverage, introduced additional domain expansions beyond the seed list with a minimum prompt.

The LLM-extracted terms (List 3) achieved the highest coverage, mapping 4,172 abstracts (91.0%). These findings underline the capacity of an LLM to identify a broad range of DHT-related concepts, including novel or less conventional terms from real-world abstracts. Finally, the human-annotated terms mapped to 2,663 abstracts (58.1%), reflecting moderate improvement over the seed ontology alone. This increase in coverage highlights the value of expert review and consensus-based curation, which can enhance specificity and adapt the ontology to real-world data more effectively.

Figure 5 details the frequency distributions of terms across these four lists, highlighting that a few highly recurrent terms (e.g., “scale,” “online,” “internet,” “virtual reality”) dominate each vocabulary, while many other terms appear only a handful of times. For the seed ontology terms (List 1), “scale” had the highest occurrence count (930 instances), followed by “other” (708) and “phone” (247). Despite the limited total of 78 unique terms, one or two frequently used concepts accounted for a substantial portion of the matches. The LLM-expanded terms (List 2) comprised only 20 unique terms. “Virtual reality” again led in frequency (399 occurrences), while “self-monitor” (12) and “self-test” (8) reflected the model’s focus on generating terms that extend beyond standard definitions. The LLM-extracted terms (List 3) displayed the greatest overall coverage, featuring 892 unique terms. Of these, “internet” was the most common (907 occurrences), followed by “computer” (395) and “virtual reality” (368). Finally, the human-annotated terms (List 4) comprised 118 unique terms, with “online” most frequently observed (845 occurrences). “Virtual reality” (374) and “mHealth” (344) also appeared prominently.

graphic file with name AMIASYMPROC-2025-7991-f5.jpg

LLM-Assisted Classification and Ontology Expansion

Following the automated DHT term mapping in Task 2, newly recognized terms were subjected to a zero-shot learning prompt (Box 2) for classification or ontology expansion. Notably, ChatGPT-4o did not introduce a new class and majority of the new subclasses were classified under the “non-wearable” class. The LLM proposed 16 new subclasses, including two duplicates (“Other” and “Clothing”) already present in the seed ontology. Table 1 illustrates ChatGPT-4o derived assignment for each term to either an existing or newly created class or subclass.

Table 1.

LLM Classified or Expanded Ontology Output.

Class	Subclass	Products
Lifestyle	Phone	cell phone, cellphone, mobile phone, smartphone, smart phone, cellular phone
Lifestyle	Tablet	tablet computer, tab, ipad
Lifestyle	Game Console	nintendo, playstation vr, nintendo wii, xbox kinect, xbox 360
Lifestyle	Home Assistant	digital assistant, amazon, google, apple, samsung, meta, microsoft
Wearable	Watch	fitbit, fitness tracker, smartwatch, healthwatch, activity tracker
Wearable	Sensor Patch	dexcom, propeller health, continuous glucose monitoring
Wearable	Other (new)	whoop, misfit, jawbone, bodymedia, polar
Non-wearable	Handheld	kardiamobile, alivecor, omron
Non-wearable	Remote Monitor	remote patient monitoring, home bp telemonitoring, remote patient management
Non-wearable	Scale	digital drainage system
Packaging	Syringe	insulin pump
Packaging	Dispenser	smart pill
Other	Subcutaneous	ingestible, smart pill
Other	Ingestible	electronic cigarette
Wearable	Sensor Box	accelerometer, accelerometry, pedometer
Wearable	Clothing (new)	hybrid assistive limb, lokomat, robotic rehabilitation, robotic therapy
Wearable	Glasses	virtual reality, augmented reality, mixed reality, vr headset, oculus
Wearable	Clothing Clip	clip
Wearable	Contact Lens	contact lens
Wearable	Transmitter	wireless health transmitter
Wearable	Wearable Activity Tracker (new)	wearable device, wearable technology, wearable activity tracker
Non-wearable	Receiver	receiver
Non-wearable	Speaker	speaker
Non-wearable	Arm Cuff	arm cuff
Non-wearable	Blood Test Kit	blood test kit
Non-wearable	Remote Monitoring (new)	remote monitoring, telemonitoring, home telemonitoring, remote asp
Non-wearable	Telehealth Equipment (new)	telehealth, telemedicine, teleconsultation, telecoaching, telecare, telerehabilitation, home telehealth
Non-wearable	Decision Support (new)	clinical decision support, cds tool, decision support tool, decision support system, electronic decision support, decision aid, decision support intervention
Non-wearable	Virtual Reality Therapy (new)	vr therapy, immersive virtual reality, immersive vr, virtual simulation, interactive virtual reality, virtual patient, virtual anatomy
Non-wearable	AI-driven Healthcare (new)	ai algorithm, artificial intelligence, machine learning, ml algorithm, predictive model, convolutional neural network, neural network, deep learning, reinforcement learning
Non-wearable	Digital Health Platform (new)	digital health platform, digital intervention, digital therapeutic, digital workflow, online platform, web platform, online program, online module, web portal
Non-wearable	EHR/EMR Systems (new)	electronic health, electronic medical record, ehr, emr, patient portal, electronic prescription, personal health record, medical informatics
Non-wearable	Virtual Communication (new)	video call, videoconference, teleconference, zoom, skype, telegram, messenger, whatsapp, signal, wechat, facebook, linkedin, twitter, instagram, youtube, tiktok, snapchat
Non-wearable	Digital Cognitive Behavioral Therapy (new)	digital cognitive behavioral therapy, online cbt, icbt, online treatment, deprexis, online mindfulness training, moodgym, online mbi
Non-wearable	Gamified Health Interventions (new)	exergame, serious game, gamification, wii fit, video game, nintendo wii
Non-wearable	Online Health Education (new)	online course, online education, online learning, online survey, educational intervention, online questionnaire, instructional video, educational website, educational video
Non-wearable	Surgical and Assistive Robotics (new)	robotic rehabilitation, robotic therapy, robotic arm, robotic gait training
Non-wearable	Medical Sensors (new)	sensor, depth sensor, haptic feedback, biosensor, electronic monitoring

Open in a new tab

Pilot Human-In-The-Loop Validation

Two annotators performed the validation on 16 newly generated subclass terms independently and blindly, then reconciled any disagreements by reviewing the annotation guideline in a consensus meeting. The final consensus serves as the gold standard. Because all subclass terms were proposed by ChatGPT-4o, “Relevance” was initially marked as 1, “Overlap” as 0, and “Integrability” as 1. Table 2 illustrates the model’s performance against the gold standard labels for each binary dimension (“Relevance,” “Overlap,” and “Integrability”).

Table 2.

Model Performance Compared to Consensus for New Subclass Terms.

Dimension	Precision	Sensitivity	Specificity	F1 Score	Correct
Relevance	0.38	1.00	0.00	0.55	0.38
Overlap	NA	NA	NA	NA	0.88
Integrability	0.38	1.00	0.00	0.55	0.38

Open in a new tab

Of 16 potential subclasses, the consensus judged 6 as “relevant” to digital health. Only 2 subclasses truly overlapped with existing ontology structures, but the model indicated the term as (“Other” and “Clothing” under “Wearable). Annotator agreement indicates that 6 subclasses are suitable for integration in the DHT ontology.

Discussion

Motivated by the need for a high-throughput and standardized method to manage the expanding body of DHT-related terminology, we introduced a machine-assisted framework that combines LLMs, domain-specific term extraction, zero-shot ontology expansion, and expert validation to support the dynamic development of a “adaptive ontology” for digital health. By accounting key steps, such as term extraction and initial classification, and then incorporating human oversight to refine and verify the result, our method delivers both scalability and contextual accuracy.

In a comparison of coverage across seed ontology terms, human-annotated terms, and LLM-generated terms applied to 5,442 digital trial study abstracts, we observed notable differences in performance. List 1 (the seed ontology terms) successfully matched 2,156 (47.0%) abstracts, confirming a significant gap due to the emergence of new terminologies and evolving language within the field. List 2 (LLM-generated terms based on the seed ontology) showed the lowest coverage (8.9%). This smaller proportion may stem from the prompt’s focus on suggesting new terms rather than aligning with existing terminologies also demonstrating the model’s potential to introduce new concepts20. List 3 (LLM-extracted terms) achieved the highest coverage by far, mapping to 4,172 abstracts, or 91.0% of the dataset. This significant increase illustrates the power of LLMs to uncover a broader and up-to-date range of DHT-related terms, including those that may not be easily identified through manual curation alone. Finally, List 4 (human-annotated terms demonstrated moderate improvement, covering 58.1% of the abstracts While it did not achieve the same breadth of coverage as the LLM-extracted terms, the employment of expert-reviewed and standardized terminology contributed to better alignment with real-world language and ontology refinement. Overall, these results illustrate that while Lists 1 and 2 elucidate existing ontology gaps and the LLM’s expansion potential, Lists 3 and 4 offer deeper coverage insights that highlight the LLM’s extraction ability and the value of expert curation.

The integration of LLM-assisted methods with expert validation addressed substantial limitations inherent in manual ontology development, such as scalability and responsiveness to rapidly evolving terminology. By refining the ontology using automated extraction and human curation, we notably increased term coverage. This refined ontology directly improves digital health research and practice by enhancing systematic reviews and meta-analyses through more comprehensive literature retrieval and improved data synthesis. It also supports interoperability and standardized communication among clinicians, researchers, and policymakers through universally recognized terminology, and facilitates efficient identification and adoption of emerging digital health tools, thereby accelerating clinical implementation and regulatory processes.

Nevertheless, human oversight remains indispensable for consolidating and finalizing curated terms, ensuring they align with established structure and domain concepts21,22. For instance, although ChatGPT-4o labeled all new subclasses under “non-wearables,” the annotators found that many could be more appropriately categorized if a broader class (e.g., “Software”) were introduced. Consequently, only six subclasses were ultimately deemed relevant by consensus, illustrating how an iterative human-in-the-loop approach refines and validates LLM outputs to more accurately reflect real-world contexts.

Our study has several limitations. First, the scope of this study was constrained to rely solely on ChatGPT-4o for LLM-based tasks. Other LLMs, such as LLaMA2 or Gemini, may offered different levels of coverage or semantic depth^23-26. Second, the dataset was limited to abstracts from randomized controlled trials. If full-text article had been included, the result might have yielded a more comprehensive range of DHT concepts and contexts. Third, the study focused primarily on expanding and refining the ontology at broader (class and subclass) levels, with minimal exploration of product-level distinction. Fourth, the approach was tested against a single dataset, which may not capture all the nuances of DHTs in other contexts. Finally, because the LLM-extracted (List 3) and human-annotated (List 4) terms had access to the entire dataset, they were not evaluated on a separate or blind set, limiting our ability to access generalizability.

In future work, we plan to explore cross-validation with additional LLMs to assess variations in term coverage, semantic accuracy, and consistency across models. Integrating active learning techniques may further streamline the automated steps. Moreover, training specialized neural architectures like CNNs, RNNs, or fine-tuned Transformers on DHT-specific corpora could yield a more insightful understanding of DHT concepts and enhance both entity recognition and ontology expansion. Additionally, incorporating more granular product-level information, along with robust metadata (e.g., manufacturer, clinical indication), could strengthen the ontology’s utility for tasks such as systematic reviews, regulatory tracking, or clinical decision support.

Conclusion

In addition to a more in-depth human validation, future work will focus on scaling this approach across broader health technology domains, such as durable medical equipment, laboratory concepts, and HER data, while integrating multilingual resources, refining evaluation metrics for ontology quality and usability, and conducting more in-depth human validation to enhance normalization and standardization efforts. Overall, the case study findings underscore the strengths of combining machine-assisted methods with expert oversight. This integrated approach shows the potential to fill critical gaps in DHT terminology, hence guiding the ontology toward a more comprehensive and adaptive resource. By continually refining term lists through systematic, machine-assisted, and expert-validated processes, researchers can adapt the ontology to the rapidly evolving landscape of digital health.

Data Sharing

The data used in this study were generated from publicly available published outcomes of randomized controlled trials that can be retrieved through various databases including PubMed. Document processing API and Python scripts will be available on GitHub (https://github.com/OHNLP/ACCEL). All large language model output data, data analysis, and data visualization code will be made available upon reasonable request.

Funding

This work was supported by the National Institutes of Health grant R01LM11934.

Conflicts of Interest

The authors do not declare any conflicts of interest.

Author Contributions

F.C., T.H, S.F., and H.L. conceptualized the study. F.C. wrote the study. S.F. and H.L. provided primary manuscript review and edits. F.C. and H.L. conducted the pilot human validation. All authors reviewed and edited the manuscript and have access to the study data.

Figures & Tables

Reference

1.World Health Organization. Digital health. 22 May 2025. https://www.who.int/europe/health-topics/digital-health Accessed.
2.Mittermaier M, Venkatesh KP, Kvedar JC. Digital health technology in clinical trials. NPJ Digital Medicine. 2023 May 18;6(1):88. doi: 10.1038/s41746-023-00841-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Masanneck L, Gieseler P, Gordon WJ, Meuth SG, Stern AD. Evidence from ClinicalTrials. gov on the growth of Digital Health Technologies in neurology trials. npj Digital Medicine. 2023 Feb 10;6(1):23.,3. doi: 10.1038/s41746-023-00767-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Marra C, Chen JL, Coravos A, Stern AD. Quantifying the use of connected digital products in clinical research. npj Digital Medicine. 2020;3(1) [Google Scholar]
5.Harrison TM, Moon S, Wang L, Fu S, Liu H. Digital Solutions Observed in Clinical Trials: A Formative Feasibility Scoping Review. InAMIA Annual Symposium Proceedings. 2024 Jan 11;2023:p. 987. [Google Scholar]
6.Dagdelen J, Dunn A, Lee S, Walker N, Rosen AS, Ceder G, et al. Structured information extraction from scientific text with large language models. Nature Communications. 2024;15(1):1418. [Google Scholar]
7.Sun Z, Zhang R, Doi SA, Furuya-Kanamori L, Yu T, Lin L, Xu C. How good are large language models for automated data extraction from randomized trials? medRxiv. 2024:2024.02. 20.24303083.
8.Polak MP, Morgan D. Extracting accurate materials data from research papers with conversational language models and prompt engineering. Nature Communications. 2024;15(1):1569. [Google Scholar]
9.Liu H, Bielinski SJ, Sohn S, Murphy S, Wagholikar KB, Jonnalagadda SR, Ravikumar KE, Wu ST, Kullo IJ, Chute CG. An information extraction framework for cohort identification using electronic health records. AMIA Summits on Translational Science Proceedings. 2013;2013:149. [Google Scholar]
10.Wen A, Fu S, Moon S, El Wazir M, Rosenbaum A, Kaggal VC, Liu S, Sohn S, Liu H, Fan J. Desiderata for delivering NLP to accelerate healthcare AI advancement and a Mayo Clinic NLP-as-a-service implementation. npj Digital Medicine. 2019 Dec 17;2(1):1–7. doi: 10.1038/s41746-018-0076-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Hey S.P., Dellapina M., Lindquist K., et al. Digital Health Technologies in Clinical Trials: An Ontology-Driven Analysis to Inform Digital Sustainability Policies. Ther Innov Regul Sci. 2023;57:1269–1278. doi: 10.1007/s43441-023-00560-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Administation USFD. Digital Health Terms: Digital Health Center of Excellence; 2022. 2025. Available from: https://www.fda.gov/medical-devices/digital-health-center-excellence/digital-health-terms Accessed 22 Mar 2025.
13.Harrison T, Hu D, Jia H, Tan SH, Chen F, Zhang Z, Lu Q, Wang J, Huang M, Prokop LJ, Hoy M, St. Sauver J, Fu S, Liu H. An Informatics Framework for Accelerating Digital Health Technology Enabled Randomized Controlled Trial Candidate Guideline Item Development. 22 Mar 2025. Available at SSRN: https://ssrn.com/abstract=5137601 or http://dx.doi.org/10.2139/ssrn.5137601. Accessed.
14.Liu Z, Gan C, Wang J, Zhang Y, Bo Z, Sun M, Chen H, Zhang W. OntoTune: Ontology-Driven Self-training for Aligning Large Language Models. arXiv preprint arXiv:2502.05478. 2025 Feb 8 [Google Scholar]
15.Fathallah N, Staab S, Algergawy A. LLMs4Life: Large Language Models for Ontology Learning in Life Sciences. arXiv preprint arXiv:2412.02035. 2024 Dec 2 [Google Scholar]
16.Liu H, Bielinski SJ, Sohn S, Murphy S, Wagholikar KB, Jonnalagadda SR, Ravikumar KE, Wu ST, Kullo IJ, Chute CG. An information extraction framework for cohort identification using electronic health records. AMIA Summits on Translational Science Proceedings. 2013 Mar 18;2013:149. [Google Scholar]
17.World Health Organization. Geneva: World Health Organization; 2021. Global strategy on digital health 2020-2025. Licence: CC BY-NC-SA 3.0 IGO. [Google Scholar]
18.Cure P, Radman T, Doyle JM, Atienza AA, Fessel JP, Hartshorn CM. Digital Health Technology Research Funded by the National Institutes of Health. JAMA Netw Open. 2025;8(1):e2452976. doi: 10.1001/jamanetworkopen.2024.52976. doi:10.1001/jamanetworkopen.2024.52976. [DOI] [PubMed] [Google Scholar]
19.He H, Fu S, Wang L, Liu S, Wen A, Liu H. MedTator: a serverless annotation tool for corpus development. Bioinformatics. 2022 Mar 15;38(6):1776–8. doi: 10.1093/bioinformatics/btab880. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Hoang-Xuan N, Vu M, Thai MT. LLM-assisted Concept Discovery: Automatically Identifying and Explaining Neuron Functions. arXiv preprint arXiv:2406.08572. 2024 Jun 12 [Google Scholar]
21.Amirizaniani M, Yao J, Lavergne A, Okada ES, Chadha A, Roosta T, Shah C. LLMAuditor: A Framework for Auditing Large Language Models Using Human-in-the-Loop. arXiv preprint arXiv:2402.09346. 2024 Feb 14 [Google Scholar]
22.Drori I, Te’eni D. Human-in-the-loop AI reviewing: feasibility, opportunities, and risks. Journal of the Association for Information Systems. 2024;25(1):98–109. [Google Scholar]
23.Tooth JM, Tuptuk N, Watson JD. A Systematic Survey of the Gemini Principles for Digital Twin Ontologies. arXiv preprint arXiv:2404.10754. 2024 Apr 16 [Google Scholar]
24.Huettemann S, Mueller R, Larsen K, Dinter B. A framework for ontology-based knowledge synthesis from research articles
25.Lippolis AS, Saeedizade MJ, Keskisärkkä R, Zuppiroli S, Ceriani M, Gangemi A, Blomqvist E, Nuzzolese AG. Ontology Generation using Large Language Models. arXiv preprint arXiv:2503.05388. 2025 Mar 7 [Google Scholar]
26.Saeedizade MJ, Blomqvist E. InEuropean Semantic Web Conference. Cham: Springer Nature Switzerland; 2024 May 19. Navigating ontology development with large language models; pp. pp. 143–161. [Google Scholar]

[r1-7991] 1.World Health Organization. Digital health. 22 May 2025. https://www.who.int/europe/health-topics/digital-health Accessed.

[r2-7991] 2.Mittermaier M, Venkatesh KP, Kvedar JC. Digital health technology in clinical trials. NPJ Digital Medicine. 2023 May 18;6(1):88. doi: 10.1038/s41746-023-00841-8. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r3-7991] 3.Masanneck L, Gieseler P, Gordon WJ, Meuth SG, Stern AD. Evidence from ClinicalTrials. gov on the growth of Digital Health Technologies in neurology trials. npj Digital Medicine. 2023 Feb 10;6(1):23.,3. doi: 10.1038/s41746-023-00767-1. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r4-7991] 4.Marra C, Chen JL, Coravos A, Stern AD. Quantifying the use of connected digital products in clinical research. npj Digital Medicine. 2020;3(1) [Google Scholar]

[r5-7991] 5.Harrison TM, Moon S, Wang L, Fu S, Liu H. Digital Solutions Observed in Clinical Trials: A Formative Feasibility Scoping Review. InAMIA Annual Symposium Proceedings. 2024 Jan 11;2023:p. 987. [Google Scholar]

[r6-7991] 6.Dagdelen J, Dunn A, Lee S, Walker N, Rosen AS, Ceder G, et al. Structured information extraction from scientific text with large language models. Nature Communications. 2024;15(1):1418. [Google Scholar]

[r7-7991] 7.Sun Z, Zhang R, Doi SA, Furuya-Kanamori L, Yu T, Lin L, Xu C. How good are large language models for automated data extraction from randomized trials? medRxiv. 2024:2024.02. 20.24303083.

[r8-7991] 8.Polak MP, Morgan D. Extracting accurate materials data from research papers with conversational language models and prompt engineering. Nature Communications. 2024;15(1):1569. [Google Scholar]

[r9-7991] 9.Liu H, Bielinski SJ, Sohn S, Murphy S, Wagholikar KB, Jonnalagadda SR, Ravikumar KE, Wu ST, Kullo IJ, Chute CG. An information extraction framework for cohort identification using electronic health records. AMIA Summits on Translational Science Proceedings. 2013;2013:149. [Google Scholar]

[r10-7991] 10.Wen A, Fu S, Moon S, El Wazir M, Rosenbaum A, Kaggal VC, Liu S, Sohn S, Liu H, Fan J. Desiderata for delivering NLP to accelerate healthcare AI advancement and a Mayo Clinic NLP-as-a-service implementation. npj Digital Medicine. 2019 Dec 17;2(1):1–7. doi: 10.1038/s41746-018-0076-7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r11-7991] 11.Hey S.P., Dellapina M., Lindquist K., et al. Digital Health Technologies in Clinical Trials: An Ontology-Driven Analysis to Inform Digital Sustainability Policies. Ther Innov Regul Sci. 2023;57:1269–1278. doi: 10.1007/s43441-023-00560-y. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r12-7991] 12.Administation USFD. Digital Health Terms: Digital Health Center of Excellence; 2022. 2025. Available from: https://www.fda.gov/medical-devices/digital-health-center-excellence/digital-health-terms Accessed 22 Mar 2025.

[r13-7991] 13.Harrison T, Hu D, Jia H, Tan SH, Chen F, Zhang Z, Lu Q, Wang J, Huang M, Prokop LJ, Hoy M, St. Sauver J, Fu S, Liu H. An Informatics Framework for Accelerating Digital Health Technology Enabled Randomized Controlled Trial Candidate Guideline Item Development. 22 Mar 2025. Available at SSRN: https://ssrn.com/abstract=5137601 or http://dx.doi.org/10.2139/ssrn.5137601. Accessed.

[r14-7991] 14.Liu Z, Gan C, Wang J, Zhang Y, Bo Z, Sun M, Chen H, Zhang W. OntoTune: Ontology-Driven Self-training for Aligning Large Language Models. arXiv preprint arXiv:2502.05478. 2025 Feb 8 [Google Scholar]

[r15-7991] 15.Fathallah N, Staab S, Algergawy A. LLMs4Life: Large Language Models for Ontology Learning in Life Sciences. arXiv preprint arXiv:2412.02035. 2024 Dec 2 [Google Scholar]

[r16-7991] 16.Liu H, Bielinski SJ, Sohn S, Murphy S, Wagholikar KB, Jonnalagadda SR, Ravikumar KE, Wu ST, Kullo IJ, Chute CG. An information extraction framework for cohort identification using electronic health records. AMIA Summits on Translational Science Proceedings. 2013 Mar 18;2013:149. [Google Scholar]

[r17-7991] 17.World Health Organization. Geneva: World Health Organization; 2021. Global strategy on digital health 2020-2025. Licence: CC BY-NC-SA 3.0 IGO. [Google Scholar]

[r18-7991] 18.Cure P, Radman T, Doyle JM, Atienza AA, Fessel JP, Hartshorn CM. Digital Health Technology Research Funded by the National Institutes of Health. JAMA Netw Open. 2025;8(1):e2452976. doi: 10.1001/jamanetworkopen.2024.52976. doi:10.1001/jamanetworkopen.2024.52976. [DOI] [PubMed] [Google Scholar]

[r19-7991] 19.He H, Fu S, Wang L, Liu S, Wen A, Liu H. MedTator: a serverless annotation tool for corpus development. Bioinformatics. 2022 Mar 15;38(6):1776–8. doi: 10.1093/bioinformatics/btab880. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r20-7991] 20.Hoang-Xuan N, Vu M, Thai MT. LLM-assisted Concept Discovery: Automatically Identifying and Explaining Neuron Functions. arXiv preprint arXiv:2406.08572. 2024 Jun 12 [Google Scholar]

[r21-7991] 21.Amirizaniani M, Yao J, Lavergne A, Okada ES, Chadha A, Roosta T, Shah C. LLMAuditor: A Framework for Auditing Large Language Models Using Human-in-the-Loop. arXiv preprint arXiv:2402.09346. 2024 Feb 14 [Google Scholar]

[r22-7991] 22.Drori I, Te’eni D. Human-in-the-loop AI reviewing: feasibility, opportunities, and risks. Journal of the Association for Information Systems. 2024;25(1):98–109. [Google Scholar]

[r23-7991] 23.Tooth JM, Tuptuk N, Watson JD. A Systematic Survey of the Gemini Principles for Digital Twin Ontologies. arXiv preprint arXiv:2404.10754. 2024 Apr 16 [Google Scholar]

[r24-7991] 24.Huettemann S, Mueller R, Larsen K, Dinter B. A framework for ontology-based knowledge synthesis from research articles

[r25-7991] 25.Lippolis AS, Saeedizade MJ, Keskisärkkä R, Zuppiroli S, Ceriani M, Gangemi A, Blomqvist E, Nuzzolese AG. Ontology Generation using Large Language Models. arXiv preprint arXiv:2503.05388. 2025 Mar 7 [Google Scholar]

[r26-7991] 26.Saeedizade MJ, Blomqvist E. InEuropean Semantic Web Conference. Cham: Springer Nature Switzerland; 2024 May 19. Navigating ontology development with large language models; pp. pp. 143–161. [Google Scholar]

PERMALINK

A Machine-Assisted Framework for Ontology Development and Standardization: Case Study in Digital Health Technologies

Fang Chen, MS

Taylor B Harrison, MBA

Sunyang Fu, PhD

Ling He, MS

Zhiyi Yue, MA

Shuyu Lu

Liwei Wang, MD, PhD

Xiaoyang Ruan, PhD

Hongfang Liu, PhD

Abstract

Introduction

Methods

Figure 1.

DHT Study Abstract Collection

DHT Term Lists Preparation

Figure 2.

Automated DHT Term Mapping via MedTagger

Box 1. Zero-Shot Learning Prompt for LLMAssisted DHT Term Expansion.

LLM-Assisted Classification and Ontology Expansion

Box 2. Zero-Shot Learning Prompt for LLM-Assisted DHT Term Classification and Ontology Expansion.

Human-In-The-Loop Validation

Results

DHT Study Abstract Collection

Figure 3.

DHT Term Lists Preparation

Automated DHT Term Mapping via MedTagger

Figure 4.

LLM-Assisted Classification and Ontology Expansion

Table 1.

Pilot Human-In-The-Loop Validation

Table 2.

Discussion

Conclusion

Data Sharing

Funding

Conflicts of Interest

Author Contributions

Figures & Tables

Reference

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases