Skip to main content
Journal of the American Medical Informatics Association: JAMIA logoLink to Journal of the American Medical Informatics Association: JAMIA
. 2024 Aug 29;31(11):2622–2631. doi: 10.1093/jamia/ocae233

Relation extraction using large language models: a case study on acupuncture point locations

Yiming Li 1, Xueqing Peng 2, Jianfu Li 3, Xu Zuo 4, Suyuan Peng 5, Donghong Pei 6, Cui Tao 7, Hua Xu 8, Na Hong 9,
PMCID: PMC11491641  PMID: 39208311

Abstract

Objective

In acupuncture therapy, the accurate location of acupoints is essential for its effectiveness. The advanced language understanding capabilities of large language models (LLMs) like Generative Pre-trained Transformers (GPTs) and Llama present a significant opportunity for extracting relations related to acupoint locations from textual knowledge sources. This study aims to explore the performance of LLMs in extracting acupoint-related location relations and assess the impact of fine-tuning on GPT’s performance.

Materials and Methods

We utilized the World Health Organization Standard Acupuncture Point Locations in the Western Pacific Region (WHO Standard) as our corpus, which consists of descriptions of 361 acupoints. Five types of relations (“direction_of”, “distance_of”, “part_of”, “near_acupoint”, and “located_near”) (n = 3174) between acupoints were annotated. Four models were compared: pre-trained GPT-3.5, fine-tuned GPT-3.5, pre-trained GPT-4, as well as pretrained Llama 3. Performance metrics included micro-average exact match precision, recall, and F1 scores.

Results

Our results demonstrate that fine-tuned GPT-3.5 consistently outperformed other models in F1 scores across all relation types. Overall, it achieved the highest micro-average F1 score of 0.92.

Discussion

The superior performance of the fine-tuned GPT-3.5 model, as shown by its F1 scores, underscores the importance of domain-specific fine-tuning in enhancing relation extraction capabilities for acupuncture-related tasks. In light of the findings from this study, it offers valuable insights into leveraging LLMs for developing clinical decision support and creating educational modules in acupuncture.

Conclusion

This study underscores the effectiveness of LLMs like GPT and Llama in extracting relations related to acupoint locations, with implications for accurately modeling acupuncture knowledge and promoting standard implementation in acupuncture training and practice. The findings also contribute to advancing informatics applications in traditional and complementary medicine, showcasing the potential of LLMs in natural language processing.

Keywords: acupuncture points, large language model, GPT, relation extraction, prompt tuning

Introduction

Acupuncture, originating from ancient Chinese medicine, has a history dating back thousands of years.1,2 It was introduced to the United States in the 1970s, gaining gradual acceptance within the medical community for its therapeutic effects.3,4 According to the World Health Organization 2019 report, acupuncture is the most widely used traditional and complementary medicine, practiced in 113 out of 120 countries.5

Acupuncture involves the insertion of thin needles into specific points on the body known as acupoints, which are believed to correspond to channels that conduct Qi, or vital energy.6,7 Clinical studies have shown acupuncture to be effective in treating a variety of conditions, including chronic pain, migraines, and osteoarthritis.8–13 It is also used to alleviate symptoms associated with chemotherapy, such as nausea and vomiting.14 The World Health Organization recognizes acupuncture as a viable treatment option for a range of disorders, including respiratory, digestive, and neurological conditions.15–17

The rationale behind acupuncture lies in its emphasis on the precise location of acupoints.18,19 These points are located along meridians, or energy pathways, where Qi flows.20 In practice, the accurate location of acupoints is essential to ensure the proper flow of Qi, and acupuncturists need to understand the anatomical structures of the body to accurately locate acupuncture points and ensure the safety and effectiveness of acupuncture procedures.21 Moreover, acupuncture localization also adheres to the concept of “body cun” (同身寸), where acupuncture points are determined based on certain lengths on the patient’s body surface. Acupuncture therapy does not rely on fixed physical dimensions but adjusts according to the patient’s body characteristics. While some acupoints are easily identifiable, others may be more challenging to locate, especially for beginners.22 Acupuncturists undergo extensive training to master the palpation techniques and anatomical knowledge necessary for accurate point location. Several studies suggested that the accuracy and precision vary when locating acupuncture points using different methods.23 Given the variance in treatment outcomes among these point location techniques, it is essential to leverage informatics technologies to structuralize and computerize the most critical point location knowledge for assisting acupuncture training and practice.

While there are numerous studies on relation extraction (RE) from biomedical literature, none have yet delved into RE from textual acupuncture knowledge. He et al used the CHEMPROT dataset from BioCreative VI and the drug-drug interaction (DDI) dataset to develop a specialized prompt tuning model for biomedical RE, showing its effectiveness in few-shot learning.24 El-Allaly et al propose ADERel, an attentive joint model with a transformer-based weighted GCN for extracting adverse drug event (ADE) relations.25 ADERel formulates the ADE RE task as an N-level sequence labeling, leveraging contextual and structural information.25 The performance of traditional machine learning and deep learning models for RE can be limited by several factors. One major challenge is accurately capturing complex relationships between entities, especially in context-dependent or inter-sentence scenarios, leading to reduced performance in tasks requiring high precision and recall.26 Additionally, model performance can suffer from inadequate or noisy training data, affecting generalization.26,27 Moreover, the reliance on large pre-trained language models such as Bidirectional Encoder Representations from Transformers (BERT) or Long Short-Term Memory (LSTM), can also pose challenges, as these models may not be well-suited for all RE tasks and may require extensive fine-tuning and adaptation.28

Large language models (LLMs) have revolutionized natural language processing (NLP) by leveraging deep learning techniques to understand and generate human-like text.29,30 Among these models, Llama, which includes several iterations like Llama 2 and the latest Llama 3 has made substantial contributions to NLP by offering robust performance across various tasks. The Generative Pre-trained Transformer (GPT) stands out as another prominent example. Developed by OpenAI, GPT has gained widespread attention for its ability to perform a variety of language tasks, including text completion, translation, and summarization.31,32 What sets GPT apart is its architecture, which is based on transformer neural networks.33 This architecture allows GPT to effectively capture long-range dependencies in text, enabling it to generate coherent and contextually relevant responses.34,35 Li et al evaluated multiple pre-trained and fine-tuned LLMs on their ability to extract adverse events (AEs) using notes from the Vaccine Adverse Event Reporting System (VAERS), which has shown LLMs’ capability in named entity recognition (NER) tasks.36 However, few studies have explored the use of LLMs for extracting relations between entities, especially for a specific medical domain. To investigate the ability of LLMs in modeling acupuncture knowledge, we launched this study as an in-depth extension of our prior work in acupuncture knowledge extraction, which focused on extracting acupuncture point location entities based on the World Health Organization Standard Acupuncture Point Locations in the Western Pacific Region (WHO Standard).37

In this study, our objective is to leverage the power of LLMs for RE in the context of acupuncture points and human anatomy. We also aim to compare the performance between pre-trained and fine-tuned LLMs. This comparative analysis will provide insights into the effectiveness of using GPT and Llama for RE tasks in the domain of acupuncture, and highlight the potential of LLMs to advance modeling of acupuncture knowledge.

Methods

Figure 1 shows the workflow of our study, which aimed to extract relations between acupuncture points using various models. We utilized data from the WHO Standard as the data source. We manually annotated and studied 5 types of relations. Four models were studied: pre-trained GPT-3.5-turbo, fine-tuned GPT-3.5-turbo, pre-trained GPT-4, and pre-trained Llama 3.

Figure 1.

The figure illustrates the workflow of a study aimed at extracting relationships between acupuncture points using various models. The process starts with the WHO Standard Acupuncture Point Locations as the data source. Manual annotation is performed to identify entities such as Acupoint, Anatomy, Direction, Distance, and Subpart. These entities are connected by relationships like direction_of, distance_of, part_of, near_acupoint, and located_near. The user provides prompts and the annotated data to pre-trained and fine-tuned models. The predicted results are then evaluated based on precision, recall, and F1 score to measure the accuracy of the extracted relationships. The figure visually represents the workflow from annotation to model fine-tuning, inference, and evaluation, highlighting the integration of large language models with traditional data sources for acupuncture research.

Overview of the framework.

Data source

We utilized WHO Standard as the data source, a key resource released in 2008 to standardize acupuncture point locations.37,38 It elaborates on the locations of 361 acupoints across 14 systems and specifies the methodology to locate acupoints of the human body.37,38 This standard aims to improve scientific communication and evidence-based practice in acupuncture, enhancing the accessibility and value of Eastern medicine in healthcare.37,38

Annotation

In this study, we used the annotated entities from the research carried out by Li et al.37 There are 5 types of entities associated with this task: Acupoint, Anatomy, Direction, Distance, and Subpart. To precisely locate acupoints, we investigated 5 relation types: direction_of indicates the Direction of one relative Acupoint or Anatomy to the Acupoint of interest, distance_of denotes the Distance to the Acupoint/Anatomy, part_of represents the Subpart of the Anatomy, near_acupoint refers to adjacency to the relative Acupoint, and located_near signifies proximity of the Anatomy to the Acupoint of interest (shown in Table 1). The total annotated relation pairs are 3174. All annotations were performed using CLAMP (version 1.6.6).39

Table 1.

Types of relations annotated for SP8 (Diji) in WHO Standard.

Relation type Definition Example
direction_of the Direction of one relative Acupoint or Anatomy to the Acupoint of interest graphic file with name ocae233ilf1.jpg
distance_of the Distance to the Acupoint/Anatomy graphic file with name ocae233ilf2.jpg
part_of the Subpart of the Anatomy graphic file with name ocae233ilf3.jpg
near_acupoint adjacency to the relative Acupoint graphic file with name ocae233ilf4.jpg
located_near proximity of the Anatomy to the Acupoint of interest graphic file with name ocae233ilf5.jpg

In our annotation examples, acupoints were described using detailed spatial relationships, such as SP8 (Diji): “On the tibial aspect of the leg, posterior to the medial border of the tibia, 3 B-cun inferior to SP9.” This narrative includes various key relationships, as illustrated in Table 1. For instance, the terms “inferior to” and “SP9” describe the acupoint’s position below SP9, highlighting the “direction_of” relationship. The distance between “3 B-cun” and “SP9” is expressed as a “distance_of” relation, indicating that SP8 is located 3 B-cun inferior to SP9. Furthermore, relationships such as “tibial aspect of” and “leg”, and “medial border of” and “tibia,” provide additional context by specifying which part of the body the acupoint is located on and its proximity to specific anatomical landmarks, representing “part_of” relationships. Additionally, the “near_acupoint” relation is demonstrated by the proximity of “SP8” and “SP9”, indicating that these acupoints are close to each other. Lastly, the “located_near” relation is exemplified by “SP8” being near both the “leg” and the “tibia”, highlighting its close proximity to these anatomical structures.

Two domain experts were invited to perform the annotation task. To evaluate the inter-annotator agreement, we employed Cohen’s kappa statistic for each relation type. The kappa scores provided a quantitative measure of consistency between the annotators’ evaluations. In cases where conflicts arose, the annotators engaged in discussions and negotiations to resolve discrepancies, ultimately reaching a consensus. This process ensured the construction of a reliable and consistent gold standard for our study.

Model and experiment setup

In this study, we explored the effectiveness of pre-trained GPT-3.5, fine-tuned GPT-3.5, pre-trained GPT-4, and Llama 3 models in extracting acupoint-based relations. GPTs, including models like GPT-3.5 and GPT-4, are Transformer-based models pre-trained on vast text corpora.40 They advance in architecture and parameter size, generating coherent, contextually relevant text for diverse NLP tasks like NER, question answering, translation, and summarization.40–42 Llama 3, developed by Meta, enhances natural language understanding and generation, improving upon Llama 2 in architecture, efficiency, and performance. It supports 8-70 billion parameters and processes sequences up to 8192 tokens, handling complex tasks effectively.43,44

In the experiment setup, the dataset consisting of 361 acupoints was randomly divided into training and test sets in an 8:2 ratio. This division ensured that 80% (N = 288) of the acupoints were allocated to the training set for model training, while the remaining 20% (N = 73) were reserved for evaluating the model’s performance on unseen data in the test set.

We further analyzed the dataset to examine the distribution of each relation type within the training and test sets. This analysis provided insights into the balance of relation types and ensured that the model was trained and evaluated on a diverse set of relations, contributing to its robustness and generalization ability. The counts of each relation type in the training and test sets are summarized in Table 2 below.

Table 2.

Relation statistics in the training and test set.

Relation type Training Test Total
direction_of 694 160 854
distance_of 238 66 304
part_of 467 107 574
near_acupoint 154 49 203
located_near 992 247 1239

GPT

In Table S1, we present the key parameters used for the GPT models in our study. For all GPT models, including the pre-trained GPT-3.5, fine-tuned GPT-3.5 and pre-trained GPT-4, we set the temperature parameter to 0.3 and the maximum number of tokens to 4096. In the case of fine-tuned GPT-3.5, the number of epochs (value = 2), batch size (value = 1), and learning rate multiplier (value = 1) were used during the fine-tuning process. These parameters were chosen to balance the model’s performance and computational efficiency, ensuring effective training and inference for RE tasks.

Llama

In this study, we utilized the Llama 3 model, specifically the Llama 3-8b instruct variant, to perform our experiments. This model configuration includes 8 billion parameters, making it a robust choice for our NLP tasks. We set the maximum sequence length to 8192 tokens, allowing the model to handle extensive and detailed input data effectively. The maximum batch size was set to 6, which optimizes the computational efficiency and ensures that the model processes data in manageable chunks without compromising performance. This setup facilitated our exploration of spatial RE within the context of traditional Chinese medicine (TCM) and acupuncture, leveraging the advanced capabilities of Llama 3 to achieve accurate and contextually relevant results.

Prompt

In this study, we used the pretrained GPT and Llama model for each relation type using specific prompts tailored to extract the corresponding relations. For instance, the prompt for the “direction_of” relation was formatted as follows:

"Textual information is below.\n——————————-\n{txt}\n——————————-\nEntity information is below(Format is:'T(id)\t(entity_type) (start_offset) (end_offset)'\t'(entity_text)'){entities}\nQuery:Now we have to conduct the relation extraction task. We want to extract 'direction_of' first. 'direction_of' refers to the direction of one relative 'Acupoint'or 'Anatomy' to the 'Acupoint'. The starting entity type of 'direction_of' is 'Direction' entity and the end entity of 'direction_of' is the relative 'Acupoint' or 'Anatomy' entity. Entities({entities}) have been annotated. And each entity is formatted as "'Entity sequence number''\t''Entity type'' ''starting character location'' ''end character location''\t''Entity'". To facilitate output extraction, please output the relation directly and format the relation as "'Relation sequence number(Starting from R1, R2,…)''\t''direction_of'' ''Arg1:Entity sequence number'' ''Arg2:Entity sequence number''\n'"from the following text. If no relations, please do not output anything. "

In Figure 2, we provided a training example for the “direction_of” RE using the acupoint BL12 as an illustrative case. This approach allowed us to fine-tune the GPT model for each relation type, enhancing its ability to extract specific relations accurately and efficiently from textual knowledge.

Figure 2.

The figure provides an example of how a large language model is trained to extract the “direction_of” relationship between entities in text. It shows the input prompt given to the model, which includes instructions and relevant text about acupuncture points and anatomical directions. The model’s expected output, labeled as “Annotations,” correctly identifies and formats the directional relationships between the entities mentioned in the text.

One example of training set for “direction_of” relation of BL12 (Fengmen).

Fine-tuning and inference for the pre-trained GPT models were conducted on a server equipped with 8 Nvidia A100 GPUs, each with a memory capacity of 80 GB. In contrast, Llama was evaluated on a server with 5 Nvidia V100 GPUs, each providing a memory capacity of 32 GB.

Evaluation

In evaluating the performance of our model, we used our annotations as the gold standard. The micro-average approach calculates the total true positives, false positives, and false negatives between the gold standard and the predicted results across all classes and then computes precision, recall, and F1 score. The formulas for precision, recall, and F1 scores are shown below:

Precision = True positiveTrue positive+False positive,
Recall = True positiveTrue positive+False negative,
F1 = 2×Precision×RecallPrecision+Recall.

Results

Table 3 shows high inter-rater agreement for all relation types, with kappa statistics ranging from 0.87 to 1.00 and an overall value of 0.91.

Table 3.

Inter-rater agreement.

Relation type Kappa statistics
direction_of 0.91
distance_of 1
part_of 0.87
near_acupoint 0.94
located_near 0.92
overall 0.91

The performance of different models is displayed in Figure 3. Figure 3A shows the precision results for different models across various relation types. The fine-tuned GPT 3.5 model achieved the highest precision of 0.96 for the “direction_of” relation, demonstrating its accuracy in identifying relative acupoint directions. It also performed well for the “located_near” relation with a precision of 0.95. In contrast, the Llama 3 model exhibited lower precision across all relation types. Overall, the fine-tuned GPT 3.5 model had the highest micro-average precision of 0.91.

Figure 3.

The figure presents the performance of various models—pretrained GPT-3.5, fine-tuned GPT-3.5, pretrained GPT-4, and Llama 3—across different relation type extraction in acupuncture research. It is divided into three subfigures: (A) Precision, (B) Recall, and (C) F1 score. Each subfigure shows bar charts comparing the performance of the models on relations like direction_of, distance_of, part_of, near_acupoint, and located_near, as well as the overall micro-average performance across all relations.

Performance on each relation type by different large language models. Note: Scores were averaged after 10 runs. (A) Precision. (B) Recall. (C) F1.

Figure 3B highlights the recall results, showing that the fine-tuned GPT 3.5 model achieved the highest recall for most relation types, particularly excelling in “direction_of” and “located_near” relations with recalls of 0.99 and 0.97, respectively. The pre-trained GPT-4 model had lower recall scores, suggesting difficulty in capturing acupoint-related details. The Llama 3 model had varied recall performance, higher for some relations like “direction_of” and “part_of” but lower for others. The fine-tuned GPT 3.5 model demonstrated the highest micro-average recall of 0.94.

Figure 3C presents the F1 scores, with the fine-tuned GPT 3.5 model achieving the highest scores across most relation types, especially in “direction_of” and “located_near” with F1 scores of 0.97 and 0.96, respectively. The pre-trained GPT-4 model had lower F1 scores, while the Llama 3 model showed varied performance. Overall, the fine-tuned GPT 3.5 model had the highest micro-average F1 score of 0.92, highlighting its effectiveness in RE for acupuncture.

Discussion

The proposed study provides a comprehensive analysis of RE tasks. Our study compared the performance of LLMs in RE tasks related to acupoint locations. We also examined the performance difference between pre-trained LLMs and fine-tuned GPT 3.5 model.

Our results indicate that the fine-tuned GPT-3.5 model outperformed pretrained GPT and Llama models across all evaluated relation types. Specifically, the fine-tuned GPT-3.5 model achieved the highest F1 scores, demonstrating its effectiveness in accurately extracting relations related to acupoints. Furthermore, we observed that pre-trained GPT models, while showing strong performance in general language tasks, did not perform as well as fine-tuned GPT models in RE for acupuncture point locations. This suggests that fine-tuning the GPT model on domain-specific data significantly improves its ability to extract complex body location relations related to acupoints.

Additionally, fine-tuning the GPT model on domain-specific data enables it to learn the nuances and complexities of RE in acupuncture, leading to improved performance compared to models that are not fine-tuned. Fine-tuning allows the model to adapt its weights and parameters to the specific characteristics of the acupuncture RE task, enhancing its ability to extract accurate and contextually relevant relations.

Our study contributes to the field of healthcare informatics, as well as traditional and complementary medicine, in several significant ways. Firstly, our information extraction approach facilitates the automatic discovery of acupuncture knowledge, enriching the existing body of knowledge and potentially uncovering novel insights into acupoint relationships and their therapeutic effects. Accurate acupoint locations are vital for achieving desired therapeutic effects, and our work contributes to a more accurate and reliable foundation for acupuncture practice.23,45 Secondly, we provide a comparative analysis of the performance differences between LLMs, as well as between pre-trained LLMs and fine-tuned LLMs in the domain of information extraction. This comparative study not only sheds light on the effectiveness of using LLMs like GPT and Llama 3 for RE tasks but also provides insights into the benefits of fine-tuning models for specific domains, such as acupuncture. While it is well-known that fine-tuning generally improves performance, our study quantifies these improvements in the specific context of acupuncture-related information extraction, offering a more comprehensive understanding of how different LLMs perform on domain-specific tasks. This can inform future research and practical applications by highlighting the specific gains achievable through fine-tuning LLMs in specialized fields. Thirdly, our study significantly advances the field of spatial RE by leveraging and fine-tuning LLMs, an approach that has not been explored in the context of TCM before. Unlike conventional anatomical RE, which often relies on established Western medical concepts and structured anatomical hierarchies, TCM involves unique acupoints and meridians with distinct, complex relationships that do not directly align with Western anatomical frameworks, necessitating tailored NLP techniques for accurate understanding.37,46,47 Our approach, using LLMs fine-tuned for TCM, offers a novel contribution by adapting state-of-the-art NLP techniques to this specialized domain. The application of LLMs to this domain not only demonstrates their versatility but also sets a new benchmark for NLP applications in alternative medicine.

From the clinical application perspective, our study offers important insights into the use of LLMs in clinical settings, particularly for developing acupuncture clinical decision support (CDS) tools. Integrating extracted acupoint location relations into these tools can help acupuncturists with precise acupoint selection and real-time recommendations tailored to patient-specific factors, enhancing treatment precision and effectiveness while providing personalized, context-aware guidance to improve clinical outcomes. Furthermore, the findings from our study can be utilized to develop educational modules and training programs for acupuncturists. These programs can include interactive tools and simulations that demonstrate the relationships between acupoints and anatomical structures. By incorporating these insights into training curricula, precise acupoint locations can improve the accuracy and effectiveness of acupuncture practice. The use of advanced technologies such as virtual reality and augmented reality in these educational tools can provide acupuncturists with immersive learning experiences, enabling them to master complex concepts and techniques more effectively.

One of the key strengths of our study is the utilization of the WHO Standard as our corpus. This widely accepted resource provides a formal and standardized conceptual framework for acupuncture point locations, ensuring the accuracy and reliability of our results. Moreover, our study highlights the potential of LLM and their fine-tuning for RE in traditional and complementary medicine domains, showcasing their adaptability and versatility.

Limitations of this study include the use of a specific dataset with limited size and format that may not fully represent the diversity of textual knowledge related to acupuncture, and the challenges of manual annotation prone to error and bias. Additionally, traditional cross-validation techniques to the fine-tuning of GPT models is not supported. Instead, we addressed this limitation by conducting multiple runs and averaging the scores. This approach aims to provide a more reliable estimate of model performance while acknowledging the inherent limitations of fine-tuning GPT models. Future work would focus on domain adaptation and acupuncture-related knowledge integration to improve model performance in textual knowledge.

Error analysis

We conducted an error analysis of the fine-tuned GPT-3.5 model’s performance in relation identification, as shown in Table 4. The analysis revealed varying error rates across different relation types, with notable findings in the “near_acupoint” relation. This relation type exhibited a high false positive rate of 32.5% and a relatively high false negative rate of 26.53%. These results suggest that the model struggles with accurately identifying acupoints that are close to each other, possibly due to the complexity of spatial relationships and the nuanced context required for such distinctions.

Table 4.

Error analysis for fine-tuned GPT-3.5.

False positive (out of machine annotated entities) False negative (out of human annotated entities) Incorrect relation type (out of machine annotated entities)
direction_of 11/168, 6.55% 1/160, 0.63% 0/168, 0%
distance_of 13/75, 17.33% 1/66, 1.52% 0/75, 0%
part_of 16/119, 13.45% 1/107, 0.93% 0/119, 0%
near_acupoint 13/40, 32.5% 13/49, 26.53% 0/40, 0%
located_near 13/252, 5.16% 4/247,1.62% 1/252, 0.4%

Additionally, all classes showed high false positive rates compared to the other 2 error types—false negative and incorrect relation type. Despite these challenges, the model demonstrated a low error rate in assigning incorrect entity types, indicating a strong understanding of entity types in the context of acupoint descriptions. These findings highlight the model’s strengths and weaknesses in relation identification, pointing to areas for improvement, particularly in spatial relation understanding.

Based on the detailed error analysis of the fine-tuned GPT-3.5 model in RE for acupoint descriptions, several key observations and patterns were uncovered. The model occasionally fails to accurately identify adjacent acupoints, as seen in the description of ST38 (shown in Figure 4) where it misses the “near_acupoint” relationship between ST38 and ST35 or ST41. This indicates a difficulty in understanding complex spatial relationships in complex descriptions.

Figure 4.

The figure presents an error analysis for the acupuncture point ST38 (Tiaokou). The figure is divided into two subfigures: (A) shows the gold standard annotation for ST38 (Tiaokou), illustrating the correct relations between entities. (B) shows the relation predictions made by the fine-tuned GPT-3.5 model, highlighting where the model’s predictions align with or diverge from the gold standard annotations. This comparison is used to assess the accuracy and areas of improvement for the model’s performance in relation extraction for acupuncture points.

Error analysis for ST38 (Tiaokou).

The model also struggles with identifying relationships that span across sentences, as seen in the description of ST31 (“St31 is located at the deepest point in the depression inferior to the apex of this triangle. Note 2: ST31 is located at the intersection of the line connecting the lateral end of the base of the patella with the anterior superior iliac spine, and the horizontal line of the inferior border of the pubic symphysis.”) where it incorrectly identifies a “direction_of” relationship between “inferior to” and “anterior superior iliac spine.” This error may stem from the model’s inability to effectively track and integrate information across multiple sentences. It may lack the contextual understanding which causes it to recognize that “inferior to” relates to “the anterior superior iliac spine.” Additionally, in the description of LR10 (“LR10: On the medial aspect of the thigh, 3 B-cun distal to St30, over the artery.”), the model incorrectly identifies a “part_of” relationship between “medial aspect of” and “thigh.” This error suggests that the model may struggle with accurately identifying anatomical relationships between acupoints and body parts, possibly due to a lack of profound understanding of anatomical structures and their spatial relationships.

While less frequent than false positives, false negatives also occur, as seen in the description of BL38 (“BL38: On the posterior aspect of the knee, just medial to the biceps femoris tendon, 1 B-cun proximal to the popliteal crease. Note: With the knee in slight flexion, BL38 is located medial to the biceps femoris tendon, 1 B-cun proximal to BL39.”) where the model fails to detect the “located_near” relationship between “BL38” or “BL39” and “knee.” This indicates a limitation in the model’s ability to capture subtle spatial relationships. The errors may be due to a lack of comprehensive knowledge about acupoints, preventing it from integrating acupoint information with anatomical knowledge effectively. Additionally, the model may not have been trained on a sufficiently large corpus of acupoint-related data, limiting its ability to learn complex relationships. Relations between anatomical structures and acupoints are inherently complex and may be challenging for the model to detect accurately.

These errors highlight limitations in the model’s ability to contextualize information over multiple sentences and to accurately interpret complex anatomical relationships. To address these issues, it may be beneficial to expand the model’s knowledge base on acupoints, expand its training data with a larger and more diverse corpus, and enhance its ability to comprehend context over multiple sentences. Improving the model’s contextual understanding and strengthening its knowledge of anatomical structures and their relationships could also help mitigate these errors and elevate its performance in RE for acupoint descriptions, upgrading its accuracy and reliability in this domain.

Conclusion

This study underscores the effectiveness of LLMs like GPT in extracting relations related to acupoint locations, highlighting their value in the acupuncture domain. By utilizing LLMs, we have demonstrated the potential for accurate modeling of acupoint location knowledge and further promoting precise locating of the acupoints during acupuncture practice. The findings also contribute to advancing informatics applications in traditional and complementary medicine, showcasing the potential of LLMs in NLP.

Supplementary Material

ocae233_Supplementary_Data

Contributor Information

Yiming Li, McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, United States.

Xueqing Peng, Department of Biomedical Informatics and Data Science, School of Medicine, Yale University, New Haven, CT 06510, United States.

Jianfu Li, Department of Artificial Intelligence and Informatics, Mayo Clinic, Jacksonville, FL 32224, United States.

Xu Zuo, McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, United States.

Suyuan Peng, Institute of Information on Traditional Chinese Medicine, China Academy of Chinese Medical Sciences, Beijing 100010, China.

Donghong Pei, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, United States.

Cui Tao, Department of Artificial Intelligence and Informatics, Mayo Clinic, Jacksonville, FL 32224, United States.

Hua Xu, Department of Biomedical Informatics and Data Science, School of Medicine, Yale University, New Haven, CT 06510, United States.

Na Hong, Department of Biomedical Informatics and Data Science, School of Medicine, Yale University, New Haven, CT 06510, United States.

Author contributions

Yiming Li and Xueqing Peng designed the study. Yiming Li developed the pipeline. Yiming Li and Jianfu Li built the model, and Yiming Li performed visualization. Cui Tao and Xu Zuo offered technical resource support. Donghong Pei contributed to data collection. Na Hong and Suyuan Peng participated in data annotation. Yiming Li drafted the manuscript. Hua Xu and Na Hong supervised the study, and Na Hong critically revised the manuscript.

Supplementary material

Supplementary material is available at Journal of the American Medical Informatics Association online.

Funding

This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.

Conflicts of interest

The authors have no competing interests to declare.

Ethics approval and consent to participate

Not applicable.

Code availability

The code underlying this article will be shared on reasonable request to the corresponding author.

Data availability

The data underlying this article will be shared on reasonable request to the corresponding author.

References

  • 1. Mallory MJ, Do A, Bublitz SE, et al. Puncturing the myths of acupuncture. J Integr Med. 2016;14(5):311-314. 10.1016/S2095-4964(16)60269-8 [DOI] [PubMed] [Google Scholar]
  • 2. Li AR, Andrews L, Hilts A, et al. Efficacy of acupuncture and moxibustion in alopecia: a narrative review. Front Med (Lausanne). 2022;9:868079. 10.3389/fmed.2022.868079 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Nasir LS. Acupuncture. Prim Care. 2002;29(2):393-405. 10.1016/s0095-4543(01)00007-0 [DOI] [PubMed] [Google Scholar]
  • 4. Stone JAM. The status of acupuncture and oriental medicine in the United States. Chin J Integr Med. 2014;20(4):243-249. 10.1007/s11655-014-1776-0 [DOI] [PubMed] [Google Scholar]
  • 5. Organization WH. WHO Global Report on Traditional and Complementary Medicine 2019. World Health Organization; 2019. [Google Scholar]
  • 6. Wang F. Comments on the definition of “acupuncture science”. Zhongguo Zhen Jiu. 2017;37(12):1333-1336. 10.13703/j.0255-2930.2017.12.021 [DOI] [PubMed] [Google Scholar]
  • 7. Zhang W-B, Jia D-X, Li H-Y, et al. Understanding Qi running in the Meridians as interstitial fluid flowing via interstitial space of low hydraulic resistance. Chin J Integr Med. 2018;24(4):304-307. 10.1007/s11655-017-2791-3 [DOI] [PubMed] [Google Scholar]
  • 8. Kelly RB, Willis J.. Acupuncture for pain. Am Fam Physician. 2019;100(2):89-96. [PubMed] [Google Scholar]
  • 9. Witt C, Brinkhaus B, Jena S, et al. Acupuncture in patients with osteoarthritis of the knee: a randomised trial. Lancet. 2005;366(9480):136-143. 10.1016/S0140-6736(05)66871-7 [DOI] [PubMed] [Google Scholar]
  • 10. Luo Y, Yang M, Liu T, et al. Effect of hand-ear acupuncture on chronic low-back pain: a randomized controlled trial. J Tradit Chin Med. 2019;39(4):587-598. [PubMed] [Google Scholar]
  • 11. Yang M, Baser RE, Liou KT, et al. Effect of acupuncture versus usual care on sleep quality in cancer survivors with chronic pain: secondary analysis of a randomized clinical trial. Cancer. 2023;129(13):2084-2094. 10.1002/cncr.34766 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Zhang L, Yuan H, Zhang L, et al. Effect of acupuncture therapies combined with usual medical care on knee osteoarthritis. J Tradit Chin Med. 2019;39(1):103-110. [PubMed] [Google Scholar]
  • 13. Tastan K, Ozer Disci O, Set T.. A comparison of the efficacy of acupuncture and hypnotherapy in patients with migraine. Int J Clin Exp Hypn. 2018;66(4):371-385. 10.1080/00207144.2018.1494444 [DOI] [PubMed] [Google Scholar]
  • 14. Morehead A, Salmon G.. Efficacy of acupuncture/acupressure in the prevention and treatment of nausea and vomiting across multiple patient populations: implications for practice. Nurs Clin North Am. 2020;55(4):571-580. 10.1016/j.cnur.2020.07.001 [DOI] [PubMed] [Google Scholar]
  • 15. Schwartz C. Chronic respiratory conditions and acupuncture therapy. Probl Vet Med. 1992;4(1):136-143. [PubMed] [Google Scholar]
  • 16. Xiao L-Y, Wang X-R, Yang Y, et al. Applications of acupuncture therapy in modulating plasticity of central nervous system. Neuromodulation. 2018;21(8):762-776. 10.1111/ner.12724 [DOI] [PubMed] [Google Scholar]
  • 17. Diehl DL. Acupuncture for gastrointestinal and hepatobiliary disorders. J Altern Complement Med. 1999;5(1):27-45. 10.1089/acm.1999.5.27 [DOI] [PubMed] [Google Scholar]
  • 18. Wang M, Liu W, Ge J, et al. The immunomodulatory mechanisms for acupuncture practice. Front Immunol. 2023;14:1147718. 10.3389/fimmu.2023.1147718 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. Ma Q. Somatotopic organization of autonomic reflexes by acupuncture. Curr Opin Neurobiol. 2022;76:102602. 10.1016/j.conb.2022.102602 [DOI] [PubMed] [Google Scholar]
  • 20. Longhurst JC. Defining Meridians: a modern basis of understanding. J Acupunct Meridian Stud. 2010;3(2):67-74. 10.1016/S2005-2901(10)60014-3 [DOI] [PubMed] [Google Scholar]
  • 21. Xie D, Chen R.. The two-step location method of acupoint in Internal Canon of Medicine and its clinical application. Zhongguo Zhen Jiu. 2014;34(10):979-982. [PubMed] [Google Scholar]
  • 22. Casey GP. Locating specific acupoints large intestine 4 (LI4) and large intestine 6 (LI6) in cadavers using anthropometric and cun measurement systems. J Acupunct Meridian Stud. 2020;13(6):174-179. 10.1016/j.jams.2020.11.003 [DOI] [PubMed] [Google Scholar]
  • 23. Godson DR, Wardle JL.. Accuracy and precision in acupuncture point location: a critical systematic review. J Acupunct Meridian Stud. 2019;12(2):52-66. 10.1016/j.jams.2018.10.009 [DOI] [PubMed] [Google Scholar]
  • 24. He J, Li F, Li J, et al. Prompt tuning in biomedical relation extraction. J Healthc Inform Res. 2024;8(2):206-224. 10.1007/s41666-024-00162-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25. El-Allaly E-D, Sarrouti M, En-Nahnahi N, et al. An attentive joint model with transformer-based weighted graph convolutional network for extracting adverse drug event relation. J Biomed Inform. 2022;125:103968. 10.1016/j.jbi.2021.103968 [DOI] [PubMed] [Google Scholar]
  • 26. Li Y, Tao W, Li Z, et al. Artificial intelligence-powered pharmacovigilance: a review of machine and deep learning in clinical text-based adverse drug event detection for benchmark datasets. J Biomed Inform. 2024;152:104621. 10.1016/j.jbi.2024.104621 [DOI] [PubMed] [Google Scholar]
  • 27. Han X, Gao T, Lin Y, et al. More Data, More Relations, More Context and More Openness: A Review and Outlook for Relation Extraction. In: Wong K-F, Knight K, Wu H, eds. Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing. Suzhou, China: Association for Computational Linguistics; 2020:745-758.
  • 28. Mayfield E, Black AW.. Should you fine-tune BERT for automated essay scoring? In: Burstein J, Kochmar E, Leacock C, et al. , eds. Proceedings of the Fifteenth Workshop on Innovative Use of NLP for Building Educational Applications. Association for Computational Linguistics; 2020:151–162. [Google Scholar]
  • 29. Orrù G, Piarulli A, Conversano C, et al. Human-like problem-solving abilities in large language models using ChatGPT. Front Artif Intell. 2023;6:1199350. 10.3389/frai.2023.1199350 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30. Li J, Li Y, Pan Y,. et al. Mapping vaccine names in clinical trials to vaccine ontology using cascaded fine-tuned domain-specific language models. J Biomed Semantics. 2024;15(1):14. 10.1186/s13326-024-00318-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31. Pokale S, Taware K, Fernandes G, et al. Text summarization: GPT perspective. In: 2023 3rd Asian Conference on Innovation in Technology (ASIANCON).2023:1-7. 10.1109/ASIANCON58793.2023.10270778 [DOI]
  • 32. Li Y, Zhao J, Li M, et al. RefAI: a GPT-powered retrieval-augmented generative tool for biomedical literature recommendation and summarization. J Am Med Inform Assoc. 2024:ocae129. 10.1093/jamia/ocae129 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33. Gillioz A, Casas J, Mugellini E, et al. Overview of the transformer-based models for NLP tasks. In: 2020 15th Conference on Computer Science and Information Systems (FedCSIS).2020:179–83. 10.15439/2020F20 [DOI]
  • 34. Thakkar K, Jagdishbhai N.. Exploring the capabilities and limitations of GPT and Chat GPT in natural language processing. JMRA. 2023;10(1):18-20. 10.18231/j.jmra.2023.004 [DOI] [Google Scholar]
  • 35. Hu Y, Ameer I, Zuo X, et al. Zero-shot clinical entity recognition using ChatGPT. arXiv.org, May 15, 2023. Accessed March 08, 2024. 10.48550/arXiv.2303.16416 [DOI]
  • 36. Li Y, Li J, He J, et al. AE-GPT: using large language models to extract adverse events from surveillance reports-a use case with influenza vaccine adverse events. PLoS One. 2024;19(3):e0300919. 10.1371/journal.pone.0300919 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37. Li Y, Peng X, Li J, et al. Development of a natural language processing tool to extract acupuncture point location terms. In: 2023 IEEE 11th International Conference on Healthcare Informatics (ICHI).2023:344–51. 10.1109/ICHI57859.2023.00053 [DOI]
  • 38. Lim S. WHO Standard acupuncture point locations. Evid Based Complement Alternat Med. 2010;7(2):167-168. 10.1093/ecam/nep006 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39. Soysal E, Wang J, Jiang M, et al. CLAMP–a toolkit for efficiently building customized clinical natural language processing pipelines. J Am Med Inform Assoc. 2018;25(3):331-336. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40. Kalyan KS. A survey of GPT-3 family large language models including ChatGPT and GPT-4. Nat Lang Process J. 2024;6:100048. 10.1016/j.nlp.2023.100048 [DOI] [Google Scholar]
  • 41. Davier MV. Training Optimus Prime, M.D.: Generating Medical Certification Items by Fine-Tuning OpenAI’s gpt2 Transformer Model. arXiv.org, January 2019. 10.48550/arxiv.1908.08594 [DOI]
  • 42. Wang Q, Rose R, Orita N, et al. Automated generation of multiple-choice cloze questions for assessing English vocabulary using GPT-turbo 3.5. arXiv.org, March 04, 2024. Accessed March 20, 2024. https://arxiv.org/abs/2403.02078
  • 43. Masalkhi M, Ong J, Waisberg E, et al. A side-by-side evaluation of Llama 2 by meta with ChatGPT and its application in ophthalmology. Eye (Lond). 2024;38(10):1789-1792. 10.1038/s41433-024-02972-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44. Meta Llama 3. Meta Llama. Accessed July 21, 2024. https://llama.meta.com/llama3/
  • 45. Aird M. Variability in the Precision of Acupoint Location Methods. 2005. Accessed July 20, 2024. https://opus.lib.uts.edu.au/handle/10453/20108
  • 46. Allen WE. Terminologia anatomica: international anatomical terminology and terminologia histologica: international terms for human cytology and histology. J Anat. 2009;215(2):221. 10.1111/j.1469-7580.2009.1093_1.x [DOI] [PubMed] [Google Scholar]
  • 47. Mungall CJ, Torniai C, Gkoutos GV, et al. Uberon, an integrative multi-species anatomy ontology. Genome Biol. 2012;13(1):R5. 10.1186/gb-2012-13-1-r5 [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

ocae233_Supplementary_Data

Data Availability Statement

The data underlying this article will be shared on reasonable request to the corresponding author.


Articles from Journal of the American Medical Informatics Association : JAMIA are provided here courtesy of Oxford University Press

RESOURCES