Skip to main content
BMC Medical Education logoLink to BMC Medical Education
. 2025 Aug 22;25:1185. doi: 10.1186/s12909-025-07567-z

The role of generative AI tools in case-based learning and teaching evaluation of medical biochemistry

Liang Li 1,#, Weiwei Zhang 2,#, Kun Zhang 3, Yuhan Yang 3, Lan Wang 3, Luo Zuo 4, Yiran Sun 5,, Quekun Peng 6,
PMCID: PMC12372301  PMID: 40847414

Abstract

Background

Medical biochemistry, a fundamental course in medical education, has a complex and expanding knowledge base. Traditional teaching methods often fail to meet students’ needs for in-depth understanding and personalized learning. Students can become overwhelmed by the vast array of biochemical concepts, reactions, and molecular structures.

Objective

This study aims to explore the potential of generative AI tools as teaching assistants in medical biochemistry, particularly in CBL (Case-Based Learning) settings where their application is currently limited.

Methods

We conducted a comparative study involving a control group (N = 40) and an experimental group (N = 39) to assess the impact of AI tools on CBL learning. We analyzed students’ performance and compared evaluations of their work by both teachers and AI tools. Additionally, a questionnaire was used to gauge the effects of AI tools on case study learning.

Results

The experimental group using AI tools showed significantly better performance than the control group. The former completed case assignments faster (2.6 h vs. 5.5 h, P < 0.05) and achieved higher exam scores (77.3 ± 4.3 vs. 66.5 ± 5.4, P < 0.05). AI-based grading on students’ assignments closely matched teachers’ evaluations on them (P > 0.05), demonstrating reliability in assessment. Students rated AI highly for basic knowledge acquisition (Q4M = 9.18) but noted limitations in complex clinical reasoning (Q11M = 4.20) and innovative thinking (Q12M = 3.90). Key concerns of using AI included that AI reduced interaction between teachers and students (Q1M = 7.17) and standardized AI outputs led to homogenized learning (Q6M = 6.56). Despite these drawbacks, students’ acceptance of AI increased significantly after the trial (5.5 to 7.6, P < 0.05).

Conclusion

Generative AI tools have significantly enhanced learning efficiency and performance in CBL teaching of medical biochemistry, shortened task completion time and improved examination scores. Although there are limitations in the cultivation of innovative thinking and interaction between teachers and students, students’ acceptance of AI has increased. Therefore, AI should serve as a supplement to traditional teaching to balance the learning efficiency and creative thinking of students.

Supplementary Information

The online version contains supplementary material available at 10.1186/s12909-025-07567-z.

Keywords: Generative AI tool, Case-based teaching, Teaching evaluation, Medical biochemistry

Introduction

Medical education in China is currently undergoing rapid reform and innovation to meet the needs of the society. Under this trend, some medical schools have widely implemented the Excellence in Physician Education Program that requires medical schools to integrate the basic medical curriculum [1]. However, basic medical curriculum in China takes 2.5-3 years, shorter than that in Europe and the United States. Meanwhile, at this basic stage, knowledge acquisition and testing are the core tasks of medical education, because knowledge mastery and passing diverse progress tests are the core embodiment of the future doctor’s literacy [2, 3]. Therefore, how to master more solid knowledge in a shorter time is an important starting point for the reform and innovation of medical education in China. In addition, the rapid development of medical knowledge and the exponential growth of knowledge is a challenge for learners, who are required to study harder to meet the knowledge needed for success. In addition, the way physicians handle knowledge resources (e.g., literature searches) greatly influences their success in their specialty including the success in patient safety, diagnosis, and quality assurance [4]. Medical education requires medical students to learn these knowledge processing techniques at an early stage in order to integrate them into their academic and clinical practice.

The use of artificial intelligence in medical education is growing at an unprecedented pace and depth, significantly contributing to the transformation of traditional medical education models [5]. Generated artificial intelligence (GAI) is a new type of production method that automatically generates content using artificial intelligence technology [6]. In recent years, GAI has been widely used in various teaching scenarios, and its rapid iterative updating and functional enhancement will inevitably bring a great impact to the education system, and will also bring new opportunities and challenges to medical education [7]. How to combine GAI with medical education and make GAI help medical education has become a hot research topic in the field of medical education.

GAI has many advantages in medical education [8]. Users, who do not have any professional computer knowledge nor any related professional skills, only need to log in the computer or cell phone to the relevant Web site and input what they are interested before GAI can give contextualized and reasonable answers, easily realizing the deep interaction between users and computers. Secondly, the answers provided by GAI can be in various forms, such as text, pictures, PPT, audio, video, program code, excel tool sheet, analysis and correction assignments, etc., and GAI provides these answers in a very short time, which greatly reduces the time cost that the user has to spend on completing the work in the past.

Clinical biochemistry, as an important discipline in the field of medicine, has an urgent need for knowledge updating. With the deepening of medical research and the rapid development of technology, new disease markers and diagnostic indicators continue to emerge [9]. Clinical biochemistry courses have to incorporate these latest research results in a timely manner to ensure that students master the most cutting-edge diagnostic methods. For example, rapid advances in genetic testing technology have led to a deeper understanding of the genetic factors of disease, and the curriculum needs to be updated with relevant content so that students can understand how to apply genetic testing for early screening and diagnosis of disease. At the same time, drug development and therapeutic innovations are changing clinical practice [10]. Courses need to reflect the mechanisms of action of new drugs and their metabolism in the body so that students can provide patients with more precise drug treatment options [11]. For example, the widespread use of immunotherapeutic drugs in tumor therapy requires a curriculum that details the biochemical basis of their action. In addition, with the application of big data and artificial intelligence in medicine, the clinical biochemistry curriculum should also cover related knowledge and develop students’ ability to utilize these techniques for data analysis and disease prediction [12].

At present, ChatGPT is widely used in medical education. However, there are few researches of its application in medical course teaching. He et al. selected ChatGPT, an advanced natural language model, as a tool and applied ChatGPT to the design of biochemistry and molecular biology experimental teaching program, and the results showed that GAI can assist teachers in the material update and assignment evaluation, which improves the efficiency of the teachers’ preparation for lessons [13]. Surapaneni et al. utilized 10 clinical cases to evaluate the performance of ChatGPT in medical biochemistry. In the first test ChatGPT generated correct answers for 4 questions, and in the second attempt ChatGPT provided correct answers for 6 questions and incorrect answers for 4 questions. But surprisingly, Case 3 got different answers for several attempts. They attribute this situation to the complexity of the cases [14]. In addition, Ghosh et al. conducted an investigation into the ability of ChatGPT to solve higher-order problems related to medical biochemistry. 200 medical biochemistry reasoning questions that required higher-order thinking to solve were randomly selected from the question pool. Finally, the AI software answered all of the questions, but the median score of 4.0 was less than the hypothesized maximum of 5 and significantly different [15]. Although ChatGPT scores were not as high as expected, this is still an effective attempt to use ChatGPT in teaching clinical biochemistry.

This study conducted a comparative analysis and evaluation of the learning effects of students in two clinical medicine classes after using generative AI tool for case learning. Firstly, this study comparatively analyzed scores of students’ homework graded by AI tools and teachers. Secondly, it also investigated students’ satisfaction with the use of AI tools and the evaluation of the attitudes of students in the experimental group before and after using AI tools from both positive and negative aspects. Through the above methods, we attempt to explore the promoting effect and limitations of the GAI tool in clinical biochemical CBL, and provide new improvement strategies for the subsequent curriculum reform.

Methods

Participant recruitment and study background

In this study, a total of 89 students from two clinical medicine classes in the class of 2023 were randomly selected for the study. Class 1 had 45 students and Class 2 had 44 students. The two classes were formed based on the college entrance examination scores, gender and ethnicity at the time of enrollment to ensure that the number of students with high, medium and low scores in each class was similar. The male-to-female ratio was close to 1:1 and the ethnic distribution ratio was also close in both classes.

In order to prevent students’ personal preferences to AI from affecting the results of this study, we set up a simple question “Please evaluate the use of AI tools in the medical biochemistry course” for initial screening of the sample. Students responded on a scale of 0 to 10 points, with higher ratings indicating greater agreement with the use of AI tools and lower ratings indicating less agreement. Samples of students with ratings lower than 2 or higher than 8 were excluded to minimize the interference of extreme attitudes with the results of the study. A total of 79 participants in the experiment were obtained by screening through questionnaire analysis (Fig. 1). The control group (Class 1) had 40 students, 20 males and 20 females, with an age range of 18–22 years old and a mean age of 19 ± 0.9. The experimental group (Class 2) had 39 students, 20 males and 19 females, with a mean age of 18 ± 0.7. There were no significant differences in other characteristics such as age, gender, or English level between the two classes. The study was agreed and supported by the Education and Teaching Research Committee of Chengdu Medical College, and there were no teaching ethical issues.

Fig. 1.

Fig. 1

Teaching design of biochemistry and molecular biology course based on GAI tools

Selection of GAI teaching tools

Due to the registration restrictions of the ChatGPT tool in China, this study recommends the use of Kimi Chat 2.0, a generative AI tool with a relatively high usage rate in China at present.

CBL learning

The clinical case was collected from the faculty of the First People’s Hospital affiliated to our university and organized and edited by the biochemistry teaching team. The clinical case focuses on the physical examination, consultation, treatment and recovery process of a liver cancer patient (Fig S1). The questions discussed cover two segments: biochemistry and cell biology. The questions in the biochemistry section mainly cover five chapters: glucose metabolism, lipid metabolism, protein metabolism, blood biochemistry and liver biochemistry. The cell biology section mainly covers liver cancer indicators, oncogenes and oncogenes, tumor characteristics and mechanisms, and treatment options for liver cancer. Twelve scientific questions (Table 1) were set for the case. Both the control and experimental groups were divided into five groups of eight students each. The instructor distributed the case and questions online, and the students in both groups discussed the case in terms of basic concepts, extended knowledge, and tumor treatment strategy, and prepared a report-back assignment. The offline discussion session consisted of “question answering, group presentation, and inter-group error correction”. The control group used traditional methods to complete the assignments, while the experimental group used AI tools (Fig. 1). Except for not using the AI tool, the other processes of the control group are exactly the same as those of the experimental group.

Table 1.

Main problems in the liver cancer case

Category knowledge Questions for retrieval

C1 Glucose metabolism and blood metabolism

C2 lipid metabolism

C3 protein metabolism

C4 Characteristics and classification of hepatocellular carcinoma

Q1 Why is the patient’s blood glucose index high?

Q2 What is the abnormality of glucose metabolism in patients and what is the mechanism?

Q3 What about the patient’s blood routine test? How is it related to liver disease?

Q4 What is the clinical significance of the patient’s hemoglobin test

Q1 In the chief complaint, why did the patient have clinical manifestations of oil aversion?

Q2 The patient has complained of jaundice in the past illness. Please tell us the mechanism of jaundice and how to detect jaundice.

Q3 What is the significance of detecting jaundice for hepatocellular carcinoma?

Q4 What are the indicators of blood lipid detection? What is the value of this test for the diagnosis and clinical treatment of this patient?

Q1 What are the indicators of liver function test?

Q2 What is the physiological significance of liver function test?

Q3 What is the significance of urea, creatinine, uric acid and cystatin c tests in the physical examination? What is the significance of these indicators for liver cancer patients?

Q1 What are the clinical features of hepatocellular carcinoma?

Q2 What are the morphological characteristics of hepatocellular cancer cells?

Q3What are the pathologic classifications of liver cancer?

Q4 What clinical treatments can you propose for Mr. Huang in this case?

Teaching evaluation

The evaluation in this study is mainly divided into homework evaluation and final exam evaluation. The homework evaluation method is further divided into AI evaluation and teacher’s comprehensive evaluation. For the questions in Table 1, we required each group to complete four assignments, i.e., to answer the corresponding questions in the four classifications of C1, C2, C3 and C4, and to make the answers to the questions into the corresponding PPT. The control group prepared the corresponding answers through traditional methods, such as academic websites, textbooks, literature, etc. The experimental group was required to use AI to prepare answers to the questions and to complete the PPT task. Subsequently, the PPT assignments of the control group and the experimental group were scored by the teacher and the AI tools according to the assignment rubric (Table S1). The evaluation scores of the two groups were then analyzed. The final exam evaluation mainly focused on the comparative analysis of the final exam clinical case study questions and the total score of the test paper of the two groups of students.

Questionnaire survey and data analysis

We set up a questionnaire with two dimensions of positive and negative impacts (Table S2). The positive feedback questionnaire includes 12 questions (Q1-Q12) and is divided into four levels: (1) the difficulty of using AI tools (Q1-Q3); (2) the impact of AI tools on the utilization of teaching resources (Q4-Q6); (3) the impact of AI tools on the mastery of knowledge (Q7-Q9); (4) the effect of AI tools on the enhancement of the clinical case analysis ability (Q10-12). A score of 0–10 is set for each question. The scores reflect the degree of student agreement with each point, ranging from 0 to 10. Higher scores indicate stronger agreement. Specifically, a score of less than 5 indicates a low level of agreement with the viewpoint; a score of 5–8 indicates a moderate level of agreement; and a score of more than 8 indicates a high level of agreement with the viewpoint.

The negative feedback questionnaire includes six questions, which are mainly divided into three levels: (1) the impact of AI tools on teacher-student interaction (Q1-2); (2) the impact of AI tools on active learning (Q3-4): (3) the impact of AI tools on competitive learning (Q5-6). The questionnaires were released through the Super Star APP and collected online. In addition, we also researched online the time students spent completing case assignments and the number of times they sought help from teachers online.

Meanwhile, we used SPSS 22.0 to conduct validity and reliability analyses on the questionnaire data. Reliability analysis was conducted using Cronbach’s α coefficient to evaluate the internal consistency of the scale. An α ≥ 0.7 was considered acceptable. Relevant items were calculated and optimized by dimension. Validity analysis includes: (1) Content validity: three experts evaluated the representativeness of the items and calculated the content validity index (CVI ≥ 0.8 meets the standard); (2) Structural validity: common factors were extracted through exploratory factor analysis (EFA) (KMO ≥ 0.6, factor loading ≥ 0.5), and the model fit was tested by confirmatory factor analysis (CFA) (CFI > 0.9, RMSEA < 0.08). (3) Criterion validity: the correlation coefficient was calculated between the questionnaire score and external criteria (such as authoritative scales) (r ≥ 0.6 is significant). After data cleaning, the analysis was completed using SPSS22.0.

Continuous variables were expressed as (‘x ± s) and two-sample t-tests were used for comparisons between groups. Categorical data were expressed as percentages and tested with Pearson’s chi-square test. P < 0.05 was considered statistically significant difference.

Results

Consistency between the AI tool and the teachers’ comprehensive evaluation on the assignment evaluation results

By analyzing the homework scores of the two groups (Fig. 2A-D), we found that in the control group, there was no significant difference in homework scores between the comprehensive evaluation and the AI tool evaluation (P > 0.05). The same results existed for the experimental group. However, the task completion scores of the experimental group with the GAI tool were significantly higher than those of the control group (P < 0.05). This result indicates that the quality of the assignments with the AI tool is significantly higher than without the AI tool. However, based on the same grading rules, there is no statistically significant difference between the evaluation of the assignments by AI tools and the manual comprehensive evaluation, indicating that AI tools have the potential to replace the teacher’s evaluation of the assignments.

Fig. 2.

Fig. 2

A comparison of teacher and GAI tools in grading student assignments. S = Subjective evaluation, AI = AI evaluation. * P < 0.05; ** P < 0.01; *** P < 0.001; ns, no significance

The AI tool provides more detailed answers to case studies

We compared and analyzed the answers given by the control group and the experimental group to the two main clinical questions in the case, as shown in Table 2, and found that the experimental group gave more comprehensive and detailed answers. For example, for question 1 (What are the pathological and molecular mechanisms of hepatocellular carcinoma? ), the AI tool could not only give conventional answers, such as viral infection, cirrhosis, genetic factors, signaling pathway abnormalities, inflammatory microenvironment, etc., but also gave more cutting-edge answers. For example, tumor microenvironmental alterations, angiogenesis, epigenetic alterations, metabolic abnormalities and genomic instability. And in question 2 (What are the clinical and biological therapeutic options for the treatment of hepatocellular carcinoma? ) the AI tool also gave more specialized answers than the control group students. For example, in addition to the answers of surgical resection, liver transplantation, radiotherapy and chemotherapy, tumor vaccine, and immunosuppressant given by the control group, the AI group also gave more novel treatments, such as immune-combination therapy, neoadjuvant therapy, stem cell therapy, and multidisciplinary integrated therapy and postoperative follow-up. It can be seen that the AI tool cannot only provide more comprehensive answers, but also provide students with integrated basic and clinical knowledge by analyzing and integrating information through multiple data centers.

Table 2.

Comparison between traditional retrieval methods and AI tools to answer case questions

Questions Traditional retrieval means AI tool retrieval means

Pathogenesis and molecular mechanism of hepatocellular carcinoma

Methods of clinical and biological treatment of hepatocellular carcinoma

1) Hepatitis virus infection;

2) Cirrhosis;

3) Gene mutation;

4) Chemical carcinogens;

5) Signaling pathway abnormalities;

6) Immunomodulatory imbalance;

7) Inflammatory microenvironment

1) Surgical resection and liver transplantation;

2) local ablative therapy;

3) Transcatheter arterial chemoembolization (TACE);

4) Radiotherapy;

5) Systemic chemotherapy;

6) Immune checkpoint inhibitor therapy;

7) Tumor vaccines;

8) Cellular immunotherapy

1) Chronic hepatitis and cirrhosis;

2) Hepatocyte injury and regeneration;

3) Genetic and epigenetic changes;

4) Tumor microenvironment;

5) Metabolic abnormalities;

6) Inflammation and immune escape;

7) Abnormal activation of signaling pathways;

8) Angiogenesis;

9) Genomic instability;

10) Environmental and lifestyle factors

1) Targeted therapies;

2) Immunotherapy;

3) Combination therapy;

4) Localized therapy;

5) Neoadjuvant and adjuvant therapies;

6) Stem cell therapy;

7) Multidisciplinary integrated treatment;

8) Post-operative follow-up

AI tool reduces case study time and improves student test scores

AI software can quickly filter out content relevant to students’ needs from a large number of learning resources, helping students acquire the knowledge they need more quickly and reducing the time they spend sifting through massive amounts of information. When students encounter problems, AI can quickly give accurate answers and detailed explanations, saving students the time to find information and think on their own [16]. We investigated the time spent by students in completing the case discussion assignments through a questionnaire, and found that the average completion time of students in the experimental group was greatly reduced after the use of the AI tool, while the time used by the control group (t = 5.5 h) was significantly more than that of the experimental group (t = 2.6 h) (Fig. 3A). In addition, we also counted the number of times the control and experimental groups sought help from the instructor in completing their assignments, and found that the number of questions asked by the students in the experimental group (N = 10) decreased significantly compared to that of the control group (N = 35) after the use of the AI tool (Fig. 3B). All these results show that using AI tools speeds up students’ completion of assignments and saves more learning time. Meanwhile, we also analyzed the examination scores of the control group and the experimental group in the final examination (Fig. 3C). The results showed that the experimental group’s case study question scores (8.7 ± 2.1) and final exam scores (77.3 ± 4.3) were significantly higher than those of the control group ((6.4 ± 3.5; 66.5 ± 5.4) (P < 0.05). This indicates that AI tools can significantly enhance students’ understanding of clinical cases and improve their learning performance.

Fig. 3.

Fig. 3

The influence of Kimi Chat 2.0 on students’ learning behavior. * P < 0.05; ** P < 0.01; *** P < 0.001; ns, no significance

AI tool has advantages in knowledge expansion of clinical cases

In this study, we investigated and analyzed the attitudes of students in the experimental group towards the role of AI tools in case study in biochemistry courses from positive and negative feedback (Fig. 4). The positive feedback questionnaire contained 12 questions, and all 39 questionnaires were collected through an online questionnaire, and the mean score (Mean ± SD) of the scores for each question was analyzed. The overall Cronbach’s α of the scale was 0.88, and the α coefficients of each dimension ranged from 0.74 to 0.85 with good reliability. Exploratory factor analysis extracted two common factors (cumulative explained variance 65.3%), and confirmatory factor analysis showed that the model fit well (CFI = 0.93, RMSEA = 0.07), supporting the preset structure.

Fig. 4.

Fig. 4

The positive impact of Kimi Chat 2.0 on students’ learning outcomes

The questionnaire data analysis showed that in the first dimension (Q1-Q3), the means were Q1 = 9.50, Q2 = 7.14, and Q3 = 7.49, indicating that most students perceived AI tools as relatively easy to acquire and use. They also thought that AI tools could greatly save learning time and reduce the time spent on reviewing and translating literature, and it was easier to set the search content, which could easily realize human-machine dialogue. In the second dimension (Q4-Q6), we found that students highly agreed that AI tools played a complementary role to the textbook (Q4M = 9.18). Students also tended to agree that AI tools could provide them with a diversity of learning materials (Q5M = 6.47), whereas students did not agree that AI tools replaced existing e-learning resources (Q6M = 4.12). In the third dimension (Q7-Q9), the vast majority of students (Q7M = 8.48) were positive about the view that AI tools provide a more comprehensive explanation of basic knowledge. Most students also agreed that AI tools could expand biochemistry knowledge beyond the textbook (Q8M = 6.18). However, students’ agreement with the role of AI tools in the explanation of cutting-edge knowledge points was low (Q9M = 4.50). This also suggests that AI tools may have shortcomings in the tracking of new knowledge. In the fourth dimension (Q10-Q12), the view that AI tools are more comprehensive in analyzing cases was affirmed by most students (Q10M = 5.8), while more students believed that AI tools had limited role in deep understanding of clinical biochemistry in clinical cases (Q11M = 4.20). Most of the students did not think that the AI tool could generate similar clinical cases (Q12M = 3.90). This indicates that this AI tool has some limitations in generating complex concepts and cutting-edge knowledge.

Limitations of the AI tool in medical biochemistry case studies

Although AI tools have great potential in classroom teaching, we also need to be fully aware of their limitations and consider them carefully in their application to ensure that they are used sensibly, safely, and effectively. In this study, we also set six questions to investigate students’ attitudes toward the limitations of AI in biochemistry classroom teaching (Fig. 5). Most of the students believed that the use of AI tools reduced the interaction between students and teachers inside and outside the classroom (Q1M = 7.17), which was evidenced by the decreased frequency of students’ questions to the teacher. However, students were neutral in their views on whether AI tools reduced classroom attention (Q2M = 5.24). In addition, most students disagreed with the view that the use of AI tools reduces active thinking (Q3M = 4.31), and students were also neutral on the view that the use of AI tools reduces the training of creative thinking (Q4M = 5.5). Secondly, we also investigated the effect of AI tools on pressure to learning, and most students agreed that using AI tools increased the competitive pressure of learning (Q5M = 8.10) and led to homogenization of learning (Q6M = 6.56). This can be explained by the fact that the homogeneous answers provided by AI tools make the variability of scores among students not obvious. This is corroborated by the data from Fig. 3 of the experimental group’s homework scores.

Fig. 5.

Fig. 5

The negative effects of Kimi Chat 2.0 on students’ learning

Evaluation of students’ attitudes towards the use of AI before and after the experiment

After the experiment, we investigated the opinions of the students in the experimental group on whether they agreed to use AI again. We found that the score of the students in the experimental group on their opinion of agreeing to use AI rose from 5.5 before the experiment to 7.6 (Fig. 6), and there was a significant difference between the two scores (P < 0.05). This indicates that the students in the experimental group did benefit from the use of AI, thereby changing their views on the use of AI in teaching.

Fig. 6.

Fig. 6

The comparison of students’ attitude scores using Kimi chat2.0 before and after the experiment. * P < 0.05

Discussion

Due to ChatGPT’s limited accessibility in China, this experimental study investigated the role of Kimi Chat 2.0, a domestic AI tool, in biochemistry clinical case studies and evaluated its impact on medical education. The results of the study show that the use of AI tools can provide more cutting-edge and comprehensive answers to case studies. For example, the AI tool provides students with novel biologic treatments for hepatocellular carcinoma, such as immune-combination therapy, targeted therapies, stem cell therapy, and other techniques. These detailed answers greatly expand the scope of students’ knowledge, constructed the connection between biochemistry and clinical diseases, and lay the foundation for cultivating students’ systematic clinical thinking (Table 2). In addition, students in the experimental group can greatly save their study time and reduce their dependence on the instructor with the help of AI tools. This can be clearly reflected in the students’ completion time of homework and the frequency of questions. This shows that AI tools, with their easy operation and practicality, can provide students with personalized learning materials or answers regardless of time and space constraints that reduces students’ access to traditional resource retrieval. Through the time saved, students can engage in broader and deeper learning. Besides, as teaching assistants, AI tools can answer students’ basic questions anytime and anywhere and can also reduce the dependence on teachers.

Because the data and evidence obtained by AI technology are relatively comprehensive, it reduces human bias in teaching evaluation, thus making the evaluation and teaching process of medical education relatively efficient, accurate and fair [17]. In this study, we set uniform standards for four assignments (PPTs) in CBL for two groups of students with dual evaluation by AI tools and teachers. The results show that there is no significant difference between the manual composite scores and AI scores for all four groups of homework grades, and the AI can accurately give detailed textual evaluations. It shows that the evaluation of AI tools in CBL is completely objective and fair, and it also suggests that we can use AI tools for homework correction and evaluation, which will greatly reduce the teaching burden of teachers. In addition, we also found that the experimental group’s learning performance with the aid of the AI tool was significantly better than that of the control group, as reflected by the experimental group’s scores on the case study questions and the overall performance. This also indicates that the AI tool can provide students with more detailed test answers.

AI tools have demonstrated multiple positive roles in CBL learning, which can be seen from the positive questionnaire survey on AI tools. Especially in the first and second dimensions, the scores of these questions are significantly higher than those in the third and fourth dimensions. In terms of the difficulty of using KiMi2.0 software, students generally believe that it is relatively convenient for students to acquire and operate AI tools, which saves their time by lowering the technical threshold and enabling them to quickly adapt and integrate AI tools into the learning process. Secondly, in terms of supplementing textbook knowledge, AI tools have significantly enriched the content of teaching materials and enhanced the flexibility of learning through diverse learning resources. However, students have also realized that although the resources provided by AI tools are abundant, they cannot completely replace other electronic resources. Thirdly, in terms of the comprehensiveness of knowledge acquisition, AI tools have played a significant role in the retrieval of basic concepts, the analysis of complex concepts, and the exploration of innovative concepts, helping students build a more systematic knowledge framework. Finally, in terms of clinical depth and innovative thinking, AI still has certain deficiencies. The scores for these three issues are the lowest among the four dimensions. This is consistent with the essential thinking of generative AI software development that the “innovation” of AI is the result of passive optimization rather than active exploration of unknown areas. This is also similar to the previous research results [18].

For students, generative AI technology provides a wealth of learning resources and aids, but over-reliance may lead them to neglect the interpersonal emotional communication between teachers and students, and reduce the desire for active thinking and the importance of in-depth learning and understanding of medical knowledge [19]. For example, data from Fig. 6-Q1 shows that students perceived less interaction with faculty and students in the classroom after using AI tools. We also find in our classroom observations that some students would focus on the use of AI tools and spend less time listening to lectures. Secondly, students also tend to think that the use of AI tools reduces creative thinking; the emergence of AI tools makes students think of using tools whenever they encounter difficult problems instead of actively recalling knowledge and seeking for answers by themselves. Over time this may lead to dependence on the tools, harming the logical thinking and creative thinking of students. Furthermore, the analysis of the questionnaire data shows that although most students believe that AI tools have a more comprehensive display of basic and expanded knowledge of biochemistry, the students’ scores on the questionnaire show that they believe that AI tools are not comprehensive in explaining complex concepts. In particular, more students feel that AI tools are unable to analogize and analyze similar cases. This also indicates that AI tools still have shortcomings in solving complex clinical medicine problems. From the questionnaire results, we also find that students prefer to think that learning through AI will cause learning pressure because AI has limitations in aiding students in innovative learning where it is easy for students to get similar answers through the aid of AI tools, thus not able to show the difference of learning effect. This is another major challenge currently faced by AI tools. Therefore, how to prevent AI tools from generating homogeneous answers and truly achieve knowledge innovation is what needs to be carefully considered in the future application of AI tools in teaching.

Furthermore, we find that after the experiment ended, the evaluation scores of the students in the experimental group increased significantly. This indicates that the attitudes of the students in the experimental group towards the use of artificial intelligence have changed significantly before and after the experiment. Although there are still some flaws in AI tools, students still hold a positive attitude towards AI tools subjectively, and most students are willing to use AI tool in their future studies.

Conclusion

This study explores the positive and negative impacts of artificial intelligence tools in the biochemistry course CBL. The results show that AI tools have many positive effects. First of all, it can save students’ time for looking up and integrating materials and reduce the frequency of consulting teachers. Secondly, it can help teachers grade homework objectively and reduce reliance on teachers. Furthermore, AI tools can collect and integrate the latest medical achievements, guidelines, and news in real time, ensuring the timeliness of course content and broadening students’ knowledge horizons. However, its application in the classroom also has drawbacks. There is a lack of knowledge innovation and creativity, making it difficult for students to conduct logical thinking and to understand of complex scientific issues. Therefore, the positive impact of AI tools on classroom teaching is significant and worth of promotion, but they should be used prudently. The sample size of this study is limited, and the pre-evaluation of auxiliary work is not comprehensive enough. In the subsequent research, more students and teachers need to participate, and a comprehensive pre-evaluation of more advanced AI tools should be conducted to fully assess the value of the application of AI tools in the field of medical education.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1 (2.4MB, pptx)
Supplementary Material 2 (16.9KB, docx)
Supplementary Material 3 (18.2KB, docx)

Acknowledgements

We would like to thank Ni, S from Jincheng College of Chengdu and Peng, T from Sichuan University for their help in revising the manuscript and the other researchers for their support in this study.

Abbreviations

CBL

Case-Based Learning

GAI

Generated artificial intelligence

PPT

PowerPoint

CVI

Content validity index

EFA

Exploratory factor analysis

CFA

Confirmatory factor analysis

Author contributions

Zhang, W conceptualized and designed the study, collected the data, and analyzed the data. Li, L revised the manuscript. Zuo, L provided the case from the affiliated hospital. Wang, L, Zhang K, and Yang Y made and collected questionnaires. Sun, Y and Peng, Q prepared the main manuscript text. The authors have met the criteria for authorship and had a role in preparing the manuscript. Also, all authors approved the final manuscript.

Funding

This work was supported by National First-class Undergraduate Course Construction Project (No. 2023232793); 2024 Teaching Reform Research Foundation of Chengdu Medical College (Key Project) (No. JG2024065); Higher Education Talent Training Quality and Teaching Reform Project of Sichuan Province in 2024 (No. JG2024-1010).

Data availability

The data sets generated during and/or analyzed during this study are available from the corresponding author on reasonable request.

Declarations

Ethics approval and consent to participate

Students participated voluntarily in the study following the official invitation. Data confidentiality was protected. Teaching Research Ethics Committee Faculty of Chengdu Medical College approved the study in accordance with the Helsinki Declaration (Ethical Approval Letter JG2022005). Students were informed about the study and signed consent form.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Liang Li and Weiwei Zhang contributed equally to this work.

Contributor Information

Yiran Sun, Email: 18202863885@163.com.

Quekun Peng, Email: pengquekun@cmc.edu.cn, Email: 289344542@qq.com.

References

  • 1.Lihong Liu HB, Wang X, Zou Y. A study of innovative curriculum application to clinical medicine in the excellent Doctors educational training program. High Med Educ China. 2017;2017(11):73–4. [Google Scholar]
  • 2.Malau-Aduli BS, et al. Perceived clinical relevance and retention of basic sciences across the medical education continuum. Adv Physiol Educ. 2019;43(3):293–9. [DOI] [PubMed] [Google Scholar]
  • 3.Cleland JA. The qualitative orientation in medical education research. Korean J Med Educ. 2017;29(2):61–71. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Lim I, Saffari SE, Neo S. A cross-sectional study of knowledge and practices in the management of patients with parkinson’s disease amongst public practice-based general practitioners and geriatricians. BMC Health Serv Res. 2022;22(1):91. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Jowsey T, et al. Medical education empowered by generative artificial intelligence large Language models. Trends Mol Med. 2023;29(12):971–3. [DOI] [PubMed] [Google Scholar]
  • 6.Temsah A, et al. DeepSeek in healthcare: revealing opportunities and steering challenges of a new Open-Source artificial intelligence frontier. Cureus. 2025;17(2):e79221. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Boscardin CK, et al. ChatGPT and generative artificial intelligence for medical education: potential impact and opportunity. Acad Med. 2024;99(1):22–7. [DOI] [PubMed] [Google Scholar]
  • 8.Moritz S, et al. Generative AI (gAI) in medical education: Chat-GPT and Co. GMS J Med Educ. 2023;40(4):Doc54. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Chang Z. Grasp the trends of the discipline and raise Biochemistry education to a higher level. 2021. 41(7): pp. 1357–1361.
  • 10.Berillo D et al. Peptide-Based drug delivery systems. Med (Kaunas), 2021. 57(11). [DOI] [PMC free article] [PubMed]
  • 11.Dube DH. Design of a drug discovery course for non-science majors. Biochem Mol Biol Educ. 2018;46(4):327–35. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Gupta R, et al. Artificial intelligence to deep learning: machine intelligence approach for drug discovery. Mol Divers. 2021;25(3):1315–60. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Zhang X, Cao YGS, Tao J, He X. application of artificial intelligence genereated content in experimental teaching of biochemistry and molecular biology. Basic Medical Education, 2024. 26(5): pp. 406–411.
  • 14.Surapaneni KM. Assessing the performance of ChatGPT in medical biochemistry using clinical case vignettes: observational study. JMIR Med Educ. 2023;9:e47191. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Ghosh A, Bir A. Evaluating chatgpt’s ability to solve Higher-Order questions on the Competency-Based medical education curriculum in medical biochemistry. Cureus. 2023;15(4):e37023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Alkhalaf M, et al. Applying generative AI with retrieval augmented generation to summarize and extract key clinical information from electronic health records. J Biomed Inf. 2024;156:104662. [DOI] [PubMed] [Google Scholar]
  • 17.Lee H. The rise of chatgpt: exploring its potential in medical education. Anat Sci Educ. 2024;17(5):926–31. [DOI] [PubMed] [Google Scholar]
  • 18.Francis NJ, Jones S, Smith DP. Generative AI in higher education: balancing innovation and integrity. Br J Biomed Sci. 2024;81:14048. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Pagliari M, Chambon V, Berberian B. What is new with artificial intelligence?Human-agent interactions through the lens of social agency. Front Psychol. 2022;13:954444. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Material 1 (2.4MB, pptx)
Supplementary Material 2 (16.9KB, docx)
Supplementary Material 3 (18.2KB, docx)

Data Availability Statement

The data sets generated during and/or analyzed during this study are available from the corresponding author on reasonable request.


Articles from BMC Medical Education are provided here courtesy of BMC

RESOURCES