Skip to main content
BMC Medical Education logoLink to BMC Medical Education
. 2025 Oct 2;25:1291. doi: 10.1186/s12909-025-07852-x

Application of DeepSeek-assisted problem-based learning in hematology residency training

Jinxiao Hou 1,2,3,4,5,#, Furun An 1,2,3,4,5,#, Hui Qin 1,2,3,4,5, Lulu Zhang 1,2,3,4,5, Juan Wang 1,2,3,4,5, Cui Zhang 1,2,3,4,5, Dachuan Fan 6,
PMCID: PMC12492539  PMID: 41039507

Abstract

Objectives

This study aimed to evaluate the efficacy of integrating the open-source large language model (LLM) DeepSeek into problem-based learning (PBL) curriculum for hematology residency training.

Methods

This non-randomized controlled trial included two groups of 30 s-year hematology residents each. One group received traditional PBL instruction, while the other’s PBL was assisted by DeepSeek. Both groups participated in in-person PBL sessions across two identical hematology courses. The DeepSeek-assisted PBL group utilized DeepSeek V3 and R1 models, along with an AI-facilitated web search and integrated output after automatic information filtering and analysis during their in-person PBL sessions. Learning outcomes were assessed via a post-course survey evaluating effectiveness, credibility, reliability, and engagement. Students also completed five standardized examinations covering analysis and diagnostic decision-making, procedural skills, communication, interdisciplinary integration, and emergency management/ethical considerations.

Results

The study demonstrated significant advantages of DeepSeek-assisted PBL over traditional PBL across multiple competency domains, including case analysis effectiveness, feedback quality, course structure, and clinical reasoning. Participants also reported stronger curriculum alignment with current guidelines and enhanced capacity for generating clinical insights during discussions. Academically, the DeepSeek-assisted PBL group outperformed in four out of five competency domains (Exams I, III, IV, V), achieving higher total examination scores. However, no significant difference emerged in clinical skills (Exam II), nor did DeepSeek enhance interactive elements based on survey results. Notably, the DeepSeek-assisted PBL group also expressed greater concerns about the potential inaccuracies in artificial intelligence-generated medical advice.

Conclusion

Integrating DeepSeek into the PBL curriculum may improve clinical competence, diagnostic reasoning, and learner engagement in hematology residency training. These findings suggest that open-source LLMs like DeepSeek may offer scalable and cost-effective support tools to augment traditional medical education. Further study is needed to explore artificial intelligence tools for enhancing interactive elements and procedural skills.

Supplementary Information

The online version contains supplementary material available at 10.1186/s12909-025-07852-x.

Keywords: DeepSeek, Problem-based learning, Hematology residency, Large language model, Medical education

Introduction

Problem-based learning (PBL) has become a highly valued teaching strategy in residency training, promoting students’ active engagement by analyzing and solving complex clinical cases. This approach cultivates advanced critical thinking, clinical reasoning, and self-directed learning, which are core competencies for medical specialization. Increasingly incorporated into residency programs, PBL serves a vital link between theoretical knowledge and its practical application in high-pressure clinical environments, preparing residents to manage the complexities of patient care effectively. Previous studies have demonstrated that PBL enhances long-term knowledge retention, sharpens diagnostic skills, improves communication proficiency, and significantly increases resident satisfaction compared to traditional didactic instruction, while also fostering collaborative learning and professional adaptability [1, 2].

However, implementing PBL in residency training presents significant challenges. particularly in ensuring well-prepared facilitators and adequate institutional support. Studies have shown that a lack of proper training for both students and instructors can hinder the effectiveness of PBL sessions [3]. The variability in facilitator expertise and inconsistencies in structuring learning experiences for residents with diverse educational backgrounds may compromise the standardization and effectiveness of PBL [3]. Systematic reviews have indicated that while PBL does not significantly impact knowledge acquisition compared to traditional methods, its influence on clinical competencies and diagnostic accuracy remains inconclusive [4]. Furthermore, methodological inconsistencies and heterogeneous study designs hinder the ability to draw definitive conclusions about its pedagogical efficacy [4]. These deficiencies are particularly pronounced in subspecialties such as hematology, where residency education must address a broad spectrum of both commonplace and rare clinical scenarios [5]. As medical education continues to evolve, integrating innovative methodologies into PBL frameworks presents a compelling opportunity to amplify its strengths while mitigating inherent limitations.

The rapid development of artificial intelligence (AI), particularly large language models (LLMs), has created opportunities to enhance educational frameworks, including PBL, in residency training [6]. Incorporating AI-driven teaching methods provides valuable benefits that address the challenges of residency education, such as personalized learning experiences tailored to individual progress, real-time feedback that strengthens clinical decision-making, and the ability to simulate rare or complex medical cases, further refining specialty-specific knowledge [7, 8]. These capabilities hold particular significance for residents, who must rapidly assimilate a vast amount of medical knowledge while honing their capacity for nuanced clinical reasoning [9]. Moreover, AI-driven educational frameworks alleviate reliance on human facilitators by providing scalable and universally accessible instructional support, a critical advantage for enhancing pedagogical quality in residency programs constrained by limited faculty resources [10]. Within this domain, DeepSeek, an innovative open-source LLM pioneered by a Chinese AI research entity, stands out due to its advanced reasoning capabilities, cost-efficient deployment, and a compact yet potent architecture [11]. Unlike proprietary AI systems, its open-source framework enables residency programs and educators worldwide to access, customize, and implement the model without external dependencies, thereby mitigating infrastructure costs [11]. Additionally, its minimal computational requirements further facilitate seamless deployment on modest hardware configurations, making it a viable solution for institutions in underserved regions [11]. In the context of PBL-driven residency education, DeepSeek harbors the potential to revolutionize instructional methodologies [12, 13]. By autonomously generating discipline-specific clinical case scenarios, furnishing real-time diagnostic feedback, and functioning as an intelligent tutor, it provides an interactive and personalized learning experience [12]. For instance, hematology residents could leverage DeepSeek to simulate rare pathological conditions, refining their diagnostic reasoning and patient interaction skills in a controlled yet dynamic environment [12]. This AI-enhanced approach reduces PBL’s traditional reliance on facilitators while offering an adaptable, scalable, and cost-effective alternative. However, despite these advantages, key concerns persist, including data privacy, model reliability, and the necessity for empirical validation within residency education settings [14]. To date, the use of DeepSeek-assisted PBL has yet to be fully explored, highlighting the need for further investigation and comprehensive evaluation.

This study aimed to systematically assess the efficacy of a DeepSeek-assisted PBL curriculum in hematology residency training, with a specific focus on its impact on teaching effectiveness, resident learning experiences, and clinical competency. The importance of this study lied in its potential to redefine LLM’s role in medical education, offering an innovative and scalable model that leverages DeepSeek’s open-source and computationally efficient framework for widespread global adoption.

Methods

Participants and recruitment

This quasi-experimental, non-randomized controlled trial was conducted at the Second Affiliated Hospital of Anhui Medical University from February to March 2025. Sixty second-year hematology residents, all participants in a mandatory PBL curriculum, voluntarily enrolled after signing informed consent. Representing diverse regions of China, these residents had completed standardized undergraduate medical education and national licensing exams. They were divided into two groups: a traditional PBL group (n = 30) and a DeepSeek-assisted PBL group (n = 30), with allocation based on residency schedules to ensure operational feasibility and minimize clinical disruptions. To control for facilitator variability, the same experienced facilitators guided both the traditional and DeepSeek-assisted PBL groups. Each group of 30 residents participated in three separate teaching sessions for each course, with approximately 10 residents per session. Facilitators, each with at least five years of PBL experience, attended a training workshop to standardize teaching approaches across groups.

Study design and setting

The study, conducted from February to March 2025, evaluated the effectiveness of DeepSeek-assisted PBL versus traditional PBL in hematology residency training. Each group completed two courses:

  • Course 1: February 1 – February 28, 2025

  • Course 2: March 1 – March 31, 2025

Each course was anchored by a distinct hematology-related clinical case (Course 1: a case of acute myeloid leukemia; Course 2: a case of severe hemolytic anemia) (supplemental appendix 1). Each course comprised three sessions, each lasting four academic hours (with each academic hour equaling 40 min), totaling 12 academic hours per course. To facilitate small-group learning, each cohort of 30 residents was divided into three classes of 10. The traditional PBL group received solely conventional facilitator-led instruction, whereas the DeepSeek-assisted group integrated the DeepSeek open-source LLM into the PBL framework for both courses. The research design and flowchart are shown in Fig. 1. To ensure equivalence, both groups engaged with identical case content and learning objectives aligned with hematology residency training standards. Sessions were executed in a hybrid format, combining in-person discussions at the Second Affiliated Hospital of Anhui Medical University with virtual access to DeepSeek for the experimental cohort.

Fig. 1.

Fig. 1

Study design and flow chart

For the traditional PBL group, the instructional sequence for both courses entailed:

  1. Case Presentation: Residents received printed case materials (e.g., patient history, bone marrow smears, laboratory results) for independent review prior to each session.

  2. Group Discussion: Facilitators orchestrated discussions to elicit learning issues, encourage the formulation of diagnostic hypotheses (e.g., leukemia subtype), and promote collaborative development of management plans.

  3. Self-Directed Learning: Post-session, residents engaged in independent research utilizing standard hematology resources such as textbooks and established online databases (e.g., UpToDate, ASH Guidelines), and were able to conduct web searches as needed. However, unlike the DeepSeek-assisted group, they did not have access to an AI-facilitated web search and integrated output after automatic information filtering and analysis during the in-person PBL sessions.

  4. Wrap-Up: Sessions concluded with facilitators summarizing key points, addressing queries, and reinforcing learning objectives through their expert oversight.

For the DeepSeek-assisted PBL group, the traditional PBL structure was enhanced with the integration of DeepSeek V3, its R1 version, and Web Searching module. It is important to note that the core clinical case content remained identical to that used by the traditional PBL group. The process was augmented through the integration of DeepSeek, exemplified by a scenario involving severe anemia in Course 2. Prior to implementation, all residents in the DeepSeek-assisted group received the training on accessing the DeepSeek App online platform (https://www.deepseek.com) and its Web Searching module. This training covered basic interaction principles for posing questions effectively to DeepSeek V3 and R1 for tailored explanations, simulations, or specific information retrieval, emphasizing the Web Searching module’s real-time functionality for accessing the latest research, guidelines, and expert opinions. In addition, there was explicit instruction on critically appraising DeepSeek-generated information, cross-referencing with established knowledge, and understanding potential limitations or inaccuracies, fostering independent judgment.

  1. Case Introduction: Residents accessed a DeepSeek-generated scenario replete with interactive prompts (e.g., “Interpret the patient’s reticulocyte count and bone marrow smears findings”) to initiate diagnostic reasoning. Residents were not provided with pre-structured or curated prompts for direct DeepSeek input during these sessions; instead, the DeepSeek itself generated these guiding prompts within the case.

  2. Real-Time Support: DeepSeek provided instantaneous feedback on diagnostic propositions (e.g., “Consider autoimmune hemolytic anemia based on Coombs test results”) and suggested supplementary inquiries during discussions.

  3. Personalized Guidance and Expanded Learning: Residents could query DeepSeek V3 and its R1 version for tailored explanations (e.g., underlying mechanisms of hemolysis) or simulations of rare hematologic conditions. Meanwhile, the Web Searching module—DeepSeek’s online search functionality—allowed real-time retrieval of the latest research, clinical guidelines, and expert opinions, ensuring access to the most up-to-date evidence and best practices.

  4. Group Integration: Insights derived from DeepSeek, its R1 version, and web-based sources were shared and debated within group discussions. Facilitators followed standardized protocols for guiding and regulating these discussions. These protocols ensured they actively guided critical evaluation of insights derived from DeepSeek and web sources, validated the clinical accuracy and relevance of DeepSeek’s outputs in real-time, correcting any misleading DeepSeek suggestions, and continually reinforced the necessity for residents to integrate DeepSeek outputs with their own independent clinical judgment throughout the session.

  5. Summary and Synthesis: Each session concluded with a collaborative wrap-up, incorporating facilitator-led discussion and DeepSeek-driven analysis, emphasizing clinical applications (e.g. therapeutic strategies for hemolytic anemia) and reinforcing AI-generated insights.

DeepSeek V3 and its R1 version were accessed via the DeepSeek App online platform (download from https://www.deepseek.com), eliminating the need for local deployment and ensuring seamless availability. Meanwhile, the Web Searching module provided residents with real-time access to the latest medical literature, bridging the gap between traditional learning resources and rapidly evolving clinical knowledge. The dual-course structure enabled an evaluation of the cumulative impact of DeepSeek integration over time, with the traditional PBL group serving as a consistent control.

Assessment of learning outcomes

An anonymous questionnaire was administered at end of courses to gather feedback on the PBL experience. Adapted from validated PBL evaluation tools, it included 15 items to assess teaching effectiveness, credibility and reliability, and student engagement on a five-point Likert scale (1 = strongly disagree; 5 = strongly agree) (supplemental appendix 2). This questionnaire captured residents’ perceptions of the PBL process, including DeepSeek’s role in the experimental group.

A comprehensive examination (100 points) was conducted after the completion of both courses and consisted of five parts, each worth 20 points, designed for hematology residency training:

  1. Exam Ⅰ: case analysis and diagnostic decision-making (20 points)

    This section aimed to evaluate clinical reasoning, diagnosis, and treatment planning for hematologic disorders, ensuring residents could effectively analyze complex cases and devise appropriate management strategies. The content included analysis of cases such as leukemia or anemia, including:

    1. Primary diagnosis and differential diagnosis (e.g., acute leukemia, Leukemia-like reaction, myelodysplastic syndrome, Nutritional anemia, aplastic anemia, etc) (5 points)
    2. Diagnostic rationale (e.g., bone marrow smears, flow cytometry, genetic testing) (5 points)
    3. Treatment planning (e.g., chemotherapy, targeted therapy, transplantation) (5 points)
    4. Prognosis and follow-up (e.g., minimal residual disease monitoring, post-transplant care) (5 points)
  2. Exam Ⅱ: clinical skills and procedures (20 points)

    Designed to assess mastery of core hematology procedures, this component tested residents’ practical skills critical to patient care in hematology settings. The content focused on knowledge of techniques such as (supplemental appendix 3):

    1. Lumbar puncture (5 points)
    2. Bone marrow aspiration (5 points)
    3. Peripheral blood stem cell collection (5 points)
    4. Therapeutic plasma exchange (5 points)
  3. Exam Ⅲ: patient communication and team collaboration (20 points)

    This part tested skills in patient interaction and interdisciplinary teamwork, emphasizing the ability to convey complex medical information and collaborate effectively with healthcare teams. The content involved scenarios including:

    1. Explaining diagnosis and prognosis to leukemia patients or families (10 points)
    2. Collaborating with multidisciplinary teams (e.g., infectious disease, ICU, transplant units) (10 points)
  4. Exam Ⅳ: interdisciplinary integration and research literacy (20 points)

    The objective here was to gauge residents’ ability to integrate related disciplines and engage with research, fostering a holistic approach to hematology practice and evidence-based decision-making. The content comprised tasks such as:

    1. Comprehensive case analysis with imaging (e.g., CT or ultrasound examination), molecular testing (e.g., next-generation sequencing), and flow cytometry (10 points)
    2. Critiquing a recent hematology study (e.g., CAR-T therapy, BCL-2 inhibitors) for design and clinical applicability (10 points)
  5. Exam Ⅴ: emergency management and ethical considerations (20 points)

    This section assessed residents’ handling of acute cases and ethical challenges, preparing them to manage critical situations and navigate complex moral dilemmas in hematology. The content featured scenarios including:

    1. Management of emergencies (e.g., DIC, tumor lysis syndrome, transplant rejection) (10 points)
    2. Ethical case analysis (e.g., palliative care for relapsed/refractory leukemia, donor selection for transplantation) (10 points)
      The examination questions were sourced from a standardized hematology residency question bank, ensuring relevance to training objectives. Scoring was based on standardized rubrics, combining examiner evaluations with objective metrics (e.g., accuracy, completeness).

Statistical analysis

Data were analyzed using SPSS version 26.0 (SPSS Inc., Chicago, IL). Quantitative data, including questionnaire scores and examination results, were present as means ± standard deviations (X̄ ± SD). Independent t-tests were used to compare outcomes between the traditional and DeepSeek-assisted PBL groups after the courses. Categorical data were analyzed using chi-square tests. Statistical significance was defined as p < 0.05.

Results

Demographic characteristics of the participants

Sixty second-year hematology residents at the Second Affiliated Hospital of Anhui Medical University were enrolled between February and March 2025, all of whom were participants in a compulsory PBL curriculum. The cohort was evenly stratified into two groups: a traditional PBL group (n = 30) and a DeepSeek-assisted PBL group (n = 30), as shown in Table 1. The overall mean age was 25.18 ± 1.50 years (range: 23–28), with the traditional PBL group averaging 24.90 ± 1.63 years and the DeepSeek-assisted group 25.47 ± 1.33 years (t = 1.48, p = 0.145). Gender distribution included 21 males (35%) and 39 females (65%), with the DeepSeek-assisted PBL group 11 males and 19 females, and the traditional PBL group comprising 10 males and 20 females (χ² = 0.073, p = 0.787). Regarding residency, all participants in both groups originated from Anhui Province. There were no significant differences in age or gender between the two groups (p > 0.05).

Table 1.

Participant characteristics

Characteristic DeepSeek-assisted PBL group (n = 30) Traditional PBL group (n = 30) χ²/t p
Age (years, Mean ± SD) 25.47 ± 1.33 24.90 ± 1.63 1.48 0.145
Gender, n 0.073 0.787
 Male 11 10
 Female 19 20
Residential Origin, n (%)
Anhui Province 30 (100%) 30 (100%) - -

Evaluation of questionnaire responses

In Table 2, we present a comparison of questionnaire responses between the DeepSeek-assisted PBL group and the traditional PBL group, with 30 students per group. Students in the DeepSeek-assisted PBL group perceived the course as significantly more effective in supporting case analysis (4.40 ± 0.56 vs. 3.97 ± 0.67, p = 0.009) and valued the feedback for substantially enhancing their grasp of medical concepts (4.33 ± 0.55 vs. 3.87 ± 0.63, p = 0.003). They also found the course structure more conducive to efficient case analysis (4.47 ± 0.51 vs. 3.93 ± 0.64, p = 0.001) and the supplementary materials more effective in bolstering their clinical reasoning abilities (4.17 ± 0.75 vs. 3.60 ± 0.68, p = 0.003). Furthermore, they reported that the instructional approach better aligned with their learning needs (4.30 ± 0.65 vs. 3.83 ± 0.65, p = 0.007). The DeepSeek-assisted PBL group expressed greater confidence in the accuracy and reliability of the medical information provided (4.23 ± 0.57 vs. 3.50 ± 0.51, p = 0.000) and perceived the course content as more aligned with the latest medical research and guidelines (4.33 ± 0.61 vs. 3.73 ± 0.58, p = 0.000). They also demonstrated higher trust in the feedback they received (4.47 ± 0.51 vs. 4.03 ± 0.62, p = 0.004) and rated the explanations and responses as clearer and more comprehensible (4.53 ± 0.51 vs. 3.73 ± 0.64, p = 0.000). However, these students exhibited significantly greater concern about the possibility of the course providing inaccurate medical advice (4.43 ± 0.57 vs. 4.07 ± 0.69, p = 0.029). In terms of engagement and future preferences, the DeepSeek-assisted PBL course was considered more engaging (4.03 ± 0.89 vs. 3.53 ± 0.57, p = 0.013), and students expressed a stronger desire to continue using this approach in future PBL courses (4.47 ± 0.51 vs. 3.63 ± 0.62, p = 0.000). Additionally, they felt the course enabled them to offer deeper insights during group discussions (4.50 ± 0.50 vs. 4.10 ± 0.66, p = 0.011).

Table 2.

Questionnaire responses comparing DeepSeek-assisted PBL and traditional PBL (n = 30 per group, averaged across two courses)

Questions DeepSeek-assisted PBL group Traditional PBL group t-value p-value
The PBL course helps me effectively analyze the cases provided. 4.40 ± 0.56 3.97 ± 0.67 2.72 0.009
The feedback I receive during the course improves my understanding of medical concepts. 4.33 ± 0.55 3.87 ± 0.63 3.07 0.003
The course structure allows me to efficiently analyze cases. 4.47 ± 0.51 3.93 ± 0.64 3.58 0.001
Supplementary materials or information provided in the course enhance my clinical reasoning skills. 4.17 ± 0.75 3.60 ± 0.68 3.08 0.003
The instructional approach in the course meets my learning needs. 4.30 ± 0.65 3.83 ± 0.65 2.78 0.007
The medical information provided during the course is accurate and reliable. 4.23 ± 0.57 3.50 ± 0.51 5.27 0.000
The course content reflects the latest medical research and guidelines. 4.33 ± 0.61 3.73 ± 0.58 3.91 0.000
I trust the feedback provided in the course for learning purposes. 4.47 ± 0.51 4.03 ± 0.62 2.98 0.004
The explanations and responses I receive during the course are clear and easy to understand. 4.53 ± 0.51 3.73 ± 0.64 5.37 0.000
I am concerned that the course might provide inaccurate medical advice. 4.43 ± 0.57 4.07 ± 0.69 2.24 0.029
The PBL course makes learning more engaging. 4.03 ± 0.89 3.53 ± 0.57 2.59 0.013
Participating in the course increases my confidence in learning. 4.17 ± 0.65 4.10 ± 0.76 0.37 0.716
I would like to continue using this instructional approach in future PBL courses. 4.47 ± 0.51 3.63 ± 0.62 5.73 0.000
The interactive elements of the course encourage me to participate more actively in case analysis. 4.20 ± 0.61 3.97 ± 0.55 1.55 0.127
The course helps me contribute deeper insights during group discussions. 4.50 ± 0.50 4.10 ± 0.66 2.625 0.011

However, there were no significant differences between the two groups in their confidence in learning as a result of participating in the course (4.17 ± 0.65 vs. 4.10 ± 0.76, p = 0.716) or in the extent to which the interactive elements encouraged active participation in case analysis (4.20 ± 0.61 vs. 3.97 ± 0.55, p = 0.127).

Evaluation of academic performance

The DeepSeek-assisted PBL group achieved significantly a significantly higher total score in the comprehensive examination compared to the traditional PBL group (86.03 ± 2.55 vs. 81.73 ± 3.38), with the difference being statistically significant (p = 0.000) (Table 3). This group also obtained significantly higher scores in Exam Ⅰ (17.93 ± 0.74 vs. 16.80 ± 1.63, p = 0.001), Exam Ⅲ (17.87 ± 1.07 vs. 17.13 ± 1.11, p = 0.012), Exam Ⅳ (15.97 ± 0.89 vs. 15.00 ± 2.20, p = 0.031), and Exam Ⅴ (17.53 ± 0.90 vs. 16.60 ± 1.38, p = 0.003). However, there was no statistically significant difference in Exam Ⅱ scores between the two groups (16.73 ± 1.17 vs. 16.20 ± 1.13, p = 0.078).

Table 3.

Examination scores comparing DeepSeek-assisted PBL and traditional PBL (n = 30 per group, averaged across two courses)

Scoring Component DeepSeek-assisted PBL group Traditional PBL group t-value p-value
Exam Ⅰ 17.93 ± 0.74 16.80 ± 1.63 3.47 0.001
Exam Ⅱ 16.73 ± 1.17 16.20 ± 1.13 1.80 0.078
Exam Ⅲ 17.87 ± 1.07 17.13 ± 1.11 2.61 0.012
Exam Ⅳ 15.97 ± 0.89 15.00 ± 2.20 2.23 0.031
Exam Ⅴ 17.53 ± 0.90 16.60 ± 1.38 3.10 0.003
Total Score 86.03 ± 2.55 81.73 ± 3.38 5.58 0.000

Discussion

The landscape of medical education is undergoing a significant transformation, driven by the rapid advancement of technology. AI has emerged as a potentially transformative tool, particularly in residency training [15]. In this study, we reported the first integration of the DeepSeek into a PBL curriculum for second-year hematology residents. This innovative approach aimed to enhance clinical competence by providing tailored case simulations, real-time diagnostic feedback, and interactive tutoring. Preliminary findings indicated that DeepSeek-assisted PBL not only improved teaching effectiveness and student engagement but also significantly enhanced residents’ analytical and decision-making skills. Moreover, this study highlighted the potential of open-source LLMs like DeepSeek to offer transparent, cost-effective, and scalable solutions that could reduce disparities in medical education on a global scale [14].

Enhancement of clinical competence through DeepSeek

The integration of DeepSeek into the PBL curriculum significantly improved residents’ self-perceived clinical abilities and their performance in competency-related domains, as evidenced by both subjective evaluations and objective data. Questionnaire surveys revealed that the experimental group, which received DeepSeek-assisted PBL, outperformed the traditional PBL group in 13 out of 15 assessment items, reflecting improvements in teaching effectiveness, credibility, and overall engagement. In particular, the DeepSeek-assisted PBL group reported a significant enhancement in case analysis efficiency, with an average score of 4.47 ± 0.51 compared to 3.93 ± 0.64 in the traditional PBL group. This finding implies that DeepSeek’s ability to process complex clinical information enabled residents to conduct case analyses more systematically and effectively. Additionally, the DeepSeek-assisted PBL group demonstrated a stronger ability to understand medical concepts, scoring an average score of 4.33 ± 0.55 compared to 3.87 ± 0.63 in the traditional PBL group, likely benefiting from the provision of real-time, personalized feedback and targeted guidance.

Objective examination scores further confirmed these positive trends. The total score of the DeepSeek-assisted PBL group was significantly higher than that of the traditional PBL group, with improvements observed across multiple domains of clinical competence:

  1. Exam I (Case Analysis and Diagnostic Decision-Making): The DeepSeek-assisted PBL group achieved a significantly higher score than the traditional group. This improvement corresponds with the subjective reports of enhanced case analysis efficiency, demonstrating DeepSeek’s effectiveness in reinforcing cognitive aspects of clinical reasoning. This exam emphasized fundamental hematology skills such as differentiating leukemia subtypes and identifying the causes of anemia. DeepSeek’s capacity to interpret data, propose diagnoses, and analyze complex cases likely provided a considerable advantage to the residents in the DeepSeek-assisted PBL group.

  2. Exam III (Patient Communication and Team Collaboration): The DeepSeek-assisted PBL group also showed a significant improvement in this area. Although it may seem unexpected for a LLM tool to directly impact interpersonal skills, DeepSeek’s structured feedback appears to have enhanced residents’ capacity to articulate their diagnostic reasoning, a critical component of effective patient communication and interprofessional collaboration.

  3. Exam IV (Interdisciplinary Integration and Research Literacy): A statistically significant difference was observed in this exam, with the DeepSeek-assisted PBL group performing better. This result suggests that DeepSeek’s access to current medical literature and guidelines enhanced residents’ ability to make evidence-based decisions and integrate interdisciplinary knowledge in patient care. As LLMs are trained on extensive textual data, including medical research, DeepSeek likely provided relevant research findings and established guidelines that improved residents’ capacity to synthesize information from diverse fields. However, it is important to acknowledge that despite the statistical significance, this modest improvement (0.97 points) may not directly translate into substantial clinical impact without further context or qualitative assessment, suggesting an area for future research.

  4. Exam V (Emergency Management and Ethical Considerations): The DeepSeek-assisted group exhibited superior performance. This improvement likely resulted from DeepSeek’s ability to simulate various emergency scenarios, such as tumor lysis syndrome, and to provide pertinent information on ethical considerations in patient management. By simulating diverse clinical situations and offering insights into best practices, DeepSeek allowed residents to practice their responses in a risk-free environment. While the simulation capabilities of DeepSeek may have contributed to these outcomes, further studies are needed to isolate the specific elements responsible for performance gains.

These results are consistent with the principles of competency-based medical education, which emphasize the mastery of specific skills rather than rote memorization [16]. By fostering analytical, communicative, and integrative abilities, DeepSeek-assisted PBL contributes significantly to the development of well-rounded hematology residents.

Analysis of non-significant results

Two outcomes in this study did not reach statistical significance, providing insights into the limitations of DeepSeek in its current form and suggesting avenues for future improvement. The questionnaire item assessing whether the course’s interactive elements promoted more active case analysis showed no significant difference between the DeepSeek-assisted PBL and the traditional PBL group. While the invaluable human interaction and facilitator-led group discussions inherent in traditional PBL are highly effective in fostering participation, this result may indicate that the interactive components of DeepSeek, as currently implemented, left limited scope for AI to further enhance engagement [3]. Alternatively, while DeepSeek’s interactive prompts may benefit individual learning, they might not be sufficiently designed to alter the dynamics of group interactions in a way that significantly boosted active engagement beyond the levels already present in traditional PBL. This suggests a need for more sophisticated AI features, such as real-time adaptive questioning tailored to group discussions or the incorporation of gamified elements, to more effectively amplify participation. The limited impact on interactive elements indicates that LLM integration needs to be thoughtfully designed to complement and enhance existing pedagogical methods rather than simply replacing them. Fostering group interaction requires different strategies than individual learning support, and the LLMs need to be more dynamically integrated into the group setting to truly enhance participation.

Similarly, Exam II, which assessed clinical skills and procedures, revealed no significant difference between the DeepSeek group and the traditional PBL group. This component evaluated hands-on skills such as bone marrow aspiration and transfusion management, suggesting that DeepSeek’s text-based support has limited influence on procedural competence [17]. Although the slightly higher mean score in the DeepSeek-assisted group might reflect improved theoretical understanding, translating this knowledge into practical skills likely requires integration with simulation-based training using virtual reality (VR), augmented reality (AR), or more focused bedside teaching [18, 19]. This finding highlights that the primary strengths of LLMs in medical education lie in the cognitive domain, while procedural skills acquisition necessitates alternative pedagogical approaches. Integrating DeepSeek with VR/AR technologies that provide virtual practice with haptic feedback could potentially bridge this gap, as such platforms offer a safe environment for practicing procedures without risk to patients [19]. The integration of AI with VR/AR could further personalize training by tailoring scenarios and feedback based on individual performance [20].

Credibility and acceptance of DeepSeek

The perceived credibility of DeepSeek was a crucial factor in its overall effectiveness. Residents in the DeepSeek-assisted PBL group rated the accuracy and reliability of its medical information significantly higher than did residents in the traditional group. They also reported greater trust in the feedback provided by DeepSeek, indicating confidence in its alignment with current hematology standards, such as the American Society of Hematology guidelines. Despite this high level of trust, the DeepSeek-assisted group expressed greater concern about potential inaccuracies. This dual perception-trust coupled with caution—reflects broader trends in the adoption of LLMs in medical education and underscores the need for human facilitators to validate AI-generated outputs and ensure clinical accuracy [21].

Our results revealed an intriguing paradox that the DeepSeek-assisted group, while confident in the information’s accuracy and alignment with guidelines, also expressed greater concern about inaccuracies. This isn’t a contradiction but reflects a sophisticated understanding of AI. Heightened concern stems from awareness of AI limitations like “hallucinations,” fostering critical scrutiny. Concurrently, DeepSeek’s comprehensive and verifiable information, combined with the rigorous validation process involving facilitator oversight, resident critical appraisal, and real-time correction and discussion, led to increased confidence in confirmed content. Thus, these findings indicate residents are developing essential competencies in critically evaluating AI-augmented information, preparing them for an AI-integrated clinical environment with both informed caution and confident utilization.

Advantages of open-source and practical implications

While this study specifically examined DeepSeek’s integration within a PBL curriculum, the observed benefits in enhancing clinical reasoning, access to information, and interactive learning suggest that similar AI-driven adjuncts could generalize effectively to other instructional methodologies, including traditional didactics or case-based learning (CBL), thereby offering broader applicability to diverse medical education settings. The open-source nature of DeepSeek presents distinct advantages over proprietary LLMs, particularly in terms of scalability, affordability, and adaptability for diverse educational settings [22]. Proprietary systems often demand significant financial and technical resources, which restricts their implementation in institutions with limited budgets. DeepSeek, on the other hand, can be deployed on local servers with relatively modest hardware requirements, making it well-suited for resource-limited settings. Its ability to simulate rare hematologic conditions, such as myeloid sarcoma or hemolytic crises, addresses critical gaps in clinical exposure [17]. By providing simulated encounters with uncommon but clinically significant scenarios, DeepSeek enriches the educational experience for residents who might not otherwise encounter these conditions in a conventional clinical setting. This enhanced exposure potentially extends the benefits of advanced medical education beyond tertiary centers, making high-quality training more available to rural and underserved areas, and thereby amplifying its overall educational impact.

Ethical considerations in LLMs integration

The integration of LLMs into medical education presents critical ethical challenges, particularly regarding data privacy and algorithmic bias [23, 24]. In this study, the local deployment of DeepSeek minimized data privacy risks. However, broader implementation of LLMs in medical training necessitates robust safeguards to prevent data breaches and mitigate skewed outputs arising from unrepresentative datasets. Ensuring data privacy and addressing algorithmic bias are essential ethical imperatives for responsibly incorporating LLMs in medical education. Regular audits and updates of these models are crucial to maintain accuracy and rectify potential biases [25]. Moreover, it is crucial to foster LLMs accomplishment among residents to ensure they can critically appraise LLMs-generated information and avoid over-reliance on these technologies [26]. This includes a strong emphasis on maintaining independent clinical judgment and preventing the erosion of critical thinking skills when interacting with AI-supported learning environments. This education should emphasize how LLMs work, their potential biases, and the importance of human oversight to ensure patient safety. As LLMs are tools that must be used correctly, developing these skills is vital for its safe and effective application in healthcare. The Association of American Medical College has established key principles for the responsible and ethical use of LLMs in medical education, emphasizing a human-centered approach, ethical and transparent practices, equitable access, comprehensive education and training, interdisciplinary collaboration in curriculum development, data privacy protection, and continuous monitoring and evaluation (https://www.aamc.org).

Limitations of the study

Despite the promising outcomes, several limitations must be acknowledged. Firstly, the quasi-experimental, non-randomized design of this study, with resident allocation based on existing schedules, introduced potential selection bias. This, alongside its limited sample size and single-institution scope, significantly restricts the generalizability of the findings, underscoring the need for larger, multi-center trials in diverse contexts. Secondly, the relatively short duration of the study captured only short-term gains; the long-term retention of skills and their translation into clinical practice remain unassessed, indicating the need for longitudinal studies. Thirdly, while aligning with competency-based education, the study does not explicitly ground its methodology within a specific theoretical educational framework (e.g., constructivism), which could strengthen future research design and interpretation. Finally, threats to internal validity include the possibility of a Hawthorne effect (performance influenced by the novelty of the AI tool), potential facilitator bias despite training (e.g., varying comfort with AI), and the lack of blinding in assessment scoring.

Conclusion

In conclusion, the integration of DeepSeek-assisted PBL may enhance hematology residency training, and was associated with improvements in residents’ self-perceived clinical abilities and their performance in related competency domains, particularly in diagnostic reasoning, interdisciplinary integration, and emergency management. Although its impact on procedural skills and interactive participation appears less pronounced, DeepSeek’s open-source framework provides a scalable, cost-effective solution for promoting equitable access to advanced medical education. The enhancements observed in clinical reasoning, information access, and interactive learning suggest DeepSeek’s potential applicability extends beyond PBL to other instructional methodologies such as traditional didactics or CBL.

Supplementary Information

Supplementary Material 1. (367.6KB, docx)
Supplementary Material 2. (17.3KB, docx)
Supplementary Material 3. (20.3KB, docx)

Acknowledgements

The authors thank the hematology residents who participated in the study.

Abbreviations

LLM

Large language model

PBL

Problem-based learning

AI

Artificial intelligence

CBL

Case-based learning

VR

Virtual reality

AR

Augmented reality

Authors’ contributions

JX-H and DC-F designed the study and drafted the manuscript. FR-A designed the course evaluation questionnaire. H-Q, LL-Z, J-W and C-Z collected, analyzed and interpreted the data. DC-F performed the statistical analysis and supervised the study. All authors critically reviewed and revised the manuscript. All authors read and approved the final manuscript.

Funding

This work was funded by the Natural Science Foundation of Anhui Provincial Education Department (KJ2021A0315) and Outstanding Youth Project of Anhui Province Education Department (2024AH030027).

Data availability

The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.

Declarations

Ethics approval and consent to participate

This study was conducted following the guidelines of the Helsinki Declaration, and approved by the local Ethics Committee of the Second Affiliated Hospital of Anhui Medical University. Each participant voluntarily took part in this study. Informed consent was obtained from all participants.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Jinxiao Hou and Furun An contributed equally to this work.

References

  • 1.Trullàs JC, Blay C, Sarri E, et al. Effectiveness of problem-based learning methodology in undergraduate medical education: a scoping review. BMC Med Educ. 2022;22(1):104. 10.1186/s12909-022-03154-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Koh GC, Khoo HE, Wong ML, et al. The effects of problem-based learning during medical school on physician competency: a systematic review. CMAJ. 2008;178(1):34–41. 10.1503/cmaj.070565. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Al-Drees AA, Khalil MS, Irshad M, et al. Students’ perception towards the problem based learning tutorial session in a system-based hybrid curriculum. Saudi Med J. 2015;36(3):341–8. 10.15537/smj.2015.3.10216. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Hartling L, Spooner C, Tjosvold L, et al. Problem-based learning in pre-clinical medical education: 22 years of outcome research. Med Teach. 2010;32(1):28–35. 10.3109/01421590903200789. [DOI] [PubMed] [Google Scholar]
  • 5.Laureano M, Mithoowani S, Tseng EK, et al. Improving medical education in hematology and transfusion medicine in canada: standards and limitations. Adv Med Educ Pract. 2021;12:1153–63. 10.2147/AMEP.S247159. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Hallquist E, Gupta I, Montalbano M, et al. Applications of artificial intelligence in medical education: A systematic review. Cureus. 2025;17(3):e79878. 10.7759/cureus.79878. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Masters K. Artificial intelligence in medical education. Med Teach. 2019;41(9):976–80. 10.1080/0142159X.2019.1595557. [DOI] [PubMed] [Google Scholar]
  • 8.Ganjavi C, Eppler M, O’Brien D, et al. ChatGPT and large Language models (LLMs) awareness and use. A prospective cross-sectional survey of U.S. Medical students. PLOS Digit Health. 2024;3(9):e0000596. 10.1371/journal.pdig.0000596. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Luan H, Geczy P, Lai H, et al. Challenges and future directions of big data and artificial intelligence in education. Front Psychol. 2020;11:580820. 10.3389/fpsyg.2020.580820. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Sapci AH, Sapci HA. Artificial intelligence education and tools for medical and health informatics students: systematic review. JMIR Med Educ. 2020;6(1):e19285. 10.2196/19285. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Guo D, Yang D, Zhang H, et al. Deepseek-r1: incentivizing reasoning capability in Llms via reinforcement learning. ArXiv Preprint arXiv. 2025. 2501.12948.
  • 12.Liang W, Chen P, Zou X, et al. DeepSeek: the Watson to doctors-from assistance to collaboration. J Thorac Dis. 2025;17(2):1103–5. 10.21037/jtd-2025b-03. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Wu D, Xiang Y, Wu X, et al. Artificial intelligence-tutoring problem-based learning in ophthalmology clerkship. Ann Transl Med. 2020;8(11):700. 10.21037/atm.2019.12.15. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Temsah A, Alhasan K, Altamimi I, et al. DeepSeek in healthcare: revealing opportunities and steering challenges of a new Open-Source artificial intelligence frontier. Cureus. 2025;17(2):e79221. 10.7759/cureus.79221. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Rasouli S, Alkurdi D, Jia B. The role of artificial intelligence in modern medical education and practice: A systematic literature review. MedRxiv. 2024:2024.07.25.24311022. 10.1101/2024.07.25.24311022
  • 16.Frank JR, Snell LS, Cate OT, et al. Competency-based medical education: theory to practice. Med Teach. 2010;32(8):638–45. 10.3109/0142159X.2010.501190. [DOI] [PubMed] [Google Scholar]
  • 17.Euliano TY, Mahla ME. Problem-based learning in residency education: a novel implementation using a simulator. J Clin Monit Comput. 1999;15(3–4):227–32. 10.1023/a:1009980500385. [DOI] [PubMed] [Google Scholar]
  • 18.Mergen M, Junga A, Risse B, et al. Immersive training of clinical decision making with AI driven virtual patients - a new VR platform called medical tr.ai.ning. GMS J Med Educ. 2023;40(2):Doc18. 10.3205/zma001600. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Tan Y, Xu W, Li S, et al. Augmented and virtual reality (AR/VR) for education and training in the AEC industry: A systematic review of research and applications. Buildings. 2022;12(10):1529. 10.3390/buildings12101529. [Google Scholar]
  • 20.Wang D, Huang X. Transforming education through artificial intelligence and immersive technologies: enhancing learning experiences. Interact Learn Environ. 2025;1–20. 10.1080/10494820.2025.2465451.
  • 21.Holzinger A, Langs G, Denk H, et al. Causability and explainability of artificial intelligence in medicine. Wiley Interdiscip Rev Data Min Knowl Discov. 2019;9(4):e1312. 10.1002/widm.1312. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Chan KS, Zary N. Applications and challenges of implementing artificial intelligence in medical education: integrative review. JMIR Med Educ. 2019;5(1):e13930. 10.2196/13930. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Shaw J, Ali J, Atuire CA, et al. Research ethics and artificial intelligence for global health: perspectives from the global forum on bioethics in research. BMC Med Ethics. 2024;25(1):46. 10.1186/s12910-024-01044-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Zhui L, Fenghe L, Xuehu W, et al. Ethical considerations and fundamental principles of large Language models in medical education: viewpoint. J Med Internet Res. 2024;26:e60083. 10.2196/60083. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Hui Z, Zewu Z, Jiao H, et al. Application of ChatGPT-assisted problem-based learning teaching method in clinical medical education. BMC Med Educ. 2025;25(1):50. 10.1186/s12909-024-06321-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Wartman SA, Combs CD. Medical education must move from the information age to the age of artificial intelligence. Acad Med. 2018;93(8):1107–9. 10.1097/ACM.0000000000002044. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Material 1. (367.6KB, docx)
Supplementary Material 2. (17.3KB, docx)
Supplementary Material 3. (20.3KB, docx)

Data Availability Statement

The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.


Articles from BMC Medical Education are provided here courtesy of BMC

RESOURCES