Skip to main content
Wiley Open Access Collection logoLink to Wiley Open Access Collection
. 2025 Mar 23;89(Suppl 3):1781–1782. doi: 10.1002/jdd.13882

Student Performance on the American Board of Orthodontics Written Examination Following Flipped Classroom and Generative AI Approach

Hera Kim‐Berman 1,, Jordyn Tarlie 2, Jacob Herremans 3
PMCID: PMC12728792  PMID: 40123065

1. Problem

The flipped classroom approach enhances student learning by increasing engagement, reducing lecture time, and fostering collaboration [1]. Generative Artificial Intelligence (GenAI) tools can further enrich this method by efficiently generating educational content and offering customized feedback. Effectiveness of GenAI has shown students performed better academically and assisted medical students in learning [2, 3]. However, large language models (LLMs) can produce inaccuracies, outdated content, lack domain‐specific expertise, and offer few source references, complicating information verification [4]. The University of Michigan recently introduced “Maizey”, a Retrieval‐augmented Generation (RAG) framework within a LLM. RAG grounds LLM outputs in external, verifiable facts, enhancing response accuracy and reliability [5]. Maizey allows faculty to create and share with students a custom GenAI GPT tool such as an “Artificial Intelligence Teaching Assistant (AITA)” that uses specific datasets within a learning management system (LMS). This preliminary study evaluated 2nd year orthodontic residents’ performance on the national American Board of Orthodontics (ABO) Written Examination following the implementation of the flipped classroom methodology paired with AITA.

2. Solution

Using the Maizey tool, AITA was incorporated with the LMS course site (CANVAS), which was used with previous cohorts and had up‐to‐date content, including the latest ABO reading list, textbook and lecture summaries, journal articles, and slide presentations. Similar to prior years, the students received approximately 850 multiple‐choice questions (MCQs) based on the ABO examination content for 3 of the 4 modules; Basic and Applied Biomedical Sciences, Clinical Sciences A and B [6]. Module 4 material, Clinical Case Analysis, was not included in this course. Residents equally divided and answered MCQs using AITA prior to class. The residents cited references and supplemented with additional resources when they deemed it necessary. Classroom activities spanned 2‐hour weekly sessions over 10 weeks, focusing on reviewing answers, discussing findings, and conducting further research. Performance on the ABO Written Examination for residents who took the course in the 3 previous years using the same course content but with faculty‐expert lectures (Group A, n = 21) were compared with residents who used the flipped classroom and AITA method (Group B, n = 7) using T‐test (p<0.05).

3. Results

There was significant improvement in overall examination performance for modules 1, 2, and 3 for Group B when compared to Group A (p≤0.001) (Table 1). Group B performed in the highest quintiles (1 = top 20% and 2 = above average) when compared with Group A and the national average scores. There was no difference in performance between the groups for module 4, which was not included in this course.

TABLE 1.

Performance on the American Board of Orthodontics (ABO) Written Examination using faculty expert lecture‐based method for 3 consecutive years of 2nd year orthodontic resident cohorts (Group A, n = 21) and after implementing flipped classroom method with Generative AI tool, Artificial Intelligence Teaching Assistant (AITA) for a cohort of 2nd year orthodontic residents (Group B, n = 7). Each year consisted of 7 residents per cohort. The ABO Written Examination results are ranked by quintiles with 1 representing the highest and 5 the lowest. The national average for the results for all parts of the ABO Written Examination are in quintile 3 = middle 20%.

Module 1:

Basic and Applied Biomedical Sciences

Mean (SD)

Module 2: Clinical Sciences “A”

Mean (SD)

Module 3:

Clinical Sciences “B”

Mean (SD)

Module 4:

Clinical Case Analysis

Mean (SD)

Total Score

Mean (SD)

Group A (n = 21)

3.67 (1.065) 3.90 (1.091) 3.86 (1.014) 2.24 (1.411) 3.42 (0.735)

Group B (n = 7)

1.71 (0.756) 1.29 (0.488) 1.57 (1.134) 3.29 (1.496) 1.96 (0.636)

Significance (p<0.05)*

<0.001* <0.001* 0.001* 0.159 <0.001*

1 = Top 20%, 2 = Above average, 3 = Middle 20%, 4 = Below average, 5 = Bottom 20%

The barrier to implementing flipped classroom approach is that faculty find sourcing materials for a flipped classroom challenging, and students may resent the increased out‐of‐class workload [3]. However, using the LMS for sourcing materials minimized the implementation time of the flipped classroom setup. The major benefit of this method was that references and sources for AITA's response were provided, allowing the student to verify the information. Residents reported high satisfaction with AITA, emphasizing its efficient study support, which increased their understanding and helped identify knowledge gaps. Although increasing the sample size, outside–of–classroom study activities and other potential confounding variables needs further investigation, the preliminary results suggest that incorporating flipped classroom with GenAI tools like AITA can significantly improve pedagogy and learning experiences.

Kim‐Berman H., Tarlie J., and Herremans J., “Student Performance on the American Board of Orthodontics Written Examination Following Flipped Classroom and Generative AI Approach.” Journal of Dental Education 89, no. S3 (2025): 1781–1782. 10.1002/jdd.13882

References

  • 1. Rotellar C. and Cain J., “Research, Perspectives, and Recommendations on Implementing the Flipped Classroom,” American Journal of Pharmaceutical Education 80, no. 2 (2016): 34, 10.5688/ajpe80234. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. Essel H. B., Vlachopoulos D., Tachie‑Menson A., Johnson E. E., and Baah P. K., “The Impact of a Virtual Teaching Assistant (chatbot) on Students' learning in Ghanaian Higher Education,” Int J Educ Technol High Educ 19, no. 1 (2022): 1–19.2.35013716 [Google Scholar]
  • 3. Kung T. H., Cheatham M., Medenilla A., et al., “Performance of ChatGPT on USMLE: Potential for AI‐Assisted Medical Education Using Large Language Models,” PLoS Digit Health 2, no. 2 (2023): e0000198, 10.1371/journal.pdig.0000198. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Giannakopoulos K., Kavadella A., Salim A. A., Stamatopoulos V., and Kaklamanos E. G., “Evaluation of the Performance of Generative AI Large Language Models ChatGPT, Google Bard, and Microsoft Bing Chat in Supporting Evidence‐Based Dentistry: Comparative Mixed Methods Study,” Journal of Medical Internet Research [Electronic Resource] 25 (2023): e51580, 10.2196/51580. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Chen J., Lin H., Han X., and Sun L, “Benchmarking Large Language Models in Retrieval‐Augmented Generation,” In: Proceedings of the Thirty‐Eighth AAAI Conference on Artificial Intelligence. AAAI‐24; 2024, 17754–17762. [Google Scholar]
  • 6. The American Board of Orthodontics . Written examination specifications. . Accessed July 17, 2024.6, Available at: https://www.americanboardortho.com/orthodontists/become‐certified/written‐exam/examination‐specifications/.

Articles from Journal of Dental Education are provided here courtesy of Wiley

RESOURCES