ABSTRACT
Artificial intelligence (AI) in the form of ChatGPT has rapidly attracted attention from physicians and medical educators. While it holds great promise for more routine medical tasks, may broaden one’s differential diagnosis, and may be able to assist in the evaluation of images, such as radiographs and electrocardiograms, the technology is largely based on advanced algorithms akin to pattern recognition. One of the key questions raised in concert with these advances is: What does the growth of artificial intelligence mean for medical education, particularly the development of critical thinking and clinical reasoning?
In this commentary, we will explore the elements of cognitive theory that underlie the ways in which physicians are taught to reason through a diagnostic case and compare hypothetico-deductive reasoning, often employing illness scripts, with inductive reasoning, which is based on a deeper understanding of mechanisms of health and disease. Issues of cognitive bias and their impact on diagnostic error will be examined. The constructs of routine and adaptive expertise will also be delineated.
The application of artificial intelligence to diagnostic problem solving, along with concerns about racial and gender bias, will be delineated. Using several case examples, we will demonstrate the limitations of this technology and its potential pitfalls and outline the direction medical education may need to take in the years to come.
INTRODUCTION
AI, based on large generative language models (LLM), made a great splash with the advent of ChatGPT in the past two years. While the technology will continue to evolve, concern has been raised by scientists (1) and those fearful of the increasing power and influence of technology companies pursuing the many possible uses of machine learning (2). University faculty have cautioned about the use of AI by students to write term papers and to get assistance in other ways that may undermine academic integrity.
In medicine, there is a hope that ChatGPT will transform diagnostic accuracy, expanding differential diagnoses and minimizing medical error (3). But will this be as transformative as most expect it to be? Preliminary studies show that ChatGPT can answer many questions from board review books, designed to assist students in preparing for the U.S. Medical Licensing Exam (USMLE) program’s Step 1 exam, with a score of about 65% accuracy (4)—a debatable endorsement, particularly on multiple-choice questions that are often based on factual recall. Nevertheless, the progression of machine learning is raising questions about the nature of medical education now and in the years to come.
An important problem confronting medical education is: What are the real cognitive challenges facing doctors today and in the future? Since the development and refinement of the internet, finding facts has become increasingly easy. But how do we deal with a patient with a confusing medical history, physical exam, and laboratory studies or an individual with a well-established chronic disease who now has new findings? Is it the progression of the underlying disease or an emerging different problem? To what degree is AI useful in these situations? To help answer these questions, we will use principles of cognitive science to explore the elements of critical thinking and diagnostic reasoning necessary for medicine and characterize the type of teaching required to support and develop analytical thinking skills. Furthermore, we will distinguish “knowledge” from thinking and review the construct of adaptive expertise and its reliance on inductive reasoning from core principles, a process with which AI appears to struggle considerably. Finally, we will examine some very significant concerns about a form of cognitive bias related to gender and race that infects AI in its present iterations, and consider possible uses of AI in medical education in the years ahead.
Cognitive Science and Critical Thinking
The work of Kahneman and Tversky introduced the dual process framework for thinking (5). System 1 thinking is rapid, intuitive, and largely based on pattern recognition. System 2 thinking is characterized by effortful analysis. The brain is built to recognize patterns; it is how we get through the day, recognize people around us, and make many simple decisions (Do I turn right or left here?). The parts of the brain responsible for system 1 thinking are largely based in the more primitive regions of the brain (e.g., the insular cortex), regions that are also responsible for emotions. Some aspects of medicine (e.g., interpreting radiographs or diagnosing dermatological lesions) are more amenable to pattern recognition than others. When radiologists were asked to quickly (within seconds) interpret chest radiographs while placed in an fMRI scanner, the regions of the brain activated during the task were not those associated with higher order executive function; rather, they were the deeper, more primitive regions of the brain (6). These same regions were also activated when the radiologists were asked to name silhouettes of small animals superimposed on the radiographs (6).
In medical school, many educators use illness scripts to teach clinical reasoning. Illness scripts are an organized mental summary of the features or characteristics of a disease (7,8). For example, the combination of dyspnea on exertion with crackles on lung exam, a history of orthopnea, and peripheral edema equals a diagnosis of congestive heart failure (CHF). This is a way to reduce the complexity of analyzing medical problems to a simpler task of pattern recognition. The student need not understand how and why these features relate to a diagnosis of CHF; it is sufficient to memorize the features. It has been proposed that illness scripts are the fastest way to teach students how to construct a differential diagnosis (7). That may well be true, but is it the best way to teach students who may be practicing medicine for 40 years, during which time many of the “facts” learned in medical school will be shown to be in error? Would these students not be better served with more time devoted to mechanisms of disease and how to think critically about how the features in a particular case can be explained? Is our goal to prepare them primarily for the first day of their first clinical clerkship or to be laying the foundation for a lifetime of medicine with an evolving knowledge about and understanding of health and disease? Can they distinguish a patient with CHF from a patient with morbid obesity who has similar symptoms and signs on physical exam?
Illness scripts are an important part of hypothetico-deductive reasoning (8). Based largely on pattern recognition, the hypothetico-deductive process goes from a small number of findings to a differential diagnosis (the hypotheses) (9). Once one has several possible diagnoses in mind, one deduces what one knows about each diagnosis (i.e., the features of the disease). The process of arriving at a diagnosis then devolves to finding the best match between the characteristics of your patient and the features of each of the diseases in your list of hypotheses. If some of the features of your patient do not match with one of the hypotheses, we tend to ignore them (this is called a confirmation bias) and rationalize discarding the information with statements such as, “Patients are not always like the textbook.” Of course, if the actual diagnosis is not in your original list, you are unlikely to discover the correct source of the problem.
In contrast to hypothetico-deductive reasoning, we can teach students to do inductive reasoning (Figure 1). One collects significant amounts of data from the history, physical exam, and preliminary laboratory evaluations before constructing an “answer.” The doctor approaches the data as clues to understanding the biological mechanisms of disease, creates intermediate hypotheses (e.g., the low blood pressure is due to a low systemic vascular resistance, i.e., a mechanism, not a disease state), and then tests those hypotheses before arriving at a diagnosis. By focusing on mechanisms of disease, this process may provide a broader differential diagnosis and be less susceptible to cognitive biases.
Fig. 1.
Inductive versus Deductive Reasoning. (Panel A) The classical hypothetico-deductive approach to clinical reasoning. With a small number of facts (F1), one creates a differential diagnosis (H1–H4) and then deduces what one knows about each of these diagnoses (F2–F10). The patient is then matched with these facts to arrive at the most likely diagnosis. (Panel B) The inductive approach to clinical reasoning. One collects many facts (F1–F8) and creates intermediate mechanistic hypotheses (e.g., the low blood pressure is due to low systemic vascular resistance) (H1–H3); one then tests those hypotheses to arrive at a final diagnosis.
Modified from Reference 9.
A cognitive bias is a predisposition to respond in a particular way. Dozens of cognitive biases have been described in the medical literature (10). Three of the most common are availability bias (overestimating the probability of a diagnosis being present, based on how easy it is to recall it), confirmation bias (attentiveness to things that confirm your thinking and ignoring or discounting elements of your case that are counter to your thinking), and anchoring bias (an unwillingness to consider alternative hypotheses when you believe you have the answer to the problem). A recent study conducted in the Veterans Affairs medical system showed that merely placing “history of heart failure” in the triage sheet for patients coming to the emergency department reduced the chances that the patient would be evaluated for a pulmonary embolism (PE) and led to an increased error rate in diagnosing PE (11). By its nature, an individual using illness scripts and hypothetico-deductive reasoning is likely to be more susceptible to cognitive biases than one who is reasoning via inductive processes. Teaching medical students about the cognitive processes underlying thinking has been shown to improve clinical reasoning (12).
Expertise
Expertise is commonly defined as knowledge plus experience (13). An expert physician is often conceptualized as one with vast experience and an ability to remember many of the cases seen over years of practice. Confronted with a new challenging case, the expert may reflect on the similarity of this case to another one they have seen and quickly come up with the diagnosis. This type of expertise has been characterized as “routine” expertise (13). It is fast and often accurate but susceptible to bias; if the expert sees a case they have never encountered before, they force it to fit into something that seems similar. Routine expertise stands in contrast to “adaptive” expertise (Table 1). The adaptive expert views a new case they have never encountered as a point of departure for thinking mechanistically about how the constellation of findings might come about; they work to “create” a new solution to the problem (13).
TABLE 1.
Routine vs. Adaptive Expertise
|
See Reference 13.
Artificial Intelligence and Clinical and Educational Applications
The broad definition of artificial intelligence is the development of a computer system that is able to perform tasks that normally require human intelligence. In the case of ChatGPT, the most common AI program now being considered for use in medicine, it may be characterized as a generative pretrained transformer (hence, GPT). It is a large language model, which means it has been trained to recognize and generate written language. Using the internet as its source, it searches for words posed to it in a prompt and generates answers to questions based on common sequences of words found on the internet. In many ways, AI is pattern recognition taken to an entirely new level (2). If there is “thinking” going on, it is system 1 thinking.
Among the first demonstrations of the use of AI was to employ computers to play chess; an IBM computer beat a world chess master. Chess has been defined as a “kind” environment (14). The rules are well established, and one gets rapid feedback on the efficacy or correctness of a particular move of one of the chess pieces. The computer has been “taught” the algorithms for moves based on training on the patterns of chess; this is classic machine learning. It recognizes the patterns, even better than the master chess player, and makes its own moves accordingly.
In contrast, medicine is a “wicked” environment (14). There are many exceptions to the rules, and the consequences of a particular intervention (i.e., feedback on the correctness of action) can be delayed by hours, days, or weeks. Multiple diseases or pathophysiological derangements may be present simultaneously; how do they interact? Is it possible for one intervention to improve one aspect of the problem but make another aspect worse? It is much less clear that AI can navigate these types of problems.
Application of AI to Medical Problems
A healthy person goes swimming with a snorkel for the first time. He would like to swim below the surface of the water and decides to use a three-foot-long snorkel tube. As he is swimming, he notices that he is becoming very short of breath, more than he experiences normally with swimming. He comes to the surface, removes the mouthpiece, and feels fine. What is the problem? How can we explain this new onset of symptoms?
When this question was posed to ChatGPT 3.5, the response suggested that there was a defect in the snorkel or that the individual was out of shape (Figure 2). This question, which is posed to first year medical students in a respiratory physiology/pathophysiology course, requires the student to think inductively and to consider issues of the resistance and dead space of the tube; the former leads to a need for the generation of greater swings in intrathoracic pressure (more muscle work) and the latter leads one to the conclusion that minute ventilation will need to increase due to the increase in the proportion of dead space to alveolar ventilation in each breath. ChatGPT knows about dead space and resistance, but it cannot analyze the question, the solution for which one must apply basic physiological principles to solve the problem. This is a question that may not appear anywhere on the internet; if there is no pattern for AI to find or recognize, it cannot “reason” to the answer.
Fig. 2. Question: Why is a person swimming with a three-foot-long snorkel short of breath?
Chat GPT3.5:
If a person swimming with a three-foot-long snorkel is experiencing shortness of breath, it could be due to several reasons:
Improper Breathing Technique: Using a snorkel requires proper breathing technique. If the individual is not breathing efficiently or is holding their breath underwater for extended periods, they may experience shortness of breath.
Inadequate Air Supply: Despite using a snorkel, there could still be limitations on airflow, especially if the snorkel’s design or the swimmer’s lung capacity doesn’t provide enough oxygen intake.
Physical Exertion: Swimming itself can be physically demanding, and if the swimmer is exerting a lot of energy, they may experience shortness of breath regardless of using a snorkel.
Equipment Issues: If the snorkel is not functioning correctly, such as water leaking in or improper fit causing discomfort, it could lead to breathing difficulties.
Health Conditions: Underlying health conditions such as asthma or respiratory issues can exacerbate breathing difficulties during physical activities like swimming, even with the aid of a snorkel.
Anxiety or Panic: Some individuals may experience anxiety or panic when swimming, leading to shallow breathing or feeling short of breath.
It’s essential for the individual to assess their technique, equipment, physical condition, and environment to determine the cause of their shortness of breath and take appropriate measures to address it. If shortness of breath persists or worsens, it’s advisable to seek medical attention.
Correct answer: The long snorkel has a smaller radius than the airway and increases the resistance of the system. In addition, the snorkel acts like an extension of the trachea thereby increasing the dead space of the system. Thus, for any given size breath, less air reaches the alveolus; this reduces alveolar ventilation leading to acute hypercapnia or increased total ventilation to compensate for the increased dead space of the system.
Similarly, the clinical case described above in which a middle-aged, obese patient with risk factors for coronary artery disease experiences dyspnea with exertion requires one to think more broadly about the physiology of the symptom dyspnea and the physical exam finding of crackles or rales. When the case is provided to ChatGPT, the classic pattern for CHF is recognized (Figure 3). The alternative explanation, based on an understanding of how the mechanisms of the signs and symptoms of the case may result from obesity, is not discussed. A diagnostic error results. To be sure, there are times when AI may broaden one’s differential diagnosis (3), but one must remain vigilant about its limits and emphasize analytical, inductive reasoning in the education of students and trainees.
Fig. 3. A 45-year-old man with a history of hypertension, diabetes mellitus, and a body mass index (BMI) of 39 has progressive shortness of breath on exertion and complains of more difficulty breathing lying flat. There are rales on chest exam and mild leg edema. What is the cause of the dyspnea?
ChatGPT:
Given the symptoms described, the most likely cause of the dyspnea (shortness of breath) in this 45-year-old man with a history of hypertension, diabetes mellitus, and obesity (BMI of 39) is congestive heart failure (CHF).
Here’s how each symptom corresponds to CHF:
Progressive Shortness of Breath on Exertion: Shortness of breath on exertion is a common symptom of CHF. As the heart’s ability to pump blood effectively diminishes, the body may not receive enough oxygenated blood during physical activity, leading to dyspnea.
More Difficulty Breathing Lying Flat (Orthopnea): Orthopnea, or difficulty breathing while lying flat, is a classic symptom of CHF. When lying flat, fluid accumulates in the lungs more readily due to decreased venous return, exacerbating dyspnea.
Rales on Chest Exam: Rales, also known as crackles, are abnormal lung sounds that indicate fluid accumulation in the alveoli or small airways. In CHF, fluid backs up into the lungs due to the heart’s inability to pump effectively, leading to pulmonary congestion and the development of crackles on auscultation.
Mild Leg Edema: Peripheral edema, particularly in the legs, is another common sign of CHF. It occurs due to fluid retention and increased venous pressure, often resulting from impaired cardiac function.
Given the patient’s medical history of hypertension, diabetes mellitus, and obesity, all of which are risk factors for cardiovascular disease, CHF becomes a highly probable diagnosis. It’s essential for the patient to seek medical evaluation promptly to confirm the diagnosis and initiate appropriate management, which may include medication, lifestyle modifications, and dietary changes.
Correct answer: The patient has a large abdomen which impairs movement of the diaphragm in the supine position, leading to orthopnea. The large abdomen and heavy chest wall increase pleural pressure at the bases of the lungs, which leads to atelectasis and rales/crackles on exam. The mild leg edema is likely due to venous stasis, common in obese patients. The patient also has risk factors for coronary disease and CHF, which must be considered, but complications of obesity in this relatively young man may be more likely.
Assessing probabilities for one disease or another is one of the key challenges facing the clinician. Determining pre- and post-test probabilities is an important element necessary for effective evaluations. Preliminary evidence suggests that AI performs better after a negative test result than after a positive one (15). The reasons for this discrepancy are not yet known.
Risk of Bias and Hallucinations: As it scours the internet in its efforts to find language to incorporate into its answer, AI is subject to the limitations of the internet. Many poor medical studies make it to publication, and even well-designed investigations, particularly those performed prior to the last 10 years, may be subject to gender, ethnic, and racial bias (16,17). The challenge is that this is not transparent to the user of AI. Even if one asks for references from AI to substantiate its recommendations/findings, one has to be careful about “hallucinations” (18), a term used to describe situations in which AI creates false or nonexistent data.
Other Potential Uses of AI in Medical Education
AI or machine learning, given its ability to ultimately see patterns that may not be recognizable to a human eye, may prove particularly useful for interpretation of radiographic and electrocardiographic images (19). Whether this can be used educationally to enhance the ability of a physician in training or, alternatively, will discourage students and residents from developing their own abilities in these areas is uncertain.
Similarly, there is excitement about the ability of AI to produce documentation of a doctor-patient clinical interaction (20). This may prove to be a huge time saver for practicing clinicians, but will it also limit the ability of the student to grow and develop their analytical skills if AI does the work of sorting through the data for them? Can AI be used to help critique a student’s “assessment and plan”? These are areas of ongoing work.
For medical educators, AI holds the promise of reducing their workload, particularly in the creation of cases for study by students. Similarly, work is underway to determine if AI can be trained on selective materials provided for a curriculum in order to serve as a “grader” for open-ended essay questions, which provide insights into a student’s thinking in ways that multiple-choice questions cannot. Issues of human bias in grading would also potentially be mitigated by “training” the AI on a highly curated dataset of information.
Another challenge for busy clinicians is providing answers to questions from patients; the goal is to provide information that is understandable and transmitted in an empathic manner. In a study comparing physician and Chatbot responses to 195 exchanges between doctor and patients, the AI responses were found to be significantly better at producing what were considered “good” or “very good” answers (21). Additionally, the AI responses were judged to be significantly more empathic than physician responses (21). As with the creation of documentation, can AI be used to educate young physicians, or will it merely replace this aspect of what we historically consider an important competency for a doctor?
CONCLUSION
The master clinician has always been viewed as the great diagnostician—the individual who could look at a complex case with multiple symptoms and physical findings and weave them via mechanisms of human biology to construct a picture that explains all that is demonstrated by the patient. That is what characterizes clinical reasoning. Sherlock Holmes, the embodiment of these abilities in a detective, was based upon a great medical professor who taught Arthur Conan Doyle, the author of these famous stories. Can AI replace Holmes? Clearly, AI will improve in the future. For now, it has limitations when asked to do the complex tasks we associate with challenges in medicine, and it remains subject to potentially significant biases residing within the vast reaches of the internet.
Noam Chomsky, one of history’s great linguists, has noted that the human mind “seeks not to infer brute correlations among data points but to create explanations” (22). As we ask, “Is AI really intelligent?” we may reflect on his view of intelligence: AI’s “deepest flaw is the absence of the most critical capacity of any intelligence: to say not only what is the case, what was the case, and what will be the case—that’s description and prediction—but also what is not the case and what could not be the case. Those are the ingredients of explanation, the mark of true intelligence” (22). AI will clearly affect medicine in the years to come and will change the ways in which doctors work. It will also make the ability to reason, to think, to analyze problems, and to know how best to apply principles of human biology at the bedside more important.
DISCUSSION
Mallal, Nashville: That was wonderful. You mentioned Thinking, Fast and Slow by Daniel Kahneman. He considered the fast lower brain versus the slow upper brain. In contrast, Ian McGilchrist emphasizes how the right and left hemispheres see the world differently. He would argue that our right hemisphere sees the big picture and asks the why questions, while the left focuses on the tasks. Does the art of medicine, asking the right question, come from our right hemisphere? If so, could the problem with today’s AI be that it lacks a right hemisphere?
Schwartzstein, Boston: Is that where what comes from?
Mallal, Nashville: Is that where the art of medicine comes from?
Schwartzstein, Boston: Oh, the art of medicine? The right hemisphere?
Mallal, Nashville: Am I asking the right question?
Schwartzstein, Boston: Yes, well I’m probably not enough of a neuroscientist to know that answer; I haven’t seen any data or discussions of the neural location for the cognitive processes that might equate with the art of medicine, but it’s an interesting thought.
Maddox, St. Louis: Thanks for the talk. I’ve noted that the term artificial intelligence is starting to shift to augmented intelligence to recognize that it can serve, rather than replace, clinicians. I also agree with you that AI’s power comes from pattern recognition and its ability to consistently process those patterns regardless of time of day. I wonder if you are also seeing a role for AI to augment humans’ abilities, and, if so, what is the best use of that?
Schwartzstein, Boston: I was being provocative in the presentation summary by minimizing the contributions that AI may make in the future; we have people at our hospital who are writing a lot about where AI is headed right now and talking about how we use it in our M&M conferences. We’ve been applying AI to the M&M conference and seeing how it does, and certainly it can pull out things that we may not think about. For malpractice cases, however, the literature shows most diagnostic errors leading to malpractice cases are not unusual esoteric diseases; rather, they’re common things we just didn’t think about in the moment. What I worry about is that people will become more and more reliant on AI. If they don’t learn these other things—the other strategies for using underlying pathophysiology to solve problems that constitute so much of clinical medicine—we will be in trouble. If they don’t learn these skills in medical school and residency, they’re not going to learn it later so their ability to determine when AI is actually wrong or not thinking accurately may be limited.
So, yes, there’s going to be a role for AI, but we must figure out the right way to use it and still teach these things that are important to know if we’re going to be able to do the hard stuff that AI can’t do.
Maddox, St. Louis: Thank you.
Footnotes
Potential Conflicts of Interest: None Disclosed.
REFERENCES
- 1.Matheyn M., Israni S. T., Ahmed M., Whicher D., editors. Artificial Intelligence in Health Care: The Hope, the Hype, the Promise, the Peril . Washington, DC: National Academy of Medicine; , 2022. [Google Scholar]
- 2.Auerbach D. B. Meganets: How Digital Forces Beyond Our Control Commandeer Our Daily Lives and Inner Realities . New York: Hathette Book Group; , 2023. [Google Scholar]
- 3.Kanjee Z, Crowe B, Rodman A. Accuracy of a generative artificial intelligence model in a complex diagnostic challenge. JAMA . 2023;330:78–80. doi: 10.1001/jama.2023.8288. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Kung TH, Cheatham M, Medenilla A, et al. Performance of ChatGPT on USMLE: potential for AI-assisted medical education using large language models. PLOS Digit Health . 2023;2:e0000198. doi: 10.1371/journal.pdig.0000198. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Kahneman D. Thinking, Fast and Slow . New York, NY: Farrar, Straus and Giroux; 2011. [Google Scholar]
- 6.Melo M, Gusso GDF, Levites M, et al. How doctors diagnose diseases and prescribe treatments: an fMRI study of diagnostic salience. Scientific Reports . 2017;7:1304. doi: 10.1038/s41598-017-01482-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Bowen JL. Educational strategies to promote clinical diagnostic reasoning. N Engl J Med. 2006;355:2215–7. doi: 10.1056/NEJMra054782. [DOI] [PubMed] [Google Scholar]
- 8.Charlin B, Tardif J, Boshuizen PA. Scritps and medical diagnostic knowledge: theory and applications for clinical reasoning instruction and research. Acad Med . 2000;75:182–90. doi: 10.1097/00001888-200002000-00020. [DOI] [PubMed] [Google Scholar]
- 9.Pottier P, Hardouin JB, Hodges BD, et al. Exploring how students think: a new method combining think-aloud and concept mapping protocols. Med Educ . 2010;33:926–35. doi: 10.1111/j.1365-2923.2010.03748.x. [DOI] [PubMed] [Google Scholar]
- 10.Croskerry P. The importance of cognitive errors in diagnosis and strategies to minimize them. Acad Med . 2003;78:775–80. doi: 10.1097/00001888-200308000-00003. [DOI] [PubMed] [Google Scholar]
- 11.Ly DP, Shekelle PG, Song Z. Evidence for anchoring bias during physician decision-making. JAMA Intern Med . 2023;183:818–23. doi: 10.1001/jamainternmed.2023.2366. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Bonifacino E, Follansbee WP, Farkas AH, et al. Implementation of a clinical reasoning curriculum for clerkship-level medical students: a pseudo-randomized and controlled study. Diagnosis . 2019;6:165–72. doi: 10.1515/dx-2018-0063. [DOI] [PubMed] [Google Scholar]
- 13.Mylopoulos M, Regehr G. Cognitive metaphors of expertise and knowledge: prospects and limitations for medical education. Med Educ. 2007;41:1159–65. doi: 10.1111/j.1365-2923.2007.02912.x. [DOI] [PubMed] [Google Scholar]
- 14.Epstein, D. Range: Why Generalists Triumph in a Specialized World . New York, NY: Riverhead Books; 2019. [Google Scholar]
- 15.Rodman A, Buckley TA, Manrai AK, Jorgan DJ. Artificial intelligence vs clinician performance in estimating probabilities of diagnoses before and after testing. JAMA NetworkOpen . 2023;6(12):e2347075. doi: 10.1001/jamanetworkopen.2023.47075. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Masters K. Ethical use of artificial intelligence in health professions education: AMEEE Guide No. 158. Med Teach . 2023;45:574–84. doi: 10.1080/0142159X.2023.2186203. [DOI] [PubMed] [Google Scholar]
- 17.Omiye JA, Gui H, Rezaei SJ, et al. Large language models in medicine: the potentials and pitfalls. Ann Int Med . 2024;277:210–20. doi: 10.7326/M23-2772. [DOI] [PubMed] [Google Scholar]
- 18.Alam F, Lim ME, Zulkipli IN. Integrating AI in medical education: embracing ethical usage and critical understanding. Front Med . 2023;10:127907. doi: 10.3389/fmed.2023.1279707. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Haug CJ, Drazen JM. Artificial intelligence and machine learning in clinical medicine, 2023. N Engl J Med . 2023;388:1201–7. doi: 10.1056/NEJMra2302038. [DOI] [PubMed] [Google Scholar]
- 20.Ravi A, Neinstein A, Murray SG. Large language models and medical education: preparing for a rapid transformation in how trainees will learn to be doctors. ATS Scholar . 2023;4:282–92. doi: 10.34197/ats-scholar.2023-0036PS. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Ayers JW, Poliak A, Dredze M, et al. Comparing physicians and artificial intelligence Chatbot responses to patient questions posted on public social media forum. JAMA Intern Med . 2023;183:589–96. doi: 10.1001/jamainternmed.2023.1838. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Chomsky N, Roberts I, Watumull The False Promise of ChatGPT. New York Times . 2023. Available at: https://www.nytimnes.cm/2023/03/08/opinion/noam-chomsky-chatgpt-ai.htm/

