Skip to main content
BMJ Medicine logoLink to BMJ Medicine
editorial
. 2025 May 11;4(1):e001614. doi: 10.1136/bmjmed-2025-001614

Understanding research on artificial intelligence in healthcare

Chris Paton 1,
PMCID: PMC12090526  PMID: 40395650

Clinicians are increasingly using artificial intelligence (AI) to help with managing their work. As AI tools become more tightly integrated into clinical decision making and patient care, it is important that clinicians are able to appraise the evidence of their effectiveness and ensure that patients consent to their use.

With many hopeful that new artificial intelligence (AI) computer systems might be coming to rescue healthcare from its many challenges, it might also be helpful for frontline clinicians to take a step back and consider whether AI can help their own clinical practice.

In their BMJ Medicine paper, Dijkstra et al provide an overview of how to read medical research papers that discuss the use of AI by clinicians (doi:10.1136/bmjmed-2025-001394).1 This topic is timely because the hype cycle around AI is peaking and clinicians are inundated with news stories about how they could use systems such as ChatGPT to diagnose patients' conditions,2 summarise their consultations,3 and generally make their lives easier. In this paper, which accompanies the "How to Read a Paper" articles in The BMJ,4 the authors draw attention to the need to carefully appraise reports about the performance of AI systems as we have learnt to do with other types of technology used in healthcare. The paper highlights some of the additional challenges involved with appraising papers that discuss the use of AI in healthcare, such as the need to carefully examine how the researchers handled the training and evaluation datasets. Data contamination is particularly problematic with large language models (LLMs) such as ChatGPT, where data used to evaluate the model are often present in the training dataset.5

Since the public release of ChatGPT, a new type of AI has been much discussed in the media but currently does not exist: artificial general intelligence (AGI). This refers to an AI system so advanced that it has the same or greater ability than the human brain and can solve so-called general problems, from driving a car to predicting the stock market. The ChatGPT chatbot was created by OpenAI as an experiment aimed at bringing AGI into fruition, although the underlying LLM technology now seems unlikely to achieve AGI.6 When it comes to LLMs and clinical decision making, if an LLM can answer United States Medical Licensing Examination (USMLE) questions accurately, it is not necessarily a sign that the model is thinking like a doctor or has any ability to do clinical reasoning. Instead, the researcher might have just inadvertently used questions and answers (such online USMLE question banks) for testing the model that were already encoded (memorised) during training. When the model is used in the real world on new cases not present in the training data, the model then fails and might "hallucinate" incorrect information.

Putting aside the hype around AGI, it is important to acknowledge the widespread use and usefulness of what Dijkstra and colleagues describe as artificial narrow intelligence. These systems are now being used to assist with medical diagnosis, support image guided procedures, and make the processes and practices of healthcare more efficient and safe.7 The US Food and Drug Administration has approved over 1000 of such AI algorithms, and medical device manufacturers now routinely include AI in their products and associated software applications. Regulators have converged on using the medical device rules to assess and regulate high risk software and algorithms: so-called Software as a Medical Device (SaMD) or AI as a Medical Device (AIaMD). This assessment has helped to distinguish between relatively low risk systems used mainly for administrative purposes that do not need to be classified as medical devices and higher risk systems that require closer oversight.

Clinicians have shown themselves to be early adopters of AI systems, particularly those systems that reduce administrative burdens such as ambient AI scribes.8 By learning how to appraise research that includes AI systems in healthcare, clinicians can feel more confident in which types of systems are genuinely safe and effective and which ones are more risky. Research from the Health Foundation has shown that many patients are concerned about the use of AI in clinical care.9 They are worried about how their private medical data might be used and whether they should believe the claims of controversial AI companies they often see portrayed negatively in the news and on social media. As clinicians adopt well tested and useful AI systems, they should keep in mind any patients' concerns about AI and remember that patients want to maintain a human connection as they attempt to navigate the difficulties that come with developing a medical illness.

Footnotes

Provenance and peer review: Commissioned; not externally peer reviewed.

References

  • 1.Dijkstra P, Greenhalgh T, Mekki YM, et al. How to read a paper involving artificial intelligence. BMJ Medicine. 2025;4:e001394. doi: 10.1136/bmjmed-2025-001394. [DOI] [Google Scholar]
  • 2.Goh E, Gallo RJ, Strong E, et al. GPT-4 assistance for improvement of physician performance on patient care tasks: a randomized controlled trial. Nat Med. 2025;31:1233–8. doi: 10.1038/s41591-024-03456-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Van Veen D, Van Uden C, Blankemeier L, et al. Adapted large language models can outperform medical experts in clinical text summarization. Nat Med. 2024;30:1134–42. doi: 10.1038/s41591-024-02855-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.The BMJ How to read a paper. https://www.bmj.com/about-bmj/resources-readers/publications/how-read-paper n.d. Available.
  • 5.Rydzewski NR, Dinakaran D, Zhao SG, et al. Comparative Evaluation of LLMs in Clinical Oncology. NEJM AI. 2024;1 doi: 10.1056/aioa2300151. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Sutskever I. NeurIPS; 2024. Sequence to sequence learning with neural networks: what a decade. [Google Scholar]
  • 7.Topol EJ. High-performance medicine: the convergence of human and artificial intelligence. Nat Med. 2019;25:44–56. doi: 10.1038/s41591-018-0300-7. [DOI] [PubMed] [Google Scholar]
  • 8.Paton C. Welcome to BMJ digital health & AI. BMJ Digital Health & AI. 2025;1 doi: 10.1136/bmjdhai-2024-000004. [DOI] [Google Scholar]
  • 9.The Health Foundation AI in health care: what do the public and NHS staff think? 2024

Articles from BMJ Medicine are provided here courtesy of BMJ Publishing Group

RESOURCES