Skip to main content

This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

medRxiv logoLink to medRxiv
[Preprint]. 2026 Feb 13:2026.01.26.26344845. Originally published 2026 Jan 27. [Version 2] doi: 10.64898/2026.01.26.26344845

Understanding Clinical Reasoning Variability in Medical Large Language Models: A Mechanistic Interpretability Study

Mirage Modi, Jordan E Krull, Donte Johnson, Xiaoying Wang, Timothy D Gauntner, Mingjia Li, Hao Cheng, Anjun Ma, Ping Zhang, Daniel G Stover, Zihai Li, Qin Ma
PMCID: PMC12870575  PMID: 41646812

Abstract

Medical large language models (LLMs) achieving high benchmark accuracy exhibit unexplained variability in clinical tasks, producing errors that clinicians cannot safeguard against. We evaluated clinical reasoning stability in GPT-5, MedGemma-27B-Text-IT, and OpenBioLLM-Llama3-70B using 355 systematic perturbations of physician-validated oncology cases and trained sparse autoencoders on 1 billion tokens from 50,000 MIMIC-IV clinical notes to decompose their internal representation. We find models exhibit dramatic reasoning instability, shifting staging accuracy by over 50% based solely on prompt format, or generating definitive staging in clinically insufficient scenarios. Sparse autoencoder analysis revealed hierarchical encoding in MedGemma, where high-magnitude features encode lexical identity and low-magnitude features encode contextual meaning. OpenBioLLM distributes information uniformly. We demonstrate these internal encoding structures differentially affect retrieval interventions, suggesting interventions effective for one architecture may harm another. We recommend healthcare institutions implement architecture-specific safety validation, as benchmark equivalence does not imply functional equivalence, with implications for AI safety beyond healthcare.

Full Text Availability

The license terms selected by the author(s) for this preprint version do not permit archiving in PMC. The full text is available from the preprint server.


Articles from medRxiv are provided here courtesy of Cold Spring Harbor Laboratory Preprints

RESOURCES