Understanding Clinical Reasoning Variability in Medical Large Language Models: A Mechanistic Interpretability Study

Mirage Modi; Jordan E Krull; Donte Johnson; Xiaoying Wang; Timothy D Gauntner; Mingjia Li; Hao Cheng; Anjun Ma; Ping Zhang; Daniel G Stover; Zihai Li; Qin Ma

doi:10.64898/2026.01.26.26344845

This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

[Preprint]. 2026 Feb 13:2026.01.26.26344845. Originally published 2026 Jan 27. [Version 2] doi: 10.64898/2026.01.26.26344845

Understanding Clinical Reasoning Variability in Medical Large Language Models: A Mechanistic Interpretability Study

PMCID: PMC12870575 PMID: 41646812

Abstract

Medical large language models (LLMs) achieving high benchmark accuracy exhibit unexplained variability in clinical tasks, producing errors that clinicians cannot safeguard against. We evaluated clinical reasoning stability in GPT-5, MedGemma-27B-Text-IT, and OpenBioLLM-Llama3-70B using 355 systematic perturbations of physician-validated oncology cases and trained sparse autoencoders on 1 billion tokens from 50,000 MIMIC-IV clinical notes to decompose their internal representation. We find models exhibit dramatic reasoning instability, shifting staging accuracy by over 50% based solely on prompt format, or generating definitive staging in clinically insufficient scenarios. Sparse autoencoder analysis revealed hierarchical encoding in MedGemma, where high-magnitude features encode lexical identity and low-magnitude features encode contextual meaning. OpenBioLLM distributes information uniformly. We demonstrate these internal encoding structures differentially affect retrieval interventions, suggesting interventions effective for one architecture may harm another. We recommend healthcare institutions implement architecture-specific safety validation, as benchmark equivalence does not imply functional equivalence, with implications for AI safety beyond healthcare.

Full Text Availability

The license terms selected by the author(s) for this preprint version do not permit archiving in PMC. The full text is available from the preprint server.

PERMALINK

This is a preprint.

Understanding Clinical Reasoning Variability in Medical Large Language Models: A Mechanistic Interpretability Study

Mirage Modi

Jordan E Krull

Donte Johnson

Xiaoying Wang

Timothy D Gauntner

Mingjia Li

Hao Cheng

Anjun Ma

Ping Zhang

Daniel G Stover

Zihai Li

Qin Ma

Abstract

Full Text Availability

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

This is a preprint.

Understanding Clinical Reasoning Variability in Medical Large Language Models: A Mechanistic Interpretability Study

Mirage Modi

Jordan E Krull

Donte Johnson

Xiaoying Wang

Timothy D Gauntner

Mingjia Li

Hao Cheng

Anjun Ma

Ping Zhang

Daniel G Stover

Zihai Li

Qin Ma

Abstract

Full Text Availability

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases