Skip to main content
. 2025 Aug 7;13:e76636. doi: 10.2196/76636

Table 2.

Summary of the implementation and evaluation of generative large language model–driven interventions in stroke care.

Study Task objectives Input data or sources Dialogue patterns Reported time stamp Gold-standard providers or benchmarks Evaluation perspectives Evaluation metrics
Clinical decision-making support (n=10)

Pedro et al [38] Predict the mRSa score at 3 mo after mechanical thrombectomy Patient H&Pb, neuroimaging, and mechanical thrombectomy procedure notes Single turn Yes Stroke unit clinicians AGSc for true exact and dichotomized mRS scores; bias; comparison with MT-DRAGON Cohen κ; mean difference and 95% limits of agreement; NDd

Chen et al [39] Make clinical decisions for mechanical thrombectomy Patient H&P and neuroimaging notes Single turn No Neurology specialists AGS for mechanical thrombectomy decision; different error analysis Counts and rate

Strotzer et al [40] Interpret MRIe and CTf images and generate free-text reports in stroke cases MRI and CT images Single turn Yes Radiologists and nonradiologist in training AGS for free-report items; interrun consistency; AGS for binary pathological findings; impact on nonradiologist Agreement rate; interrun consistency rate and the Randolph free-marginal κ; accuracy, sensitivity, and specificity; rate (distribution across categories)

Kuzan et al [41] Interpret DWIg and ADCh maps in acute stroke cases DWI and ADC maps Multiturn No Radiologists AGS for stroke and normal or all-image interpretation Rate; TPi, TNj, FPk, FNl, sensitivity, specificity, PPVm, NPVn, and accuracy

Fei et al [42] Evaluate cognitive performance in stroke cases Patient responses to selected RBMT-IIo, MMSEp, and MoCAq items Multiturn No Rehabilitation physicians Intermodel and model-physician agreement Intraclass correlation coefficient and P value

Lee et al [43] Locate lesions based on patient H&P Patient H&P notes Single turn Yes Location description from original published case report AGS for trial- and case-based lesion localization; different error analysis Specificity, sensitivity, precision, and F1-score; ND

Haim et al [44] Calculate the NIHSSr score and predict the use of tissue plasminogen activator EMRs periods Single turn No Emergency department physicians Intermodel and model-physician agreement; predictive validity Cohen κ and P value; AUC-ROCt

Chen et al [45] Calculate GCSu, H&Hv, and ICHw scores Patient neuroexamination notes without scores Single turn No Scores in original neuroexamination notes AGS for scoring; repeatability; effect of varied case complexity and prompting design Average error rate and average error magnitude

Blacker et al [46] Use of SNACCx HQRsy to answer questions on perioperative stroke and endovascular treatment anesthesia Patient H&P notes Multiturn Yes Anesthesiologists HQR identification; correct reference citation; potentially harmful information ND

Zhang et al [37] Generate rehabilitation prescriptions and ICFz codes in a stroke case Patient H&P notes Multiturn No Physical medicine and rehabilitation physicians Content exhaustiveness and clinical applicability; inference logic ND
Administrative assistance (n=9)

Sivarajkumar et al [47] Extract and categorize physical rehabilitation exercise information from stroke cases EHRaa sections with physical therapy information Single turn No Physical therapy experts AGS for extracted items Accuracy, precision, recall, and F1-score

Guo et al [48] Extract triples by fine-tuning and integrating a relation classification module Stroke-related medical text from SEMRCab, CVDEMRCac, and CMeIEad ae No Relevant items from datasets and performance of the Cas-CLNaf benchmark models AGS for total and overlapping triple extraction; performance improvements over baseline models F1-score; rate

Lehnen et al [49] Extract key information for mechanical thrombectomy Mechanical thrombectomy records Single turn No Interventional neuroradiologists AGS for extracted items; different error analysis; intermodel extraction performance comparison Correct rate and Cohen κ; count and rate; correct rate and P value

Fiedler et al [50] Extract IPSSag format information and infer disease severity Outpatient notes Multiturn No Clinical investigators AGS for extracted items Rate

Wang et al [51] Extract and infer key information for mechanical thrombectomy surgery Mechanical thrombectomy records Single turn and multiturn for correct format response No Interventional and junior neuroradiologists AGS for extracted and inferred items; agreement with junior neuroradiologists; processing efficiency Accuracy, sensitivity, specificity, AUCah, and mean squared error; P value; average case processing time

Goh et al [52] Extract stroke audit data Discharge summaries Single turn No Relevant items from original discharge summaries AGS for extracted items; model-clinician comparison in AGS; inference error analysis Counts and rate; ND

Baro et al [53] Predict stroke hospitalization by fine-tuning and integrating classification layers Chronological health insurance data with aggregated medical events No Relevant items from original health insurance data AGS across time windows using the general fine-tuned models; AGS comparison between general and stroke-specific fine-tuned models F1-score, sensitivity, specificity, and AUC

Meddeb et al [54] Extract key information for mechanical thrombectomy items Mechanical thrombectomy records Single turn No Radiologists and clinical medical students AGS for extracted items; efficiency improvement with EITLai Precision, recall, and F1-score; average case time savings

Kim et al [55] Perform data wrangling on a large dataset of patients with stroke Metadata from the CRCS-Kaj dataset and neurologist queries Multiturn No Neurologists Reliability and efficiency of EITL workflow and clinical knowledge alignment ND
Direct patient interaction (n=5)

Argymbay et al [56] Provide personalized stroke risk insights and answer medical queries based on patient data Stroke risk values, medical literature, and patient queries Multiturn No Clinicians Stroke risk factor review, personalized health recommendation provision, and anxiety alleviation ND

Neo et al [57] Answer rehabilitation questions for patients with stroke and their caregivers 280 unique questions Single turn Yes Clinicians Content correctness, safety, relevance, and readability; interrater agreement; free comments for responses 3-point Likert scale; Fleiss κ and Cohen κ; ND

Wu et al [58] Provide nonmedical professionals with stroke-related health information 2 questions about stroke prevention from the ASAak website Single turn No Answers available on the ASA website Readability compared with the Google Assistant; content relevance Word counts, GFSal, SMOGam index, DCSan, FKRTao, and P value; keyword matching counts

Chen et al [59] Interpret commands and generate Python code for hand exoskeleton control Recognized user voice commands Single turn No Rehabilitation physicians Executability and efficiency of tasks among models; response process in free scenarios Success rate across trials and time; ND

Rifai et al [60] Interpret commands and generate target coordinates for upper-limb robot control Recognized user voice commands Single turn No Predefined targets Executability of path to targets compared with joystick control; intuitive handling; success and stable control ND; user experience questionnaire; success rate across trials and ND
Automated literature review (n=1)

Anghelescu et al [36] Assist in obtaining evidence on Actovegin’s efficacy for ischemic stroke 6 queries on medicine, review conduction, literature exploration, and evidence synthesis Multiturn No Review contributors General and in-depth answer correctness; citation applicability; PRISMAap-based evidence synthesis results ND

amRS: modified Rankin Scale.

bH&P: history and neurological physical examination.

cAGS: agreement with the gold standard.

dND: narrative description.

eMRI: magnetic resonance imaging.

fCT: computed tomography.

gDWI: diffusion-weighted imaging.

hADC: apparent diffusion coefficient.

iTP: true positive.

jTN: true negative.

kFP: false positive.

lFN: false negative.

mPPV: positive predictive value.

nNPV: negative predictive value.

oRBMT-II: Rivermead Behavioral Memory Test–II.

pMMSE: Mini-Mental State Examination.

qMoCA: Montreal Cognitive Assessment.

rNIHSS: National Institutes of Health Stroke Scale.

sEMR: electronic medical record.

tAUC-ROC: area under the receiver operating characteristic curve.

uGCS: Glasgow Coma Scale.

vH&H: Hunt and Hess scale.

wICH: intracranial hemorrhage.

xSNACC: Society for Neuroscience in Anesthesiology and Critical Care.

yHQR: high-quality recommendation.

zICF: International Classification of Functioning, Disability, and Health.

aaEHR: electronic health record.

abSEMRC, stroke EMR entity and entity-related corpus.

acCVDEMRC: cardiovascular EMR entity and entity relationship–labeling corpus.

adCMeIE: Chinese Medical Information Extraction dataset.

aeNot applicable.

afCas-CLN: cascade binary pointer tagging network with conditional layer normalization.

agIPSS: International Pediatric Stroke Study.

ahAUC: area under the curve.

aiEITL: expert in the loop.

ajCRCS-K: Clinical Research Collaboration for Stroke in Korea.

akASA: American Stroke Association.

alGFS: Gunning fog score.

amSMOG: Simple Measure of Gobbledygook.

anDCS: Dale-Chall score.

aoFKRT: Flesch-Kincaid readability test.

apPRISMA: Preferred Reporting Items for Systematic Reviews and Meta-Analyses.