. 2025 Aug 7;13:e76636. doi: 10.2196/76636

Table 2.

Summary of the implementation and evaluation of generative large language model–driven interventions in stroke care.

Study		Task objectives	Input data or sources	Dialogue patterns	Reported time stamp	Gold-standard providers or benchmarks	Evaluation perspectives	Evaluation metrics
Clinical decision-making support (n=10)
	Pedro et al [38]	Predict the mRS^a score at 3 mo after mechanical thrombectomy	Patient H&P^b, neuroimaging, and mechanical thrombectomy procedure notes	Single turn	Yes	Stroke unit clinicians	AGS^c for true exact and dichotomized mRS scores; bias; comparison with MT-DRAGON	Cohen κ; mean difference and 95% limits of agreement; ND^d
	Chen et al [39]	Make clinical decisions for mechanical thrombectomy	Patient H&P and neuroimaging notes	Single turn	No	Neurology specialists	AGS for mechanical thrombectomy decision; different error analysis	Counts and rate
	Strotzer et al [40]	Interpret MRI^e and CT^f images and generate free-text reports in stroke cases	MRI and CT images	Single turn	Yes	Radiologists and nonradiologist in training	AGS for free-report items; interrun consistency; AGS for binary pathological findings; impact on nonradiologist	Agreement rate; interrun consistency rate and the Randolph free-marginal κ; accuracy, sensitivity, and specificity; rate (distribution across categories)
	Kuzan et al [41]	Interpret DWI^g and ADC^h maps in acute stroke cases	DWI and ADC maps	Multiturn	No	Radiologists	AGS for stroke and normal or all-image interpretation	Rate; TPⁱ, TN^j, FP^k, FN^l, sensitivity, specificity, PPV^m, NPVⁿ, and accuracy
	Fei et al [42]	Evaluate cognitive performance in stroke cases	Patient responses to selected RBMT-II^o, MMSE^p, and MoCA^q items	Multiturn	No	Rehabilitation physicians	Intermodel and model-physician agreement	Intraclass correlation coefficient and P value
	Lee et al [43]	Locate lesions based on patient H&P	Patient H&P notes	Single turn	Yes	Location description from original published case report	AGS for trial- and case-based lesion localization; different error analysis	Specificity, sensitivity, precision, and F₁-score; ND
	Haim et al [44]	Calculate the NIHSS^r score and predict the use of tissue plasminogen activator	EMR^s periods	Single turn	No	Emergency department physicians	Intermodel and model-physician agreement; predictive validity	Cohen κ and P value; AUC-ROC^t
	Chen et al [45]	Calculate GCS^u, H&H^v, and ICH^w scores	Patient neuroexamination notes without scores	Single turn	No	Scores in original neuroexamination notes	AGS for scoring; repeatability; effect of varied case complexity and prompting design	Average error rate and average error magnitude
	Blacker et al [46]	Use of SNACC^x HQRs^y to answer questions on perioperative stroke and endovascular treatment anesthesia	Patient H&P notes	Multiturn	Yes	Anesthesiologists	HQR identification; correct reference citation; potentially harmful information	ND
	Zhang et al [37]	Generate rehabilitation prescriptions and ICF^z codes in a stroke case	Patient H&P notes	Multiturn	No	Physical medicine and rehabilitation physicians	Content exhaustiveness and clinical applicability; inference logic	ND
Administrative assistance (n=9)
	Sivarajkumar et al [47]	Extract and categorize physical rehabilitation exercise information from stroke cases	EHR^aa sections with physical therapy information	Single turn	No	Physical therapy experts	AGS for extracted items	Accuracy, precision, recall, and F₁-score
	Guo et al [48]	Extract triples by fine-tuning and integrating a relation classification module	Stroke-related medical text from SEMRC^ab, CVDEMRC^ac, and CMeIE^ad	—^ae	No	Relevant items from datasets and performance of the Cas-CLN^af benchmark models	AGS for total and overlapping triple extraction; performance improvements over baseline models	F₁-score; rate
	Lehnen et al [49]	Extract key information for mechanical thrombectomy	Mechanical thrombectomy records	Single turn	No	Interventional neuroradiologists	AGS for extracted items; different error analysis; intermodel extraction performance comparison	Correct rate and Cohen κ; count and rate; correct rate and P value
	Fiedler et al [50]	Extract IPSS^ag format information and infer disease severity	Outpatient notes	Multiturn	No	Clinical investigators	AGS for extracted items	Rate
	Wang et al [51]	Extract and infer key information for mechanical thrombectomy surgery	Mechanical thrombectomy records	Single turn and multiturn for correct format response	No	Interventional and junior neuroradiologists	AGS for extracted and inferred items; agreement with junior neuroradiologists; processing efficiency	Accuracy, sensitivity, specificity, AUC^ah, and mean squared error; P value; average case processing time
	Goh et al [52]	Extract stroke audit data	Discharge summaries	Single turn	No	Relevant items from original discharge summaries	AGS for extracted items; model-clinician comparison in AGS; inference error analysis	Counts and rate; ND
	Baro et al [53]	Predict stroke hospitalization by fine-tuning and integrating classification layers	Chronological health insurance data with aggregated medical events	—	No	Relevant items from original health insurance data	AGS across time windows using the general fine-tuned models; AGS comparison between general and stroke-specific fine-tuned models	F₁-score, sensitivity, specificity, and AUC
	Meddeb et al [54]	Extract key information for mechanical thrombectomy items	Mechanical thrombectomy records	Single turn	No	Radiologists and clinical medical students	AGS for extracted items; efficiency improvement with EITL^ai	Precision, recall, and F₁-score; average case time savings
	Kim et al [55]	Perform data wrangling on a large dataset of patients with stroke	Metadata from the CRCS-K^aj dataset and neurologist queries	Multiturn	No	Neurologists	Reliability and efficiency of EITL workflow and clinical knowledge alignment	ND
Direct patient interaction (n=5)
	Argymbay et al [56]	Provide personalized stroke risk insights and answer medical queries based on patient data	Stroke risk values, medical literature, and patient queries	Multiturn	No	Clinicians	Stroke risk factor review, personalized health recommendation provision, and anxiety alleviation	ND
	Neo et al [57]	Answer rehabilitation questions for patients with stroke and their caregivers	280 unique questions	Single turn	Yes	Clinicians	Content correctness, safety, relevance, and readability; interrater agreement; free comments for responses	3-point Likert scale; Fleiss κ and Cohen κ; ND
	Wu et al [58]	Provide nonmedical professionals with stroke-related health information	2 questions about stroke prevention from the ASA^ak website	Single turn	No	Answers available on the ASA website	Readability compared with the Google Assistant; content relevance	Word counts, GFS^al, SMOG^am index, DCS^an, FKRT^ao, and P value; keyword matching counts
	Chen et al [59]	Interpret commands and generate Python code for hand exoskeleton control	Recognized user voice commands	Single turn	No	Rehabilitation physicians	Executability and efficiency of tasks among models; response process in free scenarios	Success rate across trials and time; ND
	Rifai et al [60]	Interpret commands and generate target coordinates for upper-limb robot control	Recognized user voice commands	Single turn	No	Predefined targets	Executability of path to targets compared with joystick control; intuitive handling; success and stable control	ND; user experience questionnaire; success rate across trials and ND
Automated literature review (n=1)
	Anghelescu et al [36]	Assist in obtaining evidence on Actovegin’s efficacy for ischemic stroke	6 queries on medicine, review conduction, literature exploration, and evidence synthesis	Multiturn	No	Review contributors	General and in-depth answer correctness; citation applicability; PRISMA^ap-based evidence synthesis results	ND

^amRS: modified Rankin Scale.

^bH&P: history and neurological physical examination.

^cAGS: agreement with the gold standard.

^dND: narrative description.

^eMRI: magnetic resonance imaging.

^fCT: computed tomography.

^gDWI: diffusion-weighted imaging.

^hADC: apparent diffusion coefficient.

ⁱTP: true positive.

^jTN: true negative.

^kFP: false positive.

^lFN: false negative.

^mPPV: positive predictive value.

ⁿNPV: negative predictive value.

^oRBMT-II: Rivermead Behavioral Memory Test–II.

^pMMSE: Mini-Mental State Examination.

^qMoCA: Montreal Cognitive Assessment.

^rNIHSS: National Institutes of Health Stroke Scale.

^sEMR: electronic medical record.

^tAUC-ROC: area under the receiver operating characteristic curve.

^uGCS: Glasgow Coma Scale.

^vH&H: Hunt and Hess scale.

^wICH: intracranial hemorrhage.

^xSNACC: Society for Neuroscience in Anesthesiology and Critical Care.

^yHQR: high-quality recommendation.

^zICF: International Classification of Functioning, Disability, and Health.

^aaEHR: electronic health record.

^abSEMRC, stroke EMR entity and entity-related corpus.

^acCVDEMRC: cardiovascular EMR entity and entity relationship–labeling corpus.

^adCMeIE: Chinese Medical Information Extraction dataset.

^aeNot applicable.

^afCas-CLN: cascade binary pointer tagging network with conditional layer normalization.

^agIPSS: International Pediatric Stroke Study.

^ahAUC: area under the curve.

^aiEITL: expert in the loop.

^ajCRCS-K: Clinical Research Collaboration for Stroke in Korea.

^akASA: American Stroke Association.

^alGFS: Gunning fog score.

^amSMOG: Simple Measure of Gobbledygook.

^anDCS: Dale-Chall score.

^aoFKRT: Flesch-Kincaid readability test.

^apPRISMA: Preferred Reporting Items for Systematic Reviews and Meta-Analyses.