Table 2.
Summary of the implementation and evaluation of generative large language model–driven interventions in stroke care.
| Study | Task objectives | Input data or sources | Dialogue patterns | Reported time stamp | Gold-standard providers or benchmarks | Evaluation perspectives | Evaluation metrics | |
| Clinical decision-making support (n=10) | ||||||||
|
|
Pedro et al [38] | Predict the mRSa score at 3 mo after mechanical thrombectomy | Patient H&Pb, neuroimaging, and mechanical thrombectomy procedure notes | Single turn | Yes | Stroke unit clinicians | AGSc for true exact and dichotomized mRS scores; bias; comparison with MT-DRAGON | Cohen κ; mean difference and 95% limits of agreement; NDd |
|
|
Chen et al [39] | Make clinical decisions for mechanical thrombectomy | Patient H&P and neuroimaging notes | Single turn | No | Neurology specialists | AGS for mechanical thrombectomy decision; different error analysis | Counts and rate |
|
|
Strotzer et al [40] | Interpret MRIe and CTf images and generate free-text reports in stroke cases | MRI and CT images | Single turn | Yes | Radiologists and nonradiologist in training | AGS for free-report items; interrun consistency; AGS for binary pathological findings; impact on nonradiologist | Agreement rate; interrun consistency rate and the Randolph free-marginal κ; accuracy, sensitivity, and specificity; rate (distribution across categories) |
|
|
Kuzan et al [41] | Interpret DWIg and ADCh maps in acute stroke cases | DWI and ADC maps | Multiturn | No | Radiologists | AGS for stroke and normal or all-image interpretation | Rate; TPi, TNj, FPk, FNl, sensitivity, specificity, PPVm, NPVn, and accuracy |
|
|
Fei et al [42] | Evaluate cognitive performance in stroke cases | Patient responses to selected RBMT-IIo, MMSEp, and MoCAq items | Multiturn | No | Rehabilitation physicians | Intermodel and model-physician agreement | Intraclass correlation coefficient and P value |
|
|
Lee et al [43] | Locate lesions based on patient H&P | Patient H&P notes | Single turn | Yes | Location description from original published case report | AGS for trial- and case-based lesion localization; different error analysis | Specificity, sensitivity, precision, and F1-score; ND |
|
|
Haim et al [44] | Calculate the NIHSSr score and predict the use of tissue plasminogen activator | EMRs periods | Single turn | No | Emergency department physicians | Intermodel and model-physician agreement; predictive validity | Cohen κ and P value; AUC-ROCt |
|
|
Chen et al [45] | Calculate GCSu, H&Hv, and ICHw scores | Patient neuroexamination notes without scores | Single turn | No | Scores in original neuroexamination notes | AGS for scoring; repeatability; effect of varied case complexity and prompting design | Average error rate and average error magnitude |
|
|
Blacker et al [46] | Use of SNACCx HQRsy to answer questions on perioperative stroke and endovascular treatment anesthesia | Patient H&P notes | Multiturn | Yes | Anesthesiologists | HQR identification; correct reference citation; potentially harmful information | ND |
|
|
Zhang et al [37] | Generate rehabilitation prescriptions and ICFz codes in a stroke case | Patient H&P notes | Multiturn | No | Physical medicine and rehabilitation physicians | Content exhaustiveness and clinical applicability; inference logic | ND |
| Administrative assistance (n=9) | ||||||||
|
|
Sivarajkumar et al [47] | Extract and categorize physical rehabilitation exercise information from stroke cases | EHRaa sections with physical therapy information | Single turn | No | Physical therapy experts | AGS for extracted items | Accuracy, precision, recall, and F1-score |
|
|
Guo et al [48] | Extract triples by fine-tuning and integrating a relation classification module | Stroke-related medical text from SEMRCab, CVDEMRCac, and CMeIEad | —ae | No | Relevant items from datasets and performance of the Cas-CLNaf benchmark models | AGS for total and overlapping triple extraction; performance improvements over baseline models | F1-score; rate |
|
|
Lehnen et al [49] | Extract key information for mechanical thrombectomy | Mechanical thrombectomy records | Single turn | No | Interventional neuroradiologists | AGS for extracted items; different error analysis; intermodel extraction performance comparison | Correct rate and Cohen κ; count and rate; correct rate and P value |
|
|
Fiedler et al [50] | Extract IPSSag format information and infer disease severity | Outpatient notes | Multiturn | No | Clinical investigators | AGS for extracted items | Rate |
|
|
Wang et al [51] | Extract and infer key information for mechanical thrombectomy surgery | Mechanical thrombectomy records | Single turn and multiturn for correct format response | No | Interventional and junior neuroradiologists | AGS for extracted and inferred items; agreement with junior neuroradiologists; processing efficiency | Accuracy, sensitivity, specificity, AUCah, and mean squared error; P value; average case processing time |
|
|
Goh et al [52] | Extract stroke audit data | Discharge summaries | Single turn | No | Relevant items from original discharge summaries | AGS for extracted items; model-clinician comparison in AGS; inference error analysis | Counts and rate; ND |
|
|
Baro et al [53] | Predict stroke hospitalization by fine-tuning and integrating classification layers | Chronological health insurance data with aggregated medical events | — | No | Relevant items from original health insurance data | AGS across time windows using the general fine-tuned models; AGS comparison between general and stroke-specific fine-tuned models | F1-score, sensitivity, specificity, and AUC |
|
|
Meddeb et al [54] | Extract key information for mechanical thrombectomy items | Mechanical thrombectomy records | Single turn | No | Radiologists and clinical medical students | AGS for extracted items; efficiency improvement with EITLai | Precision, recall, and F1-score; average case time savings |
|
|
Kim et al [55] | Perform data wrangling on a large dataset of patients with stroke | Metadata from the CRCS-Kaj dataset and neurologist queries | Multiturn | No | Neurologists | Reliability and efficiency of EITL workflow and clinical knowledge alignment | ND |
| Direct patient interaction (n=5) | ||||||||
|
|
Argymbay et al [56] | Provide personalized stroke risk insights and answer medical queries based on patient data | Stroke risk values, medical literature, and patient queries | Multiturn | No | Clinicians | Stroke risk factor review, personalized health recommendation provision, and anxiety alleviation | ND |
|
|
Neo et al [57] | Answer rehabilitation questions for patients with stroke and their caregivers | 280 unique questions | Single turn | Yes | Clinicians | Content correctness, safety, relevance, and readability; interrater agreement; free comments for responses | 3-point Likert scale; Fleiss κ and Cohen κ; ND |
|
|
Wu et al [58] | Provide nonmedical professionals with stroke-related health information | 2 questions about stroke prevention from the ASAak website | Single turn | No | Answers available on the ASA website | Readability compared with the Google Assistant; content relevance | Word counts, GFSal, SMOGam index, DCSan, FKRTao, and P value; keyword matching counts |
|
|
Chen et al [59] | Interpret commands and generate Python code for hand exoskeleton control | Recognized user voice commands | Single turn | No | Rehabilitation physicians | Executability and efficiency of tasks among models; response process in free scenarios | Success rate across trials and time; ND |
|
|
Rifai et al [60] | Interpret commands and generate target coordinates for upper-limb robot control | Recognized user voice commands | Single turn | No | Predefined targets | Executability of path to targets compared with joystick control; intuitive handling; success and stable control | ND; user experience questionnaire; success rate across trials and ND |
| Automated literature review (n=1) | ||||||||
|
|
Anghelescu et al [36] | Assist in obtaining evidence on Actovegin’s efficacy for ischemic stroke | 6 queries on medicine, review conduction, literature exploration, and evidence synthesis | Multiturn | No | Review contributors | General and in-depth answer correctness; citation applicability; PRISMAap-based evidence synthesis results | ND |
amRS: modified Rankin Scale.
bH&P: history and neurological physical examination.
cAGS: agreement with the gold standard.
dND: narrative description.
eMRI: magnetic resonance imaging.
fCT: computed tomography.
gDWI: diffusion-weighted imaging.
hADC: apparent diffusion coefficient.
iTP: true positive.
jTN: true negative.
kFP: false positive.
lFN: false negative.
mPPV: positive predictive value.
nNPV: negative predictive value.
oRBMT-II: Rivermead Behavioral Memory Test–II.
pMMSE: Mini-Mental State Examination.
qMoCA: Montreal Cognitive Assessment.
rNIHSS: National Institutes of Health Stroke Scale.
sEMR: electronic medical record.
tAUC-ROC: area under the receiver operating characteristic curve.
uGCS: Glasgow Coma Scale.
vH&H: Hunt and Hess scale.
wICH: intracranial hemorrhage.
xSNACC: Society for Neuroscience in Anesthesiology and Critical Care.
yHQR: high-quality recommendation.
zICF: International Classification of Functioning, Disability, and Health.
aaEHR: electronic health record.
abSEMRC, stroke EMR entity and entity-related corpus.
acCVDEMRC: cardiovascular EMR entity and entity relationship–labeling corpus.
adCMeIE: Chinese Medical Information Extraction dataset.
aeNot applicable.
afCas-CLN: cascade binary pointer tagging network with conditional layer normalization.
agIPSS: International Pediatric Stroke Study.
ahAUC: area under the curve.
aiEITL: expert in the loop.
ajCRCS-K: Clinical Research Collaboration for Stroke in Korea.
akASA: American Stroke Association.
alGFS: Gunning fog score.
amSMOG: Simple Measure of Gobbledygook.
anDCS: Dale-Chall score.
aoFKRT: Flesch-Kincaid readability test.
apPRISMA: Preferred Reporting Items for Systematic Reviews and Meta-Analyses.