. 2025 Apr 16;27:e70535. doi: 10.2196/70535

Table 1.

Characteristics of the included studies (n=20).

Author, year, and country	Study design	Chronic diseases	LLM^a types	Outcome assessment	Key study findings
AI-Anezi (2024) [47], Saudi Arabia	Quasi-experimental study	Cancer, diabetes, and kidney failure	ChatGPT 3.5, engaged by participants for ≥15 min daily for 2 weeks	Semistructured interviews	ChatGPT 3.5 improved disease awareness, health behaviors, and accessible support while reducing specialist reliance, yet faces issues with disease diagnosis, empathy, data privacy, and managing complex conditions.
Alanezi et al (2024) [48], Saudi Arabia	Quasi-experimental study	Chronic mental health conditions	ChatGPT 3.5, engaged by participants for ≥15 min daily for 2 weeks	Semistructured interviews	ChatGPT 3.5 enhanced mental health literacy and self-care and delivered crisis interventions. It faces challenges in data privacy, accuracy, and catering to cultural and linguistic diversities.
Alanezi (2024) [21], Saudi Arabia	Quasi-experimental study	Cancer	ChatGPT 3.5, engaged by participants for 2 weeks	Focus group interviews	ChatGPT 3.5 improved cancer knowledge, self-management, emotional aid, and social resource access; however, it faces privacy, reliability, and personalization challenges.
Aliyeva et al (2024) [35], United States	Simulation study	Severe hearing loss	ChatGPT 4.0, posed with five postoperative management questions	Survey	ChatGPT 4.0 had 100% accuracy, rapid response times, 98% clarity, and 92% relevance in its recommendations.
Choo et al (2024) [46], South Korea	A simulation study	Colorectal cancer	ChatGPT, used to generate treatment recommendations	Survey	ChatGPT showed 86.7% oncological management alignment with the multidisciplinary team.
Dergaa et al (2024) [49], Qatar	A simulation study	Mental health	ChatGPT, engaged as a digital psychiatric provider	Qualitative assessment	ChatGPT offered quick, empathetic, and guideline-concordant responses, whereas it struggled with clarification and customizing plans for complex scenarios.
Dergaa et al (2024) [50], Qatar	A simulation study	Hypertension, osteoarthritis, stress, diabetes, and asthma	ChatGPT 4.0, interacted with five hypothetical patient profiles to prescribe a 30-day fitness program	Qualitative assessment	While ChatGPT 4.0 can generate safety-conscious exercise programs, it lacks variability and cannot perform initial assessments or adjust regimens in real time.
Franco D’Souza et al (2023) [51], India	A simulation study	Psychiatric disorders	ChatGPT 3.5, interacted with 100 clinical case vignettes	Survey	ChatGPT 3.5 performed well in generating management strategies followed by diagnosis for psychiatric conditions.
Kianian et al (2024) [36], United States	A simulation study	Glaucoma	ChatGPT used to generate patient handouts	Survey	ChatGPT generated readable health information at a ninth-grade reading level and scored the quality of health resources with high precision (r=0.725; P<.001).
Lim et al (2024) [45], Singapore	A simulation study	Colorectal cancer	Retrieval-augmented generation-enhanced ChatGPT 4.0, instructed to provide colonoscopy screening recommendation	Survey	The enhanced model had higher accuracy in recommending colorectal screening intervals (79% vs 50.5%; P<.01) and experienced few hallucinations compared with the standard model.
Mondal et al (2023) [52], India	A simulation study	Lifestyle-related chronic diseases	ChatGPT 3.5, presented 20 cases of chronic disease management	Survey	ChatGPT 3.5 generated readable text with a mean FKRE^b score of 27.8 and had significantly higher accuracy (1.83, SD 0.37) and applicability (1.9, SD 0.21) than the hypothesized median score of 1.5.
Papastratis et al (2024) [53], Greece	A simulation study	Noncommunicable diseases	ChatGPT 3.5 and ChatGPT 4, interacted with 15 profiles to generate weekly meal plans.	Survey	ChatGPT 3.5 and 4.0 showed lower nutrient accuracy (81.5% and 81.6%) than a knowledge-based recommender (91%) but improved to 86% in ChatGPT 4.0 by inputting personalized energy target.
Pradhan et al (2024) [37], United States	A simulation study	Liver cirrhosis	ChatGPT 4.0, DocsGPT, Google Bard, and Bing Chat for generating a one-page patient education sheet	Survey	LLM-generated materials exhibited higher FKRE scores, 76%-99% accuracy rates, and comparable actionability to human-derived materials.
Puerto Nino et al (2024) [43], Canada	A simulation study	Benign prostate enlargement	ChatGPT 4.0+, fed with 88 queries for benign prostate enlargement	Survey	ChatGPT 4.0+ had a precision score range of 0.50-1 and a median general quality score of 4.
Seth et al (2023) [41], Australia	A simulation study	Carpal tunnel syndrome	ChatGPT (no version number), used to generate management strategies with six inquiries	Survey	ChatGPT accurately diagnosed carpal tunnel syndrome and recommended treatment options but faced challenges with erroneous references and insufficient information depth.
Singer et al (2024) [38], United States	A simulation study	Ophthalmology issues	Aeyeconsult powered by ChatGPT 4.0, interacted with 260 eyecare questions	Survey	Aeyeconsult outperformed ChatGPT 4.0 in accuracy (83.4% vs 69.2%) and demonstrated greater consistency in responses across repeated attempts on OphthoQuestions.
Spallek et al (2023) [42], Australia	A simulation study	Mental health and substance use disorders	ChatGPT 4.0 pro, interacted with queries for mental health and substance use	Survey	ChatGPT 4.0 had higher reading levels and accuracy but lacked human expert depth and breadth, with 23% featuring stigmatizing phrases.
Willms and Liu (2024) [44], Canada	An autoethnographic case study	Chronic disease prevention by increasing physical activity	ChatGPT 3.0, used to generate adaptive physical activity interventions	Qualitative assessment	ChatGPT 3.0 had acceptable accuracy and relevance in responding to prompts but sometimes provided false academic references.
Yang et al (2024) [39], United States	A case study	Diet management for preventing chronic illnesses	ChatDiet based on ChatGPT 3.5 Turbo to provide food recommendations	Causal graphs and qualitative assessment	ChatDiets effectively personalized food recommendations (85%-95% effectiveness) and demonstrated interactivity, but occasional hallucinations
Yeo et al (2023) [40], United States	A simulation study	Liver cirrhosis and hepatocellular carcinoma	ChatGPT Dec 15 version, entered with 164 questions for liver disease management	Survey with qualitative assessment	ChatGPT had 79.1% and 74% accuracy rates and provided emotional support; however, it might be unable to identify eligibility for hepatocellular carcinoma screening and liver transplantation.

^aLLM: large language model.

^bFKRE: Flesch-Kincaid reading ease score.