Skip to main content
. 2025 Apr 16;27:e70535. doi: 10.2196/70535

Table 1.

Characteristics of the included studies (n=20).

Author, year, and country Study design Chronic diseases LLMa types Outcome assessment Key study findings
AI-Anezi (2024) [47], Saudi Arabia Quasi-experimental study Cancer, diabetes, and kidney failure ChatGPT 3.5, engaged by participants for ≥15 min daily for 2 weeks Semistructured interviews ChatGPT 3.5 improved disease awareness, health behaviors, and accessible support while reducing specialist reliance, yet faces issues with disease diagnosis, empathy, data privacy, and managing complex conditions.
Alanezi et al (2024) [48], Saudi Arabia Quasi-experimental study Chronic mental health conditions ChatGPT 3.5, engaged by participants for ≥15 min daily for 2 weeks Semistructured interviews ChatGPT 3.5 enhanced mental health literacy and self-care and delivered crisis interventions. It faces challenges in data privacy, accuracy, and catering to cultural and linguistic diversities.
Alanezi (2024) [21], Saudi Arabia Quasi-experimental study Cancer ChatGPT 3.5, engaged by participants for 2 weeks Focus group interviews ChatGPT 3.5 improved cancer knowledge, self-management, emotional aid, and social resource access; however, it faces privacy, reliability, and personalization challenges.
Aliyeva et al (2024) [35], United States Simulation study Severe hearing loss ChatGPT 4.0, posed with five postoperative management questions Survey ChatGPT 4.0 had 100% accuracy, rapid response times, 98% clarity, and 92% relevance in its recommendations.
Choo et al (2024) [46], South Korea A simulation study Colorectal cancer ChatGPT, used to generate treatment recommendations Survey ChatGPT showed 86.7% oncological management alignment with the multidisciplinary team.
Dergaa et al (2024) [49], Qatar A simulation study Mental health ChatGPT, engaged as a digital psychiatric provider Qualitative assessment ChatGPT offered quick, empathetic, and guideline-concordant responses, whereas it struggled with clarification and customizing plans for complex scenarios.
Dergaa et al (2024) [50], Qatar A simulation study Hypertension, osteoarthritis, stress, diabetes, and asthma ChatGPT 4.0, interacted with five hypothetical patient profiles to prescribe a 30-day fitness program Qualitative assessment While ChatGPT 4.0 can generate safety-conscious exercise programs, it lacks variability and cannot perform initial assessments or adjust regimens in real time.
Franco D’Souza et al (2023) [51], India A simulation study Psychiatric disorders ChatGPT 3.5, interacted with 100 clinical case vignettes Survey ChatGPT 3.5 performed well in generating management strategies followed by diagnosis for psychiatric conditions.
Kianian et al (2024) [36], United States A simulation study Glaucoma ChatGPT used to generate patient handouts Survey ChatGPT generated readable health information at a ninth-grade reading level and scored the quality of health resources with high precision (r=0.725; P<.001).
Lim et al (2024) [45], Singapore A simulation study Colorectal cancer Retrieval-augmented generation-enhanced ChatGPT 4.0, instructed to provide colonoscopy screening recommendation Survey The enhanced model had higher accuracy in recommending colorectal screening intervals (79% vs 50.5%; P<.01) and experienced few hallucinations compared with the standard model.
Mondal et al (2023) [52], India A simulation study Lifestyle-related chronic diseases ChatGPT 3.5, presented 20 cases of chronic disease management Survey ChatGPT 3.5 generated readable text with a mean FKREb score of 27.8 and had significantly higher accuracy (1.83, SD 0.37) and applicability (1.9, SD 0.21) than the hypothesized median score of 1.5.
Papastratis et al (2024) [53], Greece A simulation study Noncommunicable diseases ChatGPT 3.5 and ChatGPT 4, interacted with 15 profiles to generate weekly meal plans. Survey ChatGPT 3.5 and 4.0 showed lower nutrient accuracy (81.5% and 81.6%) than a knowledge-based recommender (91%) but improved to 86% in ChatGPT 4.0 by inputting personalized energy target.
Pradhan et al (2024) [37], United States A simulation study Liver cirrhosis ChatGPT 4.0, DocsGPT, Google Bard, and Bing Chat for generating a one-page patient education sheet Survey LLM-generated materials exhibited higher FKRE scores, 76%-99% accuracy rates, and comparable actionability to human-derived materials.
Puerto Nino et al (2024) [43], Canada A simulation study Benign prostate enlargement ChatGPT 4.0+, fed with 88 queries for benign prostate enlargement Survey ChatGPT 4.0+ had a precision score range of 0.50-1 and a median general quality score of 4.
Seth et al (2023) [41], Australia A simulation study Carpal tunnel syndrome ChatGPT (no version number), used to generate management strategies with six inquiries Survey ChatGPT accurately diagnosed carpal tunnel syndrome and recommended treatment options but faced challenges with erroneous references and insufficient information depth.
Singer et al (2024) [38], United States A simulation study Ophthalmology issues Aeyeconsult powered by ChatGPT 4.0, interacted with 260 eyecare questions Survey Aeyeconsult outperformed ChatGPT 4.0 in accuracy (83.4% vs 69.2%) and demonstrated greater consistency in responses across repeated attempts on OphthoQuestions.
Spallek et al (2023) [42], Australia A simulation study Mental health and substance use disorders ChatGPT 4.0 pro, interacted with queries for mental health and substance use Survey ChatGPT 4.0 had higher reading levels and accuracy but lacked human expert depth and breadth, with 23% featuring stigmatizing phrases.
Willms and Liu (2024) [44], Canada An autoethnographic case study Chronic disease prevention by increasing physical activity ChatGPT 3.0, used to generate adaptive physical activity interventions Qualitative assessment ChatGPT 3.0 had acceptable accuracy and relevance in responding to prompts but sometimes provided false academic references.
Yang et al (2024) [39], United States A case study Diet management for preventing chronic illnesses ChatDiet based on ChatGPT 3.5 Turbo to provide food recommendations Causal graphs and qualitative assessment ChatDiets effectively personalized food recommendations (85%-95% effectiveness) and demonstrated interactivity, but occasional hallucinations
Yeo et al (2023) [40], United States A simulation study Liver cirrhosis and hepatocellular carcinoma ChatGPT Dec 15 version, entered with 164 questions for liver disease management Survey with qualitative assessment ChatGPT had 79.1% and 74% accuracy rates and provided emotional support; however, it might be unable to identify eligibility for hepatocellular carcinoma screening and liver transplantation.

aLLM: large language model.

bFKRE: Flesch-Kincaid reading ease score.