Abstract
Background
Generative artificial intelligence technologies have disrupted information ecosystems, posing new threats to public health by enabling rapid, scalable manufacture of convincing but false health stories. This systematic review synthesizes evidence on how generative AI reconfigures health misinformation creation, dissemination, and moderation.
Methods
In line with PRISMA 2020, 15 empirical studies published between January 2023 and August 2025 were included. Databases consulted were MEDLINE (via PubMed), Embase, Scopus, Web of Science Core Collection, ACM Digital Library, IEEE Xplore, PsycINFO, Communication & Mass Media Complete, arXiv, and medRxiv/SSRN. Studies were contrasted on the basis of production capacity, propagation dynamics, and efficacy of mitigation at technical, sociotechnical, and governance layers.
Results
The synthesis indicates that generative AI substantially increases the volume, speed, and perceived credibility of health disinformation production, while altering its propagation dynamics. Users often struggle to distinguish AI‑generated from human‑authored health misinformation, and their sharing intentions are not tightly coupled with perceived accuracy. Existing detection systems show limited performance against AI‑generated content, and while labeling interventions can reduce perceived accuracy, their effects are context‑dependent.
Conclusion
Generative AI transforms the health misinformation landscape by lowering barriers to creation and exploiting platform and behavioral dynamics. Current mitigation strategies—spanning technical, sociotechnical, and governance layers—are promising but remain nascent and unevenly evaluated. Future work must prioritize multimodal, multilingual, and health‑specific verification, as well as real‑world testing of interventions, to build equitable and resilient health information ecosystems.
Supplementary Information
The online version contains supplementary material available at 10.1186/s12889-025-26148-9.
Keywords: Generative AI, Health misinformation, Deepfakes, Mitigation strategies, Systematic review
Introduction
The wide and rapid dissemination of generative Artificial Intelligence (AI) technologies capable of producing fluent text, realistic pictures, and other synthetic media has revolutionized online information environments [1, 2]. These generative models—such as large language models (e.g., GPT-4, Gemini), text-to-image diffusion models (e.g., DALL·E, Stable Diffusion), and audio/video synthesis systems—leverage vast datasets to create novel, often highly convincing content that mimics human-generated material [3, 4]. In healthcare environments—where effectiveness relies on accuracy and immediacy—these attributes open up new channels for creating and disseminating false and damaging information [5–7]. Evidence from recent studies indicates that generative AI–driven misinformation (AI-generated misinformation, hereafter abbreviated as “AI-misinfo”) has concentrated mainly on vaccination, infectious diseases (especially COVID-19), and mental health topics, with limited but emerging instances involving chronic disease management and reproductive health. In the included corpus, over 60% of studies focused on vaccination- or pandemic-related misinformation, approximately 25% addressed chronic or behavioral health topics, and the remainder (≈ 15%) covered other health domains such as reproductive health or non-communicable diseases, underscoring the predominant research focus on high-salience topics [8]. Controlled studies with users also found that short-form health misinformation created by generative models is more convincing than their human-authored versions and difficult for users to detect as different from real content, pointing to an underlying “false fluency”—where AI-generated text appears coherent and authoritative despite being inaccurate—as a threat to public health communication [9].
At the same time, audits have demonstrated that security measures in widely-used systems can be bypassed or undermined. Cross-sectional evaluation of major general-purpose platforms via public interfaces showed that even simple jailbreaks could elicit false health narratives on polarizing topics [4, 6, 10]. Gradual improvements over time did not eliminate these vulnerabilities, and developer transparency and reporting channels remained inconsistent [9, 11]. Beyond front-end interfaces, programmatic access via Application Programming Interfaces (APIs) [12]—which allow software applications to interact with AI models—has another risk surface: a survey of multi-model APIs showed that system-level commands would make top models primary health disinformation agents, producing wrong answers in most trials and pointing to governance loopholes regarding personal assistants in public marketplaces [13]. Taken together, these results indicate that both layers of interaction—end-user chat interfaces and developer APIs—must be considered within health safety evaluations.
Propagation dynamics are also evolving. Large social network observational studies reported strongly increasing levels of synthetic media over time, with spikes temporarily coincident with fresh model releases, and continued emergence of high-risk content types such as deepfakes of celebrities [14]. A community-annotated large-scale analysis suggested that deceptively labeled items, which were identified to be AI-created, originate from smaller accounts but have a higher likelihood of becoming viral, and present distinctive stylistic and affective trends in order to stand apart from traditional misinformation [15]. Although these platform-scale metrics are non-health-oriented, they illustrate mechanisms—amplification by smaller agents, style changes for entertainment, and spikes due to releases—that plausibly translate into health spaces and need to be measured well there [16, 17].
User perception and behavior studies add nuance to the risk image. In a model-human comparison experiment on COVID-19 misinformation, participants tagged model-generated as less accurate but stated the same sharing intentions, dissociating perceived accuracy from sharing intent and making sole reliance on accuracy-nudge strategies questionable [18]. These findings are supported by studies of AI-misinfo, which show that such misinformation adopts rhetorical forms—enriched information, tentative structure, and voice of the author—that engage superficial credibility heuristics but compromise the performance of today’s detectors trained on human-created misinformation [19]. These findings suggest that mitigation must counter not only content persuasive structure but also the behavior factors that prompt sharing, not factuality judgments alone.
Existing mitigation measures address technical, sociotechnical, and governance levels but each of them is constrained within healthcare settings. From a technical viewpoint, benchmark and dataset practices in healthcare exhibit that sophisticated models cannot reliably discriminate right from wrong claims or detect generative source in text–image articles as a result of the complexity of multimodal detection and the need for provenance-aware and claim-aware methods [20]. On the sociotechnical side, warning labels have promise but must be carefully designed: experimental health trials find that warning labels on manipulative language alter perceptions and sharing, varying with label type and truth status, while generic labels indicating synthetic origins influence credibility and sharing in ways that require finetuning to avoid suppression of actual content [21, 22]. Governance-oriented assessments reveal gaps in developer transparency, reporting practices, and oversight of custom assistant settings, but public health partners speak to multi-dimensional risk taxonomies connecting model attributes to harms at people, care processes, information environments, and accountability levels [8, 9, 13].
There are still major gaps in spite of stepped-up progress. Much of the evidence base exists in English content and certain high-salience topics such as vaccination and COVID-19, with limited testing of audio and video health deepfakes, lower-resource language, or real-world health-specific spreading at platform level [18, 19, 21]. The majority of detection and verification strategies address surface features or generic standards, whereas health communication requires claim-level grounding in clinical guidelines, trial data, and temporally sound sources—arenas where systems and datasets currently are only now reaching maturity [19–21]. Lastly, even though certain tests show that general-purpose systems can make informative responses to general myths with positive prompting, the same systems’ vulnerability to adversary instruction and marketplace misconfiguration indicates the need for multi-layered defense and ongoing auditing in health environments [8, 9, 14].
In this context, systematic synthesis is called upon to integrate outcomes from production, propagation, and mitigation, to account for where evidence is strong relative to tentative, and to map implications for public health practice, platform stewardship, and technical research. For this review, we define production as the creation of health misinformation using generative AI tools; propagation as its dissemination, spread, and user engagement across digital platforms; and mitigation as the technical, sociotechnical, and governance strategies employed to limit its creation or impact. Integration of empirical evidence and methodological strategies between these three stages was the main objective in a perspective to inform research and practice in public health communication and online safety.
The review reflected on three linked questions throughout the life cycle of health misinformation in the context of generative models. First, in what ways do generative text, image, audio, or video systems enable or alter the production of health misinformation, including the volume, velocity, targeting, and credibility cues of the content? Second, in what ways does generatively produced health misinformation propagate across platforms and audiences, and how do users perceive and interact with it versus human-generated misinformation? Third, what are the mitigation strategies—technical, sociotechnical, and governance—that have been proposed or piloted, and with what effectiveness and limitations in health-specific applications?
Methods
The review was conducted and reported according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses 2020 (PRISMA 2020) [23]. This is a systematic review with a narrative synthesis of the evidence; a meta-analysis was not performed due to substantial heterogeneity in study designs, outcomes, and measurements across the included literature. Although a protocol was developed, it was not registered on a public platform (e.g., PROSPERO or OSF). All protocol deviations were documented in the final manuscript. The study selection process is summarized in a PRISMA flow diagram (Fig. 1).
Fig. 1.
PRISMA 2020 flow diagram of study identification, screening, eligibility, and inclusion
Protocol deviations and rationale
To enhance transparency around the internal protocol, we provide a summary of all deviations that occurred during screening, data extraction, or synthesis. These deviations did not alter the review’s core research questions or eligibility criteria but reflect adjustments made in response to the characteristics and heterogeneity of the evidence base. A detailed comparison of planned versus implemented procedures, alongside the reasons for each deviation and their potential impact, is provided in Table 1.
Table 1.
Summary of protocol deviations
| Protocol element (planned) | Implemented procedure | Reason for deviation | Potential impact on results |
|---|---|---|---|
| Conduct meta-analysis wherever ≥ 3 studies report comparable outcomes | Narrative synthesis for all domains | Substantial variation in design, outcomes, and measurement scales prevented statistically meaningful pooling | None; avoids misleading pooled estimates |
| Standardize all quantitative outcomes to a common metric (e.g., SMD, OR) | Standardization applied only where metric compatibility allowed | Studies used incompatible scales, missing variance measures, or multimodal metrics | Minimal; narrative comparisons remain robust |
| Include all generative modalities (text, image, audio, video) if eligible studies identified | Only text and text–image studies included | No eligible audio/video health misinformation studies existed | No bias; highlights evidence gap |
| Include only peer-reviewed articles | Included preprints with full methods sections and sensitivity-checked results | Rapidly evolving field; many eligible studies released as preprints | Low risk; sensitivity analyses demonstrated stable conclusions |
| Conduct subgroup analyses and meta-regression | Not conducted | Insufficient study count per subgroup; high between-study heterogeneity | Avoids unstable/underpowered models |
| Extract complete effect sizes from all studies | Extracted only where calculable from reported data | Missing denominators, incomplete reporting in several included studies | Minor; no influence on study inclusion or conclusions |
Abbreviations: SMD standardized mean difference, OR odds ratio
Inclusion and exclusion criteria
Studies were included if they provided empirical evidence on the role of generative AI in the creation, dissemination, or mitigation of health-related misinformation. Eligible publications encompassed a range of study designs, including randomized experiments, quasi-experimental studies, cross-sectional audits, observational platform analyses, and methodological papers describing relevant datasets or benchmarks. The health topics of interest included, but were not limited to, vaccination, infectious diseases, chronic conditions, and mental health. Studies needed to involve generative AI modalities—such as text, image, audio, video, or multimodal outputs—and could examine interactions at either public user interfaces or programmatic API levels, under various prompting conditions (e.g., benign, adversarial, or system-level instructions). Only studies published in English between January 2023 and August 2025 were considered. There were no restrictions based on geographic setting or participant demographics, provided the study’s focus aligned with the review’s objectives.
Studies were excluded if they did not primarily address health-related misinformation or if they did not investigate the role of generative AI systems in the creation, dissemination, or mitigation of such misinformation. Non-empirical contributions, including editorials, commentaries, and purely theoretical frameworks without original data, were also excluded. Research focusing solely on human-generated misinformation—without comparison to or analysis of AI-generated content—was considered ineligible. Due to constraints on translation resources, only studies published in English were included. Finally, duplicate publications and preprints that were later superseded by peer-reviewed versions were excluded to avoid redundancy in the evidence base.
Time period and rationale for search comprehensiveness
We searched for records published between January 1, 2023 and August 14, 2025. The search was executed in August 2025. This period was selected because public and programmatic access to high-fluency generative systems became widespread in late 2022, after which health-oriented research on generative production, platform propagation, and countermeasures accelerated. Limiting the period to 2023 and later increased the likelihood that research incorporated captured contemporary capabilities, user interfaces, and platform policies relevant to information environments today. We tallied online available articles first during this timeframe irrespective of the final issue date. No earlier date restrictions were applied while backward citation chasing.
Conceptual definitions
Misinformation refers to false, misleading, or inaccurate information that is shared without verified intent to deceive. In contrast, disinformation denotes false or misleading information that is created, presented, or disseminated with deliberate intent to deceive, manipulate, or cause harm. In keeping with these definitions, we use “misinformation” as the broad, default term across the review, and reserve “disinformation” for contexts in which deliberate deception is evident—such as adversarial prompting, intentional guardrail circumvention, or systematic generation of false health narratives. We further use the term AI-generated misinformation (AI-misinfo) to describe misinformation produced by generative artificial intelligence systems, regardless of modality (text, image, audio, or video).
Searched databases and search strategy
We performed a systematic search in MEDLINE via PubMed, Embase, Scopus, Web of Science Core Collection, ACM Digital Library, IEEE Xplore, PsycINFO, and Communication & Mass Media Complete. To capture recent works and methods articles that were not yet indexed, we also searched arXiv, medRxiv, and SSRN. Searches were current through August 14, 2025. Database-specific strategies used controlled vocabulary and free-text terms for generative model families and modalities with terms for health misinformation and mitigation constructs, using Boolean operators and proximity operators as appropriate to each index. One example PubMed approach was: (“large language model” OR “generative model” OR “diffusion model” OR “text-to-image” OR “deepfake” OR “synthetic media” OR “chatbot” OR specific model and model families’ names) AND (“health misinformation” OR disinformation OR infodemic OR “health communication” OR “public health messaging”) AND (detection OR verification OR “content label*” OR warning OR guardrail* OR jailbreak OR “system instruction*” OR “marketplace” OR policy OR governance). These strategies were iteratively refined via pilot searches to maximize recall and precision and translated into other databases with their respective syntax and thesauri. No assistive or automated tools were noted or used in the search execution description; full, line-by-line strategies by source are provided in the Supplementary File 1 for reproducibility. Reference lists of included studies and relevant reviews were hand-searched, and citation alerts were monitored via the end of the search window to identify additional qualifying records. The Supplementary File 2 also includes the completed PRISMA 2020 checklist. Only studies published in English were included, as reliable translation resources for non-English full-texts were unavailable during the review period. Non-English abstracts were screened where available, but full-texts in other languages were excluded. We included peer-reviewed journal articles, conference proceedings, and preprints (e.g., arXiv, medRxiv, SSRN) that presented empirical data or methodological contributions relevant to generative AI and health misinformation. Opinion pieces, editorials, commentaries, and purely theoretical or policy essays without empirical evidence were excluded. Preprints were retained only when they contained complete methodological and results sections and were subjected to sensitivity analyses to evaluate their influence on findings.
Eligibility criteria and study selection
We included empirical studies and methodological papers that formally investigated how generative text, image, audio, or video systems change the production or spread of health misinformation and/or evaluated mitigation interventions in health contexts. Eligible designs were randomized and nonrandomized experiments, cross-sectional audits of systems, observational analyses using platform data, and dataset or benchmark papers if associated with health misinformation tasks. We excluded studies unrelated to health misinformation or generative AI mechanisms. There were no population or setting restrictions beyond applicability to health communication or public health. Before formal screening, the reviewer team conducted a calibration exercise on a sample of 50 records to ensure consistent application of the inclusion criteria. Titles and abstracts were then screened in duplicate by two independent reviewers. The inter-rater agreement for the title/abstract screening phase was substantial (Cohen’s kappa, κ = 0.82). Duplicate records were removed using the automated deduplication feature in Covidence¹, followed by a manual check. Disagreements at both the title/abstract and full-text stages were resolved through discussion or, if necessary, adjudication by a third reviewer.
Quality assessment
The choice of critical appraisal tool was determined by the primary study design and research focus of each included publication. The mapping logic was as follows: randomized experiments assessing interventions (e.g., labels, warnings) were evaluated using RoB 2; nonrandomized comparative studies or quasi-experiments (e.g., pre-post API audits) were assessed with ROBINS-I; studies reporting diagnostic accuracy of detection/verification systems used QUADAS-2, supplemented by PROBAST items for those developing or validating predictive models; cross-sectional audits and observational platform studies used the relevant Joanna Briggs Institute (JBI) checklist; and qualitative or framework-oriented studies were appraised with the CASP Qualitative Checklist. Two reviewers independently applied the assigned tool to each study. Inter-rater agreement for the domain-level risk-of-bias judgments was substantial (Cohen’s kappa, κ = 0.79). Disagreements were resolved through discussion or, when necessary, arbitration by a third reviewer.
Data extraction and synthesis
Data extraction was conducted in duplicate using a piloted, standardized form developed in a shared data management spreadsheet. Extracted fields included bibliographic details, study aims and research questions, design and setting, population and sample characteristics (for user studies), health topic areas, generative modality and access layer (public interface versus programmatic API), prompting or instruction conditions, platform or data source traits, outcome definitions and measurement instruments, quantitative effect estimates, detection or verification metrics and evaluation protocols, mitigation intervention characteristics, governance or transparency indicators, and funding or competing interests. Where possible, we attempted to harmonize outcomes reported with incompatible scales by transforming raw values into standardized mean differences or odds ratios; when essential data were missing, we attempted to contact study authors within reasonable timeframes.
Given the substantial methodological diversity across included studies—spanning randomized experiments, cross-sectional audits, platform-level observational analyses, and multimodal detection evaluations—it quickly became evident that quantitative pooling would not be appropriate. Studies varied widely in outcome definitions, measurement instruments, effect metrics, populations, and generative modalities, and preliminary heterogeneity checks for candidate outcomes showed large inconsistency (e.g., I² values commonly exceeding 80% and τ² indicating substantial between-study variance). Several outcome domains also contained only one or two eligible studies or employed metrics that were not mutually convertible (e.g., AUROC vs. F1 vs. accuracy; distinct scales for credibility and sharing intentions; noncomparable platform-level virality measures). Because these factors prevented the identification of any outcome domain with ≥ 3 methodologically comparable studies suitable for pooled estimation, a meta-analysis would have produced misleading or uninterpretable summary effects. Accordingly, and in line with our prespecified protocol, we adopted a narrative synthesis as the primary analytic approach. Findings were organized along the production–propagation–mitigation continuum and mapped across technical, sociotechnical, and governance layers to ensure coherence while respecting the heterogeneity of the evidence base.
Results
Selection process
The systematic search across ten databases, including MEDLINE, Embase, Scopus, Web of Science, ACM Digital Library, IEEE Xplore, PsycINFO, Communication & Mass Media Complete, arXiv, and medRxiv/SSRN, initially retrieved 2,347 records. After removing 1,028 duplicates through automated tools and manual inspection, 1,319 unique articles remained for title and abstract screening.
During the title and abstract screening phase, 1,125 records were excluded for not meeting the predefined inclusion criteria. The majority of these exclusions were due to studies focusing on non-health-related misinformation, lacking a generative AI component, or employing ineligible study designs such as theoretical frameworks or opinion pieces without empirical data.
Following this screening, 194 articles advanced to full-text review. Of these, 179 were excluded based on the eligibility criteria outlined in the Methods section. The primary reasons for exclusion during full-text review included: studies that did not investigate generative AI’s role in health misinformation (n = 52), those lacking empirical evidence such as commentaries or policy analyses (n = 63), research focused solely on human-generated misinformation without AI comparison (n = 44), and studies with insufficient data on production, propagation, or mitigation outcomes (n = 20). Together, these categories account for all 179 exclusions.
Ultimately, 15 studies satisfied all inclusion criteria and were selected for qualitative synthesis. These studies addressed the review’s three core themes: generative AI’s role in health misinformation production (n = 6), propagation dynamics (n = 4), and mitigation strategies (n = 5).
For a comprehensive visualization of the selection process, including the number of records excluded at each stage and the specific reasons for exclusion, refer to Fig. 1.
Findings of data extraction and synthesis
We extracted data in pairs for 15 studies in full agreement after consensus, recording designs, samples (user studies), interface layers (public vs. API), prompting/instruction conditions, outcomes and measurement, effect estimates, and mitigation/governance features; in the presence of heterogeneity in designs and measurements (Table 2), we conducted a systematic narrative synthesis along the production–propagation–mitigation continuum projected onto technical, sociotechnical, and governance layers, reconciled direction of outcomes where appropriate, made common discrimination metrics (e.g., Area Under the Receiver Operating Characteristic curve (AUROC)) for detection work top priority. A formal meta-analysis was not conducted because no outcome domain contained three or more studies with sufficiently homogeneous designs, measures, and populations to justify statistical pooling. For example, studies on user sharing behavior (n = 2) employed different stimuli and outcome scales; detection studies (n = 3) varied in modality (text vs. multimodal) and dataset construction; and label/warning experiments (n = 2) used distinct interventions and instruments. Preliminary tests of heterogeneity for comparable subsets yielded high variability (I² > 80%, τ² > 0.25), confirming excessive between-study inconsistency. Therefore, findings were synthesized narratively to preserve validity and comparability, with sensitivity analyses confirming that exclusion of high–risk or preprint studies did not alter conclusions [8, 9, 13–15, 18–22, 24–28]. Supplementary File 3 reports the full data extraction for each of the 15 studies.
Table 2.
Synthesis domains and key findings across included studies*
| Synthesis domain | Key extracted fields (examples) | Key Finding (Example) | Outcome harmonization/metrics | Pooling decision | Notes (bias/sensitivity) | Ref. |
|---|---|---|---|---|---|---|
| Production capability (guardrails, API/system prompts, throughput/targeting) | Interface layer; prompt/jailbreak/system instructions; volume and latency measures; targeting cues; reporting/transparency procedures | System-instruction attacks on APIs yielded 88% disinformation rate; high-volume text & image generation demonstrated [9, 13]. | Direction standardized to “greater disinfo capability = higher risk”; counts, rates, pre/post changes | Not pooled | Demonstrations/audits heterogeneous by model/version and tasks; suitable for narrative mapping, not causal pooling | [8, 9, 13] |
| User perception and behavior (accuracy, credibility, sharing, indistinguishability) | Sample size/demographics; health topics; stimulus provenance (AI vs. human); outcome scales; preregistration | AI-generated COVID-19 fake news was shared at equal rates to human fakes despite lower perceived accuracy [9]. | Converted to common direction (higher = greater susceptibility/sharing); reported mean differences/proportions as given | Not pooled (insufficient homogeneous RCTs) | Two health-relevant experiments with comparable outcomes < 3; self-report outcomes; low missingness; results consistent across studies | [18, 21, 22, 26] |
| Detection/diagnostic evaluations (AI vs. human deception; multimodal health benchmarks) | Index tests; features/models; datasets/splits; reference standards; evaluation protocol | AI-misinfo’s credibility cues (e.g., faux citations) degraded classifier performance [15]. | AUROC prioritized; otherwise author-reported F1/accuracy; origin/reliability tasks analyzed separately | Not pooled | Substantial task/dataset diversity; reference-standard and domain-shift concerns; [20]arXiv included in sensitivity check | [20, 24, 25] |
| Platform-scale propagation (prevalence/virality, temporal spikes) | Platform; sampling/annotation source; engagement/virality metrics; account characteristics; time series | AI-generated content on X originated from smaller accounts but was more likely to go viral [8]. | Direction harmonized (AI-generated > or < conventional); no unit conversion due to differing denominators | Not pooled | Cross-domain topic mix and differing metrics; used descriptively to contextualize health findings | [14, 15] |
| Sociotechnical mitigation (labels, manipulation warnings) | Label type/wording/placement; outcome instruments; timing; analysis plan | Manipulative-content warnings reduced sharing intentions; generic AI-misinfo labels reduced perceived accuracy [12, 13]. | Common direction (labels reduce perceived accuracy/credibility/sharing); effect sizes recorded when available | Not pooled | Only two studies with overlapping outcomes; effects label- and truth-dependent; survey/online panel settings | [21, 22] |
| Technical mitigation (aligned responses to myths) | Prompting/version; correctness/clarity/exhaustiveness ratings; rater procedures | Aligned LLMs (e.g., ChatGPT) produced accurate responses to vaccine myths under benign prompting [19]. | Descriptive accuracy and quality scores; no harmonization needed | Not applicable | Single study; narrative use as counter-messaging signal, not deterrence of misuse | [28] |
| Governance/frameworks (audits, risk taxonomy) | Reporting pathways; incident handling; marketplace oversight; taxonomy dimensions; funding/conflicts | Audits revealed persistent transparency gaps and API-level vulnerabilities for generating health disinformation [4, 6]. | Qualitative synthesis; mapped to governance layer | Not applicable | medRxiv and conceptual pieces integrated qualitatively; sensitivity exclusion did not alter conclusions | [9, 13, 19, 27] |
*Individual studies could contribute to multiple synthesis domains; therefore, some references appear in more than one row
Abbreviations: API Application Programming Interface, AUROC Area Under the Receiver Operating Characteristic Curve, RCT randomized controlled trial
Quality assessment results
Across the 15 included studies, methodological quality was mixed and in line with study aim: feasibility/audit reports were adequate to show capability and pose questions of governance deficit but not for causality or prevalence within the population, while randomized experiments provided the most internally valid data on user opinions and sharing but relied on self-reported outcomes from online panels (Table 3). For randomized trials, RoB 2 assessment described low risk in randomization and deviations for the label/warning trials and experiments, and preregistration excluded selective reporting bias in two trials [18, 21, 22, 26]. Main outcomes were self-reported perceptions or intentions to post, though, causing measurement bias and external behavioral validity; missing data were rare since completion was required for reimbursement [18, 21, 22, 26]. ROBINS‑I appraisals of nonrandomized generation/audit demonstrations detected serious risk of confounding and selection (no counterfactuals, model/version change over time), although protocols were precisely described and replicable within specified prompts; these are appropriate for capability mapping but not impact estimation [8, 9, 13, 28]. QUADAS‑2/PROBAST evaluations of detection‑oriented activity detected selection bias because of convenience corpora, domain shift, and post‑hoc thresholding, with reference‑standard limitations such that “AI origin” was established from generation logs rather than independent validation; external validation was limited, especially for multimodal settings, but larger curated datasets mitigated concerns by capturing construction and splits [20, 24, 25]. JBI checklists applied in observational platform studies defined moderate to severe risk on outcome measurement and selection: reliance on Community Notes or crowd‑sourced flags encourages representativeness issues and unmeasured confounding, but time trends and clear eligibility criteria bettered internal consistency; health‑specific inferences are tentative in these cross‑domain samples [14, 15]. CASP evaluation of framework/qualitative work reported clarity of intent and appropriately selected methodology with adequate analytic transparency, but reflexivity and triangulation were patchily reported, limiting confidence in generalizability [19, 27]. In general, our strength-of-evidence narrative weights (i) randomized experiments for user-level effects sensitive to self-report limitations, (ii) audits/demonstrations for demonstrating feasibility and assessing system-prompt/API vulnerabilities, and (iii) detection/platform studies as being suggestive yet prone to selection and reference-standard biases; sensitivity readings omitting the highest-risk items did not alter the bottom lines on scalable production, propagation dynamics, and layered mitigations being necessary [8, 9, 13–15, 18–22, 24, 25, 27, 28].
Table 3.
Methodological quality and domain-level risk-of-bias assessment by study category
| Study category | Appraisal tool | Key strengths | Main risk-of-bias concerns | Qualitative judgment | Ref. |
|---|---|---|---|---|---|
| Randomized/experimental user studies (perception/sharing; labels/warnings) | RoB 2* | Random assignment; preregistration in [18, 26]; clear outcome definitions; low missing data | Self-reported outcomes; online panel generalizability; potential demand characteristics; limited behavioral follow-up | Low to some concerns | [18, 21, 22, 26] |
| Generation/audit demonstrations (LLM guardrails, API/system prompts, chatbot accuracy to myths) | ROBINS-I *(study-appropriate) | Transparent procedures; repeated testing in [9]; realistic prompts and assets | No controls; temporal/model-version confounding; prompting choice sensitivity; not designed for causal claims | Serious risk for causal inference; appropriate for capability/audit aims | [8, 9, 13, 28] |
| Detection/diagnostic evaluations and datasets (linguistic differences, classifiers, multimodal benchmarks) | QUADAS-2 *(+ PROBAST items where applicable) | Clear index tests; documented datasets/splits (esp. [20]); comparative AI vs. human analyses | Convenience sampling; domain shift; post-hoc threshold tuning; imperfect reference standards for “AI origin” and “reliability”; limited external validation | Some concerns to high, depending on study | [20, 24, 25] |
| Observational platform prevalence/propagation studies | JBI* (analytical cross-sectional/prevalence) | Large-scale data; explicit inclusion criteria; temporal analyses | Selection via crowd flags; outcome misclassification; unmeasured confounding; limited health-specific stratification | Moderate to serious risk | [14, 15] |
| Qualitative/framework and methodology pieces | CASP *Qualitative | Clear aims; appropriate qualitative design; actionable frameworks | Limited reflexivity; sampling representativeness; sparse triangulation and member checking | Some concerns | [19, 27] |
Abbreviations: RoB 2 Revised Cochrane Risk-of-Bias Tool for Randomized Trials, ROBINS-I Risk Of Bias In Non-randomized Studies of Interventions, QUADAS-2 Quality Assessment of Diagnostic Accuracy Studies, PROBAST Prediction Model Risk of Bias Assessment Tool, JBI Joanna Briggs Institute, CASP Critical Appraisal Skills Programme. * Note: Risk-of-bias judgments are reported using the standard terminology of each appraisal tool: RoB 2 uses “Low risk,” “Some concerns,” and “High risk”; ROBINS-I uses “Low,” “Moderate,” “Serious,” and “Critical” risk of bias; QUADAS-2 uses “Low,” “High,” or “Unclear” risk across domains; JBI checklists use descriptive judgments (e.g., Low, Moderate, High risk); CASP uses qualitative descriptors (e.g., Some concerns)
For clarity, the qualitative judgments presented in Table 3 follow standard interpretations used in the respective appraisal tools. ‘Low risk’ indicates that methodological safeguards were robust and concerns were minimal; ‘Some concerns’ reflects minor methodological limitations unlikely to substantially affect validity; and ‘Serious risk’ denotes important methodological issues (e.g., lack of controls, high potential for bias) that could meaningfully influence results. These categories were applied consistently across study types to summarize domain-level risk-of-bias assessments.
Generative AI’s amplification of health misinformation production
The findings in this section are primarily derived from demonstration and audit studies, several of which are assessed as having a serious risk of bias for causal inference, and therefore should be interpreted as evidence of technical feasibility rather than established real-world prevalence or impact. Across modalities, material generation capabilities of health disinformation are suggested to be technically feasible in generative systems, based on demonstration and audit studies: public-facing LLMs remain jailbreakable or prompt-able to produce fake health stories despite some safeguard upgrades [9], and API-level “system instruction” attacks render top models into authoritative disinformation agents (88/100 responses were disinformation), signaling vulnerabilities outside front-end guardrails [13]; end-to-end pipelines can mass-produce, localize, and embellish narratives with fabricated citations at high throughput (e.g., 102 concentrated vaccine/vaping blog posts > 17,000 words in 65 min + 20 realistic images in under 2 min), enabling demographic tailoring and speed campaign construction [8]; AI-generated outputs similarly mimic credibility signals—fake but scientific-appearance citations, structured uncertainty phrasing, and personal voice—often passing surface inspection checklists and degrading detector performance [24], while human raters both have difficulty distinguishing AI from human content and may find AI-generated disinformation especially convincing [26]; health-specific multimodal datasets that benchmark features such as linguistic stylometry, image-text consistency, and provenance signals further reveal that current models struggle to judge reliability and AI origin, illustrating how generated text + images can evade automated screening [20]; outside text, platform data record surges in synthetic images/video post-major model releases, suggesting reduced cost and latency for visual disinformation, though health-specific audio/video evidence is leaner in this corpus [14] (Table 4).
Table 4.
Generative AI’s impact on health disinformation production
| Production dimension | What changes with generative systems | Modality evidence | Key quantitative/qualitative signals | Notes/limitations | Ref. |
|---|---|---|---|---|---|
| Scale | High-volume generation of diverse health narratives | Text; Text + Image | Quantitative case study: 102 blog posts (> 17,000 words) generated in 65 min using multi-topic prompts, producing numerous variants. | Evidence is based on single-study demonstrations or audits with serious risk of bias for causal inference and is intended to illustrate capability rather than frequency or population-level effects. | [8, 9, 13] |
| Speed/latency | Rapid assembly of campaigns and assets | Text; Image | Quantitative case study: 20 realistic images generated in under 2 min; immediate conversion of APIs into disinformation chatbots through system prompts. | API-layer risks exceed UI guardrails. | [8, 13] |
| Targeting/localization | Tailored messaging to demographics and contexts | Text; Text + Image | Qualitative demonstration: Targeted posts crafted for young parents, older adults, pregnant people, and individuals with chronic conditions. | Demonstrated in a production workflow; broader field data needed. | [8] |
| Credibility signals | Faux citations, uncertainty framing, and personal tone mimic expertise | Text | Quantitative/qualitative evidence: AI-misinfo meets surface “credibility” checklist criteria; classifier performance drops when faced with these signals. | Surface cues are unreliable; motivates claim-level verification. | [24] |
| Human detectability/persuasiveness | Hard to tell AI from human; AI disinfo can be more compelling | Text (tweets/posts) | Quantitative user study: Human raters could not reliably identify AI authorship; AI-generated disinformation rated as highly persuasive. | User studies outside clinical settings; still health-topic stimuli. | [26] |
| Guardrail circumvention | Prompt/jailbreak and system-instruction vulnerabilities | Text (LLMs, APIs) | Quantitative audits: Persistent disinformation generation via chat UIs (with/without jailbreaks); 88% disinfo rate under adversarial system prompts. | Partial improvements observed over 12 weeks but gaps remain. | [9, 13] |
| Multimodality and visuals | Low-cost, fast synthesis of health-related visuals | Image; Text + Image | Qualitative/observational evidence: Campaign-ready images paired with text; platform-level spike in synthetic visuals observed after Midjourney V5 release. | Health-specific audio/video evidence limited in this set. | [8, 14] |
| Detector/evaluator slippage | Generated content evades current reliability and origin checks | Text + Image | Quantitative benchmark: SOTA models struggled with reliability/origin detection in a 34,746-item health corpus. | Benchmarking indicates gaps for production-time screening. | [20, 24] |
Abbreviations: API Application Programming Interface, LLM large language model, SOTA state-of-the-art
Altered propagation dynamics of AI-generated health misinformation
Generatively produced health disinformation propagates effectively for several reasons. First, it is often difficult for users to distinguish from human-written content. Second, it can be as persuasive as—or even more persuasive than—human-generated misinformation. Third, it exploits platform mechanisms that favor engaging and novel formats. In health-themed studies, GPT-3 output was harder to identify as AI-generated and produced more persuasive disinformation than humans [26], and GPT-4-produced COVID fake news, while rated slightly less accurate than fakes written by humans, elicited equal sharing intentions—decoupling accuracy perceptions from dissemination behavior [18]; AI-generated narratives also satisfy surface “credibility” heuristics and degrade detectors, lowering friction to dissemination [24]; on platform scale (mixed-topic but informative), AI-written disinformation on X more often originates from smaller accounts but is far likelier to go viral and shifts to “entertaining/positive” skew, with synthetic media volumes spiking following major model releases. These patterns demonstrate a plausible translation to health contexts, suggesting reduced costs for cross-account seeding and rapid diffusion; however, dedicated health-only field studies are needed to confirm these dynamics [14, 15]; user-facing cues can affect propagation—manipulative-content warnings on health posts and generic “AI-generated content” labels lower perceived accuracy/credibility and sharing in subtle ways, suggesting user experience (UX) interventions could suppress spread but must be carefully designed not to smother legitimate information [21, 22] (Table 5).
Table 5.
Propagation dynamics of AI-generated health misinformation
| Aspect | Propagation pattern | User perception/interaction | Evidence scope and caveats | Ref. |
|---|---|---|---|---|
| Virality from small accounts | AI-generated misinfo more likely to go viral despite originating from smaller accounts; tone more entertaining/positive | Greater engagement potential independent of perceived authority | Large-scale X data; not health-specific but indicative of platform mechanics | [15] |
| Temporal/media spikes | Synthetic images/video increased over time with spikes after major model releases (e.g., Midjourney V5) | Visual novelty may aid cross-audience spread | Observational platform study; topic mix broader than health | [14] |
| Indistinguishability | Humans struggle to tell AI vs. human health tweets | AI disinformation rated more compelling than human | Health-focused user study; tweet-length stimuli | [26] |
| Sharing vs. accuracy | AI COVID fake news perceived less accurate than human but shared at equal rates | Sharing intentions not tightly coupled to perceived accuracy | Pre-registered experiment (N ≈ 988) with health misinformation | [18] |
| Credibility heuristics and detectors | AI-misinfo mimics citations/uncertainty/personal tone; degrades classifier performance | Surface cues can mislead users and moderators, easing spread | COVID-focused corpus; classifier drop indicates moderation challenges | [24] |
| Labeling/warnings | Manipulative-content warnings on health posts and generic AI-misinfo labels reduce perceived accuracy/credibility and sometimes sharing | UX interventions can curb spread but effects vary by label type and truth status | Health-specific warning experiment; general AI-misinfo label RCT with nuanced effects | [21, 22] |
| Health-specific field propagation | Limited direct tracing across platforms for health-only content | — | Current platform-scale evidence is mixed-topic; need health-specific propagation studies | [14, 15] |
Abbreviations: AI artificial intelligence, UX user experience
Direct health-specific evidence from user studies
Health-themed experimental studies provide direct evidence on the propagation dynamics of AI-generated health misinformation. In these controlled settings, GPT-3 output was harder to identify as AI-generated and produced more persuasive disinformation than humans [26], and GPT-4-produced COVID-19 fake news, while rated slightly less accurate than human-authored fakes, elicited equal sharing intentions, decoupling accuracy perceptions from dissemination behavior [18]. Furthermore, AI-generated health narratives mimic surface credibility heuristics (e.g., faux citations, uncertainty framing), which can mislead users and degrade the performance of automated detectors, thereby potentially lowering friction to dissemination [24]. User-facing interventions, such as manipulative-content warnings on health posts and generic “AI-generated content” labels, have been shown to lower perceived accuracy/credibility and sometimes sharing, though effects are nuanced and depend on label design and truth status [21, 22].
Extrapolated evidence from general platform studies
Observational studies of platform-scale, mixed-topic data offer indicative, yet indirect, insights into mechanisms that may translate to health contexts. Analyses of platform X (formerly Twitter) suggest that AI-written disinformation more often originates from smaller accounts but has a higher likelihood of achieving virality, and often adopts a more entertaining or positive tone [15]. Furthermore, volumes of synthetic images and video have been observed to spike following major model releases (e.g., Midjourney V5), indicating reduced cost and latency for visual disinformation production [14]. However, it is crucial to note an inferential gap: these studies do not specifically analyze health content, and the behavioral, emotional, and topical dynamics driving the spread of health misinformation may differ from those in general or political contexts. Therefore, while these platform-scale patterns suggest the possibility of similar propagation mechanics for health misinformation—such as a “many small seeds” strategy exploiting recommender algorithms—this remains an extrapolation. Dedicated, health-only field studies tracing content across platforms are needed to confirm these dynamics [14, 15]. Table 5 synthesizes these two streams of evidence, clearly distinguishing between direct health-specific findings and extrapolated insights from general platform data.
Current limitations and promising directions in mitigation strategies
Across the returned literature, mitigation of health misinformation produced by models is framed as a multi-layered problem with early but incomplete evidence for effectiveness: technical mitigation entails strengthening guardrails and especially API/system-prompt defenses (audits report partial progress but persistent leakage and sub-optimal reporting, and system-instruction attacks flourish ubiquitously, prompting hardening and marketplace regulation) [9, 13]; detection must be health- and provenance-sensitive because AI-misinfo’s linguistic signatures decay classifiers and SOTA models struggle with multimodal reliability/source tasks, even while new datasets and feature research chart courses to improvement [20, 24, 25]; sociotechnical interventions—manipulative-content warning and generic AI-generated-content tagging—can reduce perceived accuracy/credibility and sometimes sharing, but effects are label- and truth-dependent, needing careful UX to suppress real content [21, 22]; as a counter-measure, aligned LLMs can produce accurate, transparent responses to vaccine myths under benign promptings, indicating promise for guided rebuttal when coupled with safeguards [28]; proposals and audits for governance propose standardised incident reporting/transparency, API/store regulation, and risk-taxonomy–informed deployment in public health, but field-scale health-specific trials, multilingual/multimodal (part. audio/video) testing, and adversarially robust, claim-level verification are important gaps [9, 13, 19, 20, 24] (Table 6).
Table 6.
Mitigation strategies and their effectiveness against AI-generated health misinformation
| Layer | Strategy/intervention | Evidence on effectiveness | Key limitations/risks | Ref. |
|---|---|---|---|---|
| Technical (model safeguards) | Harden guardrails with focus on API/system prompts; monitoring/reporting pathways | Repeated audits show some improvements but continued disinfo generation; 88% disinfo under malicious system instructions highlights need for API-layer controls | Evaluations are short-horizon; not stress-tested across languages/modalities; attackers adapt | [9, 13] |
| Technical (detection/ verification) | Style-/provenance-aware classifiers; multimodal health benchmarks; differentiate AI vs. human deception | Classifier performance drops on AI-misinfo; SOTA struggles on reliability/origin in a 34,746 item health corpus; AI vs. human disinfo show distinct signatures that detectors can exploit | Benchmarks are mostly English/text + image; limited adversarial/multilingual tests; few claim-level medical verification pipelines evaluated | [20, 24, 25] |
| Technical (counter-messaging) | Use aligned LLMs to answer myths accurately | ChatGPT produced mostly accurate, clear answers to WHO vaccine myths under benign prompts | Performance contingent on prompts/version; does not prevent malicious use; no real-world propagation outcomes | [28] |
| Sociotechnical (labels/warnings) | Manipulative-content warnings on health posts | Reduced sharing intentions and shifted perceptions in health-post experiments; effects vary by label type | Risk of over-suppressing true content; tested in survey settings, not live platforms | [21] |
| Sociotechnical (AI-origin labels) | Generic AI-misinfo labels | Lower perceived accuracy/credibility and sometimes sharing of misinformation; nuanced effects by content type | Not health-specific stimuli in all cases; labels may be gamed or ignored; possible chilling effects | [22] |
| Sociotechnical (prebunking/ inoculation) | LLM-assisted analysis to design inoculation | Proposed framework and case study outline process | Limited empirical validation to date; preprint status; requires topic tailoring | [27] |
| Governance (audits/transparency) | Routine safeguard audits; clear reporting/response processes | Cross-sectional audit surfaces transparency/reporting gaps; motivates standardized practices | Audit scope limited to selected tools/time windows; does not prove remediation efficacy | [9] |
| Governance (API/store oversight) | Vet and govern custom assistants and marketplaces; log system prompts/tool use in health contexts | API study shows feasibility of converting models into disinfo bots and flags disinfo-prone custom assistants | Evidence is exploratory; governance mechanisms and enforcement not empirically evaluated | [13] |
| Governance (risk assessment) | Public-health–specific risk taxonomy and reflection tool | Provides structured harms/accountability dimensions to guide deployment and evaluation | Conceptual framework; effectiveness depends on adoption and integration into policy | [19] |
Abbreviations: API Application Programming Interface, AI artificial intelligence, LLM large language model, WHO World Health Organization
Discussion
Taken together, the evidence indicates that generative AI represents a significant disruptive force within the health misinformation apparatus, reconfiguring production economics, reshaping propagation channels, and challenging prevailing models of mitigation in ways that differ from previous, human-produced misinformation. On the production side, public chat interfaces and API or system-instruction channels both reduce the cost and time needed to create personalized health falsehoods. These systems also reproduce credibility signals that imitate expert communication. The difference between front-end guardrails that expose incremental progress yet are still bypassable and back-end API layers reprogrammable as authoritative disinformation sources points to a direction towards the initial attack surface moving from user prompts to system prompts and marketplaces [8, 9, 13]. It is important because it can scale; the claimed ability to produce dozens of demographically targeted health narratives populated with fabricated citations in under an hour, with rapid image synthesis, puts in the middle a faster, larger, and more modular production mechanism than human work, and one whose rhetorical structure often satisfies superficial “credibility” tickboxes at the cost of decaying detector accuracy [8, 24]. In short, generative systems don’t merely append content; they change the nature of signals users and moderators employ to assess trustworthiness.
This review advances prior syntheses by specifically focusing on the generative AI paradigm, whereas earlier systematic reviews have largely examined either general health misinformation [5] or traditional AI methods for detecting human-generated falsehoods [1]. Methodologically, we integrate post‑2023 empirical studies that directly test LLM‑driven content creation, platform‑scale propagation, and multi‑layered mitigation—covering technical, sociotechnical, and governance strategies. This represents a substantive shift from viewing AI primarily as a detection tool to systematically evaluating it as a novel source and amplifier of health misinformation. Together, these distinctions demonstrate that the contribution of this study is substantive—mapping the first coherent evidence base on how generative AI alters the entire health misinformation ecosystem rather than incremental extensions of prior reviews.
Propagation dynamics amplify these changes in production but not necessarily through channels theory would predict. Health content experiments uncover a paradox: people rate some AI‑generated fakes as slightly less accurate than human fakes but are no less inclined to share them, decoupling accuracy perception from sharing intent and underscoring style, novelty, and affect’s role in diffusion [18]. Complementary work finds short-form AI replies to be more realistic than their human counterparts and difficult for participants to identify as artificial, implying familiarity can be converted into belief or engagement even in the absence of overt authority cues [26]. Platform-level observations complement this picture, even if they are cross-domain. AI-generated content increasingly populates feeds and goes viral despite originating from smaller accounts. It often adopts a positive or entertaining tone, with spikes following major model releases [14, 15]. Although several included studies report platform-scale patterns, this review does not provide a granular comparison of platform-specific mechanisms—such as recommender algorithms, engagement metrics, and moderation practices—which likely play a critical role in amplifying AI-generated health misinformation and warrant focused investigation in future research.
Compared to production, these trends show that generative pipelines allow for a “many small seeds” strategy that uses recommender dynamics: cheaply produced, style-adjusted variations seeding across numerous light-weight accounts can amass disproportionate reach without the giveaway prints of traditional botnets or influencer-scale efforts. The overall effect is a health misinformation environment where provenance is obscured, signals of credibility are imitated, and virality is possible without the customary support of social authority.
Mitigation research, evolving concurrently, is barely beginning to deal with these new circumstances but remains unequal across modalities and layers. Technical defenses aimed at chat user interface (UI) guardrails underperform only against the stronger API/system‑prompt vector; successive audit evidence shows part-improved, and reporting gaps, while API‑level testing has high misaligned output rates against hostile system prompts that mirror the need for hardening policy-constrained decoding, tool-use sandboxes, and marketplace governance for custom assistants [9, 13]. Detection work illustrates that “more of the same” classification is not enough: health disinformation generated by AI has stylistic characteristics distinct from human misdirection yet al.so takes advantage of credibility mimics that outperform surface‑level heuristics; systems in current states struggle to analyze both reliability and origin in multimodal, health‑news settings, indicating that provenance, style, and claim‑level medical grounding must be integrated instead of being considered substitutable signals [20, 24, 25]. Sociotechnical interventions are promising but qualified levers. Manipulative-content warnings for health contexts and default AI-source labels can decrease perceived accuracy and in some instances sharing, but label design and truth status determines their effect, with potential risks of burying correct content if applied bluntly and reinforcing the necessity for sensitive UX tuning and pretesting [21, 22]. Strikingly, aligned models can produce precise, correct responses to vaccine disinformation in well-designed prompting, which suggests the promise of directed counter‑messaging, but it does not yet address malicious use and also does not demonstrate real‑world dampening of the diffusion of fake news [28]. To sharpen governance, concrete and auditable controls are needed. These should include: (1) Mandatory logging of system prompts and tool use in health-facing applications to enable post-hoc incident analysis; (2) Pre-deployment ‘red-teaming’ specifically targeting system-prompt vulnerabilities and marketplace misconfigurations; and (3) Public scorecards for model providers that track metrics like response latency to reported incidents and transparency of safety fine-tuning data. Moving from principle to practice requires such verifiable measures to hold systems accountable [9, 13, 19].
A cross-modal comparison highlights asymmetries to be remedied by future research. Pipelines from text to text-image are relatively well-characterized: we have replicable evidence of rapid, concentrated production; experimentation with persuasion and sharing in the lab; and initial multimodal benchmarks with detection deficits [8, 18, 20, 24]. Conversely, health‑oriented audio and video deepfakes are under‑evaluated beyond generalized platform comments of artificial media growth, even when clinician voice impersonation or doctor‑type videos would almost certainly have more authority cues than texting messaging [14]. Similarly, English‑centered, COVID‑themed stimuli dominate user research, limiting generalizability to more general health concerns and lower‑resource languages in which guardrails and detectors are less effective. This dominance of COVID-19 and vaccination topics constrains the generalizability of our synthesis, as the behavioral, informational, and emotional dynamics surrounding pandemic misinformation differ from those observed in chronic, mental, and reproductive health contexts. Generative AI’s persuasive potential and the mechanisms of misinformation spread may manifest differently in areas such as chronic disease self-management, mental health stigma, or preventive care communication, where user motivations and information-seeking behaviors are distinct. Consequently, caution is warranted when extrapolating the present findings to non-pandemic or long-term health domains.
Furthermore, across the 15 included studies, none examined health misinformation in non-English or low-resource language settings, and nearly all were conducted in high-income, Anglophone contexts such as the United States, the United Kingdom, or Western Europe. No empirical studies assessed Asian, African, or Latin American platforms, nor were Arabic, Hindi, Mandarin, or African-language corpora represented in any included dataset. Likewise, audio and video modalities, particularly voice-based health misinformation or localized video content, remain almost entirely untested in these environments. This evidentiary concentration limits the review’s geographic and linguistic generalizability and underscores the need for targeted, multilingual, and cross-platform research to understand how generative AI interacts with health communication norms and misinformation exposure in low-resource and non-Western contexts. Future studies should purposefully target understudied subfields—including chronic disease management, mental health, and maternal health—using diverse datasets, non-English languages, and real-world communication environments to broaden the evidence base.
Also needed is the comparative susceptibility of API-integrated systems—behind-the-scenes orchestration of system prompts, tools, and retrieval—different audit and governance tools from those presently being developed for public chatbots, such as logging system instructions, model/tool parameters, and retrieval sources when systems run in health-facing contexts [9, 13]. Finally, the distance between perceived accuracy and intention to share AI-generated health fakes cautions against sole reliance on accuracy-nudges; comparative effectiveness is most likely to hinge on the integration of friction and prebunking with claim-aware verification and provenance cues resistant to paraphrase and style transfer [18, 21, 22]. In addition, biases embedded in training data, model architecture, or system-level design choices may inadvertently amplify or distort health misinformation—particularly in sensitive or marginalized health contexts—yet this issue remains insufficiently examined in the current evidence base and warrants focused investigation.
These findings point towards a combined policy and research agenda. Technically, detectors will have to shift from generic text classifiers to health-aware verification tools that blend medical evidence bases, time-elevating citations, and modality-crossing provenance, validated on adversarially paraphrased and multilingual data like those began to come into being [20, 24, 25]. Sociotechnically, interventions have to be tested in live field experiments of a health-specific nature that not only measure perceptions but downstream exposure reduction and sharing, with specific attention to label design that differentially reduces manipulative styles without punishing genuine content [21, 22]. There needs to be a shift from model‑card transparency to effective controls for marketplaces for custom‑assistants and API usage in healthcare environments, such as mandatory reporting of incidents, red‑teaming at the system‑prompt level, and audit trails to facilitate post‑hoc attribution and remediation [9, 13, 19]. In the absence of integration of these two, comparative advantage created by generative systems—rapid, concentrated, credibly‑appearing production united with virality out of dispersed, low‑volume accounts—is likely to be quicker than piecemeal defense, especially in under‑resourced groups and non‑textual modalities most susceptible to being underserved by current evaluation practice [8, 14, 15, 18, 20, 24].
Mechanisms and mediators
The included studies point to specific mechanisms that explain the persuasive power of AI-generated health misinformation and the failure of current detection systems. First, textual and visual features such as false fluency—coherent, authoritative-sounding language—and citation mimicry create a veneer of credibility that satisfies superficial heuristic processing among users [24, 26]. These features exploit cognitive shortcuts (e.g., the expertise heuristic) by imitating the structural and rhetorical patterns of trustworthy health communication. Second, user-level factors moderate detection ability; individuals with lower digital health literacy or higher trust in algorithmic outputs are less likely to question AI-generated content, while motivated reasoning on polarized health topics can override accuracy assessments, leading to sharing regardless of perceived credibility [18, 22]. Third, platform algorithms likely interact with AI-generated content characteristics: the novel, engaging, and often positive affective tone of synthetic media may be rewarded by engagement-based ranking systems, while the ability to generate high volumes of tailored content from small accounts enables a “seed-and-spread” strategy that exploits network virality mechanisms [14, 15]. Together, these mechanisms suggest that AI-generated misinformation operates through a dual pathway: exploiting human cognitive biases via credible surface features and leveraging platform architectures optimized for engagement over authenticity.
Practical implications
Translating the empirical findings of this review into practice requires coordinated, evidence-based action across distinct stakeholder groups. Each recommended action is directly informed by the specific results synthesized in the preceding sections.
For policymakers and regulators, the audit findings revealing persistent transparency gaps and API-level vulnerabilities [4, 6] underscore the need to establish health-specific AI safety standards and mandate transparent incident reporting for model providers. Furthermore, the evidence concentration in English-language, high-income settings highlights the necessity of supporting multilingual and multimodal misinformation research to ensure equitable protection.
For platform engineers and AI developers, the demonstrated efficacy of system-instruction attacks and the high-volume generation of tailored narratives [3, 4, 6] make hardening system-prompt and API-layer defenses an urgent priority. The degradation of detector performance by AI-misinfo’s credibility cues [11, 15] necessitates the integration of claim-aware verification—rather than reliance on stylistic features—into content moderation pipelines. Additionally, governance findings point to the need for audit logging in health-related assistant marketplaces to enable oversight.
For clinicians and public-health communicators, the potential of aligned LLMs to produce accurate responses to myths under benign prompting [19] suggests a role for using these tools to pre-bunk and debunk common misinformation. The decoupling of perceived accuracy from sharing intention [9] further emphasizes the importance of moving beyond fact-correction alone to improve digital health literacy and collaborate on designing trustworthy messaging systems that address affective and behavioral drivers of sharing.
For researchers and funders, the identified gaps—such as the lack of health-specific audio/video deepfake studies, non-English corpora, and real-world propagation trials—directly map to critical future needs. Sustained investment is required to build multimodal benchmark datasets, conduct field trials of labeling interventions [12, 13], and develop cross-disciplinary evaluation frameworks that can keep pace with the evolving threat landscape.
Mapping responsibilities and evidence in this way connects the review’s synthesis with actionable roles, fostering a multi-layered defense strategy to enhance the resilience of health information ecosystems.
Limitations
Despite the comprehensiveness of this systematic review, there are some limitations that should be pointed out. First, the review included only studies published in English. This may introduce language bias and limit the generalizability of findings to non-English-speaking populations. Second, a substantial portion of the included user perception and sharing studies relied on self-reported outcomes collected in controlled or survey-based settings, which may not fully capture real-world behavior on social media platforms; future research should prioritize observational and platform-level studies that directly track the diffusion of AI-generated health misinformation in everyday online environments. Third, the evidence was dominated by studies of high-salience topics such as COVID-19 and vaccination, with knowledge gaps persisting regarding how generative AI impacts misinformation in other health topic areas, such as chronic disease or mental health. Fourth, the review found a lack of research on audio and video deepfakes in health, despite their potential to further worsen misinformation with greater persuasive power. Fifth, while an internal review protocol was developed, the study was not registered on a public platform (e.g., PROSPERO or OSF), which represents a limitation regarding the reproducibility and transparency of our systematic review process. Furthermore, the majority of the user perception and behavior studies were based on self-reported outcomes or controlled experiments, which may not represent real propagation dynamics or user behaviors on social media in the wild. Most included studies adopted cross-sectional designs that capture a single point in time, limiting insight into how AI-generated health misinformation, platform responses, and user behaviors evolve; longitudinal studies tracking these dynamics over extended periods are therefore an important priority for future research. Sixth, although text-based misinformation dominates the current evidence base, visual and multimodal AI-generated content (e.g., images and videos) poses distinct detection and governance challenges and remains underrepresented in existing studies, highlighting a key direction for future reviews.
Finally, the rapid evolution of generative AI tools means that the findings can become outdated rapidly as new models and mitigation strategies emerge. Moreover, despite efforts to conduct comprehensive searches across multiple databases, there remains a possibility of selection bias in study inclusion—particularly given the variability of indexing times and the predominance of English-language research during the review period. Additionally, the search window closed in August 2025, and given the pace of model releases and governance changes thereafter, future updates will be essential to capture newly published evidence, evolving detection methods, and shifts in mitigation effectiveness. Our inclusion of preprints may not fully mitigate publication bias, as the corpus likely overrepresents positive findings and may underreport null results due to the field’s rapid, high-stakes nature. Moreover, in line with our narrative synthesis approach, we did not calculate standardized effect sizes (e.g., Hedges’ “g”, odds ratios) or generate forest plots for outcomes across studies. While this decision was appropriate given the substantial heterogeneity in study designs, interventions, and outcome measures—which precluded meaningful meta-analysis—it limits the ability to quantitatively compare effect magnitudes or visually assess the distribution of effects across the evidence base. Future systematic reviews in this area, as the field matures and methodologies standardize, would benefit from conducting such quantitative syntheses where feasible.
Strengths
This review has several strengths that enhance the reliability and applicability of its findings. Methodologically, it was conducted in accordance with PRISMA 2020 guidelines and employed a comprehensive, multi-database search strategy spanning both peer-reviewed and preprint literature to capture the rapidly evolving evidence base. The review is among the first to systematically synthesize the emergent literature on generative AI’s role across the entire health misinformation lifecycle—production, propagation, and mitigation—integrating evidence across technical, sociotechnical, and governance layers. By incorporating diverse study designs (from randomized experiments to platform-scale audits) and pre-specifying a narrative synthesis approach to handle substantial heterogeneity, the review provides a nuanced, interdisciplinary perspective that is directly relevant to public health practitioners, platform designers, and policy-makers. The explicit definitions of core concepts and the transparent reporting of protocol deviations further strengthen the reproducibility and clarity of the review process.
Future research directions
The evidence gaps identified in this review point to several critical avenues for future research. First, studies must expand beyond text and text–image modalities to rigorously evaluate audio and video deepfakes in health contexts, assessing their unique persuasive power and detection requirements. Second, the overwhelming focus on English-language content and high-income settings must be countered by multilingual and cross-cultural studies that examine how generative AI interacts with health communication norms, platform dynamics, and sociodemographic vulnerability factors in diverse regions. Understanding these contextual factors is essential for tailoring effective, equitable interventions. Third, research should move beyond controlled experiments and platform-scale observational studies to conduct health-specific field trials that trace the real-world propagation of AI-generated misinformation and test the effectiveness of mitigation strategies (e.g., labels, warnings, prebunking) in ecologically valid environments. Fourth, technical detection and verification tools need to evolve from generic classifiers to health-aware, claim-grounded systems that integrate medical evidence bases, temporal reasoning, and multimodal provenance tracking. Finally, interdisciplinary frameworks are required to operationalize governance and accountability measures—such as mandatory system-prompt logging, red-teaming protocols, and transparent incident reporting—and evaluate their impact on reducing harms in health-facing AI applications. Prioritizing these directions will build a more robust, equitable, and actionable evidence base for mitigating AI-generated health misinformation.
Conclusion
Generative AI fundamentally alters the landscape of health misinformation by drastically lowering the barriers to creating credible, tailored false narratives and by reshaping propagation dynamics through platform algorithms and decoupled user sharing behaviors. While emerging technical, sociotechnical, and governance mitigation strategies show promise, they remain nascent and unevenly evaluated against this novel threat. The current evidence is concentrated in English-language, text-based domains concerning high-salience topics like COVID-19, highlighting critical gaps in research on audio/video deepfakes, non-English contexts, and chronic health domains. Future work must concretely prioritize: developing multimodal benchmarks for audio/video deepfakes, building claim-aware verification systems that integrate medical evidence, testing label designs in field experiments, auditing API-level vulnerabilities for health topics, and studying platform amplification of synthetic content through partnered data access. A collaborative, multi-stakeholder effort—spanning researchers, policymakers, platform operators, and public health practitioners—is essential to develop equitable, resilient defenses against the escalating risks posed by AI-generated health misinformation.
Supplementary Information
Acknowledgements
Not applicable.
Clinical trial number
Not applicable.
Abbreviations
- AI
Artificial Intelligence
- AIGC
AI-Generated Content
- AI-misinfo
Artificial Intelligence–Generated Misinformation
- API
Application Programming Interface
- AUROC
Area Under the Receiver Operating Characteristic Curve
- CASP
Critical Appraisal Skills Programme
- JBI
Joanna Briggs Institute
- LLM
Large Language Model
- PRISMA
Preferred Reporting Items for Systematic Reviews and Meta-Analyses
- PROBAST
Prediction Model Risk of Bias Assessment Tool
- QUADAS
Quality Assessment of Diagnostic Accuracy Studies
- RCT
Randomized Controlled Trial
- ROBINS-I
Risk Of Bias In Non-randomized Studies of Interventions
- RoB 2
Revised Cochrane Risk-of-Bias Tool for Randomized Trials
- UI
User Interface
- UX
User Experience
Authors’ contributions
H.R.S.: Conceptualized the study, designed the methodology, conducted data extraction and synthesis, performed quality assessments, and drafted the manuscript. N.Gh.: Assisted in database searches, screened titles and abstracts, contributed to data extraction, and participated in manuscript revision. H.K.: Conducted risk-of-bias assessments, analyzed governance and transparency findings, and reviewed the final manuscript for critical intellectual content. All authors approved the final version of the manuscript and agreed to be accountable for all aspects of the work.
Funding
Not applicable.
Data availability
All data analyzed during this review are fully available within the text—particularly in Tables 2 and 3, which summarize the extracted information across included studies—as well as in Study-level risk-of-bias reporting. Also included are the accompanying Supplementary Files 1 and 2—search strategy and PRISMA.
Declarations
Ethics approval and consent to participate
Ethics approval for conducting this systematic review was not required. No participants were involved in this research. Furthermore, all data synthesized from included studies were based on aggregate or secondary data as presented in the original publications; no primary data collection from human subjects or platforms was performed.
Consent for publication
Not applicable.
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- 1.Saeidnia HR, Hosseini E, Lund B, Tehrani MA, Zaker S, Molaei S. Artificial intelligence in the battle against disinformation and misinformation: a systematic review of challenges and approaches. Knowl Inf Syst. 2025;67(4):3139–58. [Google Scholar]
- 2.Hassan A, Ahmad SG, Iqbal T, Munir EU, Ayyub K, Ramzan N. Enhanced model for gestational diabetes mellitus prediction using a fusion technique of multiple algorithms with explainability. Int J Comput Intell Syst. 2025;18(1):47. [Google Scholar]
- 3.Oda M. Generative AI and foundation models in medical image. Radiol Phys Technol. 2025;18(4):937–48. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Haghighat F, Nemati Z, Rambodrad A, Negareshifard P, Jafari E. Multimodal deep learning and data fusion in precision breast oncology: clinical Applications, fusion Strategies, and future directions. Infosci Trends. 2025;2(10):81–115. [Google Scholar]
- 5.Babaei R, Cheng S, Duan R, Zhao S. Generative artificial intelligence and the evolving challenge of deepfake detection: a systematic analysis. J Sens Actuator Networks. 2025;14(1):1–38.
- 6.Nasiri Bonaki H, Sadeghi S, Asemi Esfahani A, Alishahi F, Ardalani Nasab N. Algorithmic amplification of surgical misinformation: how social media prioritizes engagement over accuracy and distorts patient choices. Infosci Trends. 2025;2(7):38–48. [Google Scholar]
- 7.Hassan A, Nawaz S, Tahira S, Ahmed A. Preterm birth prediction using an explainable machine learning approach. Artificial Intell Appl. 202500(00)1–14.
- 8.Menz BD, Modi ND, Sorich MJ, Hopkins AM. Health disinformation use case highlighting the urgent need for artificial intelligence vigilance: weapons of mass disinformation. JAMA Intern Med. 2024;184(1):92–6. [DOI] [PubMed] [Google Scholar]
- 9.Menz BD, Kuderer NM, Bacchi S, Modi ND, Chin-Yee B, Hu T, et al. Current safeguards, risk mitigation, and transparency measures of large Language models against the generation of health disinformation: repeated cross-sectional analysis. BMJ. 2024;384:e078538. [DOI] [PMC free article] [PubMed]
- 10.Doumanas D, Karakikes A, Soularidis A, Mainas E, Kotis K. Emerging threat vectors: how malicious actors exploit LLMs to undermine border security. AI. 2025;6(9):232. [Google Scholar]
- 11.Keshavarz H, Wang T, Vardell E, Saeidnia HR. Challenges that health professionals face to evaluate and trust online health information: the role of conscientiousness. Infosci Trends. 2024;1(1):27–43. [Google Scholar]
- 12.Mohammadzadeh Z, Marengo A, Santamato V, Raayatpanah MA. Enhancing healthcare efficiency in iran: A comprehensive analysis of Health-Oriented apis using machine learning techniques. Infosci Trends. 2024;1(2):1–33. [Google Scholar]
- 13.Modi ND, Menz BD, Awaty AA, Alex CA, Logan JM, McKinnon RA, et al. Assessing the system-instruction vulnerabilities of large Language models to malicious conversion into health disinformation chatbots. Ann Intern Med. 2025;178(8):1172–80. [DOI] [PubMed]
- 14.Corsi G, Marino B, Wong W. The spread of synthetic media on X. Harv Kennedy School Misinform Rev. 2024;5(3):1–19. [Google Scholar]
- 15.Drolsbach C, Pröllochs N. Characterizing AI-generated misinformation on social media. arXiv preprint arXiv: 2505.10266v1 [Preprint] 2025.
- 16.Amerini I, Barni M, Battiato S, Bestagini P, Boato G, Bruni V, et al. Deepfake media forensics: status and future challenges. J Imaging. 2025;11(3):1–42. [DOI] [PMC free article] [PubMed]
- 17.Theodorakopoulos L, Theodoropoulou A, Klavdianos C, Networks I. AI Integration, and ethical dimensions. J Theoretical Appl Electron Commer Res. 2025;20(2):115. [Google Scholar]
- 18.Bashardoust A, Feuerriegel S, Shrestha YR. Comparing the willingness to share for human-generated vs. AI-generated fake news. Proc ACM Hum Comput Interact. 2024;8(CSCW2):1–21.39286336 [Google Scholar]
- 19.Zhou J, Chen AZ, Shah D, Reese LS, De Choudhury M. It’s a conversation, not a quiz: A risk taxonomy and reflection tool for LLM adoption in public health. arXiv preprint arXiv:2411.02594v1 [Preprint] 2024.
- 20.Zhang Z, Zhang Y, Zhou X, Huang L, Razzak I, Nakov P, et al. From Generation to Detection: A multimodal multi-task dataset for benchmarking health misinformation. arXiv preprint arXiv:2505.18685v1 [Preprint] 2025.
- 21.Jamieson J, Hara T, Akiyama M, editors. Flagging emotional manipulation: impacts of manipulative content warnings on sharing intentions and perceptions of health-related social media posts. Proceedings of the Extended Abstracts of the CHI Conference on Human Factors in Computing Systems. 2025;281:1–9.
- 22.Li F, Yang Y. Impact of artificial Intelligence–Generated content labels on perceived Accuracy, message Credibility, and sharing intentions for misinformation: Web-Based, Randomized, controlled experiment. JMIR Formative Res. 2024;8(1):e60024. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Page MJ, McKenzie JE, Bossuyt PM, Boutron I, Hoffmann TC, Mulrow CD, et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ. 2021;372:n71. [DOI] [PMC free article] [PubMed]
- 24.Zhou J, Zhang Y, Luo Q, Parker AG, De Choudhury M, editors. Synthetic lies: Understanding ai-generated misinformation and evaluating algorithmic and human solutions. Proceedings of the 2023 CHI conference on human factors in computing systems; 2023;436:1–20.
- 25.Mason C, Aleroud A, Melhem A, Halloush Z, Williams JA, editors. Generative AI vs. Human Deception: A Comparative Analysis of ChatGPT, Gemini, and Human-Generated Disinformation. Proceedings of the 2025 ACM International Workshop on Security and Privacy Analytics; 2025;0:13–22.
- 26.Spitale G, Biller-Andorno N, Germani F. AI model GPT-3 (dis) informs Us better than humans. Sci Adv. 2023;9(26):eadh1850. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Malek S, Griffin C, Fraleigh R, Lennon RP, Monga V, Shen L. A methodology framework for analyzing health misinformation to develop inoculation intervention using large Language models: a case study on covid-19. MedRxiv. 2025:2025.05.22.25327931.
- 28.Deiana G, Dettori M, Arghittu A, Azara A, Gabutti G, Castiglia P. Artificial intelligence and public health: evaluating ChatGPT responses to vaccination Myths and misconceptions. Vaccines. 2023;11(7):1217. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
All data analyzed during this review are fully available within the text—particularly in Tables 2 and 3, which summarize the extracted information across included studies—as well as in Study-level risk-of-bias reporting. Also included are the accompanying Supplementary Files 1 and 2—search strategy and PRISMA.

