Skip to main content
Frontiers in Public Health logoLink to Frontiers in Public Health
. 2026 Jan 6;13:1724546. doi: 10.3389/fpubh.2025.1724546

The theoretical impact of AI-based quality evaluation of short-video health information on public cognition and treatment adherence: a case study of denosumab combined with PD-1/PD-L1 therapy for lung cancer bone metastasis

Jia-wen Wang 1, Jian-Jun Xun 1,*, Fei-Fei Zhao 1,*
PMCID: PMC12816342  PMID: 41567771

Abstract

Background

Bone metastasis occurs in 30–40% of patients with advanced non-small cell lung cancer (NSCLC), and denosumab combined with PD-1/PD-L1 inhibitors has emerged as a promising treatment strategy. However, the “algorithmic echo chamber” effect on short-video platforms may distort patient cognition and treatment decision-making.

Methods

A cross-sectional study was conducted using a custom-developed web crawler to collect 1,369 videos from Bilibili, Douyin, and Xiaohongshu. A total of 402 videos were included after a three-tier keyword filtering process. An AI-based evaluation system built upon the doubao-seed-1.6 model was established, integrating three international standards—Global Quality Score (GQS), Journal of the American Medical Association (JAMA) benchmark criteria, and the modified DISCERN tool—to assess multidimensional information quality. Kruskal–Wallis tests and Spearman correlation analyses were performed to explore inter-platform differences and the relationship between information quality and user engagement metrics.

Results

Overall video quality was substantially below professional medical standards: the mean GQS was 2.84 ± 1.06 (56.8% of the full score), JAMA was 0.34 ± 0.57 (8.5%), and modified DISCERN was 1.55 ± 0.69 (31.0%). Significant quality differences were observed across platforms (p < 0.001, Cohen’s d = 0.6–0.8): Douyin ranked highest, followed by Xiaohongshu, with Bilibili lowest. Correlation between user engagement and content quality was extremely weak (R2 = 0.004, r = 0.062), indicating substantial decoupling—high engagement did not equate to high-quality content. Medical professionals accounted for only 25.6% of content creators, while patient-generated content reached 52.2%. Evidence-based treatment information comprised merely 20.0–26.7%, whereas misleading or inaccurate claims accounted for 6.7–13.3%.

Conclusion

From a behavioral and cognitive perspective, the low quality of immune-oncology information on short-video platforms, coupled with algorithm-driven amplification of high-engagement but low-quality content, may exacerbate cognitive bias, potentially increasing clinical safety risks such as insufficient hypocalcemia monitoring and inadequate MRONJ prevention. Establishing a professional governance and oversight system is urgently required.

Keywords: AI evaluation, algorithmic echo chamber, cognitive bias, denosumab, health information quality, immune checkpoint inhibitors, lung cancer bone metastasis, short-video platforms

1. Introduction

Lung cancer remains the leading cause of cancer-related mortality worldwide, with more than half of patients diagnosed at a metastatic stage and a low 5-year survival rate of only approximately 5% (1). Bone metastasis is particularly common among patients with advanced non-small cell lung cancer (NSCLC), occurring in up to 30–40% of cases (2). Excessive osteoclast activation promotes osteolytic destruction, while growth factors such as TGF-β and IGF released from bone resorption further enhance tumor proliferation. Additionally, bone marrow stromal cells, myeloid-derived suppressor cells (MDSCs), and regulatory T cells (Tregs) contribute to an immunosuppressive tumor microenvironment, facilitating immune escape. Together, these mechanisms form a self-perpetuating “vicious cycle” that worsens clinical outcomes (3–5).

Denosumab, a fully human monoclonal antibody, specifically binds to receptor activator of nuclear factor-κB ligand (RANKL), thereby inhibiting osteoclast differentiation, maturation, and function. It has demonstrated significant clinical benefit by reducing the overall risk of skeletal-related events (SREs) by approximately 18% in bone metastatic patients (6, 7). Beyond its anti-resorptive role, denosumab also exerts key immunomodulatory functions by blocking RANKL–RANK signaling, including promoting dendritic cell maturation, enhancing CD8+ T-cell cytotoxicity, and alleviating the immunosuppressive microenvironment (8–10). In parallel, PD-1/PD-L1 immune checkpoint inhibitors (ICIs) have become the standard first-line therapy for advanced lung cancer, restoring T-cell function and effectively enhancing antitumor immunity (11, 12).

Given their complementary mechanisms, the combination of denosumab and PD-1/PD-L1 ICIs shows promising synergistic effects in the bone metastatic setting, with improved objective response rate (ORR), prolonged progression-free survival (PFS), and manageable safety profiles in lung cancer patients (13, 14). This provides new therapeutic opportunities for this high-risk population.

With the rise of short-video platforms and social media, medical and health-related information has become widely disseminated through algorithm-driven personalized recommendations, making these platforms a major source of health knowledge for the general public (15). However, while recommendation algorithms improve information accessibility, they also reinforce the “algorithmic echo chamber” effect, characterized by content homogenization and strengthened cognitive biases (16). These issues are particularly evident on Chinese platforms such as Xiaohongshu, Bilibili, and Douyin, where complex medical concepts are frequently oversimplified, anecdotal evidence is generalized, individualized treatment principles are overlooked, and essential evidence-based perspectives such as risk communication and adverse effects are insufficiently addressed—even in videos posted by healthcare professionals (17–19). Continuous exposure to selectively amplified content such as “successful miracles” or extreme negative experiences may distort risk perception, reinforce biased interpretations, and influence real-world health cognition and decision-making behavior (20–22).

Meanwhile, advances in natural language processing (NLP) and machine learning have greatly enhanced the objectivity, scalability, and multimodal analytical capacity of short-video quality assessment. AI-based frameworks can process text, audio, and visual features simultaneously, minimizing subjective bias and enabling comprehensive evaluation according to internationally recognized metrics, such as the Global Quality Score (GQS), DISCERN, and JAMA criteria (23–25). These technologies provide a robust foundation for systematic evaluation of online cancer-related information quality.

Despite the emerging evidence supporting the clinical benefits of denosumab combined with PD-1/PD-L1 inhibitors for lung cancer bone metastasis (13), studies investigating how related information is disseminated on short-video platforms remain scarce. Research on communication characteristics, audience cognition, and potential misinformation in the context of cancer immunotherapy is largely lacking (26, 27). Moreover, little is known about how the quality of disseminated information and resultant cognitive biases may influence treatment understanding, adherence behaviors, and ultimately patient outcomes (28). Therefore, it is imperative to establish AI-driven assessment models to systematically evaluate the quality of short-video content and explore how informational differences may shape patient perception and potential immunotherapy decision-making.

In this study, we employed AI-based multimodal analysis to quantify the quality of short-video content related to denosumab combined with PD-1/PD-L1 therapy on Douyin, Bilibili, and Xiaohongshu (24, 29), and to investigate how varying content quality may contribute to cognitive bias and influence immunologic awareness and related clinical decision-making among patients with lung cancer bone metastasis (Figure 1).

Figure 1.

Flowchart illustrating an AI-driven assessment framework for medical information quality on short-video platforms. The process includes three stages: Multi-Platform Content Analysis, AI Quality Evaluation, and Patient Decision Impact. Each stage describes specific roles in systematic data collection, multi-dimensional assessment, and influence on clinical decision-making.

Conceptual framework of AI-driven evaluation of short-video medical information and its impact on patient decision-making. This framework illustrates how multi-platform content analysis feeds into an AI-driven quality assessment pipeline, ultimately influencing patients’ cognitive pathways and decision-making behaviors. It highlights a systematic approach to evaluating the quality of immunotherapy-related medical information and its clinical significance.

2. Materials and methods

2.1. Data collection

This cross-sectional observational study systematically evaluated the quality of medical short-video content related to “denosumab combined with PD-1/PD-L1 inhibitors for lung cancer bone metastasis” across major Chinese social media platforms. The study adhered to the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) guidelines to ensure transparency and methodological rigor in observational research (30, 31). Based on user population, content characteristics, and influence on medical science communication, three widely used short-video platforms were selected: Bilibili (the largest video-sharing platform in China, with over 270 million monthly active users), Xiaohongshu (also known as RED, a lifestyle-sharing platform with more than 200 million monthly active users), and Douyin (the Chinese version of TikTok, with over 600 million daily active users) (32, 33).

A unified search strategy was applied using the core keyword phrase “denosumab combined with PD-1/PD-L1 immunotherapy for lung cancer bone metastasis” to conduct a systematic retrieval across the three platforms. Given that Bilibili, Douyin, and Xiaohongshu incorporate intelligent synonym recognition and keyword expansion algorithms, enabling automatic matching with relevant terminologies, the use of a single core search term was considered sufficient for comprehensive dataset coverage and aligned with best practices in current short-video content research (34). Data extraction was performed on August 26, 2025, including all eligible videos available at the time of retrieval. No restrictions were applied regarding the publication date of the videos. A self-developed web crawler was utilized to collect data from the three platforms, yielding a total of 1,369 initial videos.

A standardized manual relevance assessment based on a three-level keyword scoring system was employed for screening (35). Primary keywords (high weight) included denosumab, bone metastasis, and lung cancer bone metastasis; secondary keywords (medium weight) covered immunotherapy, PD-1, PD-L1, immune checkpoint inhibitors, lung cancer, and combination therapy; tertiary keywords (low weight) consisted of bone lesions, osteoporosis, tumor therapy, targeted therapy, clinical trials, and evidence-based medicine.

The relevance scoring criteria were as follows: 3 points (high relevance): videos containing both primary and secondary keywords; 2 points (moderate relevance): videos containing primary keywords only; 1 point (low relevance): videos containing only secondary or tertiary keywords;0 point (irrelevant): videos with no related keywords.

Videos with a relevance score ≥1 were included in the preliminary screening. Following the initial screening, manual speech-to-text transcription was performed to supplement content data. Additionally, metadata—including likes, shares, user preference metrics, and video duration—were extracted and completed whenever possible, with the exception of view count data from Xiaohongshu, which is not publicly accessible. Videos with incomplete transcription content were systematically excluded to ensure full textual availability for subsequent analyses.

After de-duplication and quality filtering, 402 videos fulfilled the statistical power requirements for short-video quality assessment research (36), comprising 222 from Bilibili (55.2%), 105 from Douyin (26.1%), and 75 from Xiaohongshu (18.7%).

Inclusion criteria: (1) Content involving denosumab or treatment of lung cancer bone metastasis; (2) Video duration ≥15 s; (3) Chinese-language audio with clear intelligibility; (4) Relevance score ≥1; (5) Complete textual transcription; (6) Complete platform-specific metadata.

Exclusion criteria: (1) Pure advertisements or commercial promotion; (2) Duplicated or re-posted content; (3) Poor or incomprehensible audio quality; (4) Evidently inaccurate or misleading information; (5) Relevance score <1; (6) Missing transcription data; (7) Missing platform-specific metadata.

Additionally, manual verification of all included videos’ titles and publisher accounts was performed to identify potential cross-platform duplication. No systematic instances of the same video being repeatedly included across different platforms were identified. A few videos with similar titles were confirmed to differ in uploader identity, video duration, and presentation format; therefore, they were retained as independent samples.

2.2. Video classification

A systematic classification framework based on the professional background and authority of content creators was applied (37). Healthcare professionals in relevant specialties: oncologists, orthopedic surgeons, and other clinicians with direct experience in managing lung cancer bone metastasis, including those officially verified by Douyin; Healthcare professionals in unrelated fields: licensed clinicians, nurses, pharmacists, and other medical personnel without direct involvement in this disease area; Patients and caregivers: individuals diagnosed with the disease, family caregivers, or patient advocates sharing personal disease experiences; Other individuals: general users without medical training, commercial organizations, media accounts, and popular health-science content creators.

An automated identity-verification system was developed to classify uploader profiles. This system applies intelligent recognition based on uploader names and video content, integrates professional background validation via Doubao API calls, and generates a credibility confidence score (0–1 scale) (38, 39). For Douyin-verified accounts, platform-specific optimization modules were implemented to correct and enhance metadata extraction, ensuring consistent and accurate identification.

2.3. Quality assessment

A comprehensive video-quality evaluation system was developed using the state-of-the-art doubao-seed-1.6 model. The system adopts a modular architecture composed of two primary components: (1) a core analytical engine and (2) an integrated visualization generator for quality scoring outputs. The design of this evaluation system was informed by the most recently published framework for assessing the quality of health education short videos (2025 edition) (40). In addition, the system incorporates three internationally recognized quality assessment standards for medical information: the Global Quality Score (GQS), the Journal of the American Medical Association (JAMA) benchmark criteria, and the modified DISCERN instrument ((m)DISCERN) (40, 41).

The Global Quality Score (GQS) is a five-point scale developed by Bernard et al. (36), assessing content accuracy, completeness of information, clarity of expression, logical organization, and practical value (17). The JAMA benchmark criteria, proposed by Silberg et al. (42), range from 0 to 4 points and evaluate four standards: authorship, attribution, disclosure of sources, and currency of information. The modified DISCERN ((m)DISCERN) tool, derived from the original DISCERN instrument developed by Charnock et al. (43), is one of the most frequently used instruments in health-information research. It employs a 5-point scoring system covering treatment options, risk–benefit evaluation, information quality, decision-support capacity, and overall reliability.

Video quality assessment was first conducted using the doubao-seed-1.6 large language model with a structured prompt template designed on the basis of medical domain knowledge. The system incorporates an intelligent speech-to-text module capable of correcting medical terminology errors, including homophones, and accurately mapping drug names, diagnoses, and treatment-related vocabulary. An automated identity-recognition component analyzes uploader names, institutional affiliations, and video content to detect keywords related to authoritative medical organizations and professional credentials. This enables a multidimensional framework for assessing the credibility of uploaders and represents an advanced technical approach in short-video quality evaluation (44).

Instead of traditional inter-rater reliability testing, this study adopted a Quality Management System (QMS) approach to ensure the professionalism and accuracy of evaluations (45, 46). The quality assessment procedures consisted of four stages:

  • (1) AI preliminary scoring: Each video was independently evaluated three times by the doubao-seed-1.6 large language model. A re-evaluation process was automatically triggered when the score difference exceeded 1 point, continuing until consistent results were obtained.

  • (2) Primary expert review: Two clinical specialists in bone oncology independently reviewed AI-generated scores for all 402 videos, evaluating the accuracy and appropriateness of the scoring based on GQS, JAMA, and mDISCERN standards, with adjustments made if necessary.

  • (3) Secondary arbitration: When the discrepancy between the two experts exceeded 2 points, arbitration was conducted by a third attending physician certified in bone oncology.

  • (4) Tertiary quality supervision: A fully blinded supervisory audit was conducted by an independent senior specialist who randomly inspected 10% of the sample (n = 40) without access to prior evaluations. The supervisor confirmed that the reviewed scores met the established standards, verifying the reliability and effective execution of the assessment workflow.

This quality management design reflects an ISO-9001–style quality control framework (47), combining complete inspection with random auditing to ensure scoring integrity—providing more comprehensive quality assurance than conventional sampling-based reliability testing.

Additional quality control measures included automated detection and handling of scoring anomalies. Full-process logging captured timestamps, API request parameters, response outputs, AI preliminary scores, expert revision trajectories, and reasons for adjustments. System monitoring encompassed API success rate, response latency, and scoring consistency. A data protection module ensured secure handling of sensitive information through de-identification, access control, encrypted transmission, and operational log tracking.

Comprehensive quality scores were calculated using the composite indicator methodology recommended by the OECD (2008) (39).

Overall Score=0.4×GQSnormalized+0.3×JAMAnormalized+0.3×DISCERNnormalized

Normative weighting was applied based on expert consensus regarding the relative importance of quality dimensions in digital health information assessments: overall quality and accuracy (GQS) were prioritized with the highest weight (0.4), while source transparency (JAMA) and treatment reliability (mDISCERN) were assigned equal weights (0.3) (48). Similar weighting approaches have been widely adopted in previous studies on online health-information quality (49). To ensure interpretability, individual component scores were reported alongside the overall score.

2.4. Statistical analysis

Data processing and statistical analyses were performed using Python version 3.9 or above, and figures were generated in accordance with academic publishing standards (50). Descriptive statistics included measures of central tendency, dispersion, and distribution, stratified by platform. Video content was analyzed using natural language processing techniques for topic classification and keyword extraction (51).

A significance level of α = 0.05 was adopted for inferential analyses. Normality was assessed using the Shapiro–Wilk and Jarque–Bera tests. The Kruskal–Wallis test (for multiple groups) or Mann–Whitney U test (for pairwise comparisons) was used for continuous variables, while categorical variables were compared using the Chi-square test (17). Correlation analyses utilized Pearson or Spearman coefficients depending on distributional characteristics, with Kendall’s tau calculated as a robustness check.

Multiple comparison correction was performed using the Benjamini–Hochberg false discovery rate (FDR) method, with Bonferroni correction also provided for comparison (52). Effect sizes were estimated using R2 for correlation analyses and Cohen’s d for group comparisons (53).

Multivariable linear regression models were constructed, accompanied by diagnostic evaluations including residual normality, heteroscedasticity, multicollinearity (variance inflation factor >10), and influential point detection (Cook’s distance > 4/n). All analyses were two-tailed, and p < 0.05 was considered statistically significant (52, 53).

2.5. Model selection

The doubao-seed-1.6 model (Volcengine Ark) was selected as the auxiliary scoring tool for text analysis due to its strong semantic understanding and structured output capabilities in the Chinese language environment, making it suitable for large-scale processing of Chinese health-related video content (official documentation available from Volcengine). However, the research design, analytical interpretation, and manuscript writing were entirely conducted by the authors without the assistance of any generative AI tools.

Figure 2 illustrates the overall data processing workflow. Detailed algorithmic parameters, software configurations, computational formulas, prompt templates, and additional datasets are provided in the Supplementary materials.

Figure 2.

Flowchart detailing an AI-driven quality assessment of treatment videos. Steps include ethical considerations, data collection, manual screening, data enhancement, video classification, AI quality assessment, review, and statistical analysis. Platforms used are Bilibili, TikTok, and Xiaohongshu. The final output comprises a processed dataset with figures and tables.

Data processing workflow. Visualization of the data collection, cleaning, extraction, AI-driven scoring, and statistical analysis pipeline for the included short-video dataset.

3. Results

3.1. Baseline characteristics and platform distribution

A total of 402 videos were included in the final analysis, with uneven distribution across platforms (Figure 3): Bilibili accounted for 55.2% (n = 222), Douyin 26.1% (n = 105), and Xiaohongshu 18.7% (n = 75) (Figure 3B). Play count availability was 81.3%, while all remaining 20 evaluation indicators achieved 100% completeness (Figure 3A), ensuring analytical robustness.

Figure 3.

Panel A shows a bar chart titled "Data Completeness Analysis," with most metrics at 100% completeness except one at 81.3%. Panel B is a pie chart titled "Platform Distribution," displaying platforms: Bilibili at 55.2%, Douyin at 26.1%, and Xiaohongshu at 18.7%.

Data completeness and platform distribution of included videos. (A) Data completeness across 21 evaluation indicators among 402 videos collected up to August 26, 2025. Indicators include quality scoring (GQS, JAMA, DISCERN, and composite score), content characteristics (author authority score, adjusted score, authority category, content theme, quality grade, creator identity), engagement metrics (likes, comments, shares, favorites, plays), and technical parameters (adjusted duration, speech correction count, speech confidence, creator verification score, Doubao topic confidence, content identity confidence). All indicators except play count achieved 100% completeness; play count completeness was 81.3%. (B) Platform distribution showing proportions of Bilibili (55.2%, n = 222), Douyin (26.1%, n = 105), and Xiaohongshu (18.7%, n = 75). All analyses were performed in Python 3.9+.

Baseline characteristics revealed pronounced platform-level heterogeneity (Table 1). In terms of content quality, significant differences were observed in GQS (2.45 ± 1.03 vs. 3.40 ± 0.85 vs. 3.17 ± 0.92, p < 0.001), JAMA (0.25 ± 0.51 vs. 0.50 ± 0.62 vs. 0.41 ± 0.62, p < 0.001), and DISCERN scores (1.33 ± 0.61 vs. 1.90 ± 0.67 vs. 1.72 ± 0.69, p < 0.001).

Table 1.

Baseline characteristics of the videos on Bilibili, Douyin, and Xiaohongshu.

Characteristics Overall (n = 402) Bilibili (n = 222) Douyin (n = 105) Xiaohongshu (n = 75) p-value
GQS score 2.84 ± 1.06 2.45 ± 1.03 3.40 ± 0.85 3.17 ± 0.92 <0.001 *
JAMA score 0.34 ± 0.57 0.25 ± 0.51 0.50 ± 0.62 0.41 ± 0.62 <0.001 *
DISCERN score 1.55 ± 0.69 1.33 ± 0.61 1.90 ± 0.67 1.72 ± 0.69 <0.001 *
Likes 56.50 (0, 36,939) 26.50 (0, 36,939) 207.00 (5, 5,619) 33.00 (2, 493) <0.001 *
Shares 11.00 (0, 3,275) 4.00 (0, 2,363) 57.00 (0, 3,275) 17.00 (0, 365) <0.001 *
Comments 5.50 (0, 2,849) 2.00 (0, 2,849) 18.00 (0, 738) 3.00 (0, 130) <0.001 *
Collects 23.00 (0, 5,870) 13.00 (0, 3,703) 92.00 (0, 5,870) 28.00 (0, 451) <0.001 *
Play count 32,079.45 ± 75,610.26 13,120.52 ± 41,700.56 72,164.04 ± 108,791.49 N/A <0.001 *
Duration (sec) 100.50 (18, 435,420) 127.50 (18, 435,420) 83.00 (18, 386) 67.00 (19, 414) <0.001 *

The table presents the baseline characteristics of videos across three platforms (Bilibili, Douyin, and Xiaohongshu). Quality scores (GQS, JAMA, and DISCERN) are reported as mean ± SD, while engagement metrics (Likes, Shares, Comments, Collects) and video duration are presented as median (min, max). Play count data is provided as mean ± SD. The statistical significance of differences between platforms was assessed using the Kruskal–Wallis test, with all comparisons showing p-values less than 0.001. Play count data for Xiaohongshu were not publicly available. Outlier Note: The maximum duration on Bilibili reached 435,420 s (~121.5 h), indicating the presence of long-form or archival videos, in contrast to short-form norms on Douyin and Xiaohongshu.

User engagement metrics displayed similar variation: Douyin had a significantly higher median “likes” count (207) than Bilibili (26.5) and Xiaohongshu (33) (p < 0.001), with consistent trends in shares, comments, and favorites. Video duration also differed significantly, with Bilibili showing the longest median length (127.5 seconds), followed by Douyin (83 seconds) and Xiaohongshu (67 seconds) (p < 0.001; Figure 4C).

Figure 4.

Four graphs labeled A through D present different data distributions. A: Bar chart of author authority levels across platforms, with high, medium, and low authority categories. B: Pie chart of content theme distribution, showing 78.6% in medical education. C: Bar chart of video duration by platform, highlighting most videos under three minutes. D: Bar chart of quality level distribution, divided into fair and poor categories.

Multidimensional distribution of short-video content characteristics. (A) Distribution of creator authority levels across platforms. Authority level was derived from a weighted scoring system including identity (40%), verification status (30%), professional qualifications (20%), and AI-supported validation (10%). (B) Content theme distribution across all videos: medical education (78.6%), advertisements (6.2%), personal opinions (3.5%), and others (11.7%). (C) Platform-specific distribution of video duration grouped into <1 min, 1–3 min, 3–5 min, 5–10 min, 10–20 min, and >20 min categories. (D) Quality grade distribution based on the composite scoring system integrating GQS (40%), JAMA (30%), and revised DISCERN (30%). Only 0.2% (n = 1) were rated “good,” 19.2% (n = 77) “fair,” and 80.6% (n = 324) “poor,” with no videos achieving “excellent”.

These findings imply intrinsic platform differences in content ecosystems, user engagement patterns, and algorithmic dissemination mechanisms.

3.2. Multidimensional evaluation of video content quality

Overall, the included videos exhibited notably low quality relative to established medical information standards. The quality grade distribution was severely skewed: 80.6% (n = 324) were classified as “poor,” 19.2% (n = 77) as “fair,” only 0.2% (n = 1) as “good,” and none reached the “excellent” threshold (Figure 4D).

Across specific quality indicators, the overall mean GQS score was 2.84 ± 1.06 (out of 5; pass rate: 56.8%), with significant inter-platform differences (Kruskal–Wallis H = 66.805, p < 0.001) (Figures 5A, 6A). JAMA performance was even weaker, with a mean of 0.34 ± 0.57 (out of 4; pass rate: 8.5%), and the highest platform (Douyin) scored only 0.50 ± 0.62 (Figure 6B). DISCERN scores were similarly low (mean 1.55 ± 0.69; pass rate: 31.0%), again with significant variations among platforms (H = 37.901, p < 0.001) (Figure 6C).

Figure 5.

This image consists of multiple panels analyzing video quality and engagement metrics on different platforms. Panel A shows a bar graph of quality scores by platform: Bilibili, Douyin, and XiaoHongShu across GQS, JAMA, DISCERN, and composite metrics. Panel B is a heatmap showing the correlation between these quality metrics. Panel C presents violin plots of engagement metrics distribution (likes, comments, shares) by platform. Panel D is a scatter plot analyzing the relationship between quality and engagement scores, with a trend line. Panel E displays a bar graph of video duration distribution across platforms. Panel F summarizes dataset details, including total videos and quality scores.

Platform differences in multidimensional quality assessments and associations with engagement. (A) Mean ± SD comparison of quality indicators (GQS, JAMA, DISCERN, composite score) across platforms using Kruskal–Wallis test (all p < 0.001). (B) Pearson correlation matrix of quality indicators showing moderate-to-strong positive correlations among scoring systems. (C) Violin plots of log10(x + 1)-transformed engagement metrics (likes, comments, shares) showing distinct distribution patterns across platforms. (D) Scatter plots with fitted regression lines showing a weak association between composite score and log10-transformed likes (Pearson r = 0.062; Spearman ρ = 0.047; R2 = 0.004; n = 402). (E) Histogram of video duration indicating right-skewed distribution and weak negative correlation with composite score (Pearson r = −0.238; p < 0.001). (F) Summary panel presenting overall dataset characteristics: platform distribution, mean quality scores (GQS 2.84 ± 1.05; JAMA 0.34 ± 0.57; DISCERN 1.55 ± 0.69; composite 39.17 ± 6.41), and median engagement metrics (likes 56, comments 5, shares 11).

Figure 6.

Bar charts labeled A to K display various scores and counts for Bilibili, Douyin, and XiaoHongShu. Panels show distributions for GQS, JAMA, mDISCERN, Composite, Like, Comment, Share, Collect, Duration, Authority, and Verification Scores. The data indicate different densities across platforms, with log scales used for some counts.

Distribution of quality, engagement, and content characteristics across short-video platforms. Upper panel (Quality indicators): Histograms show the platform-specific distributions of four quality indicators. (A) GQS (1–5 scale): Bilibili (2.455 ± 1.031, median = 2.000, n = 222), Douyin (3.398 ± 0.849, median = 4.000, n = 105), Xiaohongshu (3.173 ± 0.921, median = 3.000, n = 75); Kruskal–Wallis H = 66.805, p < 0.000001. (B) JAMA (0–4 scale): Bilibili (0.248 ± 0.510), Douyin (0.495 ± 0.622), Xiaohongshu (0.413 ± 0.617); H = 16.720, p = 0.000234. (C) Modified DISCERN (1–5 scale): Bilibili (1.333 ± 0.607), Douyin (1.905 ± 0.764), Xiaohongshu (1.653 ± 0.714); H = 37.901, p < 0.000001. (D) Composite quality score: Bilibili (constant 37.500 ± 0.000), Douyin (41.474 ± 6.830; range 17.250–59.500), Xiaohongshu (39.757 ± 5.240; range 31.000–58.750); H = 25.798, p = 0.000003. Middle panel (Engagement indicators, log-transformed): Data were transformed using log10(1 + x) to correct skewness. (E) Likes: Bilibili (0–36,939), Douyin (5–5,619), Xiaohongshu (0–1,700). (F) Comments: Bilibili (0–3,703), Douyin (0–5,870), Xiaohongshu (0–451). (G,H) Shares and favorites exhibit similar platform-dependent patterns. All engagement indicators showed significant platform differences (all p < 0.000001). Lower panel (Content characteristics): (I) Video duration (minutes): extreme values above the 95th percentile removed for clarity. (J) Authority score (0–100): platform-specific availability; most Bilibili and Xiaohongshu videos ≈0 due to absence of authority-scoring systems. (K) Verification score (0–100): only present on Douyin (78.581 ± 25.315, median = 90.000; 0–98.000); Bilibili and Xiaohongshu values are uniformly zero due to lack of verification mechanisms.

Internal consistency among quality measures showed moderate correlations: GQS vs. JAMA: r = 0.55, p < 0.001; GQS vs. DISCERN: r = 0.51, p < 0.001; JAMA vs. composite score: r = 0.66, p < 0.001 (Figure 5B), indicating that these tools captured complementary aspects of video quality.

3.3. Characteristics and authority of content creators

Analysis of content creators highlighted a substantial deficit in professional medical involvement. Among the 402 videos, 52.2% (n = 210) were produced by patients or caregivers; only 25.6% (n = 103) originated from healthcare professionals in the relevant field; 18.2% (n = 73) from other medical disciplines; and 4.0% (n = 16) from other creator categories (Figure 7A). Platform-specific distributions differed significantly: Bilibili had the highest proportion of patient-generated content (69.4%), whereas Douyin demonstrated higher professional engagement (relevant field 49.5%; other medical fields 41.9%), with Xiaohongshu showing intermediate values (Figures 7BD).

Figure 7.

Four pie charts compare medical professional identity distribution across platforms. Chart A: Overall, 52.2% patient, 25.6% same-field professionals. Chart B: Bilibili, 69.4% patient. Chart C: Douyin, 49.5% same-field professionals. Chart D: XiaoHongShu, 68% patient. Each chart shows percentage distribution and record total with average confidence scores.

Distribution of creator medical expertise across platforms. (A) Overall identity distribution among 402 videos: patient/caregiver-generated content dominated (52.2%, n = 210), followed by same-field professionals (25.6%, n = 103), other medical fields (18.2%, n = 73), and others (4.0%, n = 16). Mean confidence score for identity recognition: 0.64. (B) Bilibili (n = 222): patient content most prevalent (69.4%), lower identity confidence (0.56). (C) Douyin (n = 105): dominated by same-field professionals (49.5%) and other medical professionals (41.9%); high confidence score (0.87). (D) Xiaohongshu (n = 75): patient content remains dominant (68.0%); moderate identity confidence (0.55). These findings reveal substantial platform heterogeneity in professional participation and identification certainty.

Authority assessment further revealed that high-authority creators (certified medical professionals) accounted for only 36.3%, moderate-authority creators for 12.2%, and low-authority creators for 51.5% (Figures 4A, 8A–D). Authority level exhibited weak-to-moderate positive correlation with quality metrics: GQS: r = 0.355, p < 0.001; JAMA: r = 0.189, p < 0.001; DISCERN: r = 0.342, p < 0.001 (Figures 8EG).

Figure 8.

Four pie charts and three scatter plots analyze authority distributions and scores. Charts A-D show authority levels for Overall, Bilibili, Douyin, and XiaoHongShu, with varying proportions of high, medium, and low authority. Charts E-G plot Authority Scores against GQS, JAMA, and DISCERN Scores for Bilibili, Douyin, and XiaoHongShu, showing different correlations.

Comprehensive analysis of content creator authority levels across short-video platforms. Upper panels (A–D): Pie charts depict the distribution of creator authority levels for videos discussing denosumab combined with PD-1/PD-L1 inhibitors across four datasets: overall (n = 402), Bilibili (n = 222), Douyin (n = 105), and Xiaohongshu (n = 75). Authority levels were classified into three tiers: High authority (green): certified medical professionals with verified clinical credentials; Medium authority (blue): healthcare-associated professionals or medical students with partial medical qualifications; Low authority (orange): non-professional creators including patients, caregivers, and the general public. Authority categorization was based on creator profile information, professional credential verification, and self-reported expertise extracted by the Doubao AI model (doubao-seed-1.6). Lower panels (E–G): Scatter plots with linear regression lines illustrate the relationships between creator authority scores (x-axis, 0–100 scale) and three international health information quality indices (y-axis): Global Quality Score (GQS; 1–5), JAMA benchmark score (0–4), and DISCERN score (1–5). Individual points represent single videos and are color-coded by platform (Bilibili: red; Douyin: blue; Xiaohongshu: pink). Spearman correlation coefficients indicate weak-to-moderate positive associations: GQS: r = 0.355, p < 0.001; JAMA: r = 0.189, p < 0.001; DISCERN: r = 0.342, p < 0.001. These findings suggest that higher creator authority is statistically associated with better information quality, yet the strength of association remains limited, reflecting the persistent presence of low-quality content even among videos with professional creators.

Reliability ratings based on JAMA criteria showed that 95.5% (n = 384) were classified as low reliability (0–1 points), 4.2% (n = 17) as moderate (1–2 points), only 0.2% (n = 1) as high (2–3 points), and none achieved a very high rating (Figures 9A,G). Although differences existed across platforms (H = 16.720, p = 0.000234), even Douyin (0.495 ± 0.622) fell well below acceptable reliability thresholds (Figure 9B).

Figure 9.

Chart A shows the frequency distribution of JAMA reliability scores, mostly at zero. Chart B compares JAMA scores across platforms: Bilibili, Douyin, Xiaohongshu. Chart C displays confidence scores for speech correction, authority, and content theme. Scatter plots D, E, and F illustrate correlations between JAMA reliability and GQS, DISCERN, and composite scores. Chart G is a pie chart summarizing reliability levels, with low reliability as the majority. Chart H is a frequency distribution of Doubao verification scores, peaking at zero. Chart I is a heatmap showing correlation coefficients among various reliability metrics.

Multidimensional reliability analysis of video content across platforms. Upper panel (A) Histogram of JAMA reliability scores (0–3) across 402 videos: 95.5% (n = 384) fall within 0–1 range, indicating extremely low reliability. (B) Box plots comparing JAMA scores across platforms: Kruskal–Wallis H = 16.720, p = 0.000234. (C) Box plots of three confidence indicators: speech confidence (0.940 ± 0.068), authority confidence (0.639 ± 0.273), topic confidence (0.682 ± 0.162). Middle panel: Regression analysis of JAMA vs. (D) GQS (r = 0.504/ρ = 0.546; p < 0.001) (E) DISCERN (r = 0.308/ρ = 0.326; p < 0.001) (F) Composite score (r = 0.547/ρ = 0.657; p < 0.001) Platform represented with distinct colors. Lower panel (G) Pie chart summarizing JAMA-based reliability categories: low 95.5%, moderate 4.2%, high 0.2%, very high 0%. (H) Histogram of Doubao verification scores: mean 20.525 ± 36.889, median 0.000; 92.4% of non-zero values from Douyin only. (I) Correlation heatmap: speech confidence negatively correlates with other measures (−0.124 to −0.227), while JAMA positively correlates with authority/topic confidence (0.131–0.174).

3.4. Characteristics and topic analysis of video content

The majority of included videos were categorized as medical education content (78.6%, n = 316), although substantial gaps in quality and professional accuracy were observed. The remaining videos included miscellaneous content (11.7%), advertisements (6.2%), and personal opinions (3.5%) (Figure 4B; Figure 10A). Douyin showed the highest proportion of medical education content (91.4%), whereas Bilibili demonstrated greater heterogeneity across content types (73.4% medical education, 15.3% miscellaneous), with Xiaohongshu displaying intermediate characteristics (Figures 10BD).

Figure 10.

Four pie charts labeled A, B, C, and D illustrate content theme distributions. Chart A shows 78.6% medical education, 11.7% other, 6.2% advertisement, and 3.5% personal opinion. Chart B shows 73.4% medical education, 15.3% other, 7.2% advertisement, and 4.1% personal opinion. Chart C shows 91.4% medical education, 3.8% other, 2.9% advertisement, and 1.9% personal opinion. Chart D shows 76.0% medical education, 12.0% other, 8.0% advertisement, and 4.0% personal opinion.

Distribution of video content themes across platforms. (A) Overall distribution of content themes among all 402 included videos related to denosumab combined with PD-1/PD-L1 inhibitors for lung cancer bone metastases. Medical education content predominates (78.6%, n = 316), followed by other content (11.7%, n = 47), advertisements (6.2%, n = 25), and personal opinion videos (3.5%, n = 14), indicating that health education and popular science are the primary focus across platforms. (B) Theme distribution on Bilibili (n = 222). Medical education remains the main category (73.4%, n = 163), with a higher proportion of “other” content (15.3%, n = 34), advertisements (7.2%, n = 16), and personal opinions (4.1%, n = 9), reflecting slightly lower educational dominance but greater thematic diversity. (C) Theme distribution on Douyin (n = 105). Medical education content accounts for the highest proportion across platforms (91.4%, n = 96), while other content (3.8%, n = 4), advertisements (2.9%, n = 3), and personal opinions (1.9%, n = 2) are relatively rare, indicating a strong educational orientation but limited content diversity. (D) Theme distribution on Xiaohongshu (n = 75). Medical education comprises 76.0% (n = 57), followed by other content (12.0%, n = 9), advertisements (8.0%, n = 6), and personal opinions (4.0%, n = 3), showing a balanced pattern with moderate thematic diversity.

Keyword-based thematic analysis further indicated a lack of professional depth. Evidence-based treatment information constituted only 20.0–26.7% of content. Disease symptom descriptions accounted for 20.0% general health information for 13.3% and patient experiences for 13.3%. Although medical terminology appeared in 6.7–20.0% of videos most lacked accurate explanations of underlying mechanisms. Misinterpretations and misinformation were observed in 6.7–13.3% of content varying among platforms (Figure 11).

Figure 11.

Four pie charts depict keyword distribution across platforms. Chart A shows "Overall" with Evidence-Based Treatment at 26.7% and Disease & Symptoms at 20%. Chart B for "Bilibili" shows balanced distribution with Disease & Symptoms at 20%. Chart C for "Douyin" mirrors the overall pattern. Chart D for "XiaoHongShu" displays Evidence-Based Treatment leading at 40%, followed by Disease & Symptoms at 20%. Other categories include Misconceptions & Misinformation, Patient Experience, Medical Terminology, General Health Information, and Others.

Distribution of keyword categories related to denosumab plus PD-1/PD-L1 therapy across short-video platforms. Pie charts (A–D) depict the relative proportions of seven keyword categories extracted from 402 videos across four datasets: overall (n = 402), Bilibili (n = 222), Douyin (n = 105), and Xiaohongshu (n = 75). Keywords were identified using a hybrid approach combining the Doubao AI model (doubao-seed-1.6; context length 256 k tokens; temperature 0.3; max output 4,096 tokens), natural language processing techniques, and rule-based categorization. Seven keyword categories were defined based on established health information quality frameworks: 1. Evidence-based treatment. 2. Disease and symptoms. 3. General health information. 4. Medical terminology. 5. Patient experience. 6. Misconceptions and misinformation. 7. Others. Percentages represent the relative frequency of each category within each platform dataset. Data processing was conducted in Python (Pandas 1.3.5, NumPy 1.21.6), and visualizations were generated using Matplotlib 3.5.3. The distribution patterns show that evidence-based treatment content accounts for approximately 20.0–26.7% of keywords, whereas misconceptions and misinformation range from 6.7 to 13.3%, highlighting platform-level differences in scientific rigor and informational accuracy.

Analysis of video duration demonstrated no significant association with quality. Duration showed no correlation with GQS (r = −0.021, p = 0.676) or DISCERN (r = −0.002, p = 0.961), and a slight negative correlation with JAMA scores (r = −0.162, p = 0.001) (Figures 12AD). Platform-stratified analyses suggested only isolated associations: Bilibili showed a weak positive correlation with GQS (r = 0.132, p < 0.05), while Xiaohongshu demonstrated a moderate correlation with DISCERN (r = 0.382, p < 0.001) (Figures 12EH).

Figure 12.

Twelve scatter plots showing correlations between video duration and various scores: GQS, JAMA, DISCERN, and COMPOSITE. Plots A-D display overall correlations; E-H compare platforms. Plots I-L provide regression analysis. Each plot features data points, regression lines, and accompanying statistical details like R-squared and p-values.

Multi-platform correlation analysis between video duration and quality scores. First row (overall correlations). (A) Duration vs. GQS. Spearman analysis shows no significant correlation (r = −0.021, p = 0.676; n = 402), indicating that longer videos are not associated with higher global quality. (B) Duration vs. JAMA score. A weak but statistically significant negative correlation is observed (Spearman r = −0.162, p = 0.001), suggesting that longer videos may be modestly associated with lower reliability by JAMA criteria. (C) Duration vs. DISCERN. No significant association (Spearman r = −0.002, p = 0.961), implying that video length does not meaningfully influence DISCERN scores. (D) Duration vs. composite score. Although Pearson analysis indicates a weak negative relationship (r = −0.238, p < 0.001), Spearman correlation is non-significant (r = −0.044, p = 0.378), suggesting only a small effect of duration on the composite index. Second row (platform-stratified analyses): (E–H) Platform-specific Spearman correlations between video duration and quality indicators (GQS, JAMA, DISCERN, composite score) for Bilibili (n = 222), Douyin (n = 105), and Xiaohongshu (n = 75). Significant associations are limited and platform specific: Bilibili shows a weak positive correlation between duration and GQS, and a weak negative correlation with JAMA; Xiaohongshu exhibits a moderate positive correlation between duration and DISCERN; No robust or consistent correlation patterns are observed on Douyin. Third row (regression visualizations): (I–L) Linear regression plots illustrate the relationships between duration and GQS, JAMA, DISCERN, and composite scores, respectively. R2 values for GQS, JAMA, and DISCERN are near zero, indicating negligible explanatory power of duration for these metrics. For the composite score, R2 = 0.0568 (p < 0.001), indicating a statistically significant but small effect, with longer videos associated with slightly lower composite quality scores. Overall, these results suggest that video duration is not a reliable predictor of content quality across platforms.

The composite score and duration showed a weak negative association (r = –0.238, p < 0.001, R2 = 0.0568), indicating that longer content does not necessarily translate into higher-quality information (Figures 12I–L).

3.5. User engagement patterns and their relationship with content quality

User engagement was largely decoupled from content quality. Quality scores showed only negligible correlations with log-transformed “likes” counts (Pearson r = 0.062; Spearman ρ = 0.047; R2 = 0.004), indicating that videos with high engagement were not necessarily high-quality medical resources (Figure 5D). Similar patterns were observed across other engagement metrics: comments (r = 0.043), shares (r = 0.051), and favorites (r = 0.039) all showed non-significant associations with content quality (Figure 5C).

There were statistically significant variations in engagement across platforms. Douyin demonstrated the highest participation levels: median likes = 207 (range: 5–5,619), comments = 18 (range: 0–738), shares = 57 (range: 0–3,275), and favorites = 92 (range: 0–5,870). Despite better performance than other platforms, Douyin content remained far below professional standards (GQS 3.40 ± 0.85; JAMA 0.50 ± 0.62). Bilibili and Xiaohongshu had significantly lower engagement (all p < 0.001 vs. Douyin) (Table 1; Figures 6EH).

Distribution analysis of engagement indicators demonstrated platform-specific interaction patterns: Bilibili exhibited a long-tail distribution, Douyin showed high centralization, while Xiaohongshu displayed the greatest uniformity (Figure 5C).

3.6. Inter-platform heterogeneity and reliability validation

Platform-specific analyses indicated fundamental differences in quality control mechanisms. The correlation matrix revealed distinct variable association structures across platforms (Figure 13). The most striking contrast was in verification scores: only Douyin implemented a systematic verification system (mean 78.581±25.315; median 90.000), whereas Bilibili and Xiaohongshu uniformly displayed zero verification, suggesting an absence of structured content validation (Figures 6K, 9H).

Figure 13.

Four correlation matrices labeled A, B, C, and D, showing relationships between variables like GQS score, JAMA score, and others. Each matrix represents different data sets: Overall (n=402), Bilibili (n=222), Douyin (n=105), and XiaoHongShu (n=75). Correlation values range from -0.2 to 0.8, with color gradients from blue (negative) to red (positive).

Correlation heatmaps of quality, engagement, and verification indicators across short-video platforms. Four Pearson correlation matrices summarize relationships among 14 key variables in: (A) overall dataset (n = 402), (B) Bilibili (n = 222), (C) Douyin (n = 105), and (D) Xiaohongshu (n = 75). Variables include quality indicators (GQS, JAMA, DISCERN, composite score, adjusted score), engagement metrics (likes, comments, favorites, shares), duration (duration_seconds_corrected), and platform-specific verification indicators (Doubao verification score, author authority score, content theme confidence). Color intensity ranges from deep blue (strong negative correlation, r = −1.0) to deep red (strong positive correlation, r = 1.0), with white representing no correlation (r = 0). Key findings: 1. Overall dataset: strongest correlations are observed between author authority score and Doubao verification score (r = 0.990, p < 0.0001), and between GQS and adjusted score (r = 0.917, p < 0.0001). 2. Platform-specific availability: Doubao verification and authority scores are non-applicable (zero-inflated) for Bilibili and Xiaohongshu, indicating that AI-based verification is a unique feature of Douyin. 3. Bilibili: strongest internal consistency among quality metrics (GQS vs. adjusted score, r = 0.922) and high correlation among engagement metrics (collect_count vs. share_count, r = 0.860). 4. Douyin: distinct verification structure with strong coupling between author authority and verification scores (r = 0.911), and the highest correlation among engagement metrics (like_count vs. share_count, r = 0.949). 5. zXiaohongshu: pronounced clustering of engagement indicators (collect_count vs. share_count, r = 0.934) and strong correlation between GQS and adjusted score (r = 0.881). Statistical significance is denoted by *p < 0.05, **p < 0.01, ***p < 0.001. All correlations were estimated using two-sided Pearson tests (SciPy ≥1.7.0), with heatmaps generated in Matplotlib 3.5.3 using a “coolwarm” color scheme. These patterns highlight substantial inter-platform differences in how content quality, creator authority, engagement, and AI verification are structurally linked.

Multi-dimensional reliability analysis showed varying degrees of consistency across platforms. JAMA scores demonstrated moderate correlations with GQS (r = 0.504, p < 0.001), DISCERN (r = 0.308, p < 0.001), and composite scores (r = 0.547, p < 0.001) (Figures 9DF). Among credibility indicators, speech recognition confidence was highest (0.940 ± 0.068), whereas author authority confidence (0.639 ± 0.273) and topic classification confidence (0.682 ± 0.162) were comparatively lower, reflecting reliability variations across AI-assisted assessments (Figures 9C, I).

Distinct correlation patterns further highlighted platform-level system differences: Douyin: exceptionally high association between creator authority and verification score (r = 0.911); Bilibili: strongest internal consistency among quality indicators (GQS vs. adjusted score: r = 0.922); Xiaohongshu: closest clustering of engagement metrics (collect_count vs. share_count: r = 0.934) (Figure 13). Collectively, these divergences reflect structural disparities in platform ecosystems, user demographics, and algorithmic dissemination mechanisms, which exert a systematic influence on the quality of health information distribution.

4. Discussion

4.1. Summary of key findings

This study conducted a systematic quality assessment of 402 videos concerning denosumab combined with PD-1/PD-L1 immunotherapy for lung cancer bone metastasis across three leading Chinese short-video platforms. The overall findings revealed a markedly low quality of information, substantially below accepted medical communication standards (54) and professional digital health information criteria (55).

All three core quality indicators demonstrated significant deficiencies: the mean GQS score was only 2.84 ± 1.06 (out of 5; pass rate 56.8%), the mean JAMA score was 0.34 ± 0.57 (out of 4; pass rate 8.5%), and the revised DISCERN score averaged 1.55 ± 0.69 (out of 5; pass rate 31.0%). These results indicate systematic deficiencies in immunology-related information dissemination, raising concerns regarding the risk of patients being exposed to incomplete or misleading knowledge.

Quality differences between platforms exhibited small-to-moderate effect sizes (Cohen’s d = 0.6–0.8). Douyin performed relatively better (GQS: 3.40 ± 0.85; JAMA: 0.50 ± 0.62; DISCERN: 1.90 ± 0.67), whereas Bilibili performed the worst (GQS: 2.45 ± 1.03; JAMA: 0.25 ± 0.51; DISCERN: 1.33 ± 0.61). These disparities may be attributable to heterogeneity in platform demographics, content moderation policies, and algorithmic recommendation designs (55, 56), which directly affect user exposure to credible health information and may further reinforce pre-existing biases.

Moreover, variations in machine-learning-based recommendation systems across platforms likely exacerbate unequal information distribution (57), creating potential hindrances to informed clinical communication, treatment decision-making, therapeutic adherence, and adverse event reporting.

4.2. Platform-driven cognitive biases and clinical safety risks associated with immunotherapy

Short-video platforms have rapidly become primary health information sources for the general public and patients, influencing health perceptions and—indirectly—behavior. According to dual-process cognitive models, individuals exposed to vast online information tend to rely either on: (1) heuristic cues (e.g., likes, creator identity), or (2) systematic processing with higher engagement (58, 59).

However, within short-video ecosystems, heuristic-driven judgment dominates: users often rely on popularity signals rather than informational accuracy (58, 60). Individuals with lower health literacy are particularly vulnerable to persuasive but oversimplified narratives (61–63), while repetitive short-form viewing diminishes cognitive readiness (64), promotes reactive attention (64), fragments memory (65), and increases misunderstanding.

Algorithm-driven reinforcement of familiar content further leads to “algorithmic echo chambers,” amplifying pre-existing beliefs while narrowing exposure to diverse viewpoints (66, 67). Such reinforcement has measurable effects on self-management behaviors (68). Critically, most platform algorithms prioritize engagement over quality (69–71), intensifying confirmation bias and shaping risky health decision-making (72).

Our findings align with these theoretical mechanisms: user engagement was almost entirely decoupled from content quality (R2 = 0.004; r = 0.062), demonstrating that recommendation systems systematically elevate popular but potentially inaccurate content. As a result, misconceptions regarding denosumab or PD-1/PD-L1 inhibitors may be algorithmically amplified, creating information inequity (56) and cognitively influencing clinical expectations and decisions (57).

Low-quality videos commonly presented overly simplified mechanisms such as: “Denosumab inhibits bone destruction and protects bone,” without explaining the TRAF6–NF-κB/MAPK–NFATc1 signaling cascade downstream of RANKL-RANK interaction (73, 74). PD-1/PD-L1 pathway explanations lacking mechanistic clarity on ITIM/ITSM-mediated SHP2 recruitment and suppression of TCR/CD28 signaling (12).

Platform disparities may further reinforce such biases: Douyin vs. Bilibili scores (GQS: 3.40 ± 0.85 vs. 2.45 ± 1.03; JAMA: 0.50 ± 0.62 vs. 0.25 ± 0.51; all p < 0.001; Cohen’s d = 0.6–0.8). Even the highest JAMA score (12.5% of full score) remains far below basic medical communication standards (54, 55).

Additionally, most content was produced by patients (52.2%), while only 25.6% originated from medical professionals. When personal anecdotes are algorithmically generalized, subjective or inaccurate beliefs may gain disproportionate influence, increasing the potential for harmful treatment interpretations and safety risks.

Building upon cognitive-behavioral theory, the disparities observed across platform algorithms and content quality may reinforce or exacerbate cognitive biases among short-video users, particularly in relation to mechanisms of denosumab–immunotherapy combinations (75). Repetitive exposure to content emphasizing selective aspects of immunology may foster persistent cognitive errors—such as availability bias and misattribution—which are known contributors to diagnostic inaccuracy (28). This phenomenon is consistent with the algorithmic echo chamber, whereby users’ pre-existing beliefs are continuously reinforced (66, 67), negatively impacting their understanding of treatment complexity (76).

Clinically relevant safety concerns may arise from such distorted perceptions. For instance, insufficient attention to calcium/vitamin D supplementation and serum calcium monitoring could elevate the risk of hypocalcemia during denosumab therapy, which has been reported in 15–30% of patients due to inadequate supplementation, with severe cases resulting in tetany and arrhythmias (77). Similarly, inadequate awareness of the necessity for dental assessment may increase the risk of medication-related osteonecrosis of the jaw (MRONJ). Evidence shows the incidence of MRONJ can reach 1.8–12.6% in patients without adequate oral screening and preventive care (78–81), and is further heightened in those with pre-existing oral infections or recent tooth extractions (82). However, comprehensive evaluation and prophylaxis can reduce incidence to 0.8–4.4% (79, 83). Neglecting pre-treatment dental evaluation therefore jeopardizes accurate assessment of the risk–benefit ratio of combined immunotherapy and can negatively influence clinical prognosis (84).

Moreover, combining denosumab with PD-1/PD-L1 inhibitors increases the likelihood and severity of immune-related adverse events (irAEs), with endocrine irAEs showing a 15–20% increased risk (85). Over-simplified messaging emphasizing “bone protection” may lead to inadequate endocrine monitoring and delayed irAE recognition, further elevating complication risks (86). Misattribution can also occur when patients incorrectly link symptoms such as bone or dental pain solely to treatment toxicity. Such associative misattribution not only interferes with appropriate clinical decision-making but may lead to inflated toxicity signals and substantial heterogeneity in safety reporting in clinical trials (87).

4.3. Strengths and limitations

Although AI-assisted scoring improved evaluation efficiency, limitations persisted in semantic interpretation of complex immunological concepts and contextual reasoning. The inherently individualized nature of immunotherapy—including tumor heterogeneity, immune microenvironment variation, and host immune status—adds challenges to standardized assessment (88). The cross-sectional design prevented assessment of dynamic changes in patient cognition or cumulative information exposure, and real-world impacts on clinical decision-making or treatment outcomes were not directly evaluated, necessitating future longitudinal validation (89).

Furthermore, this study focused solely on Chinese-language platforms, limiting generalizability to diverse cultural contexts where perceptions of immunotherapy may differ (90). While 10% blind quality auditing met ISO recommendations, potential sampling bias cannot be completely excluded. Additionally, the three-tier keyword strategy—despite its systematic design—may have overlooked high-quality videos with more colloquial terminology, potentially under-estimating platform performance.

Despite these limitations, this study represents the first systematic evaluation of denosumab-immunotherapy information quality from an immunology perspective on major Chinese short-video platforms. The findings illuminate critical issues of accuracy deficits, limited professional involvement, and cognitive risk in digital health communication. These results highlight the urgent need to establish professional immunology-based standards and regulatory frameworks for digital health information to safeguard patients’ right to reliable guidance.

From a clinical translation perspective, this study provides actionable implications for immuno-oncology practice: (1) clinicians should proactively contribute to digital science communication to correct misinformation, (2) digital exposure should be systematically assessed during consultations, and (3) healthcare systems should develop authoritative hospital-led patient-education platforms.

Collectively, these strategies may enhance digital health literacy, strengthen shared decision-making, and ultimately improve patient-centered immunotherapy outcomes.

4.4. Future directions

Future studies should adopt prospective cohort designs to track digital health information exposure throughout diagnosis and treatment, evaluating its effects on immunotherapy decision-making, adherence, and clinical outcomes (91). Interventional studies grounded in immunology-related cognitive theory should compare educational strategies—e.g., visualization, case-based learning, interactive Q&A—to enhance patients’ immunological understanding (92), combined with AI-driven personalized education recommendation systems.

Importantly, targeted interventions must correct misconceptions about denosumab and immune checkpoint inhibitors, while establishing a standardized patient-education pathway that ensures appropriate supplementation, toxicity monitoring, and timely reporting of adverse events. Development of unified cognition-assessment frameworks and structured education protocols will be essential to strengthen patient capacity for irAE recognition and management (93).

5. Conclusion

This study provides the first evidence demonstrating a substantial deficiency in the quality of information regarding denosumab combined with immunotherapy for lung cancer bone metastasis on major Chinese short-video platforms. A pronounced disconnect was observed between content quality and user engagement, suggesting that algorithmic recommendation systems may systematically amplify inaccurate or oversimplified immunological information. Such distortions may lead to clinically relevant cognitive risks, including inadequate monitoring for hypocalcemia, insufficient MRONJ prevention, and misinterpretation of irAE-related symptoms, while also introducing bias into adverse-event reporting and clinical research outcomes.

These findings underscore the urgency of establishing standardized regulatory frameworks and professional oversight mechanisms to ensure the accuracy and safety of digital health communication, thereby improving treatment literacy, reducing misinformation-driven harm, and ultimately enhancing patient-centered immunotherapy outcomes.

Funding Statement

The author(s) declared that financial support was not received for this work and/or its publication.

Edited by: Duoyi Zhao, Fourth Affiliated Hospital of China Medical University, China

Reviewed by: Jia Li, The First Affiliated Hospital of Sun Yat-sen University, China

Nadide Koca, University of Health Sciences, Türkiye

Sanjay Kumar, Air Force Central Medical Establishment, India

Abbreviations: AI, Artificial Intelligence; CI, Confidence Interval; DISCERN, DISCERN Instrument for Health Information Quality; Douyin, TikTok China version; GQS, Global Quality Score; JAMA, Journal of the American Medical Association benchmark criteria; NSCLC, Non-Small Cell Lung Cancer; PD-1, Programmed Death-1; PD-L1, Programmed Death-Ligand 1; RED, RED App (Xiaohongshu); α, Significance Level.

Data availability statement

The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

Ethics statement

This study exclusively analyzed publicly available video content on social media platforms and did not involve human participants, clinical datasets, or experimental animals. All extracted data were fully anonymized, and no personally identifiable information was collected, recorded, or utilized during the study process. Ethical approval was therefore not required according to prevailing ethical and data-protection standards (94). In addition, data collection was conducted strictly in accordance with platform accessibility policies and terms of service, ensuring that no violations of data usage regulations occurred.

Author contributions

J-wW: Resources, Formal analysis, Writing – original draft, Visualization, Validation, Data curation, Investigation, Methodology, Conceptualization, Software. J-JX: Project administration, Writing – review & editing, Supervision. F-FZ: Project administration, Supervision, Writing – review & editing.

Conflict of interest

The author(s) declared that this work was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declared that Generative AI was not used in the creation of this manuscript.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpubh.2025.1724546/full#supplementary-material

Data_Sheet_1.docx (677.1KB, docx)

References

  • 1.Tzschoppe T, Ohlinger J, Vordermark D, Bedir A, Medenwald D. Population based study on the progress in survival of primarily metastatic lung cancer patients in Germany. Sci Rep. (2024) 14:16005. doi: 10.1038/s41598-024-66307-3, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Zhu Y, She J, Sun R, Yan XX, Huang X, Wang P, et al. Impact of bone metastasis on prognosis in non-small cell lung cancer patients treated with immune checkpoint inhibitors: a systematic review and meta-analysis. Front Immunol. (2024) 15:1493773. doi: 10.3389/fimmu.2024.1493773, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Chen S, Lei J, Mou H, Zhang W, Jin L, Lu S, et al. Multiple influence of immune cells in the bone metastatic cancer microenvironment on tumors. Front Immunol. (2024) 15:1335366. doi: 10.3389/fimmu.2024.1335366, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Xiang L, Gilkes DM. The contribution of the immune system in Bone metastasis pathogenesis. Int J Mol Sci. (2019) 20:999. doi: 10.3390/ijms20040999, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Xin Z, Qin L, Tang Y, Guo S, Li F, Fang Y, et al. Immune mediated support of metastasis: implication for bone invasion. Cancer Commun. (2024) 44:967–91. doi: 10.1002/cac2.12584, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Cadieux B, Coleman R, Jafarinasabian P, Lipton A, Orlowski RZ, Saad F, et al. Experience with denosumab (XGEVA®) for prevention of skeletal-related events in the 10 years after approval. J Bone Oncol. (2022) 33:100416. doi: 10.1016/j.jbo.2022.100416, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Lipton A, Fizazi K, Stopeck AT, Henry DH, Brown JE, Yardley DA, et al. Superiority of denosumab to zoledronic acid for prevention of skeletal-related events: a combined analysis of 3 pivotal, randomised, phase 3 trials. Europ J Cancer. (2012) 48:3082–92. doi: 10.1016/j.ejca.2012.08.002, [DOI] [PubMed] [Google Scholar]
  • 8.Ahern E, Smyth MJ, Dougall WC, Teng M. Roles of the RANKL-RANK axis in antitumour immunity - implications for therapy. Nat Rev Clin Oncol. (2018) 15:676–93. doi: 10.1038/s41571-018-0095-y, [DOI] [PubMed] [Google Scholar]
  • 9.Gómez-Aleza C, Nguyen B, Yoldi G, Ciscar M, Barranco A, Hernández-Jiménez E, et al. Inhibition of RANK signaling in breast cancer induces an anti-tumor immune response orchestrated by CD8+ T cells. Nat Commun. (2020) 11:6335. doi: 10.1038/s41467-020-20138-8, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.van Dam PA, Verhoeven Y, Jacobs J, Wouters A, Tjalma W, Lardon F, et al. RANK-RANKL signaling in Cancer of the uterine cervix: a review. Int J Mol Sci. (2019) 20:183. doi: 10.3390/ijms20092183, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Fu J, Yan YD, Wan X, Sun XF, Ma XM, Su YJ. A network comparison on efficacy and safety profiling of PD-1/PD-L1 inhibitors in first-line treatment of advanced non-small cell lung cancer. Front Pharmacol. (2024) 15:1516735. doi: 10.3389/fphar.2024.1516735 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Liu J, Chen Z, Li Y, Zhao W, Wu J, Zhang Z. PD-1/PD-L1 checkpoint inhibitors in tumor immunotherapy. Front Pharmacol. (2021) 12:731798. doi: 10.3389/fphar.2021.731798, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Asano Y, Yamamoto N, Demura S, Hayashi K, Takeuchi A, Kato S, et al. Combination therapy with immune checkpoint inhibitors and denosumab improves clinical outcomes in non-small cell lung cancer with bone metastases. Lung Cancer. (2024) 193:107858. doi: 10.1016/j.lungcan.2024.107858, [DOI] [PubMed] [Google Scholar]
  • 14.Li HS, Lei SY, Li JL, Xing PY, Hao XZ, Xu F, et al. Efficacy and safety of concomitant immunotherapy and denosumab in patients with advanced non-small cell lung cancer carrying bone metastases: a retrospective chart review. Front Immunol. (2022) 13:908436. doi: 10.3389/fimmu.2022.908436, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Jain M, Sharma PK, Kamboj K, Shyam A. The impact of social media on medical education and health-care communication. J Orthop Case Rep. (2024) 14:1–3. doi: 10.13107/jocr.2024.v14.i09.4706, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Zhang X, Cai Y, Zhao M, Zhou Y. Generation mechanism of “information cocoons” of network users: an evolutionary game approach. Systems. (2023) 11:414. doi: 10.3390/systems11080414 [DOI] [Google Scholar]
  • 17.Liu J, Ye Q, Wu H, Ma R, Guo S, Long H. How do health content creators perform well? An integration research of short video and livestream behaviors. Front Public Health. (2024) 12:1446247. doi: 10.3389/fpubh.2024.1446247, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.McBriar JD, Mishra A, Shah HA, Boockvar JA, Langer DJ, D'Amico RS. #neurosurgery: a cross-sectional analysis of neurosurgical content on TikTok. World Neurosurg. (2023) 17:100137. doi: 10.1016/j.wnsx.2022.100137, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Ni CX, Fei YB, Wu R, Cao WX, Liu W, Huang F, et al. Tumor immunotherapy-related information on internet-based videos commonly used by the Chinese population: content quality analysis. JMIR Format Res. (2024) 8:e50561. doi: 10.2196/50561, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.McComb CA, Vanman EJ, Tobin SJ. A meta-analysis of the effects of social media exposure to upward comparison targets on self-evaluations and emotions. Media Psychol. (2023) 26:612–35. doi: 10.1080/15213269.2023.2180647 [DOI] [Google Scholar]
  • 21.Morgan M, Shanahan J, Signorielli N. Yesterday's new cultivation, tomorrow. Mass Commun Soc. (2015) 18:674–99. doi: 10.1080/15205436.2015.1072725 [DOI] [Google Scholar]
  • 22.Chung JE. Medical dramas and viewer perception of health: testing cultivation effects. Hum Commun Res. (2014) 40:333–49. doi: 10.1111/hcre.12026 [DOI] [Google Scholar]
  • 23.Götz-Hahn F, Hosu V, Lin H, Saupe D. 2021 KonVid-150k: a dataset for no-reference video quality assessment of videos in-the-wild. IEEE Access. (2021) 9:72139–60. doi: 10.1109/ACCESS.2021.3077642 [DOI] [Google Scholar]
  • 24.Naik S, Al-Kheraif AA, Vellappally S. Artificial intelligence in dentistry: assessing the informational quality of YouTube videos. PLoS One. (2025) 20:e0316635. doi: 10.1371/journal.pone.0316635, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Tu Z, Yu X, Wang Y, Birkbeck N, Adsumilli B, Bovik AC. 2021 RAPIQUE: rapid and accurate video quality prediction of user generated content. IEEE Open J Signal Process. (2021) 2:425–40. doi: 10.1109/OJSP.2021.3090333 [DOI] [Google Scholar]
  • 26.Dailah HG, Hommdi AA, Koriri MD, Algathlan EM, Mohan S. Potential role of immunotherapy and targeted therapy in the treatment of cancer: a contemporary nursing practice. Heliyon. (2024) 10:e24559. doi: 10.1016/j.heliyon.2024.e24559, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Hyatt A, Morkunas B, Davey D, Thai AA, Trewhella M, Duffy M, et al. Co-design and development of online video resources about immunotherapy with patients and their family. Patient Educ Couns. (2021) 104:290–7. doi: 10.1016/j.pec.2020.09.014, [DOI] [PubMed] [Google Scholar]
  • 28.Saposnik G, Redelmeier D, Ruff CC, Tobler PN. Cognitive biases associated with medical decisions: a systematic review. BMC Med Inform Decis Mak. (2016) 16:138. doi: 10.1186/s12911-016-0377-1, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Tang Y, Wang Z. The predictors of attracting large audience in China: the study of social media platforms. Prof Inform. (2024) 33:217. doi: 10.3145/epi.2024.0217 [DOI] [Google Scholar]
  • 30.Vandenbroucke JP, von Elm E, Altman DG, Gøtzsche PC, Mulrow CD, Pocock SJ, et al. Strengthening the reporting of observational studies in epidemiology (STROBE): explanation and elaboration. PLoS Med. (2007) 4:e297. doi: 10.1371/journal.pmed.0040297, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.von Elm E, Altman DG, Egger M, Pocock SJ, Gøtzsche PC, Vandenbroucke JP. The strengthening the reporting of observational studies in epidemiology (STROBE) statement: guidelines for reporting observational studies. Lancet. (2007) 370:1453–7. doi: 10.1016/S0140-6736(07)61602-X [DOI] [PubMed] [Google Scholar]
  • 32.Guan JL, Xia SH, Zhao K, Feng LN, Han YY, Li JY, et al. Videos in short-video sharing platforms as sources of information on colorectal polyps: Cross-sectional content analysis study. J Med Internet Res. (2024) 26:e51655. doi: 10.2196/51655, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Zhang J, Yuan J, Zhang D, Yang Y, Wang C, Dou Z, et al. Short video platforms as sources of health information about cervical cancer: a content and quality analysis. PLoS One. (2024) 19:e0300180. doi: 10.1371/journal.pone.0300180, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Kong W, Song S, Zhao YC, Zhu Q, Sha L. TikTok as a health information source: assessment of the quality of information in diabetes-related videos. J Med Internet Res. (2021) 23:e30409. doi: 10.2196/30409, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Afful-Dadzie E, Afful-Dadzie A, Egala SB. Social media in health communication: a literature review of information quality. Health Inf Manag J. (2021) 52:3–17. doi: 10.1177/1833358321992683 [DOI] [PubMed] [Google Scholar]
  • 36.Bernard A, Langille M, Hughes S, Rose C, Leddin D, Veldhuyzen van Zanten S. A systematic review of patient inflammatory bowel disease information resources on the world wide web. Am J Gastroenterol. (2007) 102:2070–7. doi: 10.1111/j.1572-0241.2007.01325.x, [DOI] [PubMed] [Google Scholar]
  • 37.Kington RS, Arnesen S, Chou WS, Curry SJ, Lazer D, Villarruel AM. Identifying credible sources of health information in social media: principles and attributes. NAM Perspect. (2021) 2021:107. doi: 10.31478/202107a [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Egala S, Liang D, Boateng D. Social media health-related information credibility and reliability: an integrated user perceived quality assessment. IEEE Trans Eng Manag. (2024) 71:5018–29. doi: 10.1109/TEM.2022.3225182 [DOI] [Google Scholar]
  • 39.OECD/European Union/EC-JRC (2008) Handbook on Constructing Composite Indicators: Methodology and User Guide. Paris: OECD Publishing. doi: 10.1787/9789264043466-en [DOI] [Google Scholar]
  • 40.Hu Y, Yang Y, Li W, Zhou Y, Sun J. Developing an evaluation system for quality of health educational short videos on social media (LassVQ) using nominal group technique and analytic hierarchy process: qualitative study. J Med Internet Res. (2025) 27:e72661. doi: 10.2196/72661, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Zheng S, Tong X, Wan D, Hu C, Hu Q, Ke Q. Quality and reliability of liver Cancer-related short Chinese videos on TikTok and Bilibili: Cross-sectional content analysis study. J Med Internet Res. (2023) 25:e47210. doi: 10.2196/47210, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Silberg W, Lundberg G, Musacchio R. Assessing, controlling, and assuring the quality of medical information on the internet: caveant lector et viewor--let the reader and viewer beware. JAMA. (1997) 277:1244–5. [PubMed] [Google Scholar]
  • 43.Charnock D, Shepperd S, Needham G, Gann R. DISCERN: an instrument for judging the quality of written consumer health information on treatment choices. J Epidemiol Community Health. (1999) 53:105–11. doi: 10.1136/jech.53.2.105, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Willis E, Friedel K, Heisten M, Pickett M, Bhowmick A. Communicating health literacy on prescription medications on social media: in-depth interviews with "patient influencers". J Med Internet Res. (2023) 25:e41867. doi: 10.2196/41867, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Cui N, Lu Y, Cao Y, Chen X, Fu S, Su Q. Quality assessment of TikTok as a source of information about mitral valve regurgitation in China: Cross-sectional study. J Med Internet Res. (2024) 26:e55403. doi: 10.2196/55403, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Permenov BA, Zimba O, Yessirkepov M, Qumar AB, Suigenbayev D, Kocyigit BF. Evaluating the quality and reliability of YouTube as a source of information on extracorporeal membrane oxygenation: a call to publish more quality videos by professionals. J Korean Med Sci. (2025) 40:e34. doi: 10.3346/jkms.2025.40.e34, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.ISO (2015) ISO 9001:2015 Quality management systems—Requirements. Geneva: International Organization for Standardization. (2015). [Google Scholar]
  • 48.Ghalavand H, Nabiolahi A. Exploring online health information quality criteria on social media: a mixed method approach. BMC Health Serv Res. (2024) 24:1311. doi: 10.1186/s12913-024-11838-8, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Guo F, Ding G, Zhang Y, Liu X. Quality assessment of radiotherapy health information on short-form video platforms of TikTok and Bilibili: Cross-sectional study. JMIR Cancer. (2025) 11:e73455. doi: 10.2196/73455, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Tian Y, Liu S, Zhang D. Clearness qualitative comparative analysis of the spread of TikTok health science knowledge popularization accounts. Digital Health. (2023) 9:20552076231219116. doi: 10.1177/20552076231219116, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Zha X, Yang H, Yan Y, Liu K, Huang C. Exploring the effect of social media information quality, source credibility and reputation on informational fit-to-task: moderating role of focused immersion. Comput Hum Behav. (2018) 79:227–37. doi: 10.1016/j.chb.2017.10.038 [DOI] [Google Scholar]
  • 52.Cohen J. (1969) Statistical Power Analysis for the Behavioral Sciences. New York, NY: Academic Press. doi: 10.2307/2529115 [DOI] [Google Scholar]
  • 53.Lakens D. Calculating and reporting effect sizes to facilitate cumulative science: a practical primer for t-tests and ANOVAs. Front Psychol. (2013) 4:863. doi: 10.3389/fpsyg.2013.00863, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Siani M, Dubovi I, Borushko A, Haskel-Ittah M. Teaching immunology in the 21st century: a scoping review of emerging challenges and strategies. Int J Sci Educ. (2024) 46:1826–47. doi: 10.1080/09500693.2023.2300380 [DOI] [Google Scholar]
  • 55.Xiao L, Min H, Wu Y, Zhang J, Ning Y, Long L, et al. Public's preferences for health science popularization short videos in China: a discrete choice experiment. Front Public Health. (2023) 11:1160629. doi: 10.3389/fpubh.2023.1160629, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Cross JL, Choma MA, Onofrey JA. Bias in medical AI: implications for clinical decision-making. PLOS Digital Health. (2024) 3:e0000651. doi: 10.1371/journal.pdig.0000651, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Gianfrancesco MA, Tamang S, Yazdany J, Schmajuk G. Potential biases in machine learning algorithms using electronic health record data. JAMA Intern Med. (2018) 178:1544–7. doi: 10.1001/jamainternmed.2018.3763, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Gao Y, Gong L, Liu H, Kong Y, Wu X, Guo Y, et al. Research on the influencing factors of users' information processing in online health communities based on heuristic-systematic model. Front Psychol. (2022) 13:966033. doi: 10.3389/fpsyg.2022.966033, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Katz SJ, Erkkinen M, Lindgren B, Hatsukami D. Assessing the impact of conflicting health warning information on intentions to use E-cigarettes -an application of the heuristic-systematic model. J Health Commun. (2018) 23:874–85. doi: 10.1080/10810730.2018.1533052, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Son J, Lee J, Oh O, Lee HK, Woo J. Using a heuristic-systematic model to assess the twitter user profile’s impact on disaster tweet credibility. Int J Inf Manag. (2020) 54:102176. doi: 10.1016/j.ijinfomgt.2020.102176 [DOI] [Google Scholar]
  • 61.Jia X, Pang Y, Liu LS. Online health information seeking behavior: a systematic review. Healthcare. (2021) 9:740. doi: 10.3390/healthcare9121740, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Liu D, Yang S, Cheng CY, Cai L, Su J. Online health information seeking, eHealth literacy, and health behaviors among Chinese internet users: Cross-sectional survey study. J Med Internet Res. (2024) 26:e54135. doi: 10.2196/54135, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Nutakor JA, Zhou L, Larnyo E, Addai-Dansoh S, Cui Y. Impact of health information seeking behavior and digital health literacy on self-perceived health and depression symptoms among older adults in the United States. Sci Rep. (2024) 14:31080. doi: 10.1038/s41598-024-82187-z, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Luo W, Zhao X, Jiang B, Fu Q, Zheng J. Swiping disrupts switching: preliminary evidence for reduced Cue-based preparation following short-form video exposure. Behav Sci. (2025) 15:70. doi: 10.3390/bs15081070, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Li H, Li J, Hao X, et al. Behavioral and eye-tracking evidence for disrupted event segmentation during continuous memory encoding due to short video watching. Biorxiv, 2024. [Preprint]. doi: 10.1101/2024.08.17.608429. [DOI]
  • 66.Metzler H, Garcia D. Social drivers and algorithmic mechanisms on digital media. Perspect Psychol Sci. (2024) 19:735–48. doi: 10.1177/17456916231185057, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Ahmmad M, Shahzad K, Iqbal A, Latif M. Trap of social media algorithms: a systematic review of research on filter bubbles, echo chambers, and their impact on youth. Societies. (2025) 15:301. doi: 10.3390/soc15110301 [DOI] [Google Scholar]
  • 68.Wang H, Li X. The influence of short videos on user cognition in visual communication. Modern Econ Manag Forum. (2023) 4:1488. doi: 10.32629/memf.v4i5.1488 [DOI] [Google Scholar]
  • 69.Hartmann D, Wang SM, Pohlmann L, Berendt B. A systematic review of echo chamber research: comparative analysis of conceptualizations, operationalizations, and varying outcomes. J Comput Soc Sci. (2025) 8:52. doi: 10.1007/s42001-025-00381-z [DOI] [Google Scholar]
  • 70.Xuan W, Tian K, Hao L. Quality assessment of short videos on health science popularization in China: scale development and validation. Front Public Health. (2025) 13:1640105. doi: 10.3389/fpubh.2025.1640105, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Xie Z, Li W, Xie Y, Wang L. Demand and satisfaction analysis of short health videos among Chinese urban youth: a mixed-methods study based on the KANO model. Humanit Soc Sci Commun. (2024) 11:740. doi: 10.1057/s41599-024-03266-0 [DOI] [Google Scholar]
  • 72.Savioni L, Triberti S. Cognitive biases in chronic illness and their impact on patients' commitment. Front Psychol. (2020) 11:579455. doi: 10.3389/fpsyg.2020.579455, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73.Okamoto K, Nakashima T, Shinohara M, Negishi-Koga T, Komatsu N, Terashima A, et al. Osteoimmunology: the conceptual framework unifying the immune and skeletal systems. Physiol Rev. (2017) 97:1295–349. doi: 10.1152/physrev.00036.2016 [DOI] [PubMed] [Google Scholar]
  • 74.Ono T, Hayashi M, Sasaki F, Nakashima T. RANKL biology: bone metabolism, the immune system, and beyond. Inflammat Regener. (2020) 40:2. doi: 10.1186/s41232-019-0111-3, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75.Yap TA, Parkes EE, Peng W, Moyers JT, Curran MA, Tawbi HA. Development of immunotherapy combination strategies in Cancer. Cancer Discov. (2021) 11:1368–97. doi: 10.1158/2159-8290.CD-20-1209, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76.Whelehan DF, Conlon KC, Ridgway PF. Medicine and heuristics: cognitive biases and medical decision-making. Ir J Med Sci. (2020) 189:1477–84. doi: 10.1007/s11845-020-02235-1, [DOI] [PubMed] [Google Scholar]
  • 77.Body JJ, Bone HG, de Boer RH, Stopeck A, van Poznak C, Damião R, et al. Hypocalcaemia in patients with metastatic bone disease treated with denosumab. Europ J Cancer. (2015) 51:1812–21. doi: 10.1016/j.ejca.2015.05.016, [DOI] [PubMed] [Google Scholar]
  • 78.Boquete-Castro A, Gómez-Moreno G, Calvo-Guirado JL, Aguilar-Salvatierra A, Delgado-Ruiz RA. Denosumab and osteonecrosis of the jaw. A systematic analysis of events reported in clinical trials. Clin Oral Implants Res. (2016) 27:367–75. doi: 10.1111/clr.12556, [DOI] [PubMed] [Google Scholar]
  • 79.Ikesue H, Mouri M, Tomita H, Hirabatake M, Ikemura M, Muroi N, et al. Associated characteristics and treatment outcomes of medication-related osteonecrosis of the jaw in patients receiving denosumab or zoledronic acid for bone metastases. Support Care Cancer. (2021) 29:4763–72. doi: 10.1007/s00520-021-06018-x, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 80.Otto S, Pautke C, Van den Wyngaert T, Niepel D, Schiødt M. Medication-related osteonecrosis of the jaw: prevention, diagnosis and management in patients with cancer and bone metastases. Cancer Treat Rev. (2018) 69:177–87. doi: 10.1016/j.ctrv.2018.06.007, [DOI] [PubMed] [Google Scholar]
  • 81.Ruggiero SL, Dodson TB, Aghaloo T, Carlson ER, Ward BB, Kademani D. American Association of Oral and Maxillofacial Surgeons' position paper on medication-related osteonecrosis of the Jaws-2022 update. J Oral Maxillofacial Surg. (2022) 80:920–43. doi: 10.1016/j.joms.2022.02.008, [DOI] [PubMed] [Google Scholar]
  • 82.Okuma S, Matsuda Y, Nariai Y, Karino M, Suzuki R, Kanno T. A retrospective observational study of risk factors for denosumab-related osteonecrosis of the jaw in patients with bone metastases from solid cancers. Cancer. (2020) 12:209. doi: 10.3390/cancers12051209, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 83.Migliorati CA. Oral complications in Cancer patients-medication-related osteonecrosis of the jaw (MRONJ). Front Oral Health. (2022) 3:866871. doi: 10.3389/froh.2022.866871, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 84.O'Sullivan ED, Schofield SJ. Cognitive bias in clinical medicine. J R Coll Physicians Edinb. (2018) 48:225–32. doi: 10.4997/JRCPE.2018.306, [DOI] [PubMed] [Google Scholar]
  • 85.de Filette J, Andreescu CE, Cools F, Bravenboer B, Velkeniers B. A systematic review and Meta-analysis of endocrine-related adverse events associated with immune checkpoint inhibitors. Hormone Metab Res. (2019) 51:145–56. doi: 10.1055/a-0843-3366, [DOI] [PubMed] [Google Scholar]
  • 86.Blumenthal-Barby JS, Krieger H. Cognitive biases and heuristics in medical decision making: a critical review using a systematic search strategy. Med Decis Mak. (2015) 35:539–57. doi: 10.1177/0272989X14547740, [DOI] [PubMed] [Google Scholar]
  • 87.Coleman RE, Collinson M, Gregory W, Marshall H, Bell R, Dodwell D, et al. Benefits and risks of adjuvant treatment with zoledronic acid in stage II/III breast cancer. 10 years follow-up of the AZURE randomized clinical trial (BIG 01/04). J Bone Oncol. (2018) 13:123–35. doi: 10.1016/j.jbo.2018.09.008, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 88.Mohsen F, Ali H, El Hajj N, Shah Z. Artificial intelligence-based methods for fusion of electronic health records and imaging data. Sci Rep. (2022) 12:17981. doi: 10.1038/s41598-022-22514-4, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 89.Kovoor JG, McIntyre D, Chik W, Chow CK, Thiagalingam A. Clinician-created educational video resources for shared decision-making in the outpatient management of chronic disease: development and evaluation study. J Med Internet Res. (2021) 23:e26732. doi: 10.2196/26732, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 90.Peng L, Wu YL. Immunotherapy in the Asiatic population: any differences from Caucasian population. J Thorac Dis. (2018) 10:S1482–93. doi: 10.21037/jtd.2018.05.106, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 91.Potter LN, Yap J, Dempsey W, Wetter DW, Nahum-Shani I. Integrating intensive longitudinal data (ILD) to inform the development of dynamic theories of behavior change and intervention design: a case study of scientific and practical considerations. Prevent Sci. (2023) 24:1659–71. doi: 10.1007/s11121-023-01495-4, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 92.Stranford SA, Owen JA, Mercer F, Pollock RR. Active learning and technology approaches for teaching immunology to undergraduate students. Front Public Health. (2020) 8:114. doi: 10.3389/fpubh.2020.00114, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 93.Hearne S, McDonnell M, Lavan AH, Davies A. Immune checkpoint inhibitors and cognition in adults with cancer: a scoping review. Cancer. (2025) 17:928. doi: 10.3390/cancers17060928, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 94.Eysenbach G, Till JE. Ethical issues in qualitative research on internet communities. BMJ. (2001) 323:1103–5. doi: 10.1136/bmj.323.7321.1103, [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Data_Sheet_1.docx (677.1KB, docx)

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.


Articles from Frontiers in Public Health are provided here courtesy of Frontiers Media SA

RESOURCES