Skip to main content
. 2026 Feb 19;14:1781216. doi: 10.3389/fpubh.2026.1781216

Table 2.

Discriminative performance of multimodal feature groups, including representative descriptors, Δp, prevalence ratio (PR), and their interpretability and clinical relevance.

Feature group Representative features Δp and PR Interpretability and clinical relevance
Cross modal coherence LSE D, LSE C, phoneme viseme mismatch, audiovisual delay, face-voice identity coherence, scene audio semantic agreement Δp = 0.21–0.22, PR = 2.5–2.7; identity Δp = 0.19, PR = 2.5; scene audio Δp = 0.18, PR = 2.6 Strongest discriminators. Reflect physiological synchrony of speech and orofacial motion; essential for assessing articulation and mandibular movement in teleconsultations.
Acoustic features F0 variability, jitter, shimmer, RT60, DRR, MFCC deviation, formant drift F0 variability −22 percent; jitter −31 percent; shimmer −27 percent; RT60: 0.12 s vs. 0.28 s (Δp = 0.16, PR = 2.3) Capture natural microvariability of human speech and room acoustics; relevant for judging clarity of speech and verifying plausibility of clinical environments.
Visual dynamic features Facial micro motion, mesh topology stability Micro motion Δp = 0.19 Reflect real biomechanical dynamics of perioral and ocular regions; useful for evaluating extraoral symmetry.
Visual geometric and photometric features Radial distortion, rolling shutter deviation, vignetting Radial distortion Δp = 0.17, PR = 2.1; rolling shutter Δp = 0.16, PR = 2.0 Derived from physical camera models; reliable in low bitrate conditions; valuable for forensic transparency.
Bitstream and texture descriptors DCT histogram asymmetry, QP entropy, GOP irregularity Moderate discriminative power; PR ≈ 1.5 Identify generator-specific artifacts and compression traces not consistent with real camera pipelines.

Cross modal coherence features demonstrate the strongest separation between authentic and synthetic recordings, while acoustic and visual features provide complementary indicators of natural speech dynamics, biomechanical facial motion, optical consistency, and environmental plausibility relevant to dental teleconsultations.