Skip to main content
. 2024 Apr 1;19(4):e0301098. doi: 10.1371/journal.pone.0301098

Fig 3. Examples of MMV audiovisual similarity estimates.

Fig 3

Visual frames from a video (above) with 4 possible audio streams below. Each audio stream has a corresponding audiovisual (AV) similarity score, estimated by MMV, when combined with the visual stream. The original audio stream leads to the highest AV similarity, which is decreased by introducing temporal asynchrony (shifting audio 1.6 seconds). Increasing the semantic distance between the audio and visual stream further decreases the AV similarity (from ‘train’ to ‘vehicle’ to ‘instrument’).