|
Algorithm 1 Prompt -level multimodal deception detection (zero-shot GPT-5). |
-
Require:
Dataset of videos with ground-truth labels
-
Require:
Ablation flags: UseVideo, UseTranscript, UseEmotion
-
1:
Initialize metrics containers
-
2:
for each sample
do
-
3:
-
4:
if
UseVideo
then
-
5:
Extract 16 uniformly spaced frames
-
6:
end if
-
7:
if
UseTranscript
then
-
8:
Extract audio; obtain ASR transcript T (Whisper-1)
-
9:
end if
-
10:
if UseEmotion
then
-
11:
Extract audio; compute emotion label e and confidence (SpeechBrain wav2vec2)
-
12:
“Detected emotion: e ()”
-
13:
end if
-
14:
Build user prompt: include E (if any), T (if any), instruction to return strictly JSON
-
15:
Attach frames I (if any) as images to the same message
-
16:
System message: safety + research framing
-
17:
Query GPT-5 with deterministic decoding
-
18:
Parse first valid JSON object: {label, confidence, reasoning}
-
19:
label ▹ final class: lie or truth
-
20:
confidence ▹ used only for analysis/threshold sweeps
-
21:
Store and compare with ground truth y
-
22:
end for
-
23:
Compute Accuracy, Precision/Recall/F1 per class, Macro-F1, MCC, Cohen’s
|