Figure 6.
Early Fusion method pipeline. Given a query video, we extract and pre-process its visual and audio content. Then, we feed these data to one multi-input CNN, composed of two CNNs whose last fully-connected layers are concatenated. Three additional fully-connected layers follow to identify the actual source camera model.