Skip to main content
. 2021 Nov 12;7(46):eabi4883. doi: 10.1126/sciadv.abi4883

Fig. 1. Fully unified pipeline for wild chimpanzee behavior recognition and analysis from raw video footage.

Fig. 1.

The pipeline consists of the following stages: (A) Frames and audio are extracted from raw video. (B) Body detection is performed over the video frames using a deep CNN single-shot detector (SSD) model, and the detections are tracked using a Siamese tracker. (C) The body tracks are classified (e.g., is this individual cracking nuts?) using the audio data and spatiotemporal visual information for the track by a deep CNN audiovisual behavior model. The system only requires the raw video as input and produces labeled body tracks and metadata as temporal and spatial information. This automated system can be used to perform large-scale analysis (D) of behavior. Photo credit: Kyoto University, Primate Research Institute.