Skip to main content
. Author manuscript; available in PMC: 2022 Apr 8.
Published in final edited form as: IEEE Trans Affect Comput. 2018 Sep 3;12(1):215–226. doi: 10.1109/taffc.2018.2868196

Fig. 2:

Fig. 2:

Audio is analyzed to detemine the exact time point the practitioner said the child’s name during a name-call. The power spectrum density (psd) of the recorded audio signal (2a) contains audio from the movie stimuli (predominantly music) and instances of vocalizations. Root mean squared (RMS) values of the audio signal (2b) provide quantification of audio signals at each time point, and are used to detect a name-call prompt. Knowing that practitioner was asked to prompt a name-call at 15 seconds into the stimuli, in this example we are able to focus on speech around the time point (green box) and detect the exact time point when maximum speech occurred.