Skip to main content
. 2017 Mar 8;7:43862. doi: 10.1038/srep43862

Figure 1. This figure illustrates the methodology to extract a sequence of types from an acoustic signal.

Figure 1

Waveform series A(t) are sampled at 16 KhZ from the system. In (a) we consider an excerpt from A(t) that coincides with an individual articulating the words “the house”. In the middle panel we plot the instantaneous energy per unit time ε(t) = |A2|(t) from an excerpt of the top panel. The energy threshold Θ, defined as the instantaneous energy level for which a fixed percentage of the entire data remains above-threshold, helps us to unambiguously define a token or voice event (a subsequence of time stamps for which ε(t) > Θ) from silence events of duration τ47. The energy released Ei in voice event i (where i ∈ ℕ) is computed from the integration of the instantaneous energy over the duration of that event Ti (dark area in the figure denotes the energy released in a given voice event). Subsequently, by performing a linear binning tokens are classified into bins that we call types (in the plot, EA, EB,… are different bins). The vocabulary V agglutinates those types that appear at least once.