CLEESE: An open-source audio-transformation toolbox for data-driven experiments in speech and music cognition"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This Python notebook contains code used in the paper : Burred, J.J., Ponsot, E., Goupil, L., Liuni, M. & Aucouturier, JJ. (2019) CLEESE: An open-source audio-transformation toolbox for data-driven experiments in speech and music cognition , to illustrate the analysis of reverse-correlation data from the paper's case-study #1 and #2. \n",
"Uses the CLEESE toolbox, available from http://forumnet.ircam.fr/product/cleese. \n",
"Author: JJ Aucouturier (CNRS/IRCAM, 2018)\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"%matplotlib inline\n",
"import matplotlib.pyplot as plt\n",
"import glob, os\n",
"import numpy as np\n",
"from numpy.polynomial import polynomial as P\n",
"import re, csv\n",
"import pandas as pd\n",
"from math import sqrt\n",
"import seaborn as sns\n",
"from numpy.polynomial import polynomial as P\n",
"from scipy import stats\n",
"from scipy.stats import linregress\n",
"sns.set_style(\"whitegrid\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"
Case-study 1 : Speech intonation
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We give here a proof of concept of how to use CLEESE in a reverse-correlation experiment to uncover what exact pitch contour drives participants' categorization of an utterance as interrogative or declarative."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" Stimuli: : One male speaker recorded a 426ms utterance of the French word 'vraiment' ('really'), which can be experienced either as a one-word statement or question. We used CLEESE to artificially manipulate the pitch contour of the recording. First, the original pitch contour (mean pitch = 105Hz) was artificially flattened to constant pitch. Then, we added/subtracted a constant pitch gain ($\\pm$ 20 cents, equating to $\\pm$ 1 fifth of a semitone) to create the 'high-' or 'low-pitch' versions presented in each trial. Finally, we added Gaussian 'pitch noise' (i.e. pitch-shifting) to the contour by sampling pitch values at 6 successive time-points, using a normal distribution (SD = 70 cents; clipped at +/- 2.2 SD), linearly interpolated between time-points.\n",
"\n",
" Procedure : 700 pairs of randomly-manipulated voices were presented to each of N=5 observers (male: 3, M=22.5yo), all native French speakers with self-reported normal hearing. Participants listened to a pair of two randomly-modulated voices and were asked which of the two versions was most interrogative. Inter-stimulus interval in each trial was 500 ms, and inter-trial interval was 1s. \n",
"\n",
" Analysis : We compute a first-order temporal kernel (i.e., a 7-points vector) for each participant, as the mean pitch contour of the voices classified as interrogative minus the mean pitch contour of the voices classified as non-interrogative. Kernels are then normalized by dividing them by the absolute sum of their values and then averaged over all participants for visualization. We use a one-way repeated-measures ANOVA on the temporal kernels to test for an effect of segment on pitch shift, and posthocs computed using Bonferroni-corrected Tukey tests."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"