Abstract
Background:
The acoustic startle response (ASR) is a simple reflex that results in a whole body motor response after animals hear a brief loud sound and is used a multisensory tool across many disciplines. Unfortunately, a method of how to record, process and analyze ASRs has yet to be standardized, leading to high variability in the collection, analysis, and interpretation of ASRs within and between laboratories.
New Method:
ASR waveforms collected from young adult CBA/CaJ mice were normalized with features extracted from the waveform, the resulting power spectral density estimates, and the continuous wavelet transforms. The features were then partitioned into training and test/validation set. Machine learning methods from different families of algorithms were used to combine startle-related features into robust predictive models to predict whether an ASR waveform is a startle or not a startle.
Results:
An ensemble of several machine learning models resulted in an extremely robust model to predict whether an ASR waveform is a startle or non-startle with a mean ROC of 0.9779, training accuracy of 0.9993, and testing accuracy of 0.9301.
Comparison with Existing Methods:
ASR waveforms analyzed using the threshold and RMS techniques resulted in over 80% of accepted startles actually being non-startles when manually classified versus 2.2% for the machine learning method, resulting in statistically significant differences in ASR metrics (such as startle amplitude and pre-pulse inhibition) between classification methods.
Conclusions:
The machine learning approach presented in this paper can be adapted to nearly any ASR paradigm to accurately process, sort, and classify startle responses.
Keywords: acoustic startle response, pre-pulse inhibition, machine learning, random forest, ensemble models
1. Introduction
The acoustic startle reflex (ASR) and modification using pre-pulse stimuli has consistently been one of the most used diagnostic tools for assessing laboratory animals’ internal state over the past several decades (Davis, 1984; Koch, 1999). Modification of this simple reflex resulting from a brief loud sound has been used in many neuropsychological disciplines for evaluating hearing (Lauer et al., 2017), tinnitus (Galazyuk and Hébert, 2015; Gerum et al., 2019; Turner et al., 2006), many neuropsychiatric disorders such as schizophrenia, bi-polar disorder, autism, and many other disorders that disrupt sensory-motor gating (Braff et al., 2001; Kohl et al., 2013, 2014). Unfortunately, a standardized method of how to record, measure, process, and analyze startle reflex waveforms has yet to be standardized, which leads to high variability within and between laboratories using this assessment tool. Reasons for this variability within a lab are numerous but include animal awareness, habituation/sensitization rates, neural plasticity, anxiety/stress levels, and neuro-muscular interactions. Even more explanations for variability exists between laboratories and include: species/strain variations, loudspeaker/wav file sound quality, recording platform type, platform sensitivity, and mode of assessment which varies between whole body startle in mice/rats (Horlington, 1968; Grimsley et al., 2015), to the Preyer reflex (small ear movements) in guinea pigs (Berger et al., 2013; Böhmer, 1988), and eye blink reflex in humans and primate research (Säring and von Cramon, 1981; Filion et al., 1993; Grillon et al., 1997; Winslow et al., 2002). These factors result in variable ASR waveform measurements, which if processed/analyzed with the same methodology, could result in verifiable comparisons between experiments and laboratories using completely different techniques. This is due to likely differences in the characteristics of the startle waveforms which are included in the analysis.
Standardization of ASR waveform classification is extremely important for many reasons (Lauer et al., 2017). Previous work has included the elimination of the highest and lowest startle responses for each frequency (Longenecker and Galazyuk, 2011), using Grubb’s test for outliers (Longenecker and Galazyuk, 2012), elimination of startle responses with maximum magnitude after startle presentation less than that before startle presentation, elimination of startle responses whose RMS after startle presentation is less than that before startle presentation, template matching (Grimsley et al., 2015), and discarding invalid trials containing movement in excess of a threshold prior to stimulus presentation (Schilling et al., 2017) as effective procedures for cleaning startle data by removing “non-startles,” which occur frequently in animal or humans continually presented with loud sounds. Non-startles occur more often when animals are presented low intensity startle stimuli as well as when an intense pre-pulse is presented prior to the startle stimulus (Longenecker et al., 2016). Since pre-pulse inhibition using stimuli placed before the startle elicitor is one of the most critical aspects of the startle reflex studied, proper classification of startles when pre-pulses are presented can dramatically influence the results. However, because each laboratory might utilize different hardware (sensors, filters, etc.) and/or waveform filter configurations which records animals startle-related movements, it makes it problematic to suggest a standardized template. Thus, an alternative approach should be used to classify startle response waveforms.
Machine learning is an evolving field of computational algorithms which learn from data in order to improve their performance on a particular classification or prediction task (Mjolsness and DeCoste, 2001; El Naqa and Murphy, 2015; Kotsiantis et al., 2006). Machine learning has been successfully used in genomics (Libbrecht and Noble, 2015), medical imaging and pathology (Komura and Ishikawa, 2019; Shen et al., 2017), as well as in the diagnosis and treatment of cancer (Goldenberg et al., 2019; Bejnordi et al., 2017). Machine learning could even be used in clinical settings to aid the navigation of the complex health trajectory of an individual patient through machine learning performed on data from many patients (Rajkomar et al., 2019). In this paper, we describe an automated, supervised machine learning approach to classify ASR waveforms acquired using various stimulus protocols and levels, eliciting ASRs of various magnitudes and shapes.
2. Experimental procedure
Mice were individually tested in wire mesh cages resting on a custom-built platform connected to piezoelectric transducers, located inside one of eight identical sound attenuated chambers. The custom-built 3D-printed platforms consist of a base and four piezoelectric transducers, one in each of the four quadrants of the platform. The piezoelectric transducers are in physical contact with the top animal compartment via four 3D printed rods. Acoustic stimuli were presented through Fostex model FT17H speakers (Fostex Company, Tokyo, Japan) located 30cm directly above the transducer platform and controlled with a RZ6 multi-I/O processor from Tucker-Davis Technologies (TDT, Alachua, FL) and custom MATLAB software (The MathWorks, Inc., Matick, MA). All signals were calibrated prior to testing with a ¼” microphone placed at the level of the animal’s pinna in the sound chamber and led to a Larsen Davis preamplifier, model 2221 (PCB Piezotronics, Inc., Depew, NY). Transducer responses to movement (in millivolts) were recorded for 125 msec prior and 375 msec following the startle stimulus. Ten trials of every frequency/intensity combination were collected during test days. Animals were given 5 min for acclimation inside the sound chamber before testing. Each animal received three testing sessions, every other day, over the course of one week, which included startle input-output using noise bursts (startle I/O) and tonal pre-pulses (TPP). For startle I/O, the startle-eliciting sound (SES) was a broadband Gaussian noise burst (0.5 – 40 kHz) ranging in intensity from 0 to 115 dB SPL. The duration was 20 msec (1msec rise/fall time) presented with pseudorandom inter-trial intervals between 10 and 20 seconds. The TPP protocol consisted of 50 msec tone bursts of 8, 16, or 24 kHz presented at 45, 55, 65 and 75 dB SPL presented 50 msec prior to the SES, being the same broadband Gaussian noise burst as for startle I/O but presented at 110 dB SPL. In addition another group of mice the pre-pulse consisted of a silent gap in an ongoing noise referred to as the gap-induced pre-pulse inhibition of the acoustic startle reflex (GPIAS). This test has been adapted and successfully tested as a powerful technique for tinnitus assessment in laboratory animals (Turner et al., 2006). This method relies on a reduction of the acoustic startle reflex by a preceding silent gap of 50 msec embedded in a constant carrier noise/tone. Mice with either noise-induced or salicylate-induced tinnitus have a decreased ability to detect gaps of silence and therefore their reduction of the startle reflex is significantly reduced compared to normal mice. Gap carriers consisted of four 1/3 octave narrowband center frequencies (10, 16, 24, and 32 kHz) and a wideband noise (2 – 64 kHz). These carriers were presented at 70 dBSPL, whereas the startle stimulus was presented at 110 dB SPL. A total of 180 trials for each carrier are presented including a no-gap control.
A total of 6, 911 Acoustic Startle Reflex (ASR) waveforms were collected from 40 young CBA/CaJ mice (20 male/20 female) ranging in age from 2.3 to 6.2 months (mean age of 3.8 months) over a 14 month period. These ASR waveforms were manually classified as startle (4, 214) or non-startle (2, 697) by two experienced behavioral neuroscientist. The random sampling of 100 startle as well as 100 non-startle waveforms presented in panel A of Figure 1 show significant variation is shape and amplitude as well as timing. This variability is shown in Figure 1 panel B where the mean and one standard deviation around the mean of all 6, 911 startle waveforms is presented.
Figure 1:

Initial sorting and characteristics of waveforms separated into startle (blue) and non-startle (red) classes of A: 100 startle and 100 non-startle randomly selected acoustic startle reflex waveforms and B: the mean (solid line) and standard deviation (shading) of all acoustic startle waveforms. The onset of the SES occurs at 0.0 msec.
3. Machine learning procedure
3.1. Acoustic startle response waveform preprocessing
Due to the variability in the ASR waveforms shown in Figure 1, all ASR waveforms were centered (by subtracting the mean) and scaled (by dividing the centered waveform by the standard deviation) using the mean and standard deviation of the ASR waveform before the SES is presented (t < 0), producing normalized ASR waveforms with units of the number of standard deviations from the mean before the SES is presented (Halaki and Gi, 2012; Lee et al., 2019; Lara-Cueva et al., 2016; Hartmann et al., 2019). Figure 2 shows raw (panel A) as well as normalized (panel B) startle waveforms from 10 trials from a single 6 month old animal subject to the startle I/O protocol at 85 dB SPL. Comparing specific trials presented in panel A to those in panel B show marked changes in non-startle trials 5 and 6 in which scaling by the pre-startle mean and standard deviation highlights the true relationship of the components of the startle waveform startle elicitor.
Figure 2:

Representative 85 dB SPL Startle I/O ASR waveforms presented as A: raw and B: normalized (centered and scaled) signals for 10 trials in a single session. Note the difference in scale between startles and non-startles.
3.2. Feature engineering
Machine learning methods use features extracted from a dataset to build a model to predict an outcome (Torkkola, 2003). Features extraction usually involves transforming the original dataset into features with stronger predictive ability when compared to the original dataset itself (Cai et al., 2018; Vergara and Estévez, 2014). Features consisting of the individual voltages at each time in the original ASR waveform dataset will possess weak predictive ability and thus must undergo several transformations in order to extract features which possess much stronger predictive capability. These transformations include waveform normalization (described in Section 3.1) and the subsequent extraction of waveform derived features, power spectral density derived features, and continuous wavelet transform derived features in order to obtain features describing the waveform itself as well as average frequency and temporal content. Feature selection is the process of selecting a subset of features extracted from a dataset with the goal of constructing the smallest set of features which collectively possess high predictive capability without containing redundant information and which generalize across samples within the current and future datasets (Cai et al., 2018; Vergara and Estévez, 2014; Guyon and Weston, 2003).
3.2.1. Waveform derived features
Several features can be directly derived from the normalized waveforms shown in panel B of Figure 2 such as the maximum magnitude and timing of the maximum magnitude of the normalized waveform. Figure 3 shows significant differences in the maximum magnitude of the normalized ASR waveforms with startles exhibiting a mean maximum magnitude of 8, 745 versus 381 for non-startles, a 23× difference on average. In addition, the majority of times at maximum magnitude occur within a 100 msec window after the startle elicitor is presented where the time at maximum magnitude for non-startles occurs at almost any time. The significant separation between the majority of the maximum magnitudes presented in Figure 3 make the maximum magnitude of the normalized ASR waveforms a feature with very strong predictive capability. However, Figure 3 also shows overlap between startles and non-startles in the maximum magnitudes when, especially when the maximum magnitudes are around 500. This overlap in both the maximum magnitude as well as time at maximum magnitude of the normalized ASR waveforms indicating more features will be needed for a machine learning algorithm to accurately classify the ASR waveforms.
Figure 3:

Individual maximum magnitude versus time of the maximum magnitude of the normalized ASR waveforms determined to be a startle (blue) or a non-startle (red) in the center with the distribution of the maximum magnitude to the right and the distribution of the time of the maximum magnitude on top. Startle and non-startle labeled points include a startle normalized ASR waveform with the maximum magnitude (A), a non-startle normalized ASR waveform with the minimum magnitude (B), and both a startle and non-startle normalized ASR waveform with closely overlapping maximum magnitude and time of maximum magnitude (C and D).
Example normalized ASR waveforms from the points A through D labeled in Figure 3 are shown in Figure 4. Panels A and B represent startle and non-startle waveforms containing the largest and smallest maximum normalized waveform magnitude respectively with widely varying scales. Panels C and D represent example startle and non-startle waveforms with maximum normalized waveform magnitudes that fall in the overlap region for maximum normalized ASR magnitude and thus have similar scales.
Figure 4:

Normalized ASR waveform from the labeled points in Figure 3 of A: startle with maximum waveform magnitude, B: non-startle with minimum waveform magnitude, C: startle with maximum waveform magnitude overlapping with non-startles, and, D: non-startle with maximum waveform magnitude overlapping with startles.
3.2.2. Power spectrum density derived features
Figures 1 and 2 show ASR waveforms exhibiting periodic behavior. Thus, analysis of the frequency content of each waveform will likely lead to additional features. The power spectral density (PSD) was estimated from the periodogram of the entire 0.5 s long normalized ASR waveform. Each PSD was smoothed using a modified Daniell moving average smoother (width = 3) (Golibagh Mahyari, 2010; Bloomfield, 2011). The smoothed PSD of the same 10 ASR waveforms presented in Figure 2 are shown in Figure 5 and demonstrate that most of the energy in the PSD spectrum is concentrated between 10 Hz and 150 Hz. In addition, the analysis shows a marked difference between the magnitude of the PSD for startle versus non-startles for these 10 trials.
Figure 5:

Smoothed power spectral density estimates of the centered and scaled representative 85 dB SPL Startle I/O ASR waveforms from the same 10 trials presented in Figure 2
Several distinguishing features of the smoothed PSD estimates of the normalized ASR waveforms include the maximum PSD power as well as the frequency of the maximum PSD. Figure 6 shows significant differences in the maximum power of the PSD estimate between startles and non-startles but no significant differences in the frequency where that maximum occurs. The mean maximum power in the PSD estimate is 30, 9580.2 Hz−1 for startles versus 13, 628.07Hz−1 for non-startles, a 22.7× difference on average. Figure 6 also shows a slight overlap in the maximum power of the PSD for startles versus non-startle, indicating additional features may be needed in order to accurately classify the ASR waveforms.
Figure 6:

Individual maximum power of the power spectral density estimate versus frequency of the maximum power of the spectral density estimate of the normalized ASR waveforms determined to be a startle (blue) or a non-startle (red) in the center with the distributions of each next to the axes. Note the frequency resolution in the power spectral density estimate is about 2 Hz and is determined by the sampling frequency and the length of recording.
3.2.3. Continuous wavelet transform derived features
Both the magnitude and frequency of the periodic behavior of the ASR waveforms presented in Figures 1, 2, and 4 are functions of time. However, the PSD estimate presented in Figures 5 and 6 are an average taken over the entire 0.5s timespan of the normalized ASR waveforms with the temporal information removed, since each time point contributes to the PSD estimate. We then examined the continuous wavelet transform (CWT), of the normalized ASR waveforms which allows for the analysis of the frequency content of time varying signals in which the frequency content changes with time (Veer and Agarwal, 2015; Allen, 1977; Sejdić et al., 2009). The CWT has been shown to have several advantages over the short time Fourier transform (STFT) for the analysis of frequency content as a function of time in time-varying signals with a balanced compromise between time and frequency resolution as well as more accurate time-frequency power estimates (Zhang et al., 2003). The CWT of each normalized ASR waveform using the Morlet wavelet was computed using the WaveletComp package for R (Roesch and Schmidbauer, 2015).
The wavelet power spectra of the same 4 ASR waveforms labeled in Figure 3 and presented in Figure 4 are shown in Figure 7. These representative wavelet power spectra demonstrate increased maximum wavelet power for startles versus non-startles as well as the wavelet power being concentrated around the high amplitude activity after the startle elicitor for startles versus being widely distributed for non-startles. Several features can be extracted from the CWT power spectra from each normalized ASR waveform, capturing the time-varying frequency content of each waveform at various times (or time spans) and frequencies. These features can then be combined with others thereby increasing the predictive ability (Cai et al., 2018; Vergara and Estévez, 2014).
Figure 7:

Wavelet power spectra with waveforms of A: startle with maximum waveform magnitude, B: non-startle with minimum waveform magnitude, C: startle with maximum waveform magnitude overlapping with non-startles, and D: non-startle with maximum waveform magnitude overlapping with startles.
Figure 8 shows the distribution of eight features extracted from the CWT spectra of each normalized ASR waveform. When considered separately, many CWT derived features provide good separation between startles and non-startles, but still with some overlap depending on the feature. However, a robust model is produced when the CWT derived features are combined together and with the other features described above via machine learning algorithms which can accurately distinguish between startles and non-startles. Panel A shows the distribution of the maximum wavelet power and time of maximum CWT power between startles and non-startles. Panel A clearly shows the time of maximum CWT power being concentrated between 50 msec and 110 msec with a mean of 102.8 msec for non-startles after the startle elicitor for most startles in comparison to the time of maximum CWT power for non-startles spanning all times with a mean of 74.4msec. In addition, the mean maximum wavelet power for startles is 167, 963.3 versus 103, 573.1 for non-startles, a 1.62× increase over non-startles.
Figure 8:

Distribution of various continuous wavelet transform (CWT) derived features from normalized ASR waveforms determined to be a startle (blue) or a non-startle (red) including A: maximum CWT power versus the time of the maximum CWT power, B: mean CWT power at 350 msec after SES versus the mean CWT power at 50 msec before SES versus, C: mean CWT power at 100 Hz versus the mean CWT power at 2, 048 Hz, and D: mean CWT power around the SES versus the mean CWT power at 50 msec after SES.
Panel B shows the distribution of the mean CWT power across all periods, all frequencies, but evaluated at 350 msec after the SES versus the same evaluated at 50 msec before the SES. This data shows that most normalized ASR waveforms classified as startles exhibit much less power in the CWT 50 msec before the SES with an average of 5.49 compared to an average of 645.05 for non-startles, a 117.5× difference. The mean CWT 350 msec after the SES, essential after the startle response, was also much lower for startles versus non-startles with mean values of 17.34 versus 473.7, a 27.3× difference. Plotting Figure 7 with CWT power presented on a log scale shows significant differences in CWT power at both 100 Hz, the characteristic frequency of the sensor measuring the platform movement, and at high frequencies such as 2, 048 Hz between startles and non-startles, generating features with strong predictive capability. Panel C shows the distribution of the mean CWT power at 100Hz versus the mean CWT power at 2,048Hz across all times with the mean CWT power evaluated at 2,048Hz being lower for startles versus non-startles with means of 0.0053 versus 0.148 respectively, a 28.2× difference. The mean CWT power evaluated at 100 Hz is highly concentrated at lower mean CWT power for non-startles while startles are distributed across the entire range of mean CWT powers at evaluated at 100 Hz with mean values of 6, 648.3 for startles versus 4, 257.4 for non-startles.
Panel D shows the distribution of the mean CWT power at around the startle elicitor (from 5msec before the SES to 150 msec after the SES) versus the mean CWT power at 50 msec after the SES. The mean CWT power at 50 msec after the SES is observed to be concentrated below 2, 500 for non-startles versus being relatively evenly distributed for startles. The mean CWT power measured 50 msec after the SES is 4, 479.8 on average for startles but 1, 850.6 for non-startles, a 2.42× increase. This analysis shows the mean CWT power around the SES to be higher for startles versus non-startles with means of 2, 725.6 and 1, 782.5 respectively, a 1.53× difference. In addition, there is a clear dependence on the mean CWT power around the SES on mean CWT power at 50 msec after the SES for non-startles but not for startles in that there is a minimum mean CWT power around the SES for each mean CWT power at 50 msec after the SES.
Panels A through D of Figure 8 all show the distribution of CWT based features with many features showing separation between startles versus non-startles. However, there is no single feature or pair of features extracted from normalized ASR waveforms, or presented in Figures 3, 6, or 8, which provides completely clear separation of startles from non-startles. The solution is to combine features extracted from the normalized ASR waveforms must be combined via machine learning algorithms in order to develop a robust model which can classify an ASR waveform as a startle versus non-startle.
3.3. Feature pre-processing
The 17 features extracted from the normalized ASR waveforms are listed in Table 1 with the distribution of 12 of those features shown in Figures 3, 6, and 8. Several standard feature selection and pre-processing steps must be carried out in order to ensure only features with high predictive value without containing redundant information (either highly correlated and/or linearly dependent features) are used in any machine learning algorithm to predict startles versus non-startles (Guyon and Weston, 2003; Cai et al., 2018). Lastly, the selected features must themselves be centered and scaled in order to remove any bias for any single high magnitude feature in preference for a low magnitude feature (Kotsiantis et al., 2006; Kuhn and Johnson, 2013).
Table 1:
Summary of features extracted from the normalized ASR waveforms including the feature abbreviation and description as well as the mean feature value and level of statistical significance for startles versus non-startles.
| feature | description | startle | non-startle | signif. |
|---|---|---|---|---|
|
| ||||
| before.mean | Mean of the raw ASR waveform before the SES [V] | 1.077e−07 | −1.002e−06 | - |
| before.sd | Standard deviation of the raw ASR waveform before the SES [V] | 3.102e−05 | 0.0002918 | **** |
| max.mag | Maximum magnitude of the normalized ASR waveform [1] | 8745 | 381 | **** |
| t.max.mag | Time of maximum magnitude of the normalized ASR waveform [sec] | 0.07599 | 0.108 | **** |
| max.psd | Maximum power spectral density of the normalized ASR waveform [Hz−1] | 3.096e+05 | 1.363e+04 | **** |
| freq.max.psd | Frequency at maximum power spectral density of the normalized ASR waveform [Hz] | 69.87 | 61.35 | **** |
| mean.cwt.power.before.startle | Mean continuous waveform transform power before the SES | 6.117 | 502.2 | **** |
| period.max.cwt.power | Period at maximum continuous waveform transform power [sec] | 0.01274 | 0.01379 | **** |
| max.cwt.power | Maximum continuous waveform transform power | 1.68e+05 | 1.036e+05 | **** |
| t.max.cwt.power | Time of maximum continuous waveform transform power [sec] | 0.07438 | 0.1028 | **** |
| mean.cwt.power | Mean continuous waveform transform power | 935.2 | 930.8 | **** |
| mean.cwt.power.100.Hz | Mean continuous waveform transform power at a frequency of 100 Hz | 6648 | 4257 | **** |
| mean.cwt.power.around.startle | Mean continuous waveform transform power around the startle (from 5 msec before SES to 150 msec after the SES | 2726 | 1783 | **** |
| mean.cwt.power.50ms.before.startle | Mean continuous waveform transform power at t = 50 msec before the SES | 5.49 | 645 | **** |
| mean.cwt.power.50ms.after.startle | Mean continuous waveform transform power at t = 50 msec after the SES | 4480 | 1851 | **** |
| mean.cwt.power.350ms.after.startle | Mean continuous waveform transform power at t = 350 msec after the SES | 17.34 | 473.7 | **** |
| mean.cwt.power.high.freq | Mean continuous waveform transform power at a frequency of 2048 Hz | 0.005254 | 0.148 | **** |
The variability of all features was examined to determine if any features have little to no variability, making them poor candidate features for machine learning. Most features contained over 99% unique values with only four features (t.max.mag (36.85%), freq.max.psd (0.39%), freq.max.cwt.power (0.32%), and mean.cwt.power (44.93%)). However, all features with a lower percentage of unique values still had a ratio of the most prevalent value to the second most prevalent value of less than 3, well less than the ratio of 20 used to classify predictors as having near-zero variance (Kuhn and Johnson, 2013). Thus, no features were removed due to near-zero variance. In general, features which are well correlated with the classifications but highly correlated with any other features are considered “good” features as they describe the difference in classes well without doing so with redundant information (Yu and Liu, 2003; Guyon and Weston, 2003) The correlation matrix between all pairs of features was also examined, see Figure 9, to see if any features were highly correlated (r > 0.9). All pairs of features have correlation coefficients less than this threshold (Biesiada and Duch, 2006). Lastly, any feature that is a linear combination of other features does itself explain any additional variation and thus should be removed from the feature set (Yu and Liu, 2003) as learning combinations are a multivariate extension of pairwise correlations described above. Fortunately, no feature was determined to be a linear combination of the any other features, allowing all features to be used for machine learning.
Figure 9:

Pairwise correlations between features extracted from normalized ASR waveforms.
3.4. Data partitioning
The feature set was randomly split into two sections with 80% of the data designated as training data and the remaining 20% as test/validation data. In addition, 10-fold cross validation was included in all model training in an effort to reduce the training bias of any specific subset of data (Kohavi, 1992; Rodríguez et al., 2010; Michailidis, 2015). All features were then centered scaled according to the mean and standard deviation of each feature within each partitioned dataset for use in the machine learning methods described below.
3.5. Machine learning methods
Figures 3, 6, and 8 show many features of the normalized ASR waveforms and how those features vary across startles and non-startles. No single feature, pair of features, or simple collection of features allows for the prediction of an ASR waveform as a startle versus a non-startle with high accuracy for a large sampling of ASR waveforms, especially over varying protocols. Machine learning algorithms must be employed to combine the features described above into a robust predictive model in order to predict a startle versus non-startle from ASR waveforms with little to no human intervention (Hastie et al., 2009; Kuhn and Johnson, 2013). A diverse set of machine learning methods from different families of algorithms have been tested including:
Linear classification - logistic regression (Dobson and Barnett, 2018; Hosmer and Lemeshow, 2000; Hastie et al., 2009), linear discriminant analysis (LDA) (Yu and Yang, 2001; Mika et al., 1998; Hastie et al., 2009)
Bagged classification (Dietterich, 1996; Breiman, 1995; Hastie et al., 2009) - bagged tree (Cutler et al., 2008; Hastie et al., 2009), random forests (Cutler et al., 2008; Breiman, 2000; Hastie et al., 2009)
Boosted classification (Breiman, 2003, 1999; Hastie et al., 2009; Dietterich, 1996) - extreme gradient boosted (Chen and Guestrin, 2014; Chen et al., 2018), C5.0 (Quinlan, 1992; Kuhn and Quinlan, 2017; Dietterich, 1996; Hastie et al., 2009)
Discriminative classification via kernels - support vector machine (Cortes and Vapnik, 1995; Meyer et al., 2003; Hastie et al., 2009)
In addition to individual machine learning algorithms, multiple models have been put together into an ensemble via stacking (Wolpert, 1992; Caruana et al., 2002) using a Generalized Linear Models (GLM) to determine the weights required in to produce an extremely robust model that performs just as well on test and validation data as it does on training data. Any individual machine learning algorithm with a statistically insignificant weight, due to predicting the same classifications as other models, was removed from the stacked ensemble model. The machine learning workflow was implemented in R using the caret and caretEnsemble packages (Kuhn, 2018; Deane-Mayer and Knowles, 2016).
4. Results
Several classification performance metrics are presented in Table 2 including the mean Receiver Operating Characteristics (ROC) and the area under the ROC curve (AUC) when predicting a normalized ASR waveform to be a startle or non-startle with the training dataset as well as the accuracy of predicting the correct classification on the testing dataset for each of the individual machine learning methods described above. All mean ROCs were over 0.95 with random forests demonstrating the greatest performance within the individual models with a ROC of 0.9779, AUC of 0.9787, and a testing accuracy of 0.9266.
Table 2:
Individual machine learning model performance metrics including the mean receiver operating characteristics (ROC), area under the ROC curve (AUC), and accuracy as applied to training data as well as accuracy when applied to testing data. The final column contains the coefficients within the generalized linear model used to create the final ensemble model for each individual model.
| Training | Testing | |||||
|---|---|---|---|---|---|---|
|
| ||||||
| Model name | Model abbrev. | ROC | AUC | Accuracy | Accuracy | Coefficient in final ensemble model |
|
| ||||||
| Generlized linear model | glm | 0.9611 | 0.9498 | 0.9099 | 0.8924 | −2.6860 |
| Linear discriminant analysis | lda | 0.9567 | 0.9461 | 0.9073 | 0.8931 | 1.5094 |
| Bagged tree | treebag | 0.9707 | 0.9732 | 0.9991 | 0.9210 | - |
| Random forests | rf | 0.9779 | 0.9787 | 1.0000 | 0.9273 | −5.6355 |
| eXtreme Gradient Boosting | xgbTree | 0.9757 | 0.9702 | 0.9548 | 0.9161 | −0.5149 |
| C5.0 | C5.0 | 0.9740 | 0.9716 | 0.9850 | 0.9238 | −2.0424 |
| Support Vector Machines with Linear Kernel | svmLinear | 0.9585 | 0.9482 | 0.9115 | 0.8938 | - |
| Support Vector Machines with Radial Basis Function Kernel | svmRadial | 0.9597 | 0.9591 | 0.9197 | 0.9001 | 1.3787 |
| Support Vector Machines with Polynomial Kernel | svmPoly | 0.9624 | 0.9575 | 0.9197 | 0.8980 | - |
|
| ||||||
| Ensemble model using all models | ensemble.glm | 0.9779 | 0.9778 | 0.9993 | 0.9301 | |
|
| ||||||
| Ensemble model using statistically significant models | final.ensemble | 0.9779 | 0.9778 | 0.9993 | 0.9301 | |
The correlation between the predicted classes, on resampled training data, and the individual machine learning models described in Table 2 is presented in Figure 10. This analysis shows several models are highly correlated including combinations of generalized linear model, linear discriminant analysis, and polynomial and linear support vector machines as well as bagged tree and random forests with correlation coefficients above 0.9. This high correlation was confirmed by an ensemble model of all machine learning methods described in Table 2, using a generalized linear model, in which the coefficients for the bagged tree and both linear and polynomial support vector machines was not statistically significant. Thus, a ensemble model containing all machine learning methods described in Table 2 using all models with statistically significant coefficients random forests containing the highest coefficient. The ensemble model showed similar performance to that of random forests with a ROC of 0.9779 and AUC of 0.9778 with a testing accuracy of 0.9301, slightly higher than that of random forests.
Figure 10:

Correlation between the predicted classes of individual machine learning models on resampled training data.
5. Discussion
The startle reflex has been measured for almost 100 years (Landis and Hunt, 1939) and is still used in many disciplines across hundreds of laboratories as a quick and effective measure of the internal state of animals and humans. However, ASR-related data between these disciplines, or even between researchers in the same discipline, are not easily compared. Since resource and data sharing is a key component for making progress in any scientific field, it was our goal to develop a universal method to process and analyze ASR waveforms so data can be seamlessly compared between laboratories. While a previous attempt was made with ASR template matching (Grimsley et al., 2015), a machine learning paradigm using the free programming language R provides a more universal and open-source approach for ASR waveform analysis. While our results in this paper focused squarely on the startle response waveforms from one strain of mouse collected from one ASR modality, it is worth noting the machine learning approaches presented here could be applied other types of startle responses. The whole body startle response differs between mice (Grimsley et al., 2015) and rats (Horlington, 1968), and this is entirely different than the Preyer reflex (small ear movements) in guinea pigs (Böhmer, 1988; Berger et al., 2013) or eye blink reflex in humans and primates (Landis and Hunt, 1939). Future studies should apply these types of machine learning techniques to classify startles from all species being studied with ASR in order to allow for consensus ASR data analysis.
The machine learning approach provides a robust procedure to classify ASR waveforms as startles versus non-startles given high-quality human classified training data. The waveform normalization and feature extraction procedures outlined above are both specific to the ASR waveforms presented as well as generalizable to most if not any startle waveforms, although the specific timings and/or frequencies selected will likely vary across laboratories, species, and measurement techniques. Using this machine learning method to classify ASR waveforms as startles versus non-startles provides highly reliable classification and thus higher quality measurements when comparing true startle characteristics within animal, within group, or across groups. This is especially true in paradigms in which a relative startle response is required such as in pre-pulse inhibition using tones (Turner and Willott, 1998; Jeskey and Willott, 2000; Joober et al., 2002) for accessing the auditory system or gap detection (Turner et al., 2006; Schilling et al., 2017) for tinnitus screening. According to Joober et al. (2002); Schilling et al. (2017), pre-pulse inhibition is defined in Equation 1:
| (1) |
where AX and Ano,X are mean amplitudes (peak-to-peak or maximum) with (AX) and without Ano,X the condition(s) causing the inhibition respectively.
The amplitudes used in any startle response based computation must only include those from startles, ignoring non-startles, requiring accurate classification of ASR waveforms in order to provide more accurate PPI and Startle I/O function computations. Several commonly used methods to classify ASR waveforms as startles versus non-startles are presented in Grimsley et al. (2015) including threshold and RMS based methods. These threshold and RMS classification methods were implemented on all 6,911 ASR waveforms used in this study with the number of startles and non-startles presented in Table 3 along with the manually classification results. Table 3 clearly shows that the thresholding and RMS techniques classify ASR waveforms as startles at much higher rate than manual classification with a total of 6, 599 and 6, 412 of the 6, 911 ASR waveforms classified as startles respectively, resulting in false discovery rates (FDR) of 0.8851 and 0.8183 respectively when compared to manual classification. These high false discovery rates result in more than 80% of the ASR waveforms used to compute characteristics of a startle response are actually non-startles, resulting in large differences between mean amplitude computations compared to manual classification.
Table 3:
Comparison of methods to determine startle versus non-startle.
| Threshold | RMS | Machine Learning | ||||
|---|---|---|---|---|---|---|
|
| ||||||
| non-startle | startle | non-startle | startle | non-startle | startle | |
|
| ||||||
| manual non-startle | 310 | 2387 | 490 | 2207 | 2637 | 60 |
| manual startle | 2 | 4212 | 9 | 4205 | 43 | 4171 |
Figure 11 shows the effect of classification method used to classify ASR waveforms as startle versus non-startle on the mean maximum magnitude of the waveforms from all trials from every session including Startle I/O and TPP protocols. This analysis shows no statistical difference between the startle amplitude of ASR waveforms classified as startles between manual classification and the ensembled machine learning method presented in this paper. These results also show no statistical difference between all startles being accepted, the threshold method, and the RMS method with average startle amplitudes of 34.2 mV, 35.1 mV, and 35.5 mV respectively. However, there is a clear statistical difference between the machine learning method (or manual classification) and all other classification methods, indicating the mean startle amplitude of ASR waveforms classified as startles are significantly different when ASR waveforms are classified as startles versus non-startles using the machine learning method presented in this paper when compared to other techniques which have been and are still used in many laboratories today. When ASR waveforms are accurately classified, non-startle ASR waveforms, usually having low amplitudes, are much more reliably separated from startles, resulting in higher mean startle amplitudes when ASR waveforms are classified using the machine learning method.
Figure 11:

Comparison of mean maximum magnitudes of all startles as classified manually, via machine learning, by accepting all startles, and by the threshold and RMS methods. Statistical significant was computed using the non-parametric Wilcoxon Rank Sum test.
The analysis shown in Figure 11 compares the distributions of startle amplitudes as a function of startle classification method without regard to protocol, level, or frequency. The startle amplitudes of Startle I/O ASR waveforms collected from a single animal, not included in the training or test datasets used to train the machine learning classifier, both before (baseline) and after acoustic trauma are shown in Figure 12 for each classification method. There are no statistical differences between the startle amplitudes of the ASR waveforms presented in Figure 12 when startles were classified by accepting all startles or using the threshold or RMS techniques. Higher startle elicitor levels tend to produce high startle amplitudes with most presentations resulting in true startles and the classification method would be thought to be irrelevant. However, the machine learning method shows statistically significant differences from the other classification methods for the after trauma ASR waveforms when the elicitor is presented at 95, 105, and 115 dB SPL, showing the correct classification of ASR waveforms has significant impact on subsequent analysis, even in instances where most if not all ASR waveforms are typically true startles.
Figure 12:

Comparison of startle amplitudes elicited from the Startle I/O protocol from a single animal before (baseline) and after acoustic trauma as a function of level when classified by machine learning, accepting all startles as well as the threshold and RMS methods. Statistical significance was computed using the non-parametric Wilcoxon Rank Sum test comparing startle amplitudes when classified via machine learning versus all other methods.
In addition, Figure 12 shows statistically significant differences between the startle amplitudes when classified by the machine learning method compared to all other methods for levels below 95 dB for both baseline and after trauma timepoints. These significant differences are due to low amplitude, non-startle, ASR waveforms being correctly classified using the machine learning method as non-startles and therefore not included in the analysis. The startle response is expected to decrease with decreasing startle elicitor level in the Startle I/O protocol, resulting in low-amplitude startles. Low amplitude startles are many times misclassified as startles using the threshold or RMS methods which are based on magnitude information and not considering the shape and timing of the ASR waveform. The features extracted from the normalized ASR waveforms and included in the machine learning models presented above were designed to extract shape and timing information through the use of established signal analysis techniques, allowing for robust classification.
The gap pre-pulse inhibition of the acoustic startle reflex (GPIAS) can be used for tinnitus screening for a variety of species (Turner et al., 2006; Schilling et al., 2017; Gerum et al., 2019). The startle amplitudes of both the gap and no-gap conditions and derived pre-pulse inhibition (PPI) for 3 different animals, each with different sounds are presented in Figure 13 both before (baseline) and after acoustic trauma, The ASR waveforms were not used in either the training or test datasets, classified by the machine learning method, accepting all startles, threshold, and RMS techniques. Panel A shows statistically significant differences between startle amplitudes when ASR waveforms were classified by the machine learning method compared to all other methods for both gap and no-gap conditions as well as for all sounds. Higher startle amplitudes are seen when low amplitude waveforms are correctly classified as non-startles and therefore ignored when ASR waveforms are classified by the machine learning method compared to all other methods.
Figure 13:

Comparison of startle amplitudes (A) and pre-pulse inhibition (B) from the GPIAS paradigm using multi-frequency and wideband gap carriers for 3 different animals when classified by machine learning, accepting all startles, threshold, and RMS methods. Panel A shows the startle amplitudes before (baseline) and after acoustic trauma plotted as a function of gap/no-gap SES presentations and pre-pulse sound frequency. Panel B shows the pre-pulse inhibition derived from the gap (inhibitory)/no-gap (control) startle amplitudes presented in Panel A. Statistical significance was computed using the non-parametric Wilcoxon Rank Sum test comparing startle amplitudes from waveforms classified by machine compared to the other methods (Panel A) and comparing pre-pulse inhibition before (baseline) and after acoustic trauma for each classification method.
Panel B of Figure 13 shows three different cases in which different conclusions are drawn from pre-pulse inhibition data based on the methods used to classify the ASR waveforms used to compute PPI. One example for the 10 kHz narrowband carrier produced a statistically significant difference in PPI for baseline versus post trauma timepoints when ASR waveforms were classified by the machine learning methods but no differences for all other methods. In another case the 16 kHz carrier shows no difference for the machine learning method whereas all other methods show statistically significant differences. The last example for a wideband noise produced statistically significant differences for all methods. Since pre-pulse inhibition is derived from the startle amplitudes from both gap (inhibitory) and no-gap (reference) conditions, correctly classified ASR waveforms as startles is of utmost importance to ensure accurate, reliable, and repeatable PPI calculations.
The Startle I/O function and/or pre-pulse inhibition characteristics are important outcome measures based on the acoustic startle reflex in which only true startles are used in data analysis. Non-startles should be considered background noise and do not offer valuable information in describing the inner workings of the auditory system via the Startle I/O and Tone Pre-Pulse paradigms nor the assessment of tinnitus via the Gap in Noise paradigm. Thus, startles must be accurately separated from non-startles to provide the highest quality data possible in order to draw sound neuroscience conclusions from any ASR measurements. Figures 11, 12, and 13 showed how the startle amplitudes of ASR waveforms vary with classification method both in general and among the Startle I/O, TPP, and GPIAS paradigms. The machine learning method presented in this paper were shown to be extremely robust and mimic manual classification with over 93% accuracy and with mislabeling only 2.2% non-startles as startles when tested against data the models were not trained against. In addition, the machine learning method was applied to unseen ASR waveforms, demonstrating large differences in startle amplitudes and derived pre-pulse inhibition computations depending on the classification method. The use of machine learning methods to classify acoustic startle reflex waveforms as startles versus non-startles reliably creates higher quality startle response and relative startle response measures, allowing neuroscientists to draw more reliable conclusions when using startle paradigms to access behavioral measures of sensory motor central nervous system function.
Acknowledgements
This work was supported by the National Institutes of Health [NIH-NIA AG00954]. The authors would like to acknowledge the use of the services provided by Research Computing at the University of South Florida. We thank Demitri Brunnell and Mary Reith for their technical assistance and oversight of behavioral experiments and Rachal Love for oversight of animal care.
Footnotes
Declaration of Competing Interest
All authors have no conflicts of interest to disclose.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
References
- Allen J, 1977. Short term spectral analysis, synthesis, and modification by discrete Fourier transform. IEEE Transactions on Acoustics, Speech, and Signal Processing 25, 235–238. doi: 10.1109/tassp.1977.1162950. [DOI] [Google Scholar]
- Bejnordi BE, Veta M, Van Diest PJ, Van Ginneken B, Karssemeijer N, Litjens G, Van Der Laak JA, Hermsen M, Manson QF, Balkenhol M, Geessink O, Stathonikos N, Van Dijk MC, Bult P, Beca F, Beck AH, Wang D, Khosla A, Gargeya R, Irshad H, Zhong A, Dou Q, Li Q, Chen H, Lin HJ, Heng PA, Haß C, Bruni E, Wong Q, Halici U, Öner MÜ, Cetin-Atalay R, Berseth M, Khvatkov V, Vylegzhanin A, Kraus O, Shaban M, Rajpoot N, Awan R, Sirinukunwattana K, Qaiser T, Tsang YW, Tellez D, Annuscheit J, Hufnagl P, Valkonen M, Kartasalo K, Latonen L, Ruusuvuori P, Liimatainen K, Albarqouni S, Mungal B, George A, Demirci S, Navab N, Watanabe S, Seno S, Takenaka Y, Matsuda H, Phoulady HA, Kovalev V, Kalinovsky A, Liauchuk V, Bueno G, Fernandez-Carrobles MM, Serrano I, Deniz O, Racoceanu D, Venâncio R, 2017. Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer. JAMA - Journal of the American Medical Association 318, 2199–2210. doi: 10.1001/jama.2017.14585. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Berger JI, Coomber B, Shackleton TM, Palmer AR, Wallace MN, 2013. A novel behavioural approach to detecting tinnitus in the guinea pig. Journal of neuroscience methods 213, 188–95. doi: 10.1016/j.jneumeth.2012.12.023. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Biesiada J, Duch W, 2006. Feature Selection for High-Dimensional Data — A Pearson Redundancy Based Filter. Advances in Soft Computing, 242–249 doi: 10.1007/978-3-540-75175-5_30. [DOI] [Google Scholar]
- Bloomfield P, 2011. Fourier Analysis of Time Series: An Introduction. Wiley Series in Probability and Statistics. 2nd ed., Wiley. [Google Scholar]
- Böhmer A, 1988. The Preyer reflex–an easy estimate of hearing function in guinea pigs. Acta oto-laryngologica 106, 368–72. doi: 10.3109/00016488809122259. [DOI] [PubMed] [Google Scholar]
- Braff DL, Geyer MA, Light GA, Sprock J, Perry W, Cadenhead KS, Swerdlow NR, 2001. Impact of prepulse characteristics on the detection of sensorimotor gating deficits in schizophrenia. Schizophrenia research 49, 171–8. [DOI] [PubMed] [Google Scholar]
- Breiman L, 1995. Bagging Predictors. Machine Learning 24, 123–140. [Google Scholar]
- Breiman L, 1999. Prediction Games and Arcing Algorithms. Neural Computation 11, 1493–1517. doi: 10.1162/089976699300016106. [DOI] [PubMed] [Google Scholar]
- Breiman L, 2000. Random Forests. Machine Learning 45, 5–32. [Google Scholar]
- Breiman L, 2003. Population theory for boosting ensembles. The Annals of Statistics 32, 1–11. doi: 10.1214/aos/1079120126. [DOI] [Google Scholar]
- Cai J, Luo J, Wang S, Yang S, 2018. Feature selection in machine learning: A new perspective. Neurocomputing 300, 70–79. doi: 10.1016/j.neucom.2017.11.077. [DOI] [Google Scholar]
- Caruana R, Niculescu-Mizil A, Crew G, Ksikes A, 2002. Ensemble selection from libraries of models, ACM Press. doi: 10.1145/1015330.1015432. [DOI] [Google Scholar]
- Chen T, Guestrin C, 2014. XGBoost, ACM Press. doi: 10.1145/2939672.2939785. [DOI] [Google Scholar]
- Chen T, He T, Benesty M, Khotilovich V, Tang Y, Cho H, Chen K, Mitchell R, Cano I, Zhou T, Li M, Xie J, Lin M, Geng Y, Li Y, 2018. xgboost: Extreme Gradient Boosting. Technical Report. [Google Scholar]
- Cortes C, Vapnik V, 1995. Support-vector networks. Machine Learning 20, 273–297. doi: 10.1007/bf00994018. [DOI] [Google Scholar]
- Cutler A, Cutler RD, Stevens RJ, 2008. Tree-Based Methods. High-Dimensional Data Analysis in Cancer Research, 1–19 doi: 10.1007/978-0-387-69765-9_5. [DOI] [Google Scholar]
- Davis M, 1984. The Mammalian Startle Response, in: Neural Mechanisms of Startle Behavior. Springer US, Boston, MA, pp. 287–351. doi: 10.1007/978-1-4899-2286-1_10. [DOI] [Google Scholar]
- Deane-Mayer ZA, Knowles JE, 2016. caretEnsemble: Ensembles of Caret Models. [Google Scholar]
- Dietterich GT, 1996. An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization. Machine Learning 40, 139–157. doi: 10.1023/a:1007607513941. [DOI] [Google Scholar]
- Dobson JA, Barnett GA, 2018. An Introduction to Generalized Linear Models. 4th ed.,Chapman and Hall/CRC. [Google Scholar]
- El Naqa I, Murphy MJ, 2015. What Is Machine Learning?, in: Machine Learning in Radiation Oncology. Springer International Publishing, Cham, pp. 3–11. doi: 10.1007/978-3-319-18305-3_1. [DOI] [Google Scholar]
- Filion DL, Dawson ME, Schell AM, 1993. Modification of the acoustic startle-reflex eyeblink: A tool for investigating early and late attentional processes. Biological Psychology 35, 185–200. doi: 10.1016/0301-0511(93)90001-0. [DOI] [PubMed] [Google Scholar]
- Galazyuk A, Hébert S, 2015. Gap-Prepulse Inhibition of the Acoustic Startle Reflex (GPIAS) for Tinnitus Assessment: Current Status and Future Directions. Frontiers in Neurology 6. doi: 10.3389/fneur.2015.00088. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gerum R, Rahlfs H, Streb M, Krauss P, Grimm J, Metzner C, Tziridis K, Günther M, Schulze H, Kellermann W, Schilling A, 2019. Open(G)PIAS: An Open-Source Solution for the Construction of a High-Precision Acoustic Startle Response Setup for Tinnitus Screening and Threshold Estimation in Rodents. Frontiers in Behavioral Neuroscience 13. doi: 10.3389/fnbeh.2019.00140. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Goldenberg SL, Nir G, Salcudean SE, 2019. A new era: artificial intelligence and machine learning in prostate cancer. doi: 10.1038/s41585-019-0193-3. [DOI] [PubMed] [Google Scholar]
- Golibagh Mahyari A, 2010. Spectral estimation using modified Daniell method. International Journal of Electronics 97, 1311–1316. doi: 10.1080/00207217.2010.488908. [DOI] [Google Scholar]
- Grillon C, Pellowski M, Merikangas KR, Davis M, 1997. Darkness facilitates the acoustic startle reflex in humans. Biological Psychiatry 42, 453–460. doi: 10.1016/S0006-3223(96)00466-0. [DOI] [PubMed] [Google Scholar]
- Grimsley CA, Longenecker RJ, Rosen MJ, Young JW, Grimsley JM, Galazyuk AV, 2015. An improved approach to separating startle data from noise. Journal of neuroscience methods 253, 206–17. doi: 10.1016/j.jneumeth.2015.07.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Guyon I, Weston J, 2003. An Introduction to Variable and Feature Selection. Machine Learning Research 3, 1157–1182. [Google Scholar]
- Halaki M, Gi K, 2012. Normalization of EMG Signals: To Normalize or Not to Normalize and What to Normalize to?, in: Computational Intelligence in Electromyography Analysis - A Perspective on Current Applications and Future Challenges. In Tech. chapter 7, pp. 175–194. doi: 10.5772/49957. [DOI] [Google Scholar]
- Hartmann V, Liu H, Chen F, Qiu Q, Hughes S, Zheng D, 2019. Quantitative comparison of photoplethysmographic waveform characteristics: Effect of measurement site. Frontiers in Physiology 10. doi: 10.3389/fphys.2019.00198. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hastie T, Tibshirani R, Friedman J, 2009. The Elements of Statistical Learning. Springer Series in Statistics, Springer New York, New York, NY. doi: 10.1007/978-0-387-84858-7. [DOI] [Google Scholar]
- Horlington M, 1968. A method for measuring acoustic startle response latency and magnitude in rats: Detection of a single stimulus effect using latency measurements. Physiology & Behavior 3, 839–844. doi: 10.1016/0031-9384(68)90164-9. [DOI] [Google Scholar]
- Hosmer WD, Lemeshow S, 2000. Applied Logistic Regression. John Wiley & Sons, Inc. doi: 10.1002/0471722146. [DOI] [Google Scholar]
- Jeskey JE, Willott JF, 2000. Modulation of prepulse inhibition by an augmented acoustic environment in DBA/2J mice. Behavioral neuroscience 114, 991–7. doi: 10.1037//0735-7044.114.5.991. [DOI] [PubMed] [Google Scholar]
- Joober R, Zarate JM, Rouleau GA, Skamene E, Boksa P, 2002. Provisional mapping of quantitative trait loci modulating the acoustic startle response and prepulse inhibition of acoustic startle. Neuropsychopharmacology 27, 765–781. doi: 10.1016/S0893-133X(02)00333-0. [DOI] [PubMed] [Google Scholar]
- Koch M, 1999. The neurobiology of startle. Progress in Neurobiology 59, 107–128. doi: 10.1016/S0301-0082(98)00098-7. [DOI] [PubMed] [Google Scholar]
- Kohavi R, 1992. A study of cross-validation and bootstrap for accuracy estimation and model selection, Morgan Kaufmann Publishers Inc. pp. 1137–1143. [Google Scholar]
- Kohl S, Heekeren K, Klosterkötter J, Kuhn J, 2013. Prepulse inhibition in psychiatric disorders–apart from schizophrenia. Journal of psychiatric research 47, 445–52. doi: 10.1016/j.jpsychires.2012.11.018. [DOI] [PubMed] [Google Scholar]
- Kohl S, Wolters C, Gruendler TOJ, Vogeley K, Klosterkötter J, Kuhn J, 2014. Prepulse inhibition of the acoustic startle reflex in high functioning autism. PloS one 9, e92372. doi: 10.1371/journal.pone.0092372. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Komura D, Ishikawa S, 2019. Machine learning approaches for pathologic diagnosis. doi: 10.1007/s00428-019-02594-w. [DOI] [PubMed] [Google Scholar]
- Kotsiantis SB, Zaharakis ID, Pintelas PE, 2006. Machine learning: A review of classification and combining techniques. Artificial Intelligence Review 26, 159–190. doi: 10.1007/s10462-007-9052-3. [DOI] [Google Scholar]
- Kuhn M, 2018. caret: Classification and Regression Training. Technical Report. [Google Scholar]
- Kuhn M, Johnson K, 2013. Applied Predictive Modeling. Springer New York, New York, NY. doi: 10.1007/978-1-4614-6849-3. [DOI] [Google Scholar]
- Kuhn M, Quinlan R, 2017. C50: C5.0 Decision Trees and Rule-Based Models. Technical Report. [Google Scholar]
- Landis C, Hunt WA, 1939. The startle pattern. Farrar and Rinehart, Inc., New York. [Google Scholar]
- Lara-Cueva RA, Benítez DS, Carrera EV, Ruiz M, Rojo-Álvarez JL, 2016. Feature selection of seismic waveforms for long period event detection at Cotopaxi Volcano. Journal of Volcanology and Geothermal Research 316, 34–49. doi: 10.1016/j.jvolgeores.2016.02.022. [DOI] [Google Scholar]
- Lauer AM, Behrens D, Klump G, 2017. Acoustic startle modification as a tool for evaluating auditory function of the mouse: Progress, pitfalls, and potential. Neuroscience & Biobehavioral Reviews 77, 194–208. doi: 10.1016/j.neubiorev.2017.03.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lee EJ, Liao WY, Lin GW, Chen P, Mu D, Lin CW, 2019. Towards Automated Real-Time Detection and Location of Large-Scale Landslides through Seismic Waveform Back Projection. Geofluids 2019, 1–14. doi: 10.1155/2019/1426019. [DOI] [Google Scholar]
- Libbrecht MW, Noble WS, 2015. Machine learning applications in genetics and genomics. doi: 10.1038/nrg3920. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Longenecker RJ, Alghamdi F, Rosen MJ, Galazyuk AV, 2016. Prepulse inhibition of the acoustic startle reflex vs. auditory brainstem response for hearing assessment. Hearing Research 339, 80–93. doi: 10.1016/j.heares.2016.06.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Longenecker RJ, Galazyuk AV, 2011. Development of tinnitus in CBA/CaJ mice following sound exposure. JARO - Journal of the Association for Research in Otolaryngology 12, 647–658. doi: 10.1007/s10162-011-0276-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Longenecker RJ, Galazyuk AV, 2012. Methodological optimization of tinnitus assessment using prepulse inhibition of the acoustic startle reflex. Brain Research 1485, 54–62. doi: 10.1016/j.brainres.2012.02.067. [DOI] [PubMed] [Google Scholar]
- Meyer D, Leisch F, Hornik K, 2003. The support vector machine under test. Neurocomputing 55, 169–186. doi: 10.1016/s0925-2312(03)00431-4. [DOI] [Google Scholar]
- Michailidis M, 2015. Investigating machine learning methods in recommender systems. Ph.D. thesis. [Google Scholar]
- Mika S, Ratsch G, Weston J, Scholkopf B, Mullers RK, 1998. Fisher discriminant analysis with kernels, IEEE. pp. 41–48. [Google Scholar]
- Mjolsness E, DeCoste D, 2001. Machine learning for science: State of the art and future prospects. doi: 10.1126/science.293.5537.2051. [DOI] [PubMed] [Google Scholar]
- Quinlan RJ, 1992. C4.5: programs for machine learning. Morgan Kaufmann Publishers Inc. [Google Scholar]
- Rajkomar A, Dean J, Kohane I, 2019. Machine learning in medicine. doi: 10.1056/NEJMra1814259. [DOI] [PubMed] [Google Scholar]
- Rodríguez JD, Pérez A, Lozano JA, 2010. Sensitivity Analysis of k-Fold Cross Validation in Prediction Error Estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence doi: 10.1109/TPAMI.2009.187. [DOI] [PubMed] [Google Scholar]
- Roesch A, Schmidbauer H, 2015. WaveletComp: Computational Wavelet Analysis. Technical Report. [Google Scholar]
- Säring W, von Cramon D, 1981. The acoustic blink reflex: Stimulus dependence, excitability and localizing value. Journal of Neurology 224, 243–252. doi: 10.1007/BF00313287. [DOI] [PubMed] [Google Scholar]
- Schilling A, Krauss P, Gerum R, Metzner C, Tziridis K, Schulze H, 2017. A new statistical approach for the evaluation of gap-prepulse inhibition of the acoustic startle reflex (GPIAS) for tinnitus assessment. Frontiers in Behavioral Neuroscience 11. doi: 10.3389/fnbeh.2017.00198. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sejdić E, Djurović I, Jiang J, 2009. Time–frequency feature representation using energy concentration: An overview of recent advances. Digital Signal Processing 19, 153–183. doi: 10.1016/j.dsp.2007.12.004. [DOI] [Google Scholar]
- Shen D, Wu G, Suk HI, 2017. Deep Learning in Medical Image Analysis. Annual Review of Biomedical Engineering 19, 221–248. doi: 10.1146/annurev-bioeng-071516-044442. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Torkkola K, 2003. Feature Extraction by Non-Parametric Mutual Information Maximization. Journal of Machine Learning Research 3, 1415–1438. [Google Scholar]
- Turner JG, Brozoski TJ, Bauer CA, Parrish JL, Myers K, Hughes LF, Caspary DM, 2006. Gap detection deficits in rats with tinnitus: A potential novel screening tool. Behavioral Neuroscience 120, 188–195. doi: 10.1037/0735-7044.120.1.188. [DOI] [PubMed] [Google Scholar]
- Turner JG, Willott JF, 1998. Exposure to an augmented acoustic environment alters auditory function in hearing-impaired DBA/2J mice. Hearing Research 118, 101–113. doi: 10.1016/S0378-5955(98)00024-0. [DOI] [PubMed] [Google Scholar]
- Veer K, Agarwal R, 2015. Wavelet and short-time Fourier transform comparison-based analysis of myoelectric signals. Journal of Applied Statistics 42, 1591–1601. doi: 10.1080/02664763.2014.1001728. [DOI] [Google Scholar]
- Vergara JR, Estévez PA, 2014. A review of feature selection methods based on mutual information. doi: 10.1007/s00521-013-1368-0. [DOI] [Google Scholar]
- Winslow JT, Parr LA, Davis M, 2002. Acoustic startle, prepulse inhibition, and fear-potentiated startle measured in rhesus monkeys. Biological Psychiatry 51, 859–866. doi: 10.1016/S0006-3223(02)01345-8. [DOI] [PubMed] [Google Scholar]
- Wolpert HD, 1992. Stacked generalization. Neural Networks 5, 241–259. doi: 10.1016/s0893-6080(05)80023-1. [DOI] [Google Scholar]
- Yu H, Yang J, 2001. A direct LDA algorithm for high-dimensional data — with application to face recognition. Pattern Recognition 34, 2067–2070. doi: 10.1016/s0031-3203(00)00162-x. [DOI] [Google Scholar]
- Yu L, Liu H, 2003. Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution, pp. 856–863. [Google Scholar]
- Zhang Y, Guo Z, Wang W, He S, Lee T, Loew M, 2003. A comparison of the wavelet and short-time fourier transforms for Doppler spectral analysis. Medical engineering & physics 25, 547–557. [DOI] [PubMed] [Google Scholar]
