Table 1.
Dataset | Classes | Description |
---|---|---|
UCLASS (2009) [11] |
Interjection, sound repetition, part-word repetition, word repetition, phrase repetition, prolongation, and no stutter | The University College London’s Archive of Stuttered Speech (UCLASS) is a widely used dataset in stuttering research. It includes monologs, conversations, and readings, totaling 457 audio recordings. Although small, UCLASS is offered in two releases by UCL’s Department of Psychology and Language Sciences. Notably, UCLASS3 release 1 contains 138 monolog samples, namely 120 and 18 from male and female participants, respectively, from 81 individuals who stutter, aged 5–47 years. Conversely, release 2 contains a total of 318 monologs, reading, and conversation samples from 160 speakers suffering from stuttering, aged 5–20 years, with samples from 279 male and 39 female participants. Transcriptions, including orthographic versions, are available for some recordings, making them suitable for stutter labeling. |
VoxCeleb (2017) [15] |
The dataset does not have classes in the traditional sense, as it is more focused on identifying and verifying individual speakers | It is developed by the VGG, Department of Engineering Science, University of Oxford, UK. It is a large-scale dataset designed for speaker-recognition and verification tasks. It contains a vast collection of speech segments extracted from celebrity interviews, talk shows, and online videos. This dataset covers a diverse set of speakers and is widely employed in research that is related to speaker recognition, speaker diarization, and voice biometrics. |
SEP-28k (2021) [12] |
Prolongations, repetitions, blocks, interjections, and instances of fluent speech | Comprising a total of 28,177 samples, the SEP-28k dataset stands as the first publicly available annotated dataset to include stuttering labels. These labels encompass various disfluencies, such as prolongations, repetitions, blocks, interjections, and instances of fluent speech without disfluencies. Alongside these, the dataset covers nondisfluent labels such as natural pauses, unintelligible speech, uncertain segments, periods of no speech, poor audio quality, and even musical content. |
FluencyBank (2021) [13] |
Individuals who stutter (IWS) and individuals who do not stutter (IWN) | The FluencyBank dataset is a collection of audio recordings of people who stutter. It was created by researchers from the United States and Canada and contains over 1000 h of recordings from 300 speakers. The dataset is divided into two parts, namely research and teaching. The research data are password-protected, and the teaching data are open-access. The teaching data include audio recordings of 10 speakers who stutter, transcripts, and annotations of stuttering disfluencies. The dataset is valuable for researchers and clinicians studying stuttering. |
LibriStutter (2021) [14] |
Sound, word, and phrase repetitions; prolongations; and interjections. | The LibriStutter dataset is a corpus of audio recordings of speech with synthesized stutters. It was created by the Speech and Language Processing group at Queen’s University in Canada. The dataset contains 100 h of audio recordings of 10 speakers, each of whom stutters differently. The stutters were synthesized via a technique known as the hidden Markov model. It is a valuable resource for researchers who are developing automatic speech-recognition (ASR) systems for people who stutter. The dataset can also be used to train models for detecting and classifying different types of stutters. |