VOICE, VIDEO & OTHER SOUND STIMULI

In no particular order....

(please note that you will probably need to seek permission to use these resources)

Voice databases

The Speech Accent Archive: In addition to voice samples that can be downloaded and chopped up there are also additional web links

A collection of useful voice related papers and voice stimuli can be found here:

CURRENTLY NOT AVAILABLE (but if you find it let me know) :Montreal Affective Voices (MAV) Audio Collection. This has been billed at the voice equivenent of the Ekman faces. It contains 90 NON-VERBAL emotional sounds (anger, disgust, fear, pain, sadness, surprise, happiness and pleasure and some neutral expressions). The contain 10 voices – 5 male and 5 female.

CITE: Belin, P., Fillion-Bilodeau, S., & Gosselin, F. (2008). The Montreal Affective Voices: a validated set of nonverbal affect bursts for research on auditory affective processing. Behavior research methods, 40(2), 531-539.

UCL Speaker Database. This is a database of recordings that was originally developed for a Welcome Trust funded project (perception of speaker variability in children and adults). It contains spoken recordings of 45 speakers (with a South-Eastern British English accent) and has been made available for researchers exploring speaker style and variability.
Emotional speech sets (multiple languages and other uses):
- Estonian Emotional Speech Corpus
- SEMAINE Database
- SAVEE Database
- Acted Emotional Speech Dynamic Dataset (AESDD) - Publicly available (Greek speakers)

APODI-database of emotional stimuli sets: The data base contains lots of free to use searchable stimuli sets . Don’t for get to quote this paper if you find anything useful that you wan to use.
- Diconne, K., Kountouriotis, G. K., Paltoglou, A. E., Parker, A., & Hostler, T. J. (2022). Presenting KAPODI – The Searchable Database of Emotional Stimuli Sets. Emotion Review, 14(1), 84-95. https://doi.org/10.1177/17540739211072803

Massive Auditory Lexical Decision (MALD) database:
Brought to you by the University of Alberta: The MALD database “is an end-to-end, freely available auditory and production data set for speech and psycholinguistic research, providing time-aligned stimulus recordings for over 26,000 words and 9,500 pseudowords, and response data for auditory lexical decisions. The data set is meant to make it easy to explore, build and test theories, and compare a wide range of models.”

Auditory tests:

The Profile of Music Perception Skills test - this is an online test to assess musical perception (across a range of aspects/facets) and vary in length and design.

Glasgow Voice Memory Test (GVMT) - this test includes both voices (vowels) and none voice sounds (bells). It uses an old/new design It takes about 5mins to administer. British voices. You will need to contact the author for access to the test.
The Bangor Voice Matching Test - this is same/different identity tasks using pairs of voices (sustained vowels, vowel-consent combinations (CVC;VCV) and and spoken paragraphs). British male and female voices. You will need to contact the author for access to the test.

DYNAMIC AUDIO VIDEO STIMULI

(There’s not a lot of this stuff around and this is free)

The GRID audiovisual sentence corpus. This is a collection of HQ video and auditory stimuli (multispeaker). It is free to download (but the files are quite big). Information and access can be found here. http://spandh.dcs.shef.ac.uk/gridcorpus/

The VidTIMIT Audio-Video Dataset. These are video and audio recordings of individuals (N=43) reciting fairly short sentences. They are idea for a variety of research questions involving person identification or voice /face processing etc.

The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS OR THE ENHANCED RADVESS)
- Courtesy of Livingstone & Russo
  - ORIGINAL The RAVDESS (open access database) contains 7356 files. Each file was rated 10 times on emotional validity, intensity, and genuineness. Ratings were provided by 247 individuals who were characteristic of untrained adult research participants from North America. A further set of 72 participants provided test-retest data. High levels of emotional validity, interrater reliability, and test-retest intrarater reliability were reported.
- The construction and validation of the RAVDESS is described in:

CITE: Livingstone SR, Russo FA (2018) The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English. PLoS ONE 13(5): e0196391. https://doi.org/10.1371/journal.pone.0196391.

ENHANCED RAVDESS Speech Dataset (open access database). This is a modified version of the speech audio contained within the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) dataset.

The construction and validation of the RAVDESS is described in:

CITE: Su, Jiaqi, Zeyu Jin, and Adam Finkelstein. "HiFi-GAN: High-fidelity denoising and dereverberation based on speech deep features in adversarial networks." Proc. Interspeech. October 2020.

The Geneva Faces and Voices (GEFAV) database “The GEFAV has been developed and is distributed by the Swiss Center for Affective Sciences at the University of Geneva, Switzerland. It is a collection of European faces and voices of 111 individuals, including 61 women and 50 men, aged 18-35 years old. For each individual, we provide three kinds of facial stimuli (static neutral, static smiling and dynamic neutral) and two kinds of vocal stimuli (a three-vowel sequence /i/-/a/-/o/ and a sentence in French: “Bonjour. Il est deux heures moins dix”). The facial and vocal stimuli are available for download.”

Cite: Ferdenzi C, Delplanque S, Mehu-Blantar I, Da Paz Cabral KM, Domingos Felicio M, Sander D. (in press). The GEneva Faces And Voices (GEFAV) database. Behavior Research Methods.

The Dynamic Variability in Speech (DyVis) database comes from an ESCR funded project. It is a forensic phonetic study of British English people. It contains 100 male speakers (Standard Southern British English), aged 18-25, spoken in a variety of styles.

Cite: Nolan, F. (2011). Dynamic Variability in Speech: a Forensic Phonetic Study of British English, 2006-2007. [data collection]. UK Data Service. SN: 6790, http://doi.org/10.5255/UKDA-SN-6790-1

OTHER sounds stuff

Sound effects archive from the BBC (British Broadcasting Corporation)
Sound Archive from the British Library
VoiceLab v0.2 Reproducible Automated Voice Analysis for beginners and experts.
Create your very own sine waves and download them as .wav files. Courtesy of Audio Check (please give generously for using the function: these thinsg are not free to the creator).
Two very useful references with information about voice characteristics can be found here (HAL.INRA) and here (DEA.BRUNEL)
Find sounds: a searchable web site for free auditory stimuli (e.g. bells, whistles, barks etc). It is from copyright.