PhonBank

PhonBank

Derived Corpora

Researchers have constructed several phonological-coded derived corpora based on segments of the PhonBank and CHILDES databases.

French phonologized IDS Maria Julia Carbajal, Camillia Bouchon, Emmanuel Dupoux and Sharon Peperkamp contributed this corpus of phonologized French infant-directed speech based on eight French CHILDES corpora, and taking into account several French phonological rules. It was created using a combination of a phonological dictionary (Lexique 3.80) and a script designed to apply these rules.
Hungarian-Italian IDS: Judit Gervain contributed this phonological transcription of the Infant-Directed Speech in the Hungarian and Italian segments of CHILDES.
English MPMC corpus: Lise Menn, Yvan Rose, and Ann Peters contributed this corpus derived from the data in the Menn corpus. The Menn Phonetic Mini Corpus (MPMC) consists of the first set of files from a new version of the original Menn corpus currently being re-transcribed using CLAN and Phon by Lise Menn, in collaboration with Ann Peters and Yvan Rose. The MPMC is the subject of this report: Menn, Lise, Ann M. Peters & Yvan Rose. (2021). The Menn Phonetic Mini-Corpus: Articulatory Gestures as Precursors to the Emergence of Segments. (Ed.) Brian MacWhinney, Vera Kempe, Ping Li & Patricia J. Brookes. Frontiers in Psychology: Special issue on Emergentist Approaches to Language.
Polish IDS: Luc Borota contributed this corpus of phonological transcriptions of the Infant-Directed Speech in the Polish segments of CHILDES.
PERCEPT-GFTA corpus version 2.2.2 in Python-ready format.
PERCEPT-R corpus version 2.2.2 in Python-ready format.