Researchers have constructed several phonological-coded derived corpora based on segments of the PhonBank and CHILDES databases.
Derived Corpora
- French phonologized IDS Maria Julia Carbajal, Camillia Bouchon, Emmanuel Dupoux and
Sharon Peperkamp contributed this corpus of
phonologized French infant-directed speech based on eight French CHILDES corpora,
and taking into account several French phonological rules.
It was created using a combination of a phonological dictionary (Lexique 3.80) and a script designed to apply these rules.
- Hungarian-Italian IDS: Judit Gervain contributed this phonological transcription of
the Infant-Directed Speech in the Hungarian and Italian segments of CHILDES.
- English MPMC corpus: Lise Menn, Yvan Rose, and Ann Peters contributed this corpus derived
from the data in the Menn corpus. The Menn Phonetic Mini Corpus
(MPMC) consists of the first set of files from a new version of
the original Menn corpus currently being re-transcribed using
CLAN and Phon by Lise Menn, in collaboration with Ann Peters and
Yvan Rose. The MPMC is the subject of this report:
Menn, Lise, Ann M. Peters & Yvan Rose. (2021). The Menn Phonetic
Mini-Corpus: Articulatory Gestures as Precursors to the Emergence of
Segments. (Ed.) Brian MacWhinney, Vera Kempe, Ping Li & Patricia J.
Brookes. Frontiers in Psychology: Special issue on Emergentist
Approaches to Language.
- Polish IDS: Luc Borota contributed this corpus of phonological transcriptions of
the Infant-Directed Speech in the Polish segments of CHILDES.
- PERCEPT-GFTA corpus version 2.2.2 in Python-ready format.
- PERCEPT-R corpus version 2.2.2 in Python-ready format.