Paidologos
Project
| Jan Edwards Hearing and Speech Sciences University of Maryland edwards@umd.edu website
| | Mary Beckman Linguistics Ohio
State beckman.2@osu.edu website |
Citation Information
Edwards, J. & Beckman, M. E. 2008.
Methodological question in studying consonant acquisition. Clinical
Linguistics and Phonetics 22(12):939-958.
Edwards, J. & Beckman, M.
E. 2008. Some cross-linguistic evidence for modulation of implicational
universals by language-specific frequency effects in the acquisition of
consonant phonemes. Language Learning and Development 4(2):122-156.
In accordance with CHILDES rules, any use of data from this corpus
must be accompanied by at least one of the above references.
General Information
The paidologos project is a large,
cross-linguistic investigation of phonological development. The project was
inspired by the fact that words in different languages are made up of
different sounds and sound sequences, and sounds that are common in some
languages are rare in others. The focus of the paidologos project is to
examine how these differences affect how children learn to speak
different languages.
The name “paidologos” is a new word that we made up from Greek roots
meaning “child” and “word,” in order to capture our idea of looking at
children’s words in parallel across different languages. To do that, we
traveled to day care centers and children’s homes in Hong Kong, eastern
Japan, northern Greece, the far northeast of China, South Korea, and
central Ohio to record the speech of two- to five-year-old children
learning to speak either Cantonese, Japanese, Greek, Mandarin Chinese,
Korean, or English. The corpora that are made available here include
child and adult speech of Cantonese, Japanese, Greek, and English.
Participant Information
For each language, there are 10 males
and 10 females in each age group (2, 3, 4, and 5-year olds, plus 20
adults). The children all come from middle socioeconomic status
families. The adults all self-reported normal speech and hearing. All
children were tested with a hearing screening (pure tone screening at 25
dB HL for 500, 1000, 2000, and 4000 Hz or otoacoustic emissions at 2000,
3000, 4000, and 5000 Hz), and norm-referenced measures of expressive
vocabulary (Williams 1997), receptive vocabulary (Brownell 2000), and
articulatory accuracy (Goldman & Fristoe 2000). Any child who did not
pass the hearing screening in at least one ear or who scored more than
one standard deviation below the mean on the norm-referenced measures
was excluded from the study.
Recording & Transcription Procedure
For each language, the
recorded stimuli used as elicitation prompts were spoken by an adult
female native speaker, in a child-directed manner, and digitally
recorded and presented at a sampling rate of 22,500 Hz.
Each trial consisted of a picture and the associated sound file,
which were presented simultaneously to the participant over a laptop
with a 14-inch screen. The computer program included an on-screen VU
meter to help the children monitor their volume and a picture of an
animal (duck or frog or koala bear) walking up a ladder on the left side
of the screen to provide visual feedback to the children about how close
they were to completing the task. The children were instructed to repeat
each word exactly as they heard it.
The first audible response to each prompt was transcribed in a Praat
TextGrid. A native speaker who is also a trained phonetician listened to
the response and examined the acoustic waveform for each repetition for
transcription. A second native-speaking, trained phonetician blindly
re-transcribed 20 percent of the data for each language.
The children's recordings took place in a quiet room at one or more
preschools in each of the four countries. The adult speakers of English
were recorded in a sound booth at the Ohio State University using the
same protocol and equipment. The adult speakers of Japanese were
recorded in a quiet room at Daito Bunka University, using the same
protocol and recording equipment. The adult speakers of Greek were
recorded in a quiet room at their home, or in the home of the
interviewer, using the same protocol and recording equipment. The adult
speakers of Cantonese were recorded in a quiet room in the Chinese
University of Hong Kong using the same protocol and recording equipment.
Naming Conventions
Each file is meaningfully named with a
sequence that indicates language, age, sex, list, and participant ID, as
follows:
1) – The initial character indexes the language, with "c"
for Cantonese, "e" for English, "g" for Greek, and "j" for Japanese.
2:3) – The next two characters give the child's age, within a
6-month band. The number represents years, while the letters "a" and "b"
indicate whether they are in the first or second half of that year,
respectively. That is, for example, the child whose productions are in
files c2at01fw_canw211 and c2at01fw_canw212 is a "young" 2-year-old,
aged between 2;0 and 2;5, whereas the child whose productions are in
files e5bt25fw_enrw111 and e5bt25fw_enrw112 is an "old" 5-year-old, aged
between 5;6 and 5;12. While finer divisions of the children's ages than
these 6-month bands are not permitted, it may be useful to know that the
two digits after the "t" are the identification numbers for the 20 or 21
children within each year band, and these numbers are ordered by age, so
that c2at01fw is the youngest Cantonese-speaking two-year-old and
e5bt25fw is the oldest English-speaking five-year-old. In the case of
the adults, a "g" will replace the “a” or “b”.
4) – The fourth
character – “t” – stands for typically developing, and is constant
across these files because there are no developmentally atypical
participants.
5:6) – The following two-digit identification number
indicates the individual speaker in each age group, in order from
youngest to oldest.
7) – The seventh character, either "f" or "m",
after the child's identification number encode the child's gender.
8) – The underscore signifies that everything following it indicates
language, session, and list.
9:10) – These characters are a
two-letter language code (ca- Cantonese, gr- Greek, en- English, ja-
Japanese).
11:12) These stand for either “nw” – nonword or “rw” –
real word (only the real words are included in these corpora).
13:14) This is the list number.
15) This is the block number (1
or 2).
There are two sets of files for each child, for the two lists of
words that were presented for repetition. The word lists were designed
to elicit the three words starting with each target consonant-vowel
sequence in such a way that the different tokens that were stimuli were
distributed across lists and children, so that no child was presented
with all three words in the same block and each of the stimulus words
for a sequence was taken from a different position in the series that
the experimenter had recorded.
Acknowledgements
This project was supported by National Institute for Deafness and Other Communication
Disorders (NIDCD) funding between 2003 and 2010.