PhonBank Polish Weist-Jarosz Corpus
|
Richard Weist
Department of Psychology
SUNY Fredonia
weist@a12t.cc.fredonia.edu
website
|
|
Gaja Jarosz
Department of Linguistics
UMass Amherst
jarosz@linguist.umass.edu
|
Participants: | 4 |
Type of Study: | naturalistic, longitudinal |
Location: | Poland |
Media type: | audio |
DOI: | doi:10.21415/T51974 |
Browsable transcripts
Phon data
CHAT data
Link to media folder
Citation information
- Weist, Richard, & Witkowska-Stadnik, Katarzyna. (1986). Basic
relations in child language and the word order myth. International
Journal of Psychology, 21, 363–381.
- Weist, Richard, Wysocka, Hanna, Witkowska-Stadnik, Katarzyna,
Buczowska, Ewa, & Konieczna, Emilia (1984). The defective tense
hypothesis: On the emergence of tense and aspect in child Polish.
Journal of Child Language, 11, 347–374.
- Jarosz, Gaja (2010). Implicational markedness and frequency in
constraint-based computational models of phonological learning. Journal
of Child Language. Special Issue on Computational Models of Child
Language Learning 37(3). Cambridge University Press. 565-606.
- Jarosz, Gaja, Calamaro, Shira, and Zentz, Jason (2017).
Input Frequency and Inductive Bias in the Acquisition of Syllable
Structure in Polish. Manuscript, Linguistics Department, Yale University.
In accordance with TalkBank rules, any use of data from this corpus must be accompanied by at least one of the above references.
In this case, it would be good to cite one from Weist and one from Jarosz.
Project Description
Participant Name | Age Range | Sessions | Sex
Bartosz | 1;7-1;11 | 6 | M
| Kubuś | 2;1-2;6 | 7 | M
| Marta | 1;7-1;10 | 6 (3 audio) | F
| Wawrzon | 2;2-3;2 | 20 (19 audio) | M
| |
All of the children were from middle-class families raised in the urban environment of Poznań, Poland. In general, their parents were highly educated. The children were recorded in their homes (typically an apartment) by two experimenters. One of the experimenters carried a small bag containing the tape recorder and the other took context notes, which were integrated during transcription.
Phonetic Transcription Description
The children’s productions were transcribed using broad phonetic transcription with the help of the open-source Phon software (Rose et al. 2006). The orthographic transcripts were used as the basis for creating phonetic transcriptions of the children’s target pronunciations, and the audio recordings were used to phonetically transcribe the children’s actual productions and align them with the target transcriptions word by word. The transcription of all child productions was first performed independently by two transcribers trained in phonetic transcription, at least one of whom was a native speaker of Polish. Then, two Polish speakers trained in phonetic transcription worked together to create a consensus transcription of all productions, relying on a third phonetically trained native speaker of Polish to adjudicate in cases when agreement could not be reached. The resulting corpus includes phonetic transcriptions of the children’s productions in all the available audio files, providing word-by-word alignment of target pronunciations and actual pronunciations.
Transcription Conventions
Boundaries: We use word groups to delineate phonological word
boundaries. In all cases except one, orthographic word boundaries
correspond to phonological word boundaries. The only exception is the
proclitics 'w' [v]/[f] and 'z' [z]/[s] which attach to the following
word and cannot be pronounced independently. In this case, the
orthography tier encodes the orthographic word boundaries, putting the
proclitic in its own word group, while the IPA Target and IPA Actual
tiers encode the proclitic together with the next word. So for example.
'[z][kotem]' would be '[][skotɛm]' on the Target tier and potentially
something like '[][sotɛm]' on the Actual tier.
Tier Conventions: We have maintained many of the conventions
from the original CHAT transcripts and introduced several codes to
denote special situations regarding phonetic transcription.
The following codes were used on the orthography tier:
- @c a child-specific form
- @n neologism
- @o - onomatopoeia
- @f - family specific form
- @q (for 'quote') for things the child is reciting from memory or by repetition
- @i - interjection
- @wp - whispered
- A comment (error:) after a word indicates a morphological or syntactic error
- A comment (t:trail off) is used to indicate an incomplete word or utterance
- A comment (++) at the beginning of an utterance means this is a completion of an adult's prompt
- Angle brackets < > around a word portion indicate this portion of the word was not uttered and is not present on the IPA tiers
- 0 at the beginning of a word group indicates unpronounced words
- [yyy] as a word group indicates material for which a target could not be identified but which is transcribed
- [xxx] as a word group indicates material for which a target could not be identified and which could not be transcribed
IPA Conventions
- For the most part each word group is just a sequence of individual
IPA symbols that can be treated literally.
- One exception is that we've used ligatures for affricates for
convenience and to make sure the affricates were consistently
differentiated from stop-fricative sequences (which are contrastive in
Polish).
- We used the postalveolar affricate ligatures for convenience, but
these are actually usually transcribed as retroflex and belong with the
retroflex fricative series, which we transcribe as such.
- Our level of transcription is relatively broad and pretty standard
for Polish, but it does encode some non-contrastive phonetic
characteristics of the targets and actuals. In particular:
- nasal vowels are transcribed on the target tiers according to their
standard pronunciation by context (as vowel+nasal stop homorganic with
the following noncontinuant consonant and as vowel+nasalized glide
otherwise)
- we paid special attention to voicing of obstruents, and target
transcriptions account for word-final devoicing and voicing assimilation
(including across word boundaries)
- due to the longer phonetic length of the palatal portion of
palatalized labials (e.g. 'piesek') we coded these palatals as
labial-palatal sequences (e.g. [pjɛsɛk]), and we coded the palatalized
velars (e.g. 'kiedy') using secondary palatalization (e.g. [kʲɛdɨ]) in
the standard pronunciation and when children produced them as
adult-like. These are not contrastive distinctions (there's no contrast
between [Cʲ] and [Cj] in Polish).