PhonBank Spanish-Galician Koine Corpus

Milagros Fernández Pérez
Literatura española
Universidad de Santiago de Compostela


Participants: 54
Type of Study: naturalistic
Location: USA
Media type: audio
DOI: doi:10.21415/T5SW39

Browsable transcripts

Phon data

CHAT data

Link to media folder

Citation information

ENRÍQUEZ MARTÍNEZ, Iván (2015). La adquisición de las construcciones complejas: de la interacción a la gramática. PhD Thesis. Universidade de Santiago de Compostela.

PEREIRA PINHEIRO, Wanessa Raquel (2015). Estudo comparativo da aquisiçao das fricativas nas línguas portuguesa e española. PhD Thesis, Universidade de Santiago de Compostela.

CORTIÑAS ANSOAR, Soraya (2014). Deixis y referencia en el habla infantil. Presentadores espaciales y temporales en el corpus Koiné. PhD Thesis, Universidade de Santiago de Compostela.

FERNÁNDEZ PÉREZ, Milagros (2015). "Lenguaje infantil y medidas de desarrollo", ENSAYOS (monográfico sobre Educación infantil), 30/2, págs. 53-69.

FERNÁNDEZ PÉREZ, Milagros & FERNÁNDEZ LÓPEZ, Isabel (2014). "Sistemas de valoración del habla infantil". Capítulo 3 de Lingüística y déficit comunicativos. ¿Cómo abordar las disfunciones verbales?, Síntesis, Madrid, 2014, 99-138.

PREGO VÁZQUEZ, Gabriela & Isabel FERNÁNDEZ LÓPEZ (2014). "Exploración lingüística del habla infantil", en M. Fernández Pérez (coord.), Lingüística y déficit comunicativos, Madrid, Editorial Síntesis, 45-101.

FERNÁNDEZ LÓPEZ, Isabel & Pablo CANO LÓPEZ (2011). “Apuntes sobre la génesis y evolución de las construcciones biargumentales en el lenguaje infantil: un estudio basado en materiales del corpus Koiné”, Revista de Investigación Lingüística, 14, págs.35-59.

FERNÁNDEZ LÓPEZ, Isabel & Pablo CANO LÓPEZ (2011). "Contribución al estudio del desarrollo fonético-fonológico infantil. Procesos fonológicos comunes en niños castellanohablantes de 2 a 4 años", en M. Fernández Pérez (coord.), Lingüística de corpus y adquisición de la lengua, Madrid, Arco Libros, 37-86.

FERNÁNDEZ PÉREZ, Milagros (2011). "El corpus koiné de habla infantil. Líneas maestras", capítulo 1 de Lingüística de corpus y adquisición de la lengua, Madrid, Arco Libros, 2011, 11-36.

FERNÁNDEZ PÉREZ, Milagros, MADRID CÁNOVAS, Sonia & Soraya CORTIÑAS ANSOAR (2011). "Recapitulando: líneas evolutivas de desarrollo en el corpus koiné", en M. Fernández Pérez (coord.) Lingüística de corpus y adquisición de la lengua, Madrid, Arco Libros, 205-234.

GONZÁLEZ PEREIRA, Miguel, FERNÁNDEZ LÓPEZ, Isabel & Pablo CANO LÓPEZ (2011). "Características construccionales en el corpus Koiné y emergencia de la gramática", en M. Fernández Pérez (coord.), Lingüística de corpus y adquisición de la lengua, Madrid, Arco Libros, 87-148.

PREGO VÁZQUEZ, Gabriela, SOUTO GÓMEZ, Montserrat & Beatriz DIESTE QUIROGA (2011). "El desarrollo pragmático: intenciones y acción comunicativa en edad temprana", en M. Fernández Pérez (coord.) Lingüística de corpus y adquisición de la lengua, Madrid, Arco Libros, 149-204.

In accordance with TalkBank rules, any use of data from this corpus must be accompanied by at least one of the above references.

Project Description

The Koiné corpus of child speech has been updated with regard to the labelling of its characteristics since it became part of CHILDES in 2006. It has incorporated in its 2016 and latest version codifications of phonic processes with a substantial output in the emergency of idiomatic sounds. The dynamics included are the following: substitution, omission, addition, front, voicing, fricative, and multiple processes.

Furthermore, sound files have been added in order to show the original conversational samples of the catalogue and it is also planned to have access to a file of productions delivered by 13 children shortly (7 will be in Galician, coming both from boys: ART, BRE, DRI, UEL, XUN; and girls: LAU, RAQ; and 6 in Galician Spanish, also coming from boys: IAG, RIC, JOR, and girls: CEC, ANP, NER. All of them have a high longitudinal presence in the corpus) managed in PHONBANK.

In the previous 2014 version we had used labels to identify the speaking turns, as well as the codifications in the %MOR line (for word properties such as: number of syllables of the infinitive form, number of syllables of the verbal form, verbal tense, person and number, type of conjugation; and characteristics in the complex structure such as: number of constituents; type of constituents/ arguments).

In its initial form, the koiné corpus had been defined in a basic level of labelling that encoded certain peculiarities in child speech in each component (phonic, morphologic, syntactic, lexical and pragmatic), elemental genuine singularities such as: agreement, ellipsis, word formation, idiosyncratic constructions, bilingualism. All cases showing systematic particularities to a certain degree have been marked on a first level, and it has also been pointed out the component they are part of. The label used is %par, followed by the indicator of component $PRA, $PHON, $MOR, $SYN. On a second level, we have specified the morphological and relational properties that show the progress reached in different areas. There are markers to point out phenomena of verbal inflection $VER, nominal inflection $NOM, and word formation $WFO. The area devoted to constructional development is shown in labels such as $CON, genuine construction (either because of its structure, because of its constituents, because of its peculiar reaction or because of its arguments); $AGR, concordance; $PRES, referencing and signalling; and $ELL, ellipsis. There are interesting communicative characteristics in contexts where two or more languages cohabitate, such as the combination of elements of more that one code, which is highlighted through the label $MIX, that functions as an indicator of an “idiomatic mix” of some sort.

Phonological Process %xphon codes

Substitution $SUS
Omission $OMI
Addition $ADI
Fronting $FRON
Voicing $SON
Fricative $FRIC
Multiple Processes $PML

Error %xpar codes

$PHO phonological error
$NOU noun gender or number error
$VER verb error
$WFO word formation error
$CON case error or omission
$AGR agreement error
$PRE error of NP reference
$ELL erroneous ellipsis
$MIX errors through language mixing

Morphological %xmor codes

$1SIL, $2SIL number of syllables in the infinitive
$PAS, $PRES etc tense codes
$1S etc person codes
$1C $2C $3C conjugation
$ARG1 etc. presence of constituent
$ARG1'v type of argument structure

The Koiné database is based on a sample of 71 children (34 boys and 37 girls), from 18 to 53 months old. The communicative materials were taken from the interactions between the participants in the educative environment, more precisely in Galician urban and semi-urban Kindergarden. The acquisitive dynamics are illustrative of the development processes of the Galician language (there are 7 children using this language) and of the variety of the Spanish spoken in Galicia.

Breogán Pública A Coruña Santiago de Compostela Middle class, university area
Vite Pública A Coruña Santiago de Compostela Working class area
Sta.Susana Pública A Coruña Santiago de Compostela Working class
Milagrosa Pública Lugo Lugo Working class
Elfos Privada Pontevedra A Estrada Middle class
Table 1. Social Environment

The collection of communicative interactions underwent between 1996-2000 in sessions of 15-20 minutes each and with a periodicity of one each two weeks. The context was one of natural and spontaneous conversation. Although it is not a broad catalogue, the longitudinal nature of the characteristics that shows guarantees its representative value. The participants have a substantial presence in the evolutionary sample. Picture 3 illustrates the monitoring of children for a period longer than 12 months. Picture 4 shows the continuity of assisted children between the ages of 6 and 11 months. Picture 5 displays the participants that appear during 6 months.

The communicative interactions containing the compiled materials of the corpus offer a distribution in age groups in the following proportions:

The unquestionable importance of the child speech Koiné catalogue lies on three main aspects that have to do with its novelty and usefulness. Firstly, its quantitative and configurational properties stand out in the koiné corpus, making it a unique source that provides the genuine characteristics of the process of acquisition of the Spanish language in its earlier stages. Secondly, the qualitative abundance of the data is remarkable especially if we focus on its pragmatic characteristics. And, thirdly, it proves in a firm and definite way the possibilities of every evolutionary tendency of being used and applied from a practical perspective with the aim of observing and commenting specific cases in the development of the language. In short, the relevance of the koiné database lays on:

(a) It is a broad enough sample to notice tendencies that serve as a referent in the early stages of:

(b) It is a display that stresses interesting pragmatic indicators such as: (c) It offers data to compare and calculate the communicative efficiency of the language used by children in a specific age group, facilitating the evaluation and contrast in particular cases.

Notes about time intervals and session dating: In the majority of records for which there are no phonetic transitions, the time intervals have been automatically assigned in order to preserve the chronological ordering of the utterances. These time intervals should be systematically revised prior to any work expanding on the orthographic transcriptions.

For anonymity purposes, and in order to preserve the children’s ages at each recording session, we removed comments containing specific dates. All sessions were also assigned a fictive date, set to 1997-01-01.