PhonBank Portuguese Pereira/Freitas Corpus

PhonBank Portuguese Pereira/Freitas Corpus

Rodrigo Pereira
University of Lisbon

Maria João Freitas
University of Lisbon


Participants: 6
Type of Study: naturalistic
Location: Portugal
Media type: audio
DOI: doi:10.21415/T50P5V

Browsable transcripts

Phon data

CHAT data

Link to media folder

Citation information

Freitas, Maria João (1997). Aquisição da Estrutura Silábica do Português Europeu. Ma, Universidade de Lisboa.

In accordance with TalkBank rules, any use of data from this corpus must be accompanied by at least one of the above references.

Project Description

NameAge RangeSessionsSex
Laura2;02.29 – 3;03.1012F
Luís1;09.29 – 2;11.0212M
Marta1;02.00 – 2;02.1712F
Pedro2;07.00 – 3;07.2412M
Raquel1;10.02 – 2;10.0811F

The recordings were made in the child’s home, normally in the child’s bedroom. Naturalistic data were collected longitudinally: each child was videotaped for a period of one year. Each session lasted from 30 to 60 minutes. The recordings were made using a Sony Handycam video 8, AF Hi-FI Stereo.

The data collection was supported by Fundação para Ciência e a Tecnologia (research project PCSH/C/LIN/524/93). This new version of the corpus was all supported by Fundação para Ciência e a Tecnologia (UID/LIN/00214/2019). The updated version (with João and Pedro) in November 2020 was supported by Fundação para Ciência e a Tecnologia (UIDB/00214/2020).

The data were manually entered into the Phon application. Orthographic and phonetic transcriptions were made of the target and children's actual forms. Transcriptions were performed by a native speaker of European Portuguese highly trained in phonetic transcription. All problematic transcriptions were noted and listened to by another judge, also highly trained in the task. Criteria adopted during the data editing process are listed below. In research contexts, and due the nature of the transcription task, we advise users to carefully review the selected files.

The data files in this corpus are part of the Acquisition of European Portuguese Databank (AcEP – databank), in CLUL's research group Grammar & Resources. Only audio files and orthographic/phonetic transcriptions are public and available online. Access to video recordings is not allowed (for any further details on this issue, please contact the AcEP- Acquisition of European Portuguese Databank Project: Additional longitudinal data on the acquisition of European Portuguese is available at:

It is important to note that the first 8 sessions of “João” are not segmented or transcribed, only the audio will be provided, since the child did not produce almost any speech. These sessions may be of interest for those who aim to study early language acquisition (including babbling).

Criteria used for the corpus editing. Overall, the transcriptions were organized in terms of orthographical words, not phonological words. The external sandhi rules and its own division are described bellow. However, some words, those that orthographically have a hyphen/dash, e.g. “foi-se” [ˈfojsɨ], pronominalized verb forms with clitics, were transcribed in the same transcription group in square brackets in the orthography tier. When the speech was unintelligible or contained extralinguistic sounds such mumbling, singing, screaming, etc. the top tier (orthography) has the mark [xxx] and the other two (IPA target and actual) have [*].

In the notes tier, some abbreviations or indications were used, such as:

Any other notes, however diverse they may be, are fully described in English in the note tier in PHON. One of the most commons was: Incomplete word after initial clue – which means that the adult provides the first two or three syllables of the target word, and the child just completes what’s missing from that word. Some choices had to be done for the cases of external sandhi (across word boundaries), and a few rules apply: