About the Corpus
MuLaR is a Latgalian spoken language corpus created by researchers of Rēzekne Academy of Technologies (RTA).
The name MuLaR is derived from MuLa, the Corpus of Modern Latgalian Texts (Mūsdienu latgaliešu tekstu korpuss), which is compiled from written resources here.
The R in MuLaR stands for runa ‘speech’.
The corpus includes audio recordings and their transcripts.
Construction periods: 2021-2022, 2022-2024.
Length: 26:30 hours.
MuLaR includes:
- Interviews conducted during field work in Aglona (2021);
- Recordings from earlier field expeditions in Latgale organized by RTA (2009 - 2021);
- Audio recordings of TV (Latgales Reģionālā televīzija) and radio (Latvijas Radio) broadcasts (2018 - 2023);
- Speech recordings of Latgalians living in Siberia (2017-2018), see also: here.
- Latgalian audio recordings and transcriptions of the TriMCo Dialect Corpus project.
The transcriptions of the audio recordings were made using the ELAN software; an orthographic transcription was used that preserves dialect features.
Each audio recording is accompanied by metadata: the location and time of the recording, the duration of the audio segment, the speaker's gender and age.
This work has been supported by research and development grant No. RTU-PA-2024/1-0063 under the EU Recovery and Resilience Facility funded project No. 5.2.1.1.i.0/2/24/I/CFLA/003 “Implementation of consolidation and management changes at Riga Technical University, Liepaja University, Rezekne Academy of Technology, Latvian Maritime Academy and Liepaja Maritime College for the progress towards excellence in higher education, science, and innovation”.