Geospatial experiments with MuLaR

The project brings together Rezekne Academy of Technologies and Riga Technical University, two Latvian institutions with expertise in interdisciplinary research and digital humanities. Its overarching aim is to advance digital resources and tools that support linguistic and cultural research. Central to the project is the Corpus of Contemporary Latgalian Speech (MuLaR), a collection of transcribed audio recordings of spoken Latgalian registered in the CLARIN-LV repository, which serves as a foundation for multifaceted linguistic and cultural analysis and for strengthening institutional and individual research capacity through interdisciplinary collaboration.

The project focuses on expanding and enriching the MuLaR corpus and developing a prototype spatiotemporal model that enables complex analysis of spoken Latgalian data by integrating spatial, linguistic, and contextual information. This model supports advanced visualisation, multimodal narrative analysis, and integration with existing digital humanities infrastructures, thereby broadening the corpus’s usability for research and education. Methodologically, the project draws on established practices in digital humanities and geospatial modelling design to address the challenges of representing nuanced spatial references in language and contemporary culture.

About the Corpus

MuLaR is a Latgalian spoken language corpus created by researchers of Rēzekne Academy of Technologies (RTA).

The name MuLaR is derived from MuLa, the Corpus of Modern Latgalian Texts (Mūsdienu latgaliešu tekstu korpuss), which is compiled from written resources here.

The R in MuLaR stands for runa ‘speech’.

The corpus includes audio recordings and their transcripts.

Construction periods: 2021-2022, 2022-2024.

Length: 26:30 hours.

MuLaR includes:

  1. Interviews conducted during field work in Aglona (2021);
  2. Recordings from earlier field expeditions in Latgale organized by RTA (2009 - 2021);
  3. Audio recordings of TV (Latgales Reģionālā televīzija) and radio (Latvijas Radio) broadcasts (2018 - 2023);
  4. Speech recordings of Latgalians living in Siberia (2017-2018), see also: here.
  5. Latgalian audio recordings and transcriptions of the TriMCo Dialect Corpus project.

The transcriptions of the audio recordings were made using the ELAN software; an orthographic transcription was used that preserves dialect features.

Each audio recording is accompanied by metadata: the location and time of the recording, the duration of the audio segment, the speaker's gender and age.

This work has been supported by research and development grant No. RTU-PA-2024/1-0063 under the EU Recovery and Resilience Facility funded project No. 5.2.1.1.i.0/2/24/I/CFLA/003 “Implementation of consolidation and management changes at Riga Technical University, Liepaja University, Rezekne Academy of Technology, Latvian Maritime Academy and Liepaja Maritime College for the progress towards excellence in higher education, science, and innovation”.

Prototype of Linguistic Model

Linguistic Map #1

This map shows the localities in Latgale mentioned in the MuLaR corpus, as well as Riga. The objects on the map are labeled with their official Latgalian names. By zooming in and clicking on an object dot, users can see the corresponding Latvian name, administrative unit, and the overall frequency with which the geographical object is mentioned in the corpus.

Linguistic Map #2

This map shows the overall frequency with which Latgalian localities (and Riga) were mentioned in the corpus, including name variants and their respective frequencies. By zooming in and clicking on an object dot, users can see the different name variants associated with a locality and see how often each variant appears in the corpus, as well as the official toponym frequency and overall toponym frequency in the corpus

Linguistic Map #3

This heatmap visualizes the overall frequency of place mentions across all toponym variants in the MuLaR corpus for the geographical points mentioned. The intensity of the blue-to-pink gradient indicates how frequently each location is referenced in the corpus—darker blues represent lower frequencies while brighter pinks represent the highest frequency mentions. This allows users to quickly identify the most talked-about locations in Latgale across all recorded speech.

Prototype of Cultural (Transmedia Communication) Model

Transmedia Communication Model #1

This map shows the localities in Latgale mentioned in the MuLaR corpus, as well as Riga. The objects on the map are labeled with their official Latgalian names. By zooming in and clicking on an object dot, users can see the corresponding Latvian name, administrative unit, and the overall frequency with which the geographical object is mentioned in the corpus.

Transmedia Communication Model #2

This map displays towns from the MuLaR corpus that have corresponding Latgalian-language hashtags on Instagram. Only locations with active Instagram presence using Latgalian hashtags are shown, reflecting the transmedia dimension of contemporary Latgalian discourse and digital engagement with place names on social media platforms.

  • Open Map in Popup