Estonian and big tech: future-proofing small languages
14 February 2025
In a bold effort to secure the future of its language, Estonia recently shared nearly four billion words of linguistic data with Meta, aiming to integrate the Estonian language into AI models. This move is designed to improve chatbots, voice assistants, and translation tools, ensuring seamless digital experiences for Estonian speakers.
For a country of just over 1.3 million people, preserving its language is not only about technology but also about safeguarding national identity. Small languages - small not being a denominator of worth or importance of a language, simply the size of the population that speaks it - like the Baltic and some Nordic ones (Faroese, Sami, and Greenlandic), face significant struggles, ranging from declining speakers to limited digital resources. As we’ve reported before, platforms like ChatGPT have been struggling to learn small languages, as there is simply not enough data available for the AI to pick up.
Speakers of the small (and rare) languages across Northern Europe, aware of their constant challenge, have taken measures into their own hands. Greenland’s Language Secretariat, for example, has been developing the Kalaallisut (Greenlandic) spellchecker since 2005, which uses pre-defined grammatical and morphological rules to recognise and correct words, rather than relying solely on machine learning models that require large datasets.
Similarly, the Sami Language Centre created Sami Voice, an AI speech recognition system for the northern Sami language, improving digital interactions, such as voice assistants and translation systems, freely available to everyone. In Latvia, the government supported an AI-powered Latvian-speaking chatbot, which goes by the name Signe, to assist users in Latvian.
![]() | Emily Mirelle Vutt and Nikola Veisberga Estonia’s decision to share the language data with Meta has faced criticism, with concerns that reliance on large tech companies could lead to the country losing control over its language and culture. Critics argue that, without clear protections, the global corporations might be the primary beneficiaries, rather than Estonia itself. Experts also agree that AI alone will not solve the challenges facing small languages in the digital age. Both linguists and AI entrepreneurs warn that beyond translation and syntax, AI must capture the unique cultural context of each language. While technology can support the practical use of these languages in digital spaces, true preservation requires integrating cultural nuance and engaging directly with people who speak the languages to maintain linguistic richness and depth. |
Welcome to The European Correspondent
Europe lacks true European media: in Germany alone, there are more media devoted exclusively to football than news outlets specialising on Europe. The established players mainly focus on Brussels and European institutions. The European Correspondent aims to change that. We cover the whole of Europe and write for a community of citizens who want to look beyond their own national borders. Without European journalism, there is no European civil society.
〉Read our manifesto
〉The stories we would like to write for you
Become a donor!
The European Correspondent is fully funded by its readers. We can only produce the newsletter with your support - and work towards the bigger project: building true European media. Donate now!
With your help, we can create true European journalism. Thank you!
We are non-profit. Every donated € goes directly into The European Correspondent.