Latvia ・ Technology

Linguistic diversity in the age of AI

12 July 2024

Why is it difficult for AI language tools to learn Latvian? ChatGPT explains: "There is not enough data and text in Latvian. The Latvian language has a complex grammar and language structure – nine conjugations, different genders, and figures – with multiple dialects and regional differences. It is one of the less common languages in the world making it not a priority for AI models."

The vast majority of Large Language Models (LLM) such as ChatGPT are predominantly trained on English data sets, simply because this is the lingua franca of the Internet. Different major – mostly European – languages such as German, French, and Spanish are also well represented. Other languages, on the other hand, are not, leading to less representation of the worldviews and cultures that these languages carry with them.

Now, Tilde, a Latvian language technology company, has just won the European Commission's Large AI Grand Challenge, which will allow them to develop a foundational LLM for European languages. It will focus particularly on underrepresented European languages, especially Eastern European and Baltic languages. These languages are "poorly covered in the current models", the company stated following their win. They also claim that their model will improve AI applications for more than 155 million Europeans.


Welcome to The European Correspondent

Europe lacks true European media: in Germany alone, there are more media devoted exclusively to football than news outlets specialising on Europe. The established players mainly focus on Brussels and European institutions. The European Correspondent aims to change that. We cover the whole of Europe and write for a community of citizens who want to look beyond their own national borders. Without European journalism, there is no European civil society.

Read our manifesto
The stories we would like to write for you

Become a donor!

The European Correspondent is fully funded by its readers. We can only produce the newsletter with your support - and work towards the bigger project: building true European media. Donate now!

With your help, we can create true European journalism. Thank you!

We are non-profit. Every donated € goes directly into The European Correspondent.