Natalia Moskvina (UAB) ‘Multilingual Variation in Language Comprehension in Large Language Models’
Seminari del CLT
Divendres, 23 de gener de 2026
Hora 15:30
Aula 202
Enllaç Teams
Abstract:
With recent advances in Artificial Intelligence (AI) tools and Large Language Models (LLMs) in particular, a growing number of people are turning to these systems for professional or personal assistance. This increasing influence calls for a thorough evaluation of their abilities in order to ensure safe and ethical use. Previous research demonstrated that LLMs still struggle to reach a human-like level of performance in a range of language aspects (e.g., Qiu et al., 2023; Collacciani et al., 2024; Dentella et al., 2024). However, while models are trained and prompted in a variety of natural languages, it remains unexplored how their linguistic abilities vary across them, with the majority of studies focusing on English as the assumed dominant language of LLMs (Zhang et al., 2023). Our study aims to explore LLMs’ multilingual performance by evaluating their language comprehension across 12 diverse languages: English, German, Dutch, Spanish, Italian, Catalan, Greek, Russian, Turkish, Arabic, Chinese, and Japanese. The results show significant cross-linguistic variation in the accuracy of models’ responses, however, in contrast with previous findings, LLMs did not exhibit superior performance in English, while Spanish and Italian did score consistently high for all tested models. Additionally, humans significantly outperformed models across languages, with the exception of Spanish for GPT-4o and Italian for DeepSeek-V3, where they demonstrated a comparable level of language comprehension. The findings open the discussion about the reliability of models’ multilingual use as well as the challenges posed by different language systems.



