Assessing how accurately large language models encode and apply the common European framework of reference for languages

Large Language Models (LLMs) can have a transformative effect on a variety of domains, including education, and it is therefore pressing to understand whether these models have knowledge of – or, in other words, how they have encoded – the specific pedagogical requirements of different educational domains, and whether they use this when performing educational tasks. In this work, we propose an approach to evaluate the knowledge – or encoding – that the LLMs have of the Common European Framework of Reference for Languages (CEFR), and use it to evaluate five modern LLMs. Our study shows that the suite of tasks we propose is quite challenging for all the LLMs, and they often provide results which are not satisfactory and would be unusable in educational applications, suggesting that – even if they encode some information about the CEFR – this knowledge is not really leveraged when performing downstream tasks.

Keywords

46 Information and Computing Sciences, 4601 Applied Computing, 4602 Artificial Intelligence

Journal Title

Computers and Education Artificial Intelligence

Journal ISSN

2666-920X
2666-920X

Publisher

Elsevier

Publisher DOI

https://doi.org/10.1016/j.caeai.2024.100353

Rights and licensing

Except where otherwised noted, this item's license is described as Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)

Sponsorship

Cambridge Assessment (unknown)

Cambridge University Press & Assessment

Collections

University of Cambridge Research Outputs (Articles and Conferences)