EMNLP 2023

Aleksandrs Berdicevskis , Gerlof Bouma , Robin Kurtz , Felix Morger , Joey Öhman , Yvonne Adesam , Lars Borin , Dana Dannélls , Markus Forsberg , Tim Isbister , Anna Lindahl , Martin Malmsten , Faton Rekathati , Magnus Sahlgren , Elena Volodina , Love Börjeson , Simon Hengchen , Nina Tahmasebi

Dec 6, 2023

EMNLP 2023

Dec 6, 2023

Abstract

We present Superlim, a multi-task NLP benchmark and analysis platform for evaluating Swedish language models, a counterpart to the English-language (Super)GLUE suite. We describe the dataset, the tasks, the leaderboard and report the baseline results yielded by a reference implementation. The tested models do not approach ceiling performance on any of the tasks, which suggests that Superlim is truly difficult, a desirable quality for a benchmark. We address methodological challenges, such as mitigating the Anglocentric bias when creating datasets for a less-resourced language; choosing the most appropriate measures; documenting the datasets and making the leaderboard convenient and transparent. We also highlight other potential usages of the dataset, such as, for instance, the evaluation of cross-lingual transfer learning.

Type

Wp1: Word Sense Induction Wp4: Application

Publication

Superlim: A Swedish Language Understanding Evaluation Benchmark

Date

December, 2023

Links

PDF