2nd Workshop on Computational Detection of Language Change 2020

Simon Hengchen , Nina Tahmasebi , Aleksandrs (Sasha) Berdicevskis , Yvonne Adesam

Jun 16, 2020

The workshop will be co-located with SLTC 2020 in Gothenburg, on the 25th of November 2020.

Registration is open! Deadline for registration is November 17

Programme, all time CET:

12:55 Participants are invited to log in to the Zoom meeting
13:00 Welcoming
13:05 - 14:00: Keynote 1: Sparse Usage Graphs as Model for Word Meaning in Context, Dominik Schlechtweg, IMS Stuttgart
14:00 - 14:15: Conditional language models for linguistic variation and change -- Bill Noble SHORT PAPER
14:15 - 14:30: Dynamic Probabilistic Word Embeddings with Laplacian Priors -- Väinö Yrjänäinen and Måns Magnusson SHORT PAPER
14:30 - 15:00 COFFEE BREAK
15:00 - 15:25: Methodological Concerns in Modeling Language Change for Medieval Texts -- Hai Hu, Patrícia Amaral and Sandra Kübler LONG PAPER
15:25 - 15:50: Interpretable Word Embeddings via Informative Prior -- Miriam Hurtado Bodell LONG PAPER
15:50 - 16:10: BREAK
16:10 - 17:00: Keynote 2: Lexical Semantics and Semantic Change - a Report from the Lexicographer's Shop Floor, Stellan Petersson & Emma Sköldberg, University of Gothenburg

Programme, all time CET:

12:55 Participants are invited to log in to the Zoom meeting
13:00 Welcoming
13:05 - 14:00: Keynote 1: Sparse Usage Graphs as Model for Word Meaning in Context, Dominik Schlechtweg, IMS Stuttgart
14:00 - 14:15: Conditional language models for linguistic variation and change -- Bill Noble SHORT PAPER
14:15 - 14:30: Dynamic Probabilistic Word Embeddings with Laplacian Priors -- Väinö Yrjänäinen and Måns Magnusson SHORT PAPER
14:30 - 15:00 COFFEE BREAK
15:00 - 15:25: Methodological Concerns in Modeling Language Change for Medieval Texts -- Hai Hu, Patrícia Amaral and Sandra Kübler LONG PAPER
15:25 - 15:50: Interpretable Word Embeddings via Informative Prior -- Miriam Hurtado Bodell LONG PAPER
15:50 - 16:10: BREAK
16:10 - 17:00: Keynote 2: Lexical Semantics and Semantic Change - a Report from the Lexicographer's Shop Floor, Stellan Petersson & Emma Sköldberg, University of Gothenburg

Our world changes, and with it we change our language. We learn new words, add new meanings to existing words and change meanings to better describe our time and culture. We forget fast and when looking back, for example in old newspaper material, these lexical and semantic changes make it difficult to understand what has been said. In addition, they hinder us when we want to apply text mining on historical corpora, for example, to track sentiments over time.

Automatic detection of language change is a field that has gotten increasing attention over the past decade. Due to digital corpora containing more data and spanning over more time, combined with new and powerful embedding technologies, methods for e.g., word sense change have become very popular.

Despite these initial efforts, we lack computational tools for studying lexical and semantic changes at a large scale. Current methods are limited in what they can find and methods for creating (neural) word embeddings that are the state-of-the-art in e.g., sense change detection require sufficiently large datasets. For (historical) Swedish and most other languages, the situation is different with fairly small-sized data with a high error rate.

This workshop aims to bring together a community of researchers in Sweden that focus on different aspects of language change detection, both from a qualitative and manual as well as a quantitative, automatic detection perspective. We believe that such a workshop is needed for Sweden to be in the forefront of this research, in particular since few others will aim to find solutions particular to Swedish.

We invite presentations but no full papers, to encourage participants from a wide range of fields. We intend a low-key workshop that will start with a keynote and continue with each participant getting a chance to present themselves and their work, to find possible collaborations (preferably across topics and fields) and better utilize existing efforts and datasets. We hope to bring together technology providers, data providers, and users such as researchers with interest in historical texts but with expertise outside of language technology (like digital humanities, historical linguistics, history of ideas, sociology, history etc.). After a coffee break and the presentations, we will have one more keynote and continue with some discussions and a planning session for collaboration and a further workshop.

There will be no published proceedings and the workshop is planned for around half a day. Please contact nina . tahmasebi at gu . se or simon . hengchen at gu . se to propose presentations, with a preliminary title, a short description (maximum of 1 written page), and whether you would prefer a long or a short presentation. We are planning for presentations of 15-20 minutes for finished work and 5-10 minutes for ongoing work, but this can change depending on the number of presentations. Examples of welcomed talks include:

You are a linguist, anthropologist, historian etc. who is interested in how a certain word or a certain concept developed and changed in Swedish language or culture over time. You have done corpus studies, but mainly manual, and wish to explore if there are computational methods or collaborations that could add more value to your research.
You are a quantitative researcher with an interesting method that finds patterns in e.g. time series data, and wonder if there are good data sets and research questions to use this on.
You are a linguist with a theory about how semantic change proceeds or how new words are added to a language, looking for new methods to falsify your hypotheses.

Confirmed Speakers: Dominik Schlechtweg, IMS Stuttgart.

Title and abstract:
Sparse Usage Graphs as Model for Word Meaning in Context
Usage Graphs represent the uses of a word as nodes in a graph. Edge weights represent semantic proximity between uses, according to which they can be grouped into use clusters representing the same sense. This gives access to the number of senses a word expresses at any point of time (polysemy) and to changes in these senses over time (lexical semantic change). While reliable usage graphs can be obtained from human semantic proximity judgments, this puts strong constraints on the number of edges that can be annotated. Hence, we need sampling techniques presenting a limited number of edges to annotators, while guaranteeing that the correct clustering can be obtained from the resulting sparse graphs. I will present several recent studies on usage graphs and how they can be obtained reliably and efficiently. I will show that they are a useful, simple and well-defined model for word meaning in context, opening up a completely new set of problems in lexical semantics.

Confirmed Speakers: Stellan Petersson & Emma Sköldberg, University of Gothenburg

Title and abstract:
Lexical Semantics and Semantic Change – a Report from the Lexicographer’s Shop Floor
The results from computational studies are relevant to practical dictionary editing. In the ongoing work on the second edition of the dictionary Svensk ordbok utgiven av Svenska Akademien (“The Contemporary Dictionary of the Swedish Academy”), semantic changes on the lexical level are important. But the editorial group, of which we are members, currently lack formal methods for discovering them. The purpose of this talk is to share some of the research questions and problems dealtwithin everyday, hands-on lexicographic work, in order to initiate a dialogue between lexicography and language technology. A first question is: what is lexical semantic change? A second question: are modifications in emotive meaning semantic changes? Third: context dependence of meaning is expected from a synchronic perspective. Finally: if a change is noticed, what kind is it? These issues will be discussed in the talk.

The workshop is organized by Språkbanken. Organizers: Simon Hengchen, Nina Tahmasebi, Aleksandrs (Sasha) Berdicevskis, Yvonne Adesam

Language Change Detection Digital humanities

2nd Workshop on Computational Detection of Language Change 2020

Related