2nd Workshop on Computational Detection of Language Change 2020

The workshop will be co-located with SLTC 2020 in Gothenburg, on the 25th of November 2020.

Due to server crashes at the university, our email has been out of order for the past two weeks. If you have submitted a paper to info@languagechange.org, or are planning to submit a paper, we ask that you resubmit / send it to nina.tahmasebi @ gu.se, or nina @ tahmasebi.se. We look forward to reading your papers.

Important Dates:

  • 1 October 2020: Submission deadline of abstracts
  • 13 October 2020: Author notification
  • 25 November 2020: Workshop

Our world changes, and with it we change our language. We learn new words, add new meanings to existing words and change meanings to better describe our time and culture. We forget fast and when looking back, for example in old newspaper material, these lexical and semantic changes make it difficult to understand what has been said. In addition, they hinder us when we want to apply text mining on historical corpora, for example, to track sentiments over time.

Automatic detection of language change is a field that has gotten increasing attention over the past decade. Due to digital corpora containing more data and spanning over more time, combined with new and powerful embedding technologies, methods for e.g., word sense change have become very popular.

Despite these initial efforts, we lack computational tools for studying lexical and semantic changes at a large scale. Current methods are limited in what they can find and methods for creating (neural) word embeddings that are the state-of-the-art in e.g., sense change detection require sufficiently large datasets. For (historical) Swedish and most other languages, the situation is different with fairly small-sized data with a high error rate.

This workshop aims to bring together a community of researchers in Sweden that focus on different aspects of language change detection, both from a qualitative and manual as well as a quantitative, automatic detection perspective. We believe that such a workshop is needed for Sweden to be in the forefront of this research, in particular since few others will aim to find solutions particular to Swedish.

We invite presentations but no full papers, to encourage participants from a wide range of fields. We intend a low-key workshop that will start with a keynote and continue with each participant getting a chance to present themselves and their work, to find possible collaborations (preferably across topics and fields) and better utilize existing efforts and datasets. We hope to bring together technology providers, data providers, and users such as researchers with interest in historical texts but with expertise outside of language technology (like digital humanities, historical linguistics, history of ideas, sociology, history etc.). After a coffee break and the presentations, we will have one more keynote and continue with some discussions and a planning session for collaboration and a further workshop.

There will be no published proceedings and the workshop is planned for around half a day. Please contact nina . tahmasebi at gu . se or simon . hengchen at gu . se to propose presentations, with a preliminary title, a short description (maximum of 1 written page), and whether you would prefer a long or a short presentation. We are planning for presentations of 15-20 minutes for finished work and 5-10 minutes for ongoing work, but this can change depending on the number of presentations. Examples of welcomed talks include:

  • You are a linguist, anthropologist, historian etc. who is interested in how a certain word or a certain concept developed and changed in Swedish language or culture over time. You have done corpus studies, but mainly manual, and wish to explore if there are computational methods or collaborations that could add more value to your research.
  • You are a quantitative researcher with an interesting method that finds patterns in e.g. time series data, and wonder if there are good data sets and research questions to use this on.
  • You are a linguist with a theory about how semantic change proceeds or how new words are added to a language, looking for new methods to falsify your hypotheses.

Confirmed Speakers: Dominik Schlechtweg, IMS Stuttgart.

Title and abstract:
Sparse Usage Graphs as Model for Word Meaning in Context
Usage Graphs represent the uses of a word as nodes in a graph. Edge weights represent semantic proximity between uses, according to which they can be grouped into use clusters representing the same sense. This gives access to the number of senses a word expresses at any point of time (polysemy) and to changes in these senses over time (lexical semantic change). While reliable usage graphs can be obtained from human semantic proximity judgments, this puts strong constraints on the number of edges that can be annotated. Hence, we need sampling techniques presenting a limited number of edges to annotators, while guaranteeing that the correct clustering can be obtained from the resulting sparse graphs. I will present several recent studies on usage graphs and how they can be obtained reliably and efficiently. I will show that they are a useful, simple and well-defined model for word meaning in context, opening up a completely new set of problems in lexical semantics.

Confirmed Speakers: Stellan Petersson & Emma Sköldberg, University of Gothenburg.

Title and abstract:
Lexical Semantics and Semantic Change - a Report from the Lexicographer's Shop Floor
There are several reports on efforts to automate or semi-automate parts of the process of dictionary compilation, including the building of headword lists and identification of collocations (Cook et al. 2014). Automatic methods for finding linguistic examples have also been developed (e.g. Kilgarriff et al. 2008; Pilán 2016). Furthermore, there are computational linguistic studies that aim to find semantic changes in large text corpora (e.g. Cavallin 2012; Cook et al. 2013, 2014; Nimb et al. 2020). A central aim of these efforts is to make lexicographic work more efficient; another, related aim, is to introduce more systematicity into the process of dictionary construction. The results from studies like these are, of course, relevant to practical dictionary editing. In the ongoing work on the second edition of the dictionary Svensk ordbok utgiven av Svenska Akademien (''The Contemporary Dictionary of the Swedish Academy''), semantic changes on the lexical level are important. But the editorial group, of which we are members, currently lack formal methods for discovering them. The purpose of this talk is to share some of the research questions and problems dealt with in everyday, hands-on lexicographic work, in order to initiate a dialogue between lexicography and language technology.

The workshop is organized by Språkbanken. Organizers: Simon Hengchen, Nina Tahmasebi, Aleksandrs (Sasha) Berdicevskis, Yvonne Adesam

Related