The workshop will be co-located with SLTC 2020 in Gothenburg, on the 25th of November 2020.
- 1 October 2020: Submission deadline of abstracts
- 13 October 2020: Author notification
- 25 November 2020: Workshop
Our world changes, and with it we change our language. We learn new words, add new meanings to existing words and change meanings to better describe our time and culture. We forget fast and when looking back, for example in old newspaper material, these lexical and semantic changes make it difficult to understand what has been said. In addition, they hinder us when we want to apply text mining on historical corpora, for example, to track sentiments over time.
Automatic detection of language change is a field that has gotten increasing attention over the past decade. Due to digital corpora containing more data and spanning over more time, combined with new and powerful embedding technologies, methods for e.g., word sense change have become very popular.
Despite these initial efforts, we lack computational tools for studying lexical and semantic changes at a large scale. Current methods are limited in what they can find and methods for creating (neural) word embeddings that are the state-of-the-art in e.g., sense change detection require sufficiently large datasets. For (historical) Swedish and most other languages, the situation is different with fairly small-sized data with a high error rate.
This workshop aims to bring together a community of researchers in Sweden that focus on different aspects of language change detection, both from a qualitative and manual as well as a quantitative, automatic detection perspective. We believe that such a workshop is needed for Sweden to be in the forefront of this research, in particular since few others will aim to find solutions particular to Swedish.
We invite presentations but no full papers, to encourage participants from a wide range of fields. We intend a low-key workshop that will start with a keynote and continue with each participant getting a chance to present themselves and their work, to find possible collaborations (preferably across topics and fields) and better utilize existing efforts and datasets. We hope to bring together technology providers, data providers, and users such as researchers with interest in historical texts but with expertise outside of language technology (like digital humanities, historical linguistics, history of ideas, sociology, history etc.). After a coffee break and the presentations, we will have one more keynote and continue with some discussions and a planning session for collaboration and a further workshop.
There will be no published proceedings and the workshop is planned for around half a day. Please contact nina . tahmasebi at gu . se or simon . hengchen at gu . se to propose presentations, with a preliminary title, a short description (maximum of 1 written page), and whether you would prefer a long or a short presentation. We are planning for presentations of 15-20 minutes for finished work and 5-10 minutes for ongoing work, but this can change depending on the number of presentations. Examples of welcomed talks include:
- You are a linguist, anthropologist, historian etc. who is interested in how a certain word or a certain concept developed and changed in Swedish language or culture over time. You have done corpus studies, but mainly manual, and wish to explore if there are computational methods or collaborations that could add more value to your research.
- You are a quantitative researcher with an interesting method that finds patterns in e.g. time series data, and wonder if there are good data sets and research questions to use this on.
- You are a linguist with a theory about how semantic change proceeds or how new words are added to a language, looking for new methods to falsify your hypotheses.
Confirmed Speakers: Dominik Schlechtweg, IMS Stuttgart.
Title and abstract:
Sparse Usage Graphs as Model for Word Meaning in Context
Usage Graphs represent the uses of a word as nodes in a graph. Edge weights represent semantic proximity between uses, according to which they can be grouped into use clusters representing the same sense. This gives access to the number of senses a word expresses at any point of time (polysemy) and to changes in these senses over time (lexical semantic change). While reliable usage graphs can be obtained from human semantic proximity judgments, this puts strong constraints on the number of edges that can be annotated. Hence, we need sampling techniques presenting a limited number of edges to annotators, while guaranteeing that the correct clustering can be obtained from the resulting sparse graphs. I will present several recent studies on usage graphs and how they can be obtained reliably and efficiently. I will show that they are a useful, simple and well-defined model for word meaning in context, opening up a completely new set of problems in lexical semantics.
The workshop is organized by Språkbanken. Organizers: Simon Hengchen, Nina Tahmasebi, Aleksandrs (Sasha) Berdicevskis, Yvonne Adesam