Unsupervised Lexical Semantic Change Detection – Past, Present & Future


In this talk, I will introduce lexical semantic change detection from a computational perspective. Often, the motivations for studying lexical semantic change differ between data science or NLP, and traditional studies in e.g., historical linguistics. This has implications on methods, data, and evaluation techniques. Because of the focus on historical English with its relatively high quality and large volumes of available digital text, the field tends to focus on solutions that are not always suitable for other languages. In addition, the complexity of the problem, and its evaluation over long time spans, has led to ad hoc and non-comparable evaluation. I will discuss our current efforts (and huge challenges) on creating a large annotated test set for SemEval2020. Through the creation of ground-truth for German, Swedish, and Latin in addition to English, we aim to bring the community closer to standardized, comparable testing of methods across languages with different features and digital text quality. In its short history as a sub-field in NLP, lexical semantic change has seen detection methods that have ranged from count-based vectors, cluster-based and topic model-based methods, to predictive vectors created from neural embedding spaces, and now the latest contextualized vectors. I will present the great advances made so far and discuss current and future challenges.

Language Change: Theoretical and Empirical Perspectives Conference
Jerusalem, Israel