2nd International Workshop on Computational Approaches to Historical Language Change 2021 (LChange'21)

NEW: the programme is available!

The workshop builds upon its first iteration in 2019, where we received over 50 submissions and had over 65 attendees. Just like the first one, the second LChange workshop will be co-located with ACL (2021), and, pending on the situation, a hybrid event with possible online participation. We hope to make our second edition another resounding success!

The call for papers will be similar to last time: all aspects around computational approaches to historical language change with the focus on digital text corpora. If you have published in the field previously, and are inrerested in helping out in the PC to review papers, send us an email.

We will have two keynote talks. Our confirmed keynote speakers are Alexander Koplenig (Leibniz-Institute for the German Language in Mannheim) with the talk Two challenges we face when analyzing diachronic corpora, and Maria Koptjevskaja-Tamm (Stockholm University) and Tatiana Nikitina (LLACAN – “Languages and cultures of Africa”, CNRS) who will present a talk titled Linguistic diversity as a testing ground for the study of semantic change.

Important Dates

All dates are Anywhere on Earth and to be understood "at 23:59 on that date".

  • April 26, 2021 May 3, 2021: Paper submission
  • May 28, 2021: Notification of acceptance
  • June 7, 2021: Camera-ready papers due
  • August 6, 2021: Workshop date

Workshop Topics

Human language changes over time, driven by the dual needs of adapting to ongoing sociocultural and technological development in the world and facilitating efficient communication. In particular, novel words are coined or borrowed from other languages, while obsolete words slide into obscurity. Similarly, words may acquire novel meanings or lose existing meanings. This workshop explores these phenomena by bringing to bear state-of-the-art computational methodologies, theories and digital text resources on exploring the time-varying nature of human language.

Although there exists rich empirical work on language change from historical linguistics, sociolinguistics and cognitive linguistics, computational approaches to the problem of language change particularly how word forms and meanings evolve have only begun to take shape over the past decade or so, with exemplary work on semantic change and lexical replacement. The motivation has long been related to search, and understanding in diachronic archives. The emergence of long-term and large-scale digital corpora was the prerequisite and has resulted in a slightly different set of problems for this strand of study than have traditionally been studied in historical linguistics. As an example, studies of lexical replacement have largely focused on named entity change (names of e.g., countries and people that change over time) because of the large effect these name changes have for temporal information retrieval.

The aim of this workshop is three-fold. First, within a field that has a lot of challenges (Hengchen et al, 2021) we want to provide pioneering researchers who work on computational methods, evaluation, and large-scale modelling of language change an outlet for disseminating cutting-edge research on topics concerning language change. Currently, researchers in this area have published in a wide range of different venues, from computational linguistics, to cognitive science and digital archiving venues (Tahmasebi et al, 2021). Building on its previous edition, we want to utilize this workshop as a platform for sharing state-of-the-art research progress in this fundamental domain of natural language research.

Second, in doing so we want to bring together domain experts across disciplines. We want to connect those that have long worked on language change within historical linguistics and bring with them a large understanding for general linguistic theories of language change; those that have studied change across languages and language families; those that develop and test computational methods for detecting semantic change and laws of semantic change; and those that need knowledge (of the occurrence and shape) of language change, for example, in digital humanities and computational social sciences where text mining is applied to diachronic corpora subject to lexical semantic change.

Third, the detection and modelling of language change using diachronic text and text mining raise fundamental theoretical and methodological challenges for future research in this area. The representativeness of text is a first critical issue; works using large diachronic corpora and computational methods for detecting change often claim to find changes that are universally true for a language as a whole. But the jury is out on how results derived from digital literature or newspapers accurately represent changes in language as a whole. We hope to engage corpus linguists, big-data scientists, and computational linguists to address these open issues. Besides these goals, this workshop will also support discussion on the evaluation of computational methodologies for uncovering language change. Verifying change only using positive examples of change often confirms a corpus bias rather than reflecting genuine language change. Larger quantities and higher qualities of text over time result in the detection of more semantic change. In fact, multiple semantic laws have been proposed lately where later other authors have shown that the detected effects are linked to frequency rather than underlying semantic change . The methodological issue of evaluation, together with good evaluation testsets and standards are of high importance to the research community. We aim to shed some light on these issues and encourage the community to collaborate to find solutions.

The work in semantic change detection [1] has, to a large extent, moved to (neural) embedding techniques in recent years . These methods have several drawbacks: the need for very large datasets to produce stable embeddings, and the fact that all semantic information of a word is encoded in a single vector thus limiting the possibility to study word senses separately. A move towards multi-sense embeddings will most likely require even more texts per time unit, which will limit the applicability of these methods to other languages than English and a few others. We want to bring about a discussion on the need for methods that can discriminate and disambiguate among a word's senses (meanings) and that can be used for resource-poor languages with little hope of acquiring the order of magnitude of words needed for creating stable embeddings, possibly using dynamic embeddings that seem to require less text. Finally, knowledge of language change is useful not only on its own, but as a basis for other diachronic textual investigations and in search.

A digital humanities investigation into the living conditions of young women through history cannot rely on the word girl in English, as in the past the reference of girl also included young men. Automatic detecting of language change is useful for many researchers outside of the communities that study the changes themselves and develop methods for their detection. By reaching out to these other communities, we can better understand how to utilize the results for further research and for presenting them to the interested public. In addition, we need good user interfaces and systems for exploring language changes in corpora, for example, to allow for serendipitous discovery of interesting phenomena . In addition to facilitate research on texts, information about language changes is used for measuring document across-time similarity, information retrieval from long-term document archives, the design of OCR algorithms and so on.

We invite original research papers from a wide range of topics, including but not limited to:

  • Novel methods for detecting diachronic semantic change and lexical replacement
  • Automatic discovery and quantitative evaluation of laws of language change
  • Computational theories and generative models of language change
  • Sense-aware (semantic) change analysis
  • Diachronic word sense disambiguation
  • Novel methods for diachronic analysis of low-resource languages
  • Novel methods for diachronic linguistic data visualization
  • Novel applications and implications of language change detection
  • Quantification of sociocultural influences on language change
  • Cross-linguistic, phylogenetic, and developmental approaches to language change
  • Novel datasets for cross-linguistic and diachronic analyses of language

Keynote Talks

Confirmed Speakers:
Alexander Koplenig (Leibniz-Institute for the German Language in Mannheim)
Title of talk: Two challenges we face when analyzing diachronic corpora
Abstract: In my keynote, I want to discuss two important challenges for the quantitative analysis of diachronic corpora that I believe deserve more attention:
  • The first challenge is the systematic influence of the sample size when it comes to basically all measures in quantitative linguistics (Baayen 2001). By analysing the lexical dynamics of the German weekly news magazine “Der Spiegel” (consisting of approximately 365,000 articles and 237,000,000 words that were published between 1947 and 2017), I show that this influence makes it difficult to quantify lexical dynamics and language change. I will also demonstrate that standard sampling approaches do not solve this problem. I will suggest an approach that is able to break the sample size dependence but presupposes access to the full text data (Koplenig, Wolfer & Müller-Spitzer 2019).
  • The second challenge is of methodological nature and relates to the problem of representativeness of diachronic corpora. Labov (1994) famously stated that “historical documents survive by chance, not by design, and the selection that is available is the product of an unpredictable series of historical accidents.” By using both Google Books Ngram data (Michel et al. 2010; Koplenig 2015; Pechenick, Danforth & Dodds 2015) and publicly available data from the German National Bibliography, I will try to show that the problem is even more fundamental, because there is good reason to believe that composition of the body of published written works (from which a corresponding corpus is supposed to be sampled from) systematically changes as a function of time. This makes it difficult to disentangle actual language change from environmental changes in the textual habitat (Szmrecsanyi 2016).



Maria Koptjevskaja-Tamm (Stockholm University) and Tatiana Nikitina (LLACAN – “Languages and cultures of Africa”, CNRS)

Title of talk: Linguistic diversity as a testing ground for the study of semantic change

There are between 6000 and 8000 languages currently spoken in the world. The majority of those still lack decent descriptions, not to mention any written tradition and sizeable documents to rely on while trying to trace semantic changes they have undergone in the past and understanding the mechanisms behind them. Understandably, but likewise regrettably, most of the theoretical thinking in linguistics and adjacent disciplines has been formed by research on a few very big languages with a long written tradition, and the same has to a large extent been carried over to computational approaches, including work on semantic change. In our talk we will focus on two big issues which we believe deserve more awareness and attention among researchers involved in computational approaches to historical language change:

  • A crucial part in any theoretical work consists of formulating hypotheses, generalizations, laws etc. and explaining them, and work on semantic change is, of course, no exception. Linguistic diversity does not imply that any such generalizations are meaningless or premature before these have been studied for all the world’s languages. It does imply, though, that such generalizations gain a lot from careful systematic cross-linguistic research that may unveil cross-linguistic regularities behind diversity – which is foundational for linguistic typology. Here we will discuss several cases whereby such research has questioned earlier generalizations on semantic change based on the familiar languages and/or has come up with new hypotheses.
  • But given that the majority of the world’s languages lack any written tradition and sizeable historical documents, how is it possible to study semantic changes they have undergone in the past? This is indeed a big challenge, but not an insurmountable one. We will discuss several methods which often combine a careful intragenetic comparison (i.e., comparison of closely related languages) and a broader cross-linguistic perspective and some of the results obtained by their application.


Submissions

We accept two types of submissions: long papers and short papers, following the ACL-IJCNLP 2021 style, and the ACL submission policy. Note that the valid dates for submissions to this workshop are those available on this webpage, and not those on the ACL-IJCNLP one.

Long papers may consist of up to eight (8) pages of content, plus unlimited references, short papers may consist of up to four (4) pages of content plus unlimited references; final versions will be given one additional page of content so that reviewers' comments can be taken into account.

Submissions should be sent in electronic forms, using the Softconf START conference management system. The submission site is now available.

The workshop is planned to last a full day. Submissions are open to all, and are to be submitted anonymously. All papers will be refereed through a double-blind peer review process by at least three reviewers with final acceptance decisions made by the workshop organizers.

Contact

Contact us if you have any questions.

Organisers: Nina Tahmasebi, Adam Jatowt, Yang Xu, Simon Hengchen, Syrielle Montariol, and Haim Dubossarsky.

Programme

The workshop will take place online on August the 6th. All times in the programme are CEST -- Central European Summer Time.

Start–End Title Author(s) LChange'21/Findings Link(s)
9:00–9:15 Introduction Workshop organisers
SESSION 1 Chair: Dominik Schlechtweg
9:15–9:40 Time-Aware Ancient Chinese Text Translation and Inference Ernie Chang, Yow-Ting Shiue, Hui-Syuan Yeh and Vera Demberg LChange'21 Slides, Paper
9:40–10:05 Three-part diachronic semantic change dataset for Russian Andrey Kutuzov and Lidia Pivovarova LChange'21 Slides, Paper
10:05–10:30 The Corpora They Are a-Changing: a Case Study in Italian Newspapers Pierpaolo Basile, Annalina Caputo, Tommaso Caselli, Pierluigi Cassotti and Rossella Varvara LChange'21 Slides, Paper
10:30–11:05 BREAK
SESSION 2 Chair: Andrey Kutuzov
11:05–11:30 Linguistic change and historical periodization of Old Literary Finnish Niko Partanen, Khalid Alnajjar, Mika Hämäläinen and Jack Rueter LChange'21 Slides, Paper
11:30–12:00 Studying the Evolution of Scientific Topics and their Relationships Ana Sabina Uban, Cornelia Caragea and Liviu P. Dinu Findings Slides, Paper
12:00–12:30 When Time Makes Sense: A Historically-Aware Approach to Targeted Sense Disambiguation Kaspar Beelen, Federico Nanni, Mariona Coll Arduany, Kasra Hosseini, Giorgia Tolfo and Barbara McGillivray Findings Slides, Paper
12:30–13:30 LUNCH / BREAK
KEYNOTE 1, Moderator: Barbara McGillivray
13:30–14:30 Linguistic diversity as a testing ground for the study of semantic change Maria Koptjevskaja-Tamm and Tatiana Nikitina
14:30–16:00 Online poster session See list below
KEYNOTE 2, Moderator: Syrielle Montariol
16:00–17:00 Two challenges we face when analyzing diachronic corpora Alexander Koplenig Slides


Poster presentations:

    LChange'21 papers
  • A diachronic evaluation of gender asymmetry in euphemism – Anna Kapron-King and Yang Xu – Poster, Paper
  • The GLAUx corpus: methodological issues in designing a long-term, diverse, multi-layered corpus of Ancient Greek – Alek Keersmaekers – Poster, Paper
  • Bhāṣācitra: Visualising the dialect geography of South Asia – Aryaman Arora, Adam Farris, Gopalakrishnan R and Samopriya Basu – Poster, Paper
  • Modeling the Evolution of Word Senses with Force-Directed Layouts of Co-occurrence Networks – Tim Reke, Robert Schwanhold and Ralf Krestel – Poster, Paper
  • Tracking Semantic Change in Cognate Sets for English and Romance Languages – Ana Sabina Uban, Alina Maria Cristea, Anca Dinu, Liviu P. Dinu, Simona Georgescu and Laurentiu Zoicas – Poster, Paper
  • Findings papers
  • Can Cognate Prediction Be Modelled as a Low-Resource Machine Translation Task? – Clémentine Fourrier, Rachel Bawden, Benoît Sagot – Poster, Paper
  • Event Extraction from Historical Texts: A New Dataset for Black Rebellions – Viet Dac Lai, Minh Van Nguyen, Heidi Kaufman and Thien Huu Nguyen – Poster, Paper
  • Sequence Models for Computational Etymology of Borrowings – Winston Wu, Kevin Duh, David Yarowsky – Poster, Paper
  • A Formidable Ability: Detecting Adjectival Extremeness with DSMs – Farhan Samir, Barend Beekhuizen and Suzanne Stevenson – Poster, Paper

Sponsors

We gratefully acknowledge the contribution of iguanodon.ai as silver sponsor.

Student sponsorship:

Thanks to iguanodon.ai, we are able to sponsor the registration fees for the ACL-IJCNLP 2021 conference (online), including the yearly ACL membership fee, for two students. This year, any type of registration will grant access to the whole event and not solely to our workshop. We therefore would like to invite interested students to apply by email, **by July 6th 23:59 CEST**, to syrielle.montariol@inria.fr with the following information:

  • Short CV
  • 500-word abstract about current research
  • Whether it would be your first xACL event
  • Whether you have an accepted paper at ACL-IJCNLP 2021
  • Confirmation of your “student” status
We particularly encourage sponsorship applications from diverse backgrounds and underrepresented groups in the community.

Anti-Harassment Policy

Our workshop highly values the open exchange of ideas, the freedom of thought and expression, and respectful scientific debate. We support and uphold the ACL Anti-Harassment policy, and any workshop participant should feel free to contact any of the workshop organisers or Priscilla Rasmussen, in case of any issues.

References:

  • Simon Hengchen, Nina Tahmasebi, Dominik Schlechtweg, Haim Dubossarsky. Challenges for Computational Lexical Semantic Change. To appear in: Nina Tahmasebi, Lars Borin, Adam Jatowt, Yang Xu, Simon Hengchen (eds). Computational Approaches to Semantic Change. Berlin: Language Science Press.
  • Nina Tahmasebi, Adam Jatowt, Lars Borin. Survey of Computational Approaches to Lexical Semantic Change Detection. To appear in: Nina Tahmasebi, Lars Borin, Adam Jatowt, Yang Xu, Simon Hengchen (eds). Computational Approaches to Semantic Change. Berlin: Language Science Press.
  • Baayen, R. Harald. 2001. Word Frequency Distributions. Dordrecht: Kluwer Academic Publishers.
  • Koplenig, Alexander. 2015. The Impact of Lacking Metadata for the Measurement of Cultural and Linguistic Change Using the Google Ngram Data Sets—Reconstructing the Composition of the German Corpus in Times of WWII. Digital Scholarship in the Humanities fqv037. https://doi.org/10.1093/llc/fqv037.
  • Koplenig, Alexander, Sascha Wolfer & Carolin Müller-Spitzer. 2019. Studying Lexical Dynamics and Language Change via Generalized Entropies: The Problem of Sample Size. Entropy 21(5). https://doi.org/10.3390/e21050464. http://www.mdpi.com/1099-4300/21/5/464.
  • Labov, William. 1994. Principles of linguistic change (Language in Society 20). Oxford, UK ; Cambridge [Mass.]: Blackwell.
  • Michel, Jean-Baptiste, Yuan Kui Shen, Aviva Presser Aiden, Adrian Verses, Matthew K. Gray, The Google Books Team, Joseph P. Pickett, et al. 2010. Quantitative Analysis of Culture Using Millions of Digitized Books (Supporting Online Material II). Science 331(14). http://www.sciencemag.org/content/331/6014/176/suppl/DC1 (5 March, 2014).
  • Pechenick, Eitan Adam, Christopher M. Danforth & Peter Sheridan Dodds. 2015. Characterizing the Google Books Corpus: Strong Limits to Inferences of Socio-Cultural and Linguistic Evolution. (Ed.) Alain Barrat. PLOS ONE 10(10). e0137041. https://doi.org/10.1371/journal.pone.0137041.
  • Szmrecsanyi, Benedikt. 2016. About text frequencies in historical linguistics: Disentangling environmental and grammatical change. Corpus Linguistics and Linguistic Theory 12(1). 153–171. https://doi.org/10.1515/cllt-2015-0068.

[1] Often, the work from the computational community has a wider take on semantic change than traditional historical linguistics, for example, with novel words and senses as well as change to the senses themselves as a part.

Related