Project Description

Background

Language changes over time in processes that often span long time periods. However, modern events like the current situation around Covid-19 has stressed that the cultural aspects of words and their meaning can change radically over short periods of time as well: isolation today carries a stronger sense of hopelessness and an extreme negative connotation. Vaccine, while also previously having both a positive and negative connotation, today carries with it a sense of hope; Once the vaccine is in place, life will go back to normal. Our linguistic resources, and our cultural existence are intertwined and must be studied as a single whole. To understand our contemporary and historical societies, we must understand the language used to describe them.

Researchers in text-based humanities and social sciences have always faced hurdles caused by semantic change and linguistic variation on a regular basis (words acquire new subtle meanings, or are replaced by other, more prominent words). Despite technological breakthroughs to alleviate the problems, they are still left to handle these changes on their own using resources like dictionaries, that are slow to update and cover little of our language and its actual use. They risk missing out on important textual clues and are limited to small-scale manual analysis. In addition, many humanities and social science researchers are interested in  changing phenomena portrayed in language.

While acknowledging that textual resources do not have representation of all parts of society, with the socio-economically weak being significantly less represented, these resources are never-the-less reaching an impressive and unprecedented part of our society. By opening up modern and historical Swedish textual resources, social media included, we have enormous possibilities to study our world with reasonable efforts in data collection, and minimal interference to the objects we study.

Program description

The program will run over six years (2022--2027) and is funded by Riksbankens Jubileumsfond for a total of 33.5 Million SEK. It constitutes a core language technology research team. In addition, we have researchers from (historical) linguistics, lexicography, analytical sociology, gender studies and literary studies. The program comprises five subprojects, out of which three are core language technology and NLP (SP1, SP2, SP3), one relates to reevaluating existing change hypotheses proposed by historical linguists, and the final consist of four humanities and social science projects including lexicography.

We will identify and eliminate linguistic barriers caused by language change to open up our textual accounts of the world to researchers from a wide range of fields; sociology, cultural studies, history, literature, journalism and religion. We will also apply our change detection methodology directly to answer different HSS research questions in our application projects.

Historical Linguistic: There are many open questions in the burgeoning field of quantitative semantics, which we cannot currently answer with existing computational methods; How do lexical change and semantic change interact? Why do different parts of the vocabulary change at different speeds? How does change spread throughout the lexicon? High quality case studies of change often produce hypotheses, and we will provide tools to test and quantify these hypotheses using large-scale methods developed within this program.Our  corpus-based studies will feed insights into our models, thus improving both modeling and theory of meaning, senses, and language change.

Lexicography: Using computational methods, we will advance our understanding of the semantic structures underlying textual data. We will integrate recent advances in computational linguistics into the lexicographic process,    transforming it from manual lexicography into a semi-automatic, and empirically-based work flow. This work will be done together with the lexicographic group at the University of Gothenburg that develops the dictionary Svensk ordbok utgiven av Svenska Akademien (“The Contemporary Dictionary of the Swedish Academy”), and will directly improve their workflow.

Advancing Natural Language Processing and Machine Learning: We will extend the state-of-the-art in lexical semantic change with respect to both theoretical and methodological aspects. In addition, we will adapt our methods to be applicable to the needs of HSS.  A large focus will be on synchronic variation as a complement to diachronic change, stemming from our work on sense-aware models, and in part on the comparison across contemporary corpora. We will advance the state-of-the-art in NLP in several ways:

  • Extend semantic modeling beyond the simplistic ``one vector per word'' or ``one vector per sentence'' to contextually and culturally defined concepts needed for HSS;
  • Develop sense-aware models that can model all parts of a word, to be able to answer what happened, how the changes relate to what we already know and when the change took place. We will apply change detection to Swedish, to individual concepts and the interplay in a semantic field, setting the state-of-the-art for Swedish, and significantly furthering the research field internationally;
  • Adapt methods for diachronic change to synchronic variations needed for many different humanities and social science studies, including the study of radicalisation, cultural transformation and cultural differences across social groups.

We will have four HSS projects within the program. Each collaboration partner brings research question/s and data, and we collaborate around methods to help incorporate expert knowledge and provide answers. These application projects provide us a chance to conduct high-quality research that will benefit both parties, and leave behind tools and methodology useful beyond the scope of the program.

We believe that putting a group of NLP experts in close collaboration with HSS experts, within the field of semantic change, will lead to research results only achievable through collaboration across fields. Secondly, method development is radically improved by close collaboration with fields in which the methods are needed. Thirdly, our research results and methodology are disseminated to relevant fields and serve to set state-of-the-art in terms of research results, and perhaps more importantly, further research methodology.

Our HSS projects investigate (1) radicalization of groups (focused around synchronic variation due to its rapid speed), (2) cultural differences over time (sense-aware diachronic change), (3) how rights, acknowledgement and justice have changed over time in media, legislation, and politics (sense-aware diachronic change), and (4) how the phone, steamer, and electricity changed the society, and how this was reflected in literature (diachronic change and synchronic variation).

The development of methods in NLP is essential to reach our main goal, to integrate our research with, and contribute  to, state-of-the-art HSS research. The application projects are at least equally important as the theoretical and methodological development, and a crucial part in driving the methodological development. All parties bring cutting-edge research questions that we can answer under the umbrella of this program.