SP3: Annotation and Clustering
To understand the nature of semantic change as apparent in empirical, large-scale data, we will utilize human annotation and sense clustering. Annotated data are a crucial way to control that computational models solve the problem that they are intended to solve. We will study the amount of annotation needed to reveal proper semantics from a corpus sample in different domains and over different time spans. The results will be robust sampling and annotation strategies that will help us reduce the amount of manual effort needed while maintaining a high annotation quality. We will also investigate the possibility to combine manual annotation with synthetic or simulated evaluation sets. The outcome of this SP will be fundamental insights into lexical semantic change.
SP3 will be a crucial part of ensuring that we can create ground truth data together with our application partners within the time frames allocated to the application projects in the outer circle. The work will be done in close collaboration with SP2 on sense-differentiation and to the lexicography application project.
A particular focus of this project will be on lexicography. A continuous aim for lexicographers is to identify and record changes to the vocabulary of a language. The lexicographic identification process itself is based heavily on manual effort, hence only a small number of high-frequency candidates are inspected on the basis of nonrandomized, manually drawn samples. This leads to decisions on change of meaning to be more subjective than objective. In order to overcome these issues, we will combine our annotation system with computational lexical semantic change detection methods to automate the lexicographic process as far as possible. We will predict candidate words on a regular basis from recent language samples, for example, corpora continuously collected by Språkbanken Text and the National Library of Sweden, and present our lexicographers with those words that have the highest likelihood of having experienced change.