SP1: Model Evaluation
We will start by performing a first, large-scale evaluation of existing methods. Unique to this program is that we will focus on different kinds of data (both historical and modern), and with respect to different research questions (stemming from historical linguistics, lexicography, social sciences, gender studies, and literature studies). We will evaluate for appropriateness, level of detail, and correctness.This will provide us a map of our current methods, their capacity and limitation, as well as point to areas where new methods are needed.
Most LSC detection methods have been evaluated only in a narrow setting, i.e., comparing two time periods on small test data annotated for lexicographic word sense distinctions. In order to make the methods applicable to external fields of research, such as cultural or sociological studies, a wider evaluation of methods is needed. This includes changes attested over multiple time points, more fine-grained semantic distinctions, and especially a large-scale evaluation testing the potential of the methods to discover new changes, on different kinds of text (spanning long time periods, or highly dynamic, modern social media, and so on). Only after such evaluation data exists, can we know in practice how each method words for a specific research formulation and on specific kinds of text.