Computational Approaches to Language Change

Abstract

Language is the vital medium through which people communicate and express their needs. To make our communication meaningful, it is essential that we use language in the most effective and efficient way. As humans evolve, so too do languages, adapting to better encapsulate and convey information content. The underlying dynamics and mechanics of language change are often intricate and difficult to disentangle. Linguistic studies tend to focus on small samples, requiring high levels of skill and effort to consult and analyze thousands of historical documents. The computational interpretation and processing of natural language is a complex endeavor. This complexity escalates when transitioning from a synchronic to a diachronic, or across-time dimension. In the past few years, there has been a significant upswing in interest towards computational strategies for understanding language change. This surge can be attributed to two converging phenomena: the remarkable growth in computational power and a large surge in the availability of textual data. The core objective of this thesis is to delve deep into computational methods for understanding Language Change, with an emphasis on Lexical Semantic Change. This will be achieved by (i) systematically reviewing and comparing current state-of-the-approaches relying on Temporal Word Embeddings on different languages and benchmarks covering different historical periods (ii) designing and implementing novel models nurtured on synchronic data and evaluating their applicability in a diachronic context, (iii) devising datasets and tools to set performance standards for these models and further linguistic-aided analysis, and (iv) offering methodological perspectives on expansive, quantitative, longitudinal studies of Language Change, highlighting the critical points and the advantages to be gained from this type of analysis. The initial section of this thesis elucidates foundational concepts from the field of Historical Linguistics and Natural Language Processing. Additionally, it offers a comprehensive overview of cutting-edge computational strategies geared towards understanding Language Change and their consequential applications in Culturomics. Then, the main contributions are presented and discussed in the last chapter of the thesis addressing the Research Questions introduced in the first chapter.

Type
Publication
Computational Approaches to Language Change