FNRS contact group meeting: importance of data quality

Abstract

In this workshop held in French, Simon Hengchen will tackle the impact OCR can have on quantitative text analyses.

Date
Apr 16, 2024 1:30 PM
Location
Bruxelles, Belgium

Introduction

The next meeting of the FNRS contact group “Analyse critique et amélioration de la qualité de l’information numérique” will be held on Tuesday April 16, 2024 at 1:30 p.m. at the Université libre de Bruxelles (auditorium R42.2.103, building R, Solbosch campus).

Multidisciplinary, the group is located at the confluence of applied sciences and human and political sciences. The group, whose 30th anniversary we are celebrating this year, recently reunited in 2023.

The conference will be presented by Simon Hengchen, Doctor in Information Science from the Université libre de Bruxelles, his research focuses on the detection of semantic change in historical texts. After having led a scientific career on this subject in Sweden and Finland, he is currently a lecturer at the University of Geneva and a consultant in NLP.

The conference this year, “Approches quantitatives de textes historiques : quelques (non-) problèmes et comment les aborder ?” will examine, on the basis of concrete examples, the quality issues raised by optical character recognition (OCR) in NLP when it gives rise to “erratic” results. This subject, which is still little discussed, is also likely to evolve over time, with the evolution of algorithms and the languages processed. Simon Hengchen will pose the problem and indicate concrete ways to remedy it.

The meeting will end with a debate followed by a drink. Access to the meeting, which is financed by the Fonds National pour la Recherche Scientifique (FNRS), is free; However, it is essential to register for the event before April 9, 2024 at the latest via this form: https://forms.office.com/e/vGCe3GZSd2

Programme

1:30 p.m. Introduction, by Isabelle Boydens, President of the FNRS contact group “Analyse critique et amélioration de la qualité de l’information numérique”, Full Professor at ULB and head of the “Data Quality Competence Center” within the Smals Research department

1:35 p.m. “Approches quantitatives de textes historiques : quelques (non-) problèmes et comment les aborder ?” by Simon Hengchen, Doctor in Information Science from the Université libre de Bruxelles, lecturer at the University of Geneva and consultant in NLP (natural language processing).

2:35 p.m. Debate and round table. Moderators: Max De Wilde, Doctor in Information Science from the Université libre de Bruxelles, lecturer at the Université libre de Bruxelles and the University of Geneva and consultant in NLP and Guillaume Quintin, doctoral student in digital humanities (“Quantitative Digital Humanities” laboratory) and scientific assistant within the Master in Information and Communication Sciences and Technologies at the Université libre de Bruxelles.

3:35 p.m. Reception