Machine Learning for Digital Scholarly Editions (School)

This summer school introduces machine learning methods for digital scholarly editions, combining practical applications with the theoretical foundations of topic modelling and text analysis.

Instructor: Martina Scholger, Roman Bleier, Bernhard Geiger, Sarah Lang et al.

Course Overview

The school introduces participants to machine learning approaches for the analysis and enrichment of textual data in the Digital Humanities. Using the Python library BERTopic, participants explore topic modelling workflows and gain practical experience with key machine learning concepts, including text embeddings, dimensionality reduction, and clustering.

Designed for both students and researchers, the school combines hands-on exercises with an accessible introduction to the theoretical foundations of machine learning. Participants learn how to apply BERTopic to historical texts while developing a critical understanding of the strengths, limitations, and potential applications of machine learning methods in digital scholarly editing.

This summer school was co-organised by Sarah Lang, Martina Scholger, Roman Bleier, Bernhard Geiger, and colleagues as the 2025 CLARIAH-AT school.

The teaching materials are available online.