Critical and Data-Centric AI

Recently, my research has expanded to include critical AI. This includes articulating critical concerns regarding the use of large language models in digital and computational humanities (Lang, 2026), as well as addressing issues such as carbon reporting and the environmental impact of computational research (missing reference) . I have also explored the lure of plausibility in using LLMs for OCR in non-Western languages such as Arabic (Lang et al., 2026). Through this work, I am increasingly developing what I would like to call a critical computational humanities that that articulates principles for good scholarly practice in contexts that use digital, computational and AI methods. I aim to articulate a disciplinary ethics beyond compliance, providing a framework for critically engaging with the epistemic and ethical challenges posed by data-driven research.

References

2026

Critical Concerns for Using LLMs in the (Computational) Humanities and Beyond

Sarah Lang

In Understanding Science with Large Language Models?, 2026

Abs Bib PDF

This chapter maps ethical and epistemic risks associated with LLMs, including opaque and biased training data, “open-washing,” exploitative data work, environmental costs, and threats posed by hallucinations and paper mills. It critiques explainable AI as insufficient and foregrounds dataset documentation and auditing as more meaningful approaches to addressing structural bias. Finally, it advocates for frameworks grounded in care and solidarity and emphasises the need for stronger institutional requirements to promote more ethical AI practices.
@incollection{Lang2026LLM_criticalHPSS, author = {Lang, Sarah}, title = {Critical Concerns for Using LLMs in the (Computational) Humanities and Beyond}, booktitle = {Understanding Science with Large Language Models?}, editor = {Simons, Arno and Wüthrich, Adrian and Zichert, Michael and Graßhoff, Gerd}, publisher = {transcript}, address = {Bielefeld}, pages = {33--48}, year = {2026}, url = {https://www.transcript-verlag.de/978-3-8376-7994-6/understanding-science-with-large-language-models/?number=978-3-8394-4752-9}, }
Confabulated Transliterations? Managing the Lure of Plausibility in LLM-Detected Arabic Terms in an Early Modern Lexicon

Sarah Lang, Jonas Müller-Laackmann, Hazem Lashen, and 1 more author

In Critical Approaches to Automated Text Recognition, 2026

Forthcoming

Abs Bib

This article examines the problem of confabulated transliterations in the context of large language models (LLMs) used for automated text recognition (ATR) tasks. Focusing on a digital humanities case study, we analyse an early 17th-century Latin-German alchemical dictionary that also contains multilingual entries, including Arabic-derived terms. Our aim was to use LLMs in an OCR post-correction workflow to identify these Arabic terms, which may have been creatively transliterated into early modern Latin. The transcription was only partially clean, further complicating the task with the potential presence of OCR noise. Positioned between post-correction and multilingual retrieval, this use case highlights both the potential and the limitations of LLMs in historical multilingual settings. In particular, we explore the lure of plausibility and the difficulty of verifying LLM-generated outputs, especially when dealing with under-resourced or especially ambiguous languages such as Arabic. The article reflects on existing strategies for managing such challenges while demonstrating how LLMs can mislead through confident yet inaccurate suggestions. We argue for a cautious, critically informed approach to LLM use in the humanities and offer an illustrative example of some of the problems that emerge in historically and linguistically complex scenarios.
@incollection{LangEtAl2026Confabulated, author = {Lang, Sarah and M{\"u}ller-Laackmann, Jonas and Lashen, Hazem and Mahootian, Farzad}, title = {Confabulated Transliterations? Managing the Lure of Plausibility in LLM-Detected Arabic Terms in an Early Modern Lexicon}, booktitle = {Critical Approaches to Automated Text Recognition}, editor = {Terras, Melissa and Gooding, Paul and Ames, Sarah and Nockels, Joe}, publisher = {Facet Publishing}, address = {London}, year = {2026}, note = {Forthcoming}, }