Dataset Documentation, Auditing and Data Work

Data (work) shapes results, yet we don't document or even understand our datasets well enough. I argue for standardized documentation and auditing pracitces as well as properly valuing data work.

Through my engagement with critical digital humanities, I'm increasingly interested in the study of data work and its influence on processes of knowledge production (Gengnagel & Lang, 2026). After preliminary work on dataset documentation practices (Lang, 2025), I have articulated dataset audit as a practical strategy for identifying and addressing data gaps (missing reference). This is closely connected to fields such as critical AI studies, which increasingly recognise data and data work as central sources of bias and unethical outcomes in computational systems. Recognizing the sheer amount of hidden labour and data work required for computational humanities research, I have examined what it means to _do_ computational history (Lang, 2026). I have also investigated invisibilised labour in Computational Humanities contexts through the lenses of data work and the invisible technician discourse (Lang, 2027).

References

2027

  1. Two-tier Computational Humanities: A Labour History of Undervalued Contributions in DH
    Sarah Lang
    In The De Gruyter Handbook of Feminist Digital Scholarship, Berlin/Boston, 2027

2026

  1. A Discipline, Divided: On the Digital Humanities and Ideologies of Knowledge Work
    Tessa Gengnagel and Sarah Lang
    In DH2026 Book of Abstracts, 2026
  2. (Doing) Computational History: On the Role of Data Work in Computational Approaches
    Sarah Lang
    Histories, 2026

2025

  1. Documenting Datasets as a Tool for Change
    Sarah Lang
    In Digital Humanities 2025: Book of Abstracts, 2025