Skip to main content

The Early Modern Text Lab

Supervisor

Suitable for

MSc in Advanced Computer Science
Computer Science, Part C
Computer Science, Part B

Abstract

The Early Modern Text Lab is a proposed new digital humanities platform that uses AI to turn large quantities of historical texts into richly structured, queryable data. At its core is a tagging engine that does two things simultaneously: (1) it recognizes and applies an existing controlled vocabulary of people, places, commodities, events, legal roles, and conceptual categories, and (2) it identifies new words, phrases, and concepts that ought to be tagged but are not yet part of the vocabulary. Every upload triggers this dual process—AI applies known tags with contextual sensitivity (handling variant spellings, sense distinctions, and case-specific roles), while also proposing new entries that the scholar can approve or reject. Over time, the system becomes increasingly intelligent: its vocabulary expands, its tagging accuracy improves, and its sense-disambiguation and entity-matching become more precise.

Built on top of this growing layer of structured annotations is an environment for serious historical analysis. Once documents are tagged, the Lab allows scholars to query their corpus in sophisticated ways: finding all events of a particular type, mapping relationships among people and places, tracing changes over time, or identifying patterns that would be invisible in raw text and impossible for a human to identify without AI assistance. A student taking on this project would design and implement the core platform—AI tagging and suggestion pipelines, human-in-the-loop review workflows, metadata handling, and a knowledge-graph backend—creating a tool that lets historians do at scale what is extremely difficult to do now: ask complex, data-driven research questions directly of thousands of early modern documents.