An AI Co-Scholar for Economic History
Supervisor
Suitable for
Abstract
Title: An AI Co-Scholar for Economic History
Vision
Progress in economic history depends on researchers formulating novel hypotheses, constructing datasets from primary sources, and embedding statistical findings within the historical context. The field’s central bottleneck is that datasets remain extremely scarce: information extraction from primary sources is still performed manually due to their heterogeneity, which severely limits the scale and scope of empirical analysis.
We propose to develop a multi-agent reasoning system that accelerates each stage of the empirical research pipeline in economic history. The system will include: (1) a Hypothesis and Literature Agent that surveys existing research, identifies theoretical mechanisms and existing datasets, as well as generates new hypotheses; (2) a Primary Source Discovery Agent that locates and retrieves image scans from public digital archives, including associated copyright information; (3) a Dataset Construction Agent that performs information extraction on heterogeneous primary sources (Gothic, Antiqua, handwriting, complex layouts), whether provided by the economic historian from their own scanned materials or retrieved from public digital archives; (4) a Cleaning and Linking Agent that standardizes extracted data and links individuals, firms, and locations across the constructed datasets; and (5) an Analysis Agent that evaluates hypotheses using econometric methods (difference-in-differences, instrumental variables, and other causal identification strategies). All agents must operate with transparent, traceable decision logs and the system should remain steerable for economic historians through a human-in-the-loop interface.
This system also provides an environment for AI safety and mechanistic interpretability work on historical reasoning. In the long-term, the goal is to build agent-based simulations to study historical economic systems through the lens of complexity economics, using the collected archival image scans as their empirical foundation.
Proposals
1. Hypothesis and Literature Agent for Economic History
The Hypothesis and Literature Agent will read the relevant literature, locate publicly available datasets, and identify the mechanisms proposed by economic historians. It will synthesize this information to generate new hypotheses and rank them by their feasibility. The agent will also clearly distinguish between hypotheses that can be examined with existing datasets after minimal linking or restructuring and those that require constructing new datasets from primary sources because key variables are missing.
Related Work from different domains:
Towards an AI Co-Scientist, https://arxiv.org/abs/2502.18864
2. Primary Source Discovery Agent for Archival Image Scans
Governments and private companies worldwide have invested substantial resources in scanning and digitizing historical sources, and many of these materials are now publicly accessible online. Yet they remain scattered across thousands of archives, museum, and company websites. Agentic methods make it possible to automate the retrieval of these archival image scans and assemble them into a unified database. Such a database must contain not only the images but also all relevant copyright information and metadata. Developing an agentic web scraper for historical sources would create a large, centralized corpus of primary source images. This would provide the foundation for large-scale empirical research, since multimodal large language models can read, transcribe, and extract structured information from these scans to generate datasets suitable for statistical analysis.
One example of a publicly accessible digital archive:
University of Mannheim, https://digi.bib.uni-mannheim.de
3. Dataset Construction Agent for Archival Image Scans
This agent builds on our experience using multimodal LLMs to process archival image scans, including double-column patent records from Imperial Germany (forthcoming) and eighteenth- and nineteenth-century German city directories. These sources vary widely in layout, structure, and font, and our work demonstrates that current models perform reliably when pages are processed independently but remain unsuitable for full-volume processing. Although frontier models allow extremely large input context windows, often in the millions of tokens, their maximum output window remains far smaller: for example, Gemini-2.5-Pro is limited to 65,536 output tokens. This restricts the length of transcriptions or extracted fields that can be returned in a single model call, alongside the already well-documented degradation in model performance as context length increases. Developing this agent is therefore a prerequisite for converting large archival PDFs into structured datasets suitable for statistical analysis.
Relevant literature:
Multimodal LLMs for OCR, OCR Post-Correction, and Named Entity Recognition in Historical Documents, https://arxiv.org/abs/2504.00414
4. LLM-based Dataset Linking Agent for Historical Microdata
Any historical source provides only a single snapshot. To study human lives in the past, we must link many such snapshots across sources and across time. Economic historians have made progress by linking individuals in historical censuses, but existing methods rely on rules-based algorithms with limited accuracy and scale. Our aim is to develop an LLM-based system for record linkage that can trace individuals both across different years of the same source and across entirely different categories of sources. The same person may appear in birth registers, marriage records, city directories, newspapers, probate files and other documents throughout their lifetime. We will build on our city-directory work, which already covers thousands of German directories from the eighteenth and nineteenth century, and will use location, occupation, kinship terms and other contextual cues to anchor identities. Our objective is to establish a state-of-the-art approach to LLM-based dataset linking. A successful solution would allow researchers to reconstruct the lives of ordinary people as coherent life histories rather than isolated fragments, thereby opening new possibilities for historical microdata and quantitative economic history.
Current state-of-the-art approach in economic history:
Automated Linking of Historical Data, https://www.aeaweb.org/articles?id=10.1257/ jel.20201599
5. Historical Georeferencing with multimodal LLMs
Historical city directories contain rich address information for individuals, firms and institutions, yet the physical structure of cities has changed dramatically over the past centuries. Researchers who wish to study urban development, spatial inequality or neighborhood dynamics must therefore spend years manually georeferencing these addresses. We propose to develop a system that uses multimodal LLMs, historical maps and modern coordinate data to automate this process. Combined with the large corpus of German and European city directories we have already collected, this approach would make it possible to reconstruct historical urban environments at scale, and ultimately, to analyze long-run urban change with far greater precision and speed.
6. Mechanistic Interpretability of Multimodal LLMs on Historical Data: Images vs Text
A central open question in AI for historical research is whether large language models reason in the same way when they receive historical information as raw text or as images containing the same text. Public archives hold vast numbers of image scans. Many of these can also be processed through OCR before being fed into an LLM. This raises a straightforward but important question: do models extract and interpret information differently depending on whether the input is an image or OCR text?
We aim to study how multimodal LLMs internally represent historical scripts such as Gothic and Antiqua. We will then compare these representations to those obtained when the identical content is provided as OCR text. The objective is to map differences in the model’s activation space, to see whether specialized pathways emerge for particular scripts and data formats. We will also investigate whether the model develops something resembling “OCR attention heads” for historical writing systems.
Relevant literature:
- Interpreting Attention Heads for Image-to-Text Information Flow in Large Vision-Language Models, https://arxiv.org/abs/2509.17588
How Do Large Vision-Language Models See Text in Image? Unveiling the Distinctive Role of OCR Heads, https://arxiv.org/abs/2505.15865