AI Safety and Theoretical Computer Science
- 14:00 31st May 2024 ( Trinity Term 2024 )Lecture Theatre A, Department of Computer Science, Wolfson Building, Parks Road, Oxford,OX1 3QD
Title: AI Safety and Theoretical Computer Science
Abstract: Progress on AI safety and alignment, like the current AI revolutionmore generally, has been almost entirely empirical. In this talk, however,I'll survey a few areas where I think theoretical computer science cancontribute to AI safety, including:
- How can we robustly watermark the outputs of Large Language Models andother generative AI systems, to help identify academic cheating, deepfakes,and AI-enabled fraud? I'll explain my proposal and its basic mathematical properties, as well as what remains to be done.
- Can one insert undetectable cryptographic backdoors into neural nets, for good or ill? In what senses can those backdoors also be unremovable? How robust are they against fine-tuning?
- Should we expect neural nets to be "generically" interpretable? I'll discuss a beautiful formalization of that question due to Paul Christiano, along with some initial progress on it, and an unexpected connection to quantum computing.