Skip to main content

Foundations of Self-Programming Agents:  2022-2023

Lecturer

Degrees

Schedule C1 (CS&P)Computer Science and Philosophy

Schedule C1Computer Science

Schedule C1Mathematics and Computer Science

Hilary TermMSc in Advanced Computer Science

Term

Overview

This course studies the foundations of how to build autonomous agents that have the ability to self-programming their behavior to accomplish desired tasks in their environment. This is related to Reasoning about Actions and Planning in AI and to Program Synthesis in Formal Methods but focuses on tasks (goals) expressed over finite traces. Specifically, borrowing from Formal Methods the course will consider tasks and environment specifications expressed in LTL, as well as its finite trace variant LTLf. It will review the main results and algorithmic techniques to handle planning in nondeterministic domains. Then, it will draw connections with verification, and reactive synthesis, together with their game-theoretic solution techniques. The main catch is that working with these logics can be based on devising suitable 2-player games and finding strategies, i.e., plans, to win them. Specifically, we will cover the following topics: planning in nondeterministic domains, temporal logics, LTL, LTLf, game-theoretic techniques, safety games, reachability games, games for LTLf objectives, LTLf reactive synthesis. Also, connections with decision processes and reinforcement learning will be considered.

Learning outcomes

On the completion of this course, the student will be able to: 1. Critically appreciate and discuss various concepts at the base of reasoning about actions, planning, and program synthesis, and understand that concepts and techniques from both Artificial Intelligence and Formal Methods can be applied in synergy to develop significant self-programming capabilities in intelligent agents. 2. Apply program synthesis algorithms and techniques learned during the course to diverse contexts within the development of intelligent agents. 3. Use the concepts, techniques and algorithms learned in the course to devise forms of self-programming capabilities for a variety of systems, even out of classical AI.

Mode of Examination The course will be examined by a take-home mini-project that requires students to either put into practice some of the concepts they learned during the course.

Prerequisites

Artificial intelligence, Computer-Aided Formal Verification (or equivalent courses).

Synopsis

Part 1: Exploiting models of the environment to reason on consequences of actions (3 lectures)

In this part of the course, we study classical Reasoning about Actions developed in Knowledge Representation. The notion of acting in an environment, how classically the environment is specified, how actions and their effects are modeled, how to reason to understand if a certain course of action can indeed be executed and what effects it will bring about.

References:[1]

  • John McCarthy: Programs with Common Sense. Teddington Conference on the Mechanization of Thought Processes in December 1958 *
  • John McCarthy and Patrick J. Hayes: Some Philosophical Problems from the Standpoint of Artificial Intelligence. Machine Intelligence 4, 463—502, 1969. *
  • Ray Reiter: Knowledge in Action: Logical Foundations for Specifying and Implementing Dynamical Systems, MIT Press 2001 (Chapters 2, 3, 4) *

 

Part 2: Planning courses of action in the environment (3 lectures + 1 class)

In this part of the course, we study advanced forms of planning in AI, in particular, planning in so-called “nondeterministic domains”. We show how languages for expressing planning domains are based on the insight gained by studying Reasoning about Actions. We consider what kind of solution concepts are needed when planning in nondeterministic domains, and how these are related to strategic reasoning in 2-player games.

References:

  • Malik Ghallab, Dana Nau, Paolo Traverso: Automated Planning and Acting, Cambridge University Press, 2016, Chapter 5 *
  • Patrik Haslum, Nir Lipovetzky, Daniele Magazzeni Christian Muise: An Introduction to the Planning Domain Definition Language. Morgan & Claypool. 2019 (Chapters 2,3, and 7) *

 

Part 3: LTL and LTLf as a specification language (2 lectures)

In this part of the course, we review Linear Time Logic on infinite and finite traces used in Formal Methods for specifying dynamic systems. We intend to use it as a specification language for specifying complex tasks as well as environment properties. We review Safety Properties, Guarantee Properties, and forms of Fairness and Stability, including GR(1). We also review connections to automata and their use in model checking.

References:

  • Allen Emerson: Temporal and Modal Logic. Handbook of Theoretical Computer Science 1990: 995-1072 *
  • Moshe Y. Vardi: An Automata-Theoretic Approach to Linear Temporal Logic. Banff Higher Order Workshop 1995, LNCS 1042: 238-266, Springer 1995 *
  • Zohar Manna, Amir Pnueli: A Hierarchy of Temporal Properties. PODC 1990: 377-410 *
  • Giuseppe De Giacomo, Moshe Y. Vardi: Linear Temporal Logic and Linear Dynamic Logic on Finite Traces. IJCAI 2013: 854-860 *
  • Bowen Alpern, Fred B. Schneider: Recognizing Safety and Liveness. Distributed Comput. 2(3): 117-126 (1987)
  • Giuseppe De Giacomo, Antonio Di Stasio, Francesco Fuggitti, Sasha Rubin: Pure-Past Linear Temporal and Dynamic Logic on Finite Traces. IJCAI 2020: 4959-4965

 

Part 4: LTL and LTLf Reactive Synthesis (4 lectures + 1 class)

In this part of the course, we introduce synthesis from LTL and LTLf specifications. The idea is that, given a logical specification, we can automatically synthesize a program that is guaranteed to fulfill the specification. We focus specifically on “reactive synthesis”, i.e., synthesize programs for interactive/reactive ongoing computations (protocols, operating systems, controllers, robots, etc.). This kind of synthesis is deeply related to planning in nondeterministic environments. We consider mainly synthesis for LTLf specifications, but we will also study synthesis for some restricted forms of LTL specifications, such as safety properties, and limited forms of guarantee, reactivity, fairness, including GR(1), and stability.

References:

  • Amir Pnueli, Roni Rosner: On the Synthesis of a Reactive Module. POPL 1989: 179-190 *
  • Bernd Finkbeiner: Synthesis of Reactive Systems. Dependable Software Systems Engineering 2016: 72-98 *
  • Rüdiger Ehlers, Stéphane Lafortune, Stavros Tripakis, Moshe Y. Vardi: Supervisory control and reactive synthesis: a comparative introduction. Discret. Event Dyn. Syst. 27(2): 209-260 (2017)
  • Giuseppe De Giacomo, Moshe Y. Vardi: Synthesis for LTL and LDL on Finite Traces. IJCAI 2015: 1558-1564 *
  • Giuseppe De Giacomo, Moshe Y. Vardi: LTLf and LDLf Synthesis under Partial Observability. IJCAI 2016: 1044-1050 *
  • Roderick Bloem, Barbara Jobstmann, Nir Piterman, Amir Pnueli, Yaniv Sa'ar: Synthesis of Reactive(1) designs. J. Comput. Syst. Sci. 78(3): 911-938 (2012)
  • Giuseppe De Giacomo, Moshe Y. Vardi, Paolo Felli, Natasha Alechina, Brian Logan: Synthesis of Orchestrations of Transducers for Manufacturing. AAAI 2018: 6161-6168
  • Blai Bonet, Giuseppe De Giacomo, Hector Geffner, Fabio Patrizi, Sasha Rubin: High-level Programming via Generalized Planning and LTL Synthesis. KR 2020: 152-161

 

Part 5: LTL and LTLf Reactive Synthesis under Environment Specifications (4 lectures + 1 class)

In this part of the course, we introduce Reactive Synthesis under Environment Specifications. That is, we take advantage of the knowledge about the environment's behavior in achieving the agent tasks. We consider how to obtain the game arena out of the LTL and LTLf specifications of both the agent task and the environment behaviors, and how to solve safety games, reachability games, games for LTLf objectives, and for objectives expressed in fragments of LTL. We also discuss how winning regions of such arenas are related to the notion of “being able” to achieve desired properties (without necessarily committing to a specific strategy to do so). We focus on agent tasks that eventually terminate and hence are specified in LTLf. While for the environment we focus on Safety specifications and limited forms of guarantee, reactivity, fairness, and stability.

References:

  • Martin Zimmermann, Felix Klein, and Alexander Weinert: Infinite Games (Lectures Notes) 2016 (Chapters 1 and 2) *
  • Giuseppe De Giacomo, Sasha Rubin: Automata-Theoretic Foundations of FOND Planning for LTLf and LDLf Goals. IJCAI 2018: 4729-4735 *
  • Benjamin Aminof, Giuseppe De Giacomo, Aniello Murano, Sasha Rubin: Planning under LTL Environment Specifications. ICAPS 2019: 31-39 *
  • Giuseppe De Giacomo, Antonio Di Stasio, Moshe Y. Vardi, Shufang Zhu: Two-Stage Technique for LTLf Synthesis Under LTL Assumptions. KR 2020: 304-314 *
  • Giuseppe De Giacomo, Antonio Di Stasio, Lucas M. Tabajara, Moshe Y. Vardi, Shufang Zhu: Finite-Trace and Generalized-Reactivity Specifications in Temporal Synthesis. IJCAI 2021: 1852-1858 *
  • Giuseppe De Giacomo, Antonio Di Stasio, Giuseppe Perelli, Shufang Zhu: Synthesis with Mandatory Stop Actions. KR 2021: 237-246 *
  • Alberto Camacho, Meghyn Bienvenu, Sheila A. McIlraith: Finite LTL Synthesis with Environment Assumptions and Quality Measures. KR 2018: 454-463
  • Ronen I. Brafman, Giuseppe De Giacomo: Planning for LTLf /LDLf Goals in Non-Markovian Fully Observable Nondeterministic Domains. IJCAI 2019: 1602-1608
  • Benjamin Aminof, Giuseppe De Giacomo, Sasha Rubin: Stochastic Fairness and Language-Theoretic Fairness in Planning in Nondeterministic Domains. ICAPS 2020: 20-28
  • Shufang Zhu, Giuseppe De Giacomo, Geguang Pu, Moshe Y. Vardi: LTLf Synthesis with Fairness and Stability Assumptions. AAAI 2020: 3088-3095
  • Benjamin Aminof, Giuseppe De Giacomo, Sasha Rubin: Best-Effort Synthesis: Doing Your Best Is Not Harder Than Giving Up. IJCAI 2021: 1766-1772

 

Part 6: LTL and LTLf Reactive Synthesis Techniques (2 lectures + 1 class)

In this part of the course, we further examine synthesis algorithms based on 2-player game-theoretic techniques. Specifically, we consider both symbolic algorithms and algorithms that explore the search space on-the-fly while synthesizing the program.

References:

  • Shufang Zhu, Lucas M. Tabajara, Jianwen Li, Geguang Pu, Moshe Y. Vardi: Symbolic LTLf Synthesis. IJCAI 2017: 1362-1369 *
  • Shufang Zhu, Geguang Pu, Moshe Y. Vardi: First-Order vs. Second-Order Encodings for LTLF-to-Automata Translation. TAMC 2019: 684-705 *
  • Shengping Xiao, Jianwen Li, Shufang Zhu, Yingying Shi, Geguang Pu, Moshe Y. Vardi: On-the-fly Synthesis for LTL over Finite Traces. AAAI 2021: 6530-6537 *
  • Shufang Zhu, Lucas M. Tabajara, Jianwen Li, Geguang Pu, Moshe Y. Vardi: A Symbolic Approach to Safety LTL Synthesis. Haifa Verification Conference 2017: 147-162
  • Lucas M. Tabajara, Moshe Y. Vardi: LTLf Synthesis under Partial Observability: From Theory to Practice. GandALF 2020: 1-17
  • Suguman Bansal, Yong Li, Lucas M. Tabajara, Moshe Y. Vardi: Hybrid Compositional Reasoning for Reactive Synthesis from Finite-Horizon Specifications. AAAI 2020: 9766-9774
  • Giuseppe De Giacomo, Marco Favorito: Compositional Approach to Translate LTLf/LDLf into Deterministic Finite Automata. ICAPS 2021: 122-130

 

Part 7: Learning environment behaviors and performing tasks (2 lectures + 1 class)

In this last part of the course, we make connections with Decision Processes and Reinforcement Learning. The point is that in some cases the agent has a simulator of the environment instead of a formal specification, so it needs to learn its strategies to achieve its task in the environment. Sometimes even the task is only implicitly specified through rewards. The key issue is that the kind of properties we are often interested in (the ones that we specify in LTL or LTLf ) are non-Markovian, and hence we need to introduce non-Markovian characteristics in decision processes and reinforcement learning.

References:

  • Ronen I. Brafman, Giuseppe De Giacomo, Fabio Patrizi: LTLf/LDLf Non-Markovian Rewards. AAAI 2018: 1771-1778 *
  • Rodrigo Toro Icarte, Toryn Q. Klassen, Richard Anthony Valenzano, Sheila A. McIlraith: Using Reward Machines for High-Level Task Specification and Decomposition in Reinforcement Learning. ICML 2018: 2112-2121 *
  • Giuseppe De Giacomo, Luca Iocchi, Marco Favorito, Fabio Patrizi: Foundations for Restraining Bolts: Reinforcement Learning with LTLf/LDLf Restraining Specifications. ICAPS 2019: 128-136 *
  • Ronen I. Brafman, Giuseppe De Giacomo: Regular Decision Processes: A Model for Non-Markovian Domains. IJCAI 2019: 5516-5522 *
  • Bruno Lacerda, David Parker, Nick Hawes: Optimal Policy Generation for Partially Satisfiable Co-Safe LTL Specifications. IJCAI 2015: 1587-1593
  • Alberto Camacho, Rodrigo Toro Icarte, Toryn Q. Klassen, Richard Anthony Valenzano, Sheila A. McIlraith: LTL and Beyond: Formal Languages for Reward Function Specification in Reinforcement Learning. IJCAI 2019: 6065-6073
  • Giuseppe De Giacomo, Marco Favorito, Luca Iocchi, Fabio Patrizi, Alessandro Ronca: Temporal Logic Monitoring Rewards via Transducers. KR 2020: 860-870
  • Alessandro Ronca, Giuseppe De Giacomo: Efficient PAC Reinforcement Learning in Regular Decision Processes. IJCAI 2021: 2026-2032
  • Kishor Jothimurugan, Suguman Bansal, Osbert Bastani, Rajeev Alur: Compositional Reinforcement Learning from Logical Specifications. NeuIPS 2021.
  • Alberto Camacho, Oscar Chen, Scott Sanner, Sheila A. McIlraith: Non-Markovian Rewards Expressed in LTL: Guiding Search Via Reward Shaping. SOCS 2017: 159-160
  • Rodrigo Toro Icarte, Ethan Waldie, Toryn Q. Klassen, Richard Anthony Valenzano, Margarita P. Castro, Sheila A. McIlraith: Learning Reward Machines for Partially Observable Reinforcement Learning. NeurIPS 2019: 15497-15508
  • Maor Gaon, Ronen I. Brafman: Reinforcement Learning with Non-Markovian Rewards. AAAI 2020: 3980-3987
  • Eden Abadi, Ronen I. Brafman: Learning and Solving Regular Decision Processes. IJCAI 2020: 1948-1954
  • Mohammadhosein Hasanbeig, Natasha Yogananda Jeppu, Alessandro Abate, Tom Melham, Daniel Kroening: DeepSynth: Automata Synthesis for Automatic Task Segmentation in Deep Reinforcement Learning. AAAI 2021: 7647-7656

 

[1] Main references denoted with “*”.

Syllabus

This course brings together competences from two distinct research areas, namely Reasoning about Actions and Planning in AI, and Verification and Synthesis in Formal Methods.  The aim is to demonstrate how the cross-fertilization of these two areas charts a novel path for building AI agents that can program themselves to handle rich objectives in complex, partially known, unpredictable environments.

Reading list

  • Course lectures notes
  • Scientific articles as mentioned above

Feedback

Students are formally asked for feedback at the end of the course. Students can also submit feedback at any point here. Feedback received here will go to the Head of Academic Administration, and will be dealt with confidentially when being passed on further. All feedback is welcome.

Taking our courses

This form is not to be used by students studying for a degree in the Department of Computer Science, or for Visiting Students who are registered for Computer Science courses

Other matriculated University of Oxford students who are interested in taking this, or other, courses in the Department of Computer Science, must complete this online form by 17.00 on Friday of 0th week of term in which the course is taught. Late requests, and requests sent by email, will not be considered. All requests must be approved by the relevant Computer Science departmental committee and can only be submitted using this form.