Skip to main content

Relaxing Decisions under Uncertainty

Supervisors

Suitable for

MSc in Advanced Computer Science
Computer Science, Part C

Abstract

Prerequisites: CAFV, PMC 

 

Background 

Motivation: Markov Decision Processes (MDPs) are a foundational framework for sequential decision-making in stochastic environments and play a central role in artificial intelligence and, particularly, reinforcement learning. However, a key limitation of MDPs is their reliance on precisely specified transition probabilities. In realistic settings, accurately estimating these probabilities is often difficult or infeasible. To overcome this issue, robust MDPs (RMDPs) extend standard MDPs by allowing uncertainty sets over transition probabilities rather than fixed values. The standard objective in an RMDP is to find an optimal robust policy, i.e., one that maximizes the expected return under the worst-case (adversarial) transition probabilities within the uncertainty set.

While this adversarial assumption ensures robustness, it can also be overly conservative in practical applications where the environment does not actively oppose the agent. For example, consider an autonomous drone navigating through uncertain wind conditions: clearly, the wind might behave stochastically but not adversarially (the wind does not depend on the drone’s actions). Hence, optimizing only for the worst-case scenario can lead to unnecessarily cautious behaviour. This motivates our central research question:

Can we compute the “best” policy, while keeping into account that the environment might not act fully adversarially?

Related Work: Several research directions have considered this question by exploring relaxations of decision-making under uncertainty. Notably, the notion of best-effort introduced by (Faella, 2009) offers a game-theoretic relaxation of strictly winningstrategies. These ideas have since been extended to reactive synthesis, where, in the absence of a winning strategy, best-effort strategies can be computed efficiently. Building on this, recent work (Aminof, Giacomo, Rubin, & Zuleger, 2023) has adapted these concepts to stochastic games, and most recently, to robust MDPs as a tie-breaking criterion for optimal robust policies (Abate, Badings, De Giacomo, & Fabiano, 2026). This project aims to extend this line of research by developing a more flexible and theoretically grounded notion of best-effort robustness for RMDPs.

Additionally, related perspectives from fairness in decision processes (Wen, 2021), and policy permissiveness (Zhu & De Giacomo, 2022) might offer valuable insights. Part of the project will involve a thorough literature review to identify conceptual and methodological connections among these lines of research and our proposed approach.

Focus

This project explores how to design policies for decision-making with robust MDPs, which are both robust but also not overly conservative. The main research question is: can we compute the “best” policy while accounting for the fact that the environment may not act fully adversarially? Expected contributions include one or more of the following:

Define an ε-relaxed “best-effort optimal robustness” concept for RMDPs.

Leverage best-effort to develop more efficient yet “robust enough” alternatives to standard methods, e.g, robust value iteration.

Explore related concepts and propose new ways to balance robustness with expected environmental behavior.

Method

The project will begin with a literature review of the relevant works discussed above. Building on these foundations, we will develop and formalize a new notion of optimality, e.g., “ε-relaxed best-effort optimal robustness”, for RMDPs, examining theirtheoretical properties and relationship to existing approaches. The final phase will focus on implementing and evaluating the proposed notions through numerical experiments on common robust MDP benchmarks. The expected outcomes include a proof-of-concept implementation, a clear theoretical framework, and, if results are promising, a publication-quality paper summarizing the findings.

Bibliography

Abate, A., Badings, T., De Giacomo, G., & Fabiano, F. (2026). Best-Effort Policies for Robust Markov Decision Processes. 40th AAAI Conference on Artificial Intelligence, (p. TBD). Retrieved from https://arxiv.org/abs/2508.07790

Aminof, B., Giacomo, D. G., Rubin, S., & Zuleger, F. (2023). Stochastic Best-Effort Strategies for Borel Goals. LICS (pp. 1--13). IEEE.

Faella, M. (2009). Admissible Strategies in Infinite Games over Graphs. MFCS -- Lecture Notes in Computer Science, 5734, 307--318. Retrieved from https://link.springer.com/chapter/10.1007/978-3-642-03816-7_27

Wen, M. a. (2021). Algorithms for Fairness in Sequential Decision Making. 24th International Conference on Artificial Intelligence and Statistics (pp. 1144--1152). PMLR.

Zhu, S., & De Giacomo, G. (2022). Synthesis of Maximally Permissive Strategies for LTLf Specifications. IJCAI, (pp. 2783--2789). Retrieved from https://doi.org/10.24963/ijcai.2022/386