Conditional expected accumulated rewards in Markov decision processes
- 11:00 27th February 2017 ( week 7, Hilary Term 2017 )Room 441, Wolfson Building, Parks Road
The talk addresses the problem of computing maximal conditional expected accumulated rewards until reaching a target state, briefly called maximal conditional expectations, in finite-state Markov decision processes (MDPs) where the condition is given as a reachability constraint. Conditional expectations of this type can, e.g., stand for the maximal expected termination time of probabilistic programs with non-determinism, under the condition that the program eventually terminates, or for the worst-case expected penalty to be paid, assuming that at least three deadlines are missed. The problem to check whether the maximal conditional expectation is finite is in P. Assumung finiteness, the problem to check whether the maximal conditional expectation exceeds a given rational threshold is PSPACE-hard for general MDPs and PSPACE-complete for acyclic MDPs. An iterative strategy-improvement algorithm can be used to compute the maximal conditional expectation and an optimal scheduler in exponential time.