Transfer via Inter−Task Mappings in Policy Search Reinforcement Learning
Matthew E. Taylor‚ Shimon Whiteson and Peter Stone
The ambitious goal of transfer learning is to accelerate learning on a target task after training on a different, but related, source task. One recent reinforcement learning transfer approach, transfer via inter-task mapping for value function methods (TVITM-VF), works by utilizing a transfer functional to map a learned value function in the source task to an initial value function in the target task. While TVITM-VF has successfully enabled transfer across tasks with different state and action spaces, and with several kinds of function approximators, it is applicable only to learning methods that use value functions, such as temporal difference methods. This paper extends TVITM-VF to policy search methods (TVITM-PS) and in particular shows how to construct a transfer functional to translate a population of neural network policies trained via policy search from a source task to a target task. Empirical results in robot soccer Keepaway and Server Job Scheduling show that TVITM-PS can markedly reduce learning time when full inter-task mappings are available. The results also demonstrate that TVITM-PS still succeeds when given only incomplete inter-task mappings. Furthermore, we present a novel method for learning such mappings when they are not available, and give results showing they perform comparably to hand-coded mappings.