Skip to main content

Sample and Computation Efficient Online Adaptation through Offline Reinforcement Learning

Supervisors

Suitable for

MSc in Advanced Computer Science

Abstract

In addressing real-world challenges, AI agents, often driven by expansive neural networks like Large Language Models (LLMs) such as GPT, face significant computational and sample-related costs during training and deployment. Notably, Reinforcement Learning (RL) agents frequently undergo training on vast offline datasets with relaxed computation budgets, followed by deployment or fine-tuning during an online stage that demands rapid computation. This project aims to investigate methods that capitalize on the disparity between offline and online computation budgets to streamline the training and deployment of RL agents.

The student undertaking this project will first delve into relevant literature by studying recommended papers. Subsequently, the student will implement offline RL methods to facilitate downstream online learning. In the third phase, the student will experiment with various online RL algorithm variants. The final algorithm will undergo rigorous empirical testing and comparison to validate its efficiency in handling the challenges posed by large models and varying computation budgets.