Time-Bandits:

Supervisors

Thomas Orton

Suitable for

Mathematics and Computer Science, Part C

Abstract

Consider the following problem: at each time step, you receive a dataset D_t you would like to fit (using e.g. a neural network or statistical model). You need to choose a hyper-parameter p_t for the model, and then you can run an optimization procedure (e.g. stochastic gradient descent) for time T_{p_t} with this hyper-parameter. Afterwards, you can measure the validation loss l_{t} of the fitted model on the dataset. If you are happy with the validation loss, you can move to the next time step. Otherwise, you can try to choose a new hyper-parameter (spending more compute time) and re-fit the model until you have an acceptable validation loss. > > The field of Multi Armed Bandits ( MAB ) is a rich research area which broadly answers the question of how to make decisions while minimizing regret. This has many applications in Machine Learning, Reinforcement Learning, Portfolio Optimization and Economics. However, there is an obstacle in applying solutions to the MAB for the above problem. This is because MAB is typically formalized by considering a loss function over actions, and considering how to pick a single action at each time step to minimize regret. In contrast, in our problem you may pick as many actions as you like (provided you pay for them with time), and you receive the minimum loss over the actions which you took at each time step. The idea of modelling time trade-offs is also a fundamental issue in practical machine learning which has only more recently been receiving attention in a theoretical context. > > The following project is ideal for a mathematically inclined 4th year student with the following objectives: > > 1. Develop some practically motivated formalisms for modelling time resources in the MAB setting. > 2. Develop a good understanding of existing MAB literature and how it relates to the time-resource MAB variants considered. > 3. Provide algorithmic solutions and theoretical lower-bounds to the time-resource MAB variants considered.

Time-Bandits:

Supervisors

Suitable for

Abstract

Student Space