Research project web page

Optimisation for Game Theory and Machine Learning (funded by EPSRC)

10.11.23. The project will run for 3 years, starting on 1st Feb 2024. It will support two postdoctoral researchers: Dr. Edwin Lock (who was the named PDRA), plus another that I’m in the process of recruiting, to join for a 2-year term. The project includes funds to support travel and work with external collaborators.

Background

Algorithms for optimisation often in practice use local search approaches, for example, when the objective function is continuous and smooth, gradient descent is usually used (for example, in neural networks). In game-theoretic settings, local search arises naturally in the context of multiple agents who are attempting to improve their payoffs by best-responding to their peers’ behaviour. Of course, there is no general guarantee about the convergence of this process, it depends on the structure of the game. Local search often works surprising well in practice (even when worst-case is known to be poor) and it is of interest to understand why. The project aims to develop the theory of what is going on, and hopefully lead to improved algorithms. We will study various specific optimisation problems in more detail, as part of this agenda.

To some extent building on a recent paper I co-authored: The Complexity of Gradient Descent: CLS=PPAD∩PLS, link. Both machine learning and game theory have given rise to diverse problems of local optimisation, and it’s of interest to classify them according to their computational hardness. The project aims to study a general issue, which is the “hard in theory, easy in practice” phenomenon of these problems (so, an aspect of the “beyond worst-case analysis” agenda). We are also interested in designing novel algorithms with performance guarantees. In settings of continuous optimisation, where gradient descent is applicable, there are new and interesting variants of the the technique, for example ‘optimistic’ gradient descent, and the issue of how to adjust the step size, or learning rate, as the algorithm runs.

Example problems

By way of example, another recent paper Solving Strong-Substitutes Product-Mix Auctions (Baldwin et al. 2023), introduces a problem of minimising a difference of 2 functions f and g each of which belongs to a class known as M-convex functions. In general, fg need not be convex; the paper obtain a bound for computing a local optimum but the analysis is not tight; experimentally it works well. Our proposed agenda: seek an improved analysis. Investigate generalisations to larger classes of convex functions. Study more general problems of minimising non-convex functions composed of multiple “simple” functions.

In a “Tullock contest” (Tullock (2001)), agents compete for a prize, and the probability of winning is a function of an agent's effort divided by the total effort exerted by all contestants. Equilibrium search can sometimes be formulated as a local optimisation problem (Ewerhart 2017) but more general versions may have multiple equilibria (Choudhury and Sheremeta 2011), posing a new challenge. There are many open questions about the efficacy of local optimisation algorithms (for example, continuous fictitious play), and convergence rate.

A recent paper DOTE: Rethinking (Predictive) WAN Traffic Engineering (link) studies a scenario of continuous local optimisation in which there are no local optima. Gradient descent works well in practice, but there are new questions of analysing it in more detail, and optimising aspects such as the step size, for the class of problems of interest.

General algorithmic techniques

A second work package seeks improved general-purpose algorithms for classes of optimisation problems arising in theory and applications. Also techniques to identify when this is not possible, building on computational hardness results for Gradient Descent (Fearnley et al 2023) and related work in multi-objective optimisation (Daskalakis et al 2021).

Q: Can we identify a broad class of discrete optimisation problems for which there is a dichotomy theorem, saying that problems in the class are either PLS-complete or solvable in polynomial time? One natural way to formalise this question is with reference to Schaefer’s dichotomy theorem, in which we seek a “local maximal” assignment to the variables of a collection of weighted relations, as opposed to a satisfying assignment. (“Locally maximal” means that if we flip any variable in the assignment, the total weight of satisfied relations cannot be higher.) Motivation: PLS-hardness is the strongest kind of intractability of local optimisation problems. It is important to understand it in detail.

Understand the efficacy of local optimisation

Main approach: Continuous local optima have two distinct existence principles (corresponding to complexity classes PLS and PPAD), and these have associated algorithms. The algorithms (greedy local search, and Sperner’s lemma-style path-following) both avoid brute-force enumeration, in a fundamental way. Exploit this property to derive fast convergence, subject to benign assumptions on the way problem instances are generated.

Other approaches: (a) Investigate problems where it may be hard in practice: Gradient Descent is noted to converge slowly for complicated PDE problems. Look for further experimental negative results, which may indicate problem features that lead to tractability. (b) Extend Average-case analysis that has been applied in related computational search problems. (c) Identify classes of continuous local optimisation problems where it can be shown to converge rapidly, or probabilistic results about convergence with high probability. (d) Investigate the price of local optimality: in some cases it may be easy to find poor-quality local optima.

We envisage additional work on discrete local optimisation.