Leveraging Machine Learning for Multi-Agent Path Finding

Supervisor

Sara Bernardini

Suitable for

MSc in Advanced Computer Science

Computer Science, Part C

Computer Science, Part B

Abstract

Prerequisites: Artificial Intelligence, programming

Background

Multi-Agent Path Finding (MAPF) is a core optimisation problem in multi-agent systems: given many agents sharing a space, compute time-coordinated, collision-free paths from their start locations to their assigned goals. Contemporary applications are many and include automated warehouses, UAV/drone flight scheduling and urban air-traffic management, autonomous-vehicle and robotaxi fleets, port and airport ground-vehicle coordination, factory and semiconductor-fab AGVs/AMHS, sidewalk delivery robots, hospital logistics robots, and crowd/character routing in games and simulation. All of these applications involve large agent populations and complex environments, making the MAPF problem very challenging to solve.

MAPF has been driven primarily by search-based approaches, with decades of work in heuristics and combinatorial optimisation. Although remarkable progress has been made in MAPF technologies, their deployment in real-world scenarios still encounters significant challenges. Computing optimal MAPF plans is NP-hard, so practical deployments typically use bounded-suboptimal or anytime methods that trade exact optimality for speed while trying to preserve high solution quality. In addition, most MAPF planners are centralised, i.e. one coordinator computes and dispatches all agents’ paths, which simplifies global reasoning but creates serious practical limits. First, scalability is poor: planning and re-planning latency and memory blow up with agent count and map size. Second, the architecture is brittle: a single point of failure and communication bottlenecks make performance degrade under packet loss or overload. Third, centralised schedules are hard to execute robustly in the real world: small timing errors, sensor noise, and model mismatch (e.g., grid abstractions that ignore robot kinematics and dynamics) quickly invalidate tightly coordinated plans. Finally, responsiveness is limited: disturbances such as delays, blockages, or new tasks often require expensive global re-planning, and heterogeneity in agent sizes and speeds further exacerbates these issues.

Focus

This project aims to explore how ML can be leveraged to overcome these challenges. Two approaches are viable. The first is to augment existing solvers through ML. This involves using ML techniques to enhance the optimisation process in traditional non-ML algorithms. The goal is to improve these solvers’ efficiency, scalability, and adaptability. For example, consider Prioritised Planning, a widely used MAPF algorithm that plans based on each agent's assigned priorities. Finding effective priority orders is typically hard and is currently done manually. ML techniques can enhance this prioritisation process.

The second approach to using ML in the context of MAPF is to develop effective decentralised MAPF methods that employ ML techniques such as imitation or reinforcement learning to achieve decentralised agent behaviours. These behaviours are influenced by the agents’ local observations and, if applicable, by their interactions with one another, to develop more responsive and self-governing decentralised MAPF approaches. For example, the application of graph neural networks (GNNs) to the development of adaptive multi-robot systems is an emerging field of research.

Method

Goals:

Explore different ways in which ML can be used to solve MAPF efficiently
Design an approach to leverage ML for MAPF (e.g. using GNNs)
Development of efficient hybrid algorithms to solve MAPF, leveraging both search and ML
Implementation for representative real-world problems