Geometric and Topological ML for Next-generation Drugs and Food
The aim of this project is to develop a novel mathematical framework for geometric and graph machine learning, and apply these new methods to some of the most challenging problems in the domains of drug and food design. This approach will overcome the limitations of existing machine learning (ML) methods and enable a quantitative and qualitative leap that will lead to new capabilities.
Over the past decade, AI and ML methods have had a revolutionary impact in several fields, adding billions in business value, creating new markets, and transforming entire industrial segments. At the same time, in other fields such as medicine and drug design, the hopes for a similar fast impact of AI have not yet materialised. One of the key challenges for AI in the next decade is delivering on these unfulfilled promises and addressing notoriously hard problems in the biomedical and physical sciences, where an AI-driven breakthrough can have a dramatic economic and societal impact. Achieving this goal requires developing a new generation of AI methods that meaningfully exploit domain-specific knowledge, have performance guarantees, are interpretable, and address the needs and concerns of domain experts and the broader society.
The past few years have seen the emergence of "Geometric ML" approaches leveraging broad mathematical principles of symmetry and invariance. Geometric ML allows deriving from first principles the majority of modern ML architectures and also provides a general blueprint to incorporate domain-specific inductive biases in a mathematically principled way, paving the way for future ML systems. Instances of Geometric ML such as Graph Neural Networks (GNNs) and equivariant neural networks have brought a series of breakthroughs in applications ranging from particle physics and fake news detection to pure mathematical proofs and molecule design. The triumph of Geometric ML in biological sciences is perhaps best exemplified by the ground-breaking DeepMind AlphaFold 2 model for protein structure prediction based on geometric equivariant attention. In these and other problems, Geometric ML allows to reason at the right level of abstraction, leading to computationally tractable and at the same time physically correct models.
The vast majority of today's GNNs rely on the message passing paradigm, where graph representation is formed by an exchange of information between graph nodes connected by edges. This "node-and-edge"-centric mindset constitutes a major limitation of current Geometric and Graph ML schemes. From a theoretical viewpoint, message passing is equivalent to iterative graph isomorphism testing (Weisfeiler-Lehman algorithm). As a result, message passing GNNs have limited expressive power and poorly understood generalisation properties and are disadvantageous in chemical and biological applications. Furthermore, the use of the input graph as a computational device for message passing often leads to bottleneck and over-squashing phenomena and is poorly compatible with existing hardware.
This project aims to use tools from differential geometry, algebraic topology, and differential equations to derive a new methodology for deep learning on graphs. The goal is to replace the computational fabric of GNNs with richer and more suitable structures that will allow us to overcome the current limitations of GNNs. These techniques, so far insufficiently explored in the field of ML, will lead to a new generation of geometric and graph machine learning models that are better interpretable, have guarantees of expressive power and performance, are more efficient in the required amount of data and compute, and better exploit existing hardware.
Deep learning models based on these mathematical foundations will be developed into efficient and scalable software implementation and, in collaboration with industrial partners, applied to some of today's most important and challenging problems from the domains of drug and food design. In the long perspective, the new methods are expected to contribute to accelerated drug development, mapping the "dark matter" of food-based bioactive molecules to help cure cancer, and create non-meat alternative foods to reduce the impact of traditional food industries on the global climate.