Attention-based representation learning on graphs
Repository URI
Repository DOI
Change log
Authors
Abstract
As data that does not conform to simple and regular structures such as images or text becomes more readily available, the field of representation learning has continued to evolve through approaches that seek to describe, understand, and even unify deep learning strategies for data structures such as sets, grids, and graphs. A remarkably successful application of this field of geometric deep learning is learning on graphs, abstractions that represent relationships between items of a set, and which naturally describe real-world phenomena such as social, biological, or transportation networks. Recently, breakthroughs in graph learning translated to impactful applications such as protein structure prediction, as well as the inverse problem of constructing an amino acid sequence that folds to a target structure, and even Earth-scale weather forecasts. Despite these achievements, there is uncertainty regarding the balance of algorithmic complexity, computational resource utilisation, and task performance, with few graph methods consistently performing well across multiple datasets, benchmarks, and settings.
This rapid growth of the field produces new challenges, at the same time exposing shortcomings of existing methodologies. In this dissertation, I investigate multiple pertinent aspects of deep learning, including graph neural networks, graph transformers, and transfer learning in the graph domain. Conceptually, the main limitation being addressed is the fixed and handcrafted nature of graph learning operators. Instead, I propose attention as a universal mechanism capable of augmenting and even superseding current architectural choices. My first contribution targets graph neural networks and consists of replacing classical readout functions with neural network-based adaptive readouts, and in particular with attention-based pooling. Secondly, I study transfer learning in the context of high-throughput screening funnels specific to early-stage drug discovery. In this setting, I demonstrate empirically that classical readouts are unable to model molecular data at the multi-million scale, and instead show that adaptive readouts unlock the transfer learning potential of graph neural networks. Finally, motivated by these conclusions and recent advances in efficient and exact attention, I propose an end-to-end attention-based framework for learning on graphs: edge set attention, which is inherently edge-based, simpler than message passing and graph transformers, and achieves state-of-the-art results. The findings and advances proposed in this thesis have been empirically validated across hundreds of experiments, consistently outperforming conventional approaches and validating the hypotheses put forward here.
