The resurgence of structure in deep neural networks

Change log
Veličković, Petar  ORCID logo

Machine learning with deep neural networks ("deep learning") allows for learning complex features directly from raw input data, completely eliminating hand-crafted, "hard-coded" feature extraction from the learning pipeline. This has lead to state-of-the-art performance being achieved across several---previously disconnected---problem domains, including computer vision, natural language processing, reinforcement learning and generative modelling. These success stories nearly universally go hand-in-hand with availability of immense quantities of labelled training examples ("big data") exhibiting simple grid-like structure (e.g. text or images), exploitable through convolutional or recurrent layers. This is due to the extremely large number of degrees-of-freedom in neural networks, leaving their generalisation ability vulnerable to effects such as overfitting.

However, there remain many domains where extensive data gathering is not always appropriate, affordable, or even feasible. Furthermore, data is generally organised in more complicated kinds of structure---which most existing approaches would simply discard. Examples of such tasks are abundant in the biomedical space; with e.g. small numbers of subjects available for any given clinical study, or relationships between proteins specified via interaction networks. I hypothesise that, if deep learning is to reach its full potential in such environments, we need to reconsider "hard-coded" approaches---integrating assumptions about inherent structure in the input data directly into our architectures and learning algorithms, through structural inductive biases. In this dissertation, I directly validate this hypothesis by developing three structure-infused neural network architectures (operating on sparse multimodal and graph-structured data), and a structure-informed learning algorithm for graph neural networks, demonstrating significant outperformance of conventional baseline models and algorithms.

Liò, Pietro
structural inductive biases, machine learning, deep learning, deep neural networks, graph neural networks, graph convolutional networks, graph attention networks, deep graph infomax, mutual information, unsupervised learning, infomax, graph convolutions, attention, self-attention, cross-modal, antibody, antigen, à trous, paratope, audiovisual, multimodal, cross-connections, classification, human weight fluctuation, weight objective prediction, convolutional neural networks, recurrent neural networks, fitness data, x-cnn, x-lstm, gat, dgi, sparse datasets, multi-omics, bioinformatics, cortical meshes, graph attention, parcellation, neuroimaging, cortex parcellation, network embeddings, graph embeddings, model selection, structure learning, evolutionary neural networks, optimisation algorithms, neural networks, unsupervised node embedding
Doctor of Philosophy (PhD)
Awarding Institution
University of Cambridge
The work depicted in this dissertation was in part supported by funding from the European Union's Horizon 2020 research and innovation programme PROPAG-AGEING under grant agreement No 634821.