The resurgence of structure in deep neural networks
Repository URI
Repository DOI
Change log
Authors
Abstract
Machine learning with deep neural networks ("deep learning") allows for learning complex features directly from raw input data, completely eliminating hand-crafted, "hard-coded" feature extraction from the learning pipeline. This has lead to state-of-the-art performance being achieved across several---previously disconnected---problem domains, including computer vision, natural language processing, reinforcement learning and generative modelling. These success stories nearly universally go hand-in-hand with availability of immense quantities of labelled training examples ("big data") exhibiting simple grid-like structure (e.g. text or images), exploitable through convolutional or recurrent layers. This is due to the extremely large number of degrees-of-freedom in neural networks, leaving their generalisation ability vulnerable to effects such as overfitting.
However, there remain many domains where extensive data gathering is not always appropriate, affordable, or even feasible. Furthermore, data is generally organised in more complicated kinds of structure---which most existing approaches would simply discard. Examples of such tasks are abundant in the biomedical space; with e.g. small numbers of subjects available for any given clinical study, or relationships between proteins specified via interaction networks. I hypothesise that, if deep learning is to reach its full potential in such environments, we need to reconsider "hard-coded" approaches---integrating assumptions about inherent structure in the input data directly into our architectures and learning algorithms, through structural inductive biases. In this dissertation, I directly validate this hypothesis by developing three structure-infused neural network architectures (operating on sparse multimodal and graph-structured data), and a structure-informed learning algorithm for graph neural networks, demonstrating significant outperformance of conventional baseline models and algorithms.