Learning Transferable Representations

Rojas-Carulla, Mateo

Learning Transferable Representations

Repository URI

https://www.repository.cam.ac.uk/handle/1810/292145

Repository DOI

https://doi.org/10.17863/CAM.39297

Files

Thesis (4.91 MB)

Type

Thesis

Authors

Rojas-Carulla, Mateo

Abstract

A first contribution of this thesis is to propose causality as a language for problems of distribution shift. First, we consider domain generalisation, where no data from the test distribution are observed during training. What assumptions can be made regarding the relation between train and test distributions for transfer to succeed? We argue that assuming the data in both tasks originate from the same causal graph leads to a natural solution: use only causal features for prediction, as the mechanism mapping causes to effects is invariant to shifts in the probability distributions induced by the causal structure. We provide optimality results when the test task is adversarial, and introduce a method for exploiting all remaining features when data from the test task are observed. We motivate that learning such invariant mechanisms mapping features to outputs leads to machine learning modules robust to transfer.

Second, we consider a classification problem where only few examples are available for each label. How should an initial large dataset be leveraged to improve performance in this task? We argue that such a dataset should be used to learn powerful features for batch classification using a neural network. We present a framework which transfers between classes by building a probabilistic model on the weights of the network. Our results suggest that practitioners should use the original dataset for building features whose power can be exploited during few-shot learning.

Finally, we extend causal discovery to solve problems such as distinguishing a painting from its counterfeit. Given two such static entities, a proxy random variable introduces the randomness necessary to construct two features of the static entities which preserve their causal footprint, measurable by a standard causal discovery procedure. Experiments on vision and language provide evidence that the causal relation between the static entities can often be identified.

Date

2018-09-26

Advisors

Schölkopf, Bernhard
Turner, Richard

Keywords

Machine Learning, Transfer Learning, Dataset shift, Learning to Learn, Few-shot Learning, Causality, Causal Learning

Qualification

Doctor of Philosophy (PhD)

Awarding Institution

University of Cambridge

Rights

Attribution 4.0 International (CC BY 4.0)

Collections

Theses - Engineering