Repository logo

Learning Transferable Representations



Change log


Rojas-Carulla, Mateo 


A first contribution of this thesis is to propose causality as a language for problems of distribution shift. First, we consider domain generalisation, where no data from the test distribution are observed during training. What assumptions can be made regarding the relation between train and test distributions for transfer to succeed? We argue that assuming the data in both tasks originate from the same causal graph leads to a natural solution: use only causal features for prediction, as the mechanism mapping causes to effects is invariant to shifts in the probability distributions induced by the causal structure. We provide optimality results when the test task is adversarial, and introduce a method for exploiting all remaining features when data from the test task are observed. We motivate that learning such invariant mechanisms mapping features to outputs leads to machine learning modules robust to transfer.

Second, we consider a classification problem where only few examples are available for each label. How should an initial large dataset be leveraged to improve performance in this task? We argue that such a dataset should be used to learn powerful features for batch classification using a neural network. We present a framework which transfers between classes by building a probabilistic model on the weights of the network. Our results suggest that practitioners should use the original dataset for building features whose power can be exploited during few-shot learning.

Finally, we extend causal discovery to solve problems such as distinguishing a painting from its counterfeit. Given two such static entities, a proxy random variable introduces the randomness necessary to construct two features of the static entities which preserve their causal footprint, measurable by a standard causal discovery procedure. Experiments on vision and language provide evidence that the causal relation between the static entities can often be identified.





Schölkopf, Bernhard
Turner, Richard


Machine Learning, Transfer Learning, Dataset shift, Learning to Learn, Few-shot Learning, Causality, Causal Learning


Doctor of Philosophy (PhD)

Awarding Institution

University of Cambridge