On Principled Modeling of Inductive Bias in Machine Learning
Repository URI
Repository DOI
Change log
Authors
Abstract
The inductive bias of a learning algorithm is the set of assumptions that the hypothesis uses to predict unseen data, governing its generalization power. This thesis focuses on principled approaches to modeling inductive bias of learning algorithms. We start with a unifying view on inductive bias modeling. By decomposing the regularized empirical risk minimization, this thesis introduces two different perspectives: value-guided modeling through regularization, and data-centric modeling through training data manipulation. In value-guided modeling, we define a quantity called hyperspherical uniformity, which characterizes the diversity of neurons in neural networks. Based on this quantity, we develop a general principle, maximum hyperspherical uniformity, for regularizing neural networks. Instead of using maximum hyperspherical uniformity as a regularizer, we propose a principled training algorithm for neural networks that can provably maximize hyperspherical uniformity and improve generalization. Finally, we adapt this training algorithm to the efficient adaptation of foundation models (e.g., text-to-image diffusion models, large language models). To further improve the parameter efficiency, we also develop a new parameterization using butterfly factorization. In data-centric modeling, we introduce the iterative machine teaching framework for studying the effect of training data manipulation on inductive bias. Specifically, we consider two teaching scenarios: (1) label synthesis teaching: given data points, the teacher model adaptively synthesizes its suitable labels; and (2) data hallucination teaching: the teacher model directly generates data points. Under both settings, we prove that the proposed teaching algorithm can achieve faster convergence than random teacher (i.e., stochastic gradient descent). Our algorithms empirically show that feeding training samples from easy to hard is beneficial to generalization and convergence. Finally, we discuss open problems and future directions, addressing specific challenges in both value-guided and data-centric modeling. This thesis aims to provide new insights into modeling inductive bias in machine learning.
Description
Date
Advisors
Schölkopf, Bernhard
