Repository logo
 

Towards the Efficient, Scientific and Accessible Development of Small Language Models


Loading...
Thumbnail Image

Type

Change log

Abstract

Language models continue to grow in size, yet our understanding of their inner workings and ability to train them efficiently, particularly smaller models, remains limited. Small (sub-1 billion parameter) models offer practical advantages, including reduced financial and environmental costs, and greater accessibility, motivating the need for more effective training methodologies. This thesis addresses the challenge of developing small language models through two complementary lenses: cognitive inspiration and analytical investigation. First, drawing parallels with efficient human language acquisition, I explore cognitively inspired techniques for training small models. I investigate curriculum learning strategies informed by human language acquisition in data-constrained settings within a framework called CLIMB, and I introduce Syntactic Smoothing, a cognitively motivated method that enhances the representation of infrequent words by leveraging syntactic structure. Second, I adopt an analytical perspective to study the training dynamics and bottlenecks of small models. By analysing the layer-wise behaviour of the Pythia model suite, I identify convergence challenges and saturation phenomena in small models. This analysis exposes a broader shortcoming in current language model development: the disconnect between training and analysis tools, which hinders a scientific, iterative approach to model improvement. To address this, I introduce Pico, an open-source, lightweight, modular development framework for small models that integrates training and fine-grained analysis of model learning dynamics. Comprising pico-train and pico-analyze, Pico enables a principled, experiment-driven methodology for developing small language models. Ultimately, this thesis contributes novel techniques and tools aimed at making the training of small language models both more efficient, scientific and accessible to a wider range of users.

Description

Date

2025-09-18

Advisors

Buttery, Paula

Qualification

Doctor of Philosophy (PhD)

Awarding Institution

University of Cambridge

Rights and licensing

Except where otherwised noted, this item's license is described as All rights reserved
Sponsorship
Gates Cambridge