Neural parameter inference for large-scale multi-agent systems
Repository URI
Repository DOI
Change log
Authors
Abstract
In this thesis, I present a computational framework for integrating deep learning into mechanistic models. A neural network is trained to learn unknown system components—such as constant or time-varying parameters—by differentiating through the full set of governing equations. I argue that a targeted incorporation of machine learning to augment, rather than replace, physics-informed models (e.g. partial differential equations) addresses several of the limitations typically associated with machine learning in scientific applications. First, the reliance on large quantities of training data is reduced, as the underlying system dynamics are explicitly modelled and can be leveraged for extrapolation. Second, the outputs of the neural network remain interpretable within the broader context of the mechanistic model. Third, the need for ad-hoc parametrisation—whether through hand-picked covariates or pre-specified basis functions—can be avoided. Instead, neural networks offer a more flexible and data-driven alternative, leveraging their universal function approximation capabilities to model unknown system components in a realistic and expressive way. This is particularly beneficial in the social and economic sciences, where human behaviour is often modelled using overly simplified parametric forms, calibrated via regression techniques.
The thesis begins with a brief overview of modern deep learning methods and their emerging role in the applied sciences, followed by a presentation of the proposed computational framework. Its performance is first demonstrated on a synthetic model of infectious disease dynamics, where a neural network is trained to infer constant parameters from a single time series and later extended to learn time-dependent components using recurrent neural networks. The following two chapters of the thesis focus on real-world, multi-agent systems in the social sciences: the global trade of agricultural commodities and international human migration since 1990. Both exhibit complex spatio-temporal correlations that traditional models struggle to capture. In the trade study, a deep neural network is used to fit an optimal transport model to bilateral trade flows, significantly improving upon the accuracy and flexibility of classical gravity models. The optimal transport formulation captures spatial interactions, while the network learns to solve the inverse problem of estimating abstract cost structures from noisy flow data. In the migration case study, we train a recurrent neural network on a set of socio-economic, political, cultural, and geographic covariates to model international migration patterns. The recurrent architecture enables the learning of long-term temporal dependencies, resulting in a high-resolution dataset of annual bilateral migration flows and migrant stock estimates on a global scale.
A key theme throughout this work is the quantification of uncertainty in neural network predictions, approached in a tractable and computationally efficient manner. We explore various strategies, including propagating input uncertainty through the network, ensemble training, and a novel method inspired by weighted importance sampling, where the training process itself is used to gather samples of the inferred parameters. These approaches are benchmarked against classical Markov Chain Monte Carlo (MCMC) methods both on synthetic datasets and two real-world systems: the spread of COVID-19 in Berlin, and urban retail dynamics in Greater London. In both cases, our framework yields more accurate parameter distributions at significantly reduced computational cost.
In the final chapter, we extend the framework to the inference of network adjacency matrices from time series data. We analyse its scaling behaviour, demonstrating favourable performance compared to both MCMC and standard regression methods. This approach is then applied to a case study on the British electricity grid, where the framework is used to localise power line failures and quantify the uncertainty of these inferences, thereby enabling rigorous statistical hypothesis testing.
Taken together, this thesis offers a practical and theoretically grounded approach to blending deep learning with classical numerical modelling. The framework is simple to implement, computationally efficient, and broadly applicable—opening the door to a new generation of data-driven yet interpretable models across the computational sciences. Future research may explore extensions to weakly differentiable systems; applications in forecasting, where long-term predictive accuracy and robust uncertainty quantification are essential; and connections to manifold learning, where the latent, lower-dimensional structure of the parameter space is inferred.
Description
Date
Advisors
Schoenlieb, Carola-Bibiane

