Repository logo
 

Understanding Biology with Machine Learning: Compression, Intelligibility, and Dependency

Accepted version
Peer-reviewed

Loading...
Thumbnail Image

Change log

Abstract

Machine learning (ML) is increasingly used to interrogate biological systems whose complexity resists law-like, deductive explanation. As a result, embeddings, clusters, and attributions are often overinterpreted, dependencies are left implicit, and claims about explainability are often insufficiently bounded. In this work, we present a framework for contextualizing how machine learning contributes to scientific understanding in biology via compression, qualitative intelligibility, and dependency models. Compression is achieved when inductive biases encode biological structure, reducing the effective hypothesis space and yielding representations aligned with known biology. Qualitative intelligibility is supported when high-dimensional measurements are mapped to human-graspable objects, such as embeddings, clusters, and trajectories, that enable accurate qualitative reasoning without exact calculation. Dependency modelling is realized when learned models make explicit the pattern of relations among system components and thereby guide prediction and intervention. We examine how these principles manifest in successful ML applications and discuss considerations that emerge from this framework. Overall, when viewed through these lenses, ML can transform predictive success into intervention-guiding knowledge in the life sciences.

Description

Journal Title

Artificial Intelligence in the Life Sciences

Conference Name

Journal ISSN

2667-3185
2667-3185

Volume Title

Publisher

Elsevier

Rights and licensing

Except where otherwised noted, this item's license is described as Attribution 4.0 International
Sponsorship
Horizon Europe UKRI Underwrite ERC (EP/X024733/1)
Royal Society (URF\R1\201461)