Repository logo
 

Transferability of Data Sets between Machine-Learned Interatomic Potential Algorithms

Accepted version
Peer-reviewed

Loading...
Thumbnail Image

Change log

Abstract

The emergence of Foundational Machine Learning Interatomic Potential (FMLIP) models trained on extensive data sets motivates attempts to transfer data between different ML architectures. Using a common battery electrolyte solvent as a test case, we examine the extent to which training data optimized for one machine-learning method may be reused by a different learning algorithm, aiming to accelerate FMLIP fine-tuning and to reduce the need for costly iterative training. We consider several types of training configurations and compare the benefits they bring to feedforward neural networks (the Deep Potential model) and message-passing networks (MACE). We propose a simple metric to assess model performance and demonstrate that MACE models perform well with even the simplest training sets, whereas simpler architectures require further iterative training to describe the target liquids correctly. We find that configurations designed by human intuition to correct systematic deficiencies of a model often transfer well between algorithms, but that reusing configurations that were generated automatically by one MLIP does not necessarily benefit a different algorithm. We also compare the performance of these bespoke models against two pretrained FMLIPs, demonstrating that system-specific training data are usually necessary for realistic models. Finally, we examine how training data sets affect a model's ability to generalize to unseen molecules, finding that model stability is conserved for small changes in molecule shape but not changes in functional chemistry. Our results provide insight into how training set properties affect the behavior of an MLIP and principles to enhance training sets for molecular liquid models with minimal computational effort. These approaches may be used in tandem with FMLIPs to dramatically accelerate the rate at which new chemical systems can be simulated.

Description

Journal Title

Journal of Chemical Theory and Computation

Conference Name

Journal ISSN

1549-9618
1549-9626

Volume Title

Publisher

American Chemical Society (ACS)

Rights and licensing

Except where otherwised noted, this item's license is described as Attribution 4.0 International
Sponsorship
European Commission Horizon 2020 (H2020) ERC (835073)
Faraday Institution (via University Of St Andrews) (NEXGENNA)
European Commission Horizon 2020 (H2020) Research Infrastructures (RI) (957189)
This work was performed using computational resources provided by the Cambridge Service for Data Driven Discovery (CSD3). Furthermore, this work made use of the facilities of the N8 Centre of Excellence in Computationally Intensive Research (N8 CIR) provided and funded by the N8 research partnership and EPSRC (Grant No. EP/T022167/1). The Centre is coordinated by the Universities of Durham, Manchester and York. S.P.N., C.P.G. and G.C. were supported by the European Union’s Horizon 2020 Research and Innovation Program under Grant Agreement No. 957189 (BIG-MAP project). S.P.N. and CPG were also supported by an ERC Advanced Investigator Grant for Prof. Clare P. Grey, "BATNMR", Grant no. 83507, and by the Faraday Institution’s Nexgenna project (FIRG064). P.K. was supported by the Engineering and Physical Sciences Research Council (Grant No. EP/W524700/1).

Relationships

Is supplemented by: