On the capacity and superposition of minima in neural network loss function landscapes

jats:titleAbstract</jats:title> jats:pMinima of the loss function landscape (LFL) of a neural network are locally optimal sets of weights that extract and process information from the input data to make outcome predictions. In underparameterised networks, the capacity of the weights may be insufficient to fit all the relevant information. We demonstrate that different local minima specialise in certain aspects of the learning problem, and process the input information differently. This effect can be exploited using a meta-network in which the predictive power from multiple minima of the LFL is combined to produce a better classifier. With this approach, we can increase the area under the receiver operating characteristic curve by around jats:inline-formula jats:tex-math</jats:tex-math> <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" overflow="scroll"> mml:mn20</mml:mn> <mml:mi mathvariant="normal">%</mml:mi> </mml:math> <jats:inline-graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="mlstac64e6ieqn1.gif" xlink:type="simple" /> </jats:inline-formula> for a complex learning problem. We propose a theoretical basis for combining minima and show how a meta-network can be trained to select the representative that is used for classification of a specific data item. Finally, we present an analysis of symmetry-equivalent solutions to machine learning problems, which provides a systematic means to improve the efficiency of this approach.</jats:p>

Keywords

ensemble learning, interpretability, loss function landscape, theoretical chemistry

Journal Title

Machine Learning: Science and Technology

Journal ISSN

2632-2153
2632-2153

Volume Title

3

Publisher

IOP Publishing

Publisher DOI

https://doi.org/10.1088/2632-2153/ac64e6

Rights

Attribution 4.0 International

Sponsorship

Agence Nationale de la Recherche (ANR-19-P3IA-0002)

Collections

Jisc Publications Router