Repository logo
 

Loss surface of XOR artificial neural networks.

Accepted version
Peer-reviewed

Loading...
Thumbnail Image

Type

Article

Change log

Authors

Mehta, Dhagash 
Zhao, Xiaojun 
Bernal, Edgar A 
Wales, David J 

Abstract

Training an artificial neural network involves an optimization process over the landscape defined by the cost (loss) as a function of the network parameters. We explore these landscapes using optimization tools developed for potential energy landscapes in molecular science. The number of local minima and transition states (saddle points of index one), as well as the ratio of transition states to minima, grow rapidly with the number of nodes in the network. There is also a strong dependence on the regularization parameter, with the landscape becoming more convex (fewer minima) as the regularization term increases. We demonstrate that in our formulation, stationary points for networks with N_{h} hidden nodes, including the minimal network required to fit the XOR data, are also stationary points for networks with N_{h}+1 hidden nodes when all the weights involving the additional node are zero. Hence, smaller networks trained on XOR data are embedded in the landscapes of larger networks. Our results clarify certain aspects of the classification and sensitivity (to perturbations in the input data) of minima and saddle points for this system, and may provide insight into dropout and network compression.

Description

Keywords

0801 Artificial Intelligence and Image Processing

Journal Title

Phys Rev E

Conference Name

Journal ISSN

2470-0045
2470-0053

Volume Title

97

Publisher

American Physical Society (APS)
Sponsorship
Engineering and Physical Sciences Research Council (EP/N035003/1)