Loss surface of XOR artificial neural networks.


Type
Article
Change log
Authors
Mehta, Dhagash 
Zhao, Xiaojun 
Bernal, Edgar A 
Wales, David J 
Abstract

Training an artificial neural network involves an optimization process over the landscape defined by the cost (loss) as a function of the network parameters. We explore these landscapes using optimization tools developed for potential energy landscapes in molecular science. The number of local minima and transition states (saddle points of index one), as well as the ratio of transition states to minima, grow rapidly with the number of nodes in the network. There is also a strong dependence on the regularization parameter, with the landscape becoming more convex (fewer minima) as the regularization term increases. We demonstrate that in our formulation, stationary points for networks with N_{h} hidden nodes, including the minimal network required to fit the XOR data, are also stationary points for networks with N_{h}+1 hidden nodes when all the weights involving the additional node are zero. Hence, smaller networks trained on XOR data are embedded in the landscapes of larger networks. Our results clarify certain aspects of the classification and sensitivity (to perturbations in the input data) of minima and saddle points for this system, and may provide insight into dropout and network compression.

Description
Keywords
0801 Artificial Intelligence and Image Processing
Journal Title
Phys Rev E
Conference Name
Journal ISSN
2470-0045
2470-0053
Volume Title
97
Publisher
American Physical Society (APS)
Sponsorship
Engineering and Physical Sciences Research Council (EP/N035003/1)