Loss surface of XOR artificial neural networks.

Change log
Mehta, Dhagash 
Zhao, Xiaojun 
Bernal, Edgar A 
Wales, David J 

Training an artificial neural network involves an optimization process over the landscape defined by the cost (loss) as a function of the network parameters. We explore these landscapes using optimization tools developed for potential energy landscapes in molecular science. The number of local minima and transition states (saddle points of index one), as well as the ratio of transition states to minima, grow rapidly with the number of nodes in the network. There is also a strong dependence on the regularization parameter, with the landscape becoming more convex (fewer minima) as the regularization term increases. We demonstrate that in our formulation, stationary points for networks with N_{h} hidden nodes, including the minimal network required to fit the XOR data, are also stationary points for networks with N_{h}+1 hidden nodes when all the weights involving the additional node are zero. Hence, smaller networks trained on XOR data are embedded in the landscapes of larger networks. Our results clarify certain aspects of the classification and sensitivity (to perturbations in the input data) of minima and saddle points for this system, and may provide insight into dropout and network compression.

0801 Artificial Intelligence and Image Processing
Journal Title
Phys Rev E
Conference Name
Journal ISSN
Volume Title
American Physical Society (APS)
Engineering and Physical Sciences Research Council (EP/N035003/1)