Repository logo
 

Exact natural gradient in deep linear networks and application to the nonlinear case

cam.issuedOnline2018-12-07
dc.contributor.authorBernacchia, Alberto
dc.contributor.authorLengyel, Máté
dc.contributor.authorHennequin, Guillaume
dc.contributor.orcidLengyel, Mate [0000-0001-7266-0049]
dc.contributor.orcidHennequin, Guillaume [0000-0002-7296-6870]
dc.date.accessioned2019-01-17T09:13:40Z
dc.date.available2019-01-17T09:13:40Z
dc.date.issued2018
dc.description.abstractStochastic gradient descent (SGD) remains the method of choice for deep learning, despite the limitations arising for ill-behaved objective functions. In cases where it could be estimated, the natural gradient has proven very effective at mitigating the catastrophic effects of pathological curvature in the objective function, but little is known theoretically about its convergence properties, and it has yet to find a practical implementation that would scale to very deep and large networks. Here, we derive an exact expression for the natural gradient in deep linear networks, which exhibit pathological curvature similar to the nonlinear case. We provide for the first time an analytical solution for its convergence rate, showing that the loss decreases exponentially to the global minimum in parameter space. Our expression for the natural gradient is surprisingly simple, computationally tractable, and explains why some approximations proposed previously work well in practice. This opens new avenues for approximating the natural gradient in the nonlinear case, and we show in preliminary experiments that our online natural gradient descent outperforms SGD on MNIST autoencoding while sharing its computational simplicity.
dc.description.sponsorshipThis work was supported by Wellcome Trust Seed Award 202111/Z/16/Z (G.H.) and Wellcome Trust Investigator Award 095621/Z/11/Z (A.B.,M.L.).
dc.identifier.doi10.17863/CAM.35433
dc.identifier.issn1049-5258
dc.identifier.urihttps://www.repository.cam.ac.uk/handle/1810/288118
dc.language.isoeng
dc.publisherNIPS
dc.publisher.urlhttps://papers.nips.cc/paper/7834-exact-natural-gradient-in-deep-linear-networks-and-its-application-to-the-nonlinear-case
dc.titleExact natural gradient in deep linear networks and application to the nonlinear case
dc.typeConference Object
dcterms.dateAccepted2018-09-05
prism.publicationName32nd Conference on Neural Information Processing Systems (NIPS 2018), Montréal, Canada
pubs.conference-finish-date2018-12-08
pubs.conference-nameNeural Information Processing Systems
pubs.conference-start-date2018-12-02
pubs.funder-project-idWellcome Trust (202111/Z/16/Z)
pubs.funder-project-idWellcome Trust (095621/Z/11/Z)
rioxxterms.licenseref.startdate2018-09-05
rioxxterms.licenseref.urihttp://www.rioxx.net/licenses/all-rights-reserved
rioxxterms.typeConference Paper/Proceeding/Abstract
rioxxterms.versionAM
rioxxterms.versionofrecord10.17863/CAM.35433

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
bernacchia-nips-2018.pdf
Size:
359.81 KB
Format:
Adobe Portable Document Format
Description:
Accepted version
Licence
http://www.rioxx.net/licenses/all-rights-reserved
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
DepositLicenceAgreementv2.1.pdf
Size:
150.9 KB
Format:
Adobe Portable Document Format