Accelerating variance-reduced stochastic gradient methods
dc.contributor.author | Driggs, Derek | |
dc.contributor.author | Ehrhardt, MJ | |
dc.contributor.author | Schönlieb, CB | |
dc.date.accessioned | 2022-02-23T17:01:15Z | |
dc.date.available | 2022-02-23T17:01:15Z | |
dc.date.issued | 2022-02 | |
dc.date.submitted | 2019-10-08 | |
dc.identifier.issn | 0025-5610 | |
dc.identifier.other | s10107-020-01566-2 | |
dc.identifier.other | 1566 | |
dc.identifier.uri | https://www.repository.cam.ac.uk/handle/1810/334374 | |
dc.description | Funder: Gates Cambridge Trust (GB) | |
dc.description.abstract | <jats:title>Abstract</jats:title><jats:p>Variance reduction is a crucial tool for improving the slow convergence of stochastic gradient descent. Only a few variance-reduced methods, however, have yet been shown to directly benefit from Nesterov’s acceleration techniques to match the convergence rates of accelerated gradient methods. Such approaches rely on “negative momentum”, a technique for further variance reduction that is generally specific to the SVRG gradient estimator. In this work, we show for the first time that negative momentum is unnecessary for acceleration and develop a universal acceleration framework that allows all popular variance-reduced methods to achieve accelerated convergence rates. The constants appearing in these rates, including their dependence on the number of functions <jats:italic>n</jats:italic>, scale with the mean-squared-error and bias of the gradient estimator. In a series of numerical experiments, we demonstrate that versions of SAGA, SVRG, SARAH, and SARGE using our framework significantly outperform non-accelerated versions and compare favourably with algorithms using negative momentum.</jats:p> | |
dc.language | en | |
dc.publisher | Springer Science and Business Media LLC | |
dc.subject | Full Length Paper | |
dc.subject | Stochastic optimisation | |
dc.subject | Convex optimisation | |
dc.subject | Variance reduction | |
dc.subject | Accelerated gradient descent | |
dc.subject | 90C06 | |
dc.subject | 90C15 | |
dc.subject | 90C25 | |
dc.subject | 90C30 | |
dc.subject | 90C60 | |
dc.subject | 68Q25 | |
dc.title | Accelerating variance-reduced stochastic gradient methods | |
dc.type | Article | |
dc.date.updated | 2022-02-23T17:01:15Z | |
prism.endingPage | 715 | |
prism.issueIdentifier | 2 | |
prism.publicationName | Mathematical Programming | |
prism.startingPage | 671 | |
prism.volume | 191 | |
dc.identifier.doi | 10.17863/CAM.81790 | |
dcterms.dateAccepted | 2020-09-07 | |
rioxxterms.versionofrecord | 10.1007/s10107-020-01566-2 | |
rioxxterms.version | VoR | |
rioxxterms.licenseref.uri | http://creativecommons.org/licenses/by/4.0/ | |
dc.contributor.orcid | Driggs, Derek [0000-0003-1582-5884] | |
dc.identifier.eissn | 1436-4646 | |
pubs.funder-project-id | Engineering and Physical Sciences Research Council (EP/M00483X/1) | |
pubs.funder-project-id | Engineering and Physical Sciences Research Council (EP/N014588/1) | |
pubs.funder-project-id | European Commission Horizon 2020 (H2020) Marie Sk?odowska-Curie actions (691070) | |
pubs.funder-project-id | European Commission Horizon 2020 (H2020) Marie Sk?odowska-Curie actions (777826) | |
pubs.funder-project-id | EPSRC (EP/S026045/1) | |
pubs.funder-project-id | Engineering and Physical Sciences Research Council (EP/H023348/1) | |
pubs.funder-project-id | Leverhulme Trust (PLP-2017-275) | |
pubs.funder-project-id | Alan Turing Institute (Unknown) | |
cam.issuedOnline | 2020-09-15 |
Files in this item
This item appears in the following Collection(s)
-
Jisc Publications Router
This collection holds Cambridge publications received from the Jisc Publications Router