SAAGs: Biased stochastic variance reduction methods for large-scale learning

Change log

Stochastic approximation is one of the effective approach to deal with the large-scale machine learning problems and the recent research has focused on reduction of variance, caused by the noisy approximations of the gradients. In this paper, we have proposed novel variants of SAAG-I and II (Stochastic Average Adjusted Gradient) (Chauhan et al. 2017), called SAAG-III and IV, respectively. Unlike SAAG-I, starting point is set to average of previous epoch in SAAG-III, and unlike SAAG-II, the snap point and starting point are set to average and last iterate of previous epoch in SAAG-IV, respectively. To determine the step size, we have used Stochastic Backtracking-Armijo line Search (SBAS) which performs line search only on selected mini-batch of data points. Since backtracking line search is not suitable for large-scale problems and the constants used to find the step size, like Lipschitz constant, are not always available so SBAS could be very effective in such cases. We have extended SAAGs (I, II, III and IV) to solve non-smooth problems and designed two update rules for smooth and non-smooth problems. Moreover, our theoretical results have proved linear convergence of SAAG-IV for all the four combinations of smoothness and strong-convexity, in expectation. Finally, our experimental studies have proved the efficacy of proposed methods against the state-of-art techniques.

Stochastic gradient descent, Stochastic optimization, Variance reduction, Strongly covex, Smooth and non-smooth, SGD, Large-scale learning
Journal Title
Applied Intelligence
Conference Name
Journal ISSN
Volume Title
Springer Science and Business Media LLC
All rights reserved