Calibrated Stochastic Gradient Descent for Convolutional Neural Networks

  • Li’an Zhuo Beihang University
  • Baochang Zhang Beihang University
  • Chen Chen University of North Carolina at Charlotte
  • Qixiang Ye University of Chinese Academy of Sciences
  • Jianzhuang Liu Huawei Technologies Company, Ltd.
  • David Doermann State University of New York at Buffalo

Abstract

In stochastic gradient descent (SGD) and its variants, the optimized gradient estimators may be as expensive to compute as the true gradient in many scenarios. This paper introduces a calibrated stochastic gradient descent (CSGD) algorithm for deep neural network optimization. A theorem is developed to prove that an unbiased estimator for the network variables can be obtained in a probabilistic way based on the Lipschitz hypothesis. Our work is significantly distinct from existing gradient optimization methods, by providing a theoretical framework for unbiased variable estimation in the deep learning paradigm to optimize the model parameter calculation. In particular, we develop a generic gradient calibration layer which can be easily used to build convolutional neural networks (CNNs). Experimental results demonstrate that CNNs with our CSGD optimization scheme can improve the stateof-the-art performance for natural image classification, digit recognition, ImageNet object classification, and object detection tasks. This work opens new research directions for developing more efficient SGD updates and analyzing the backpropagation algorithm.

Published
2019-07-17
Section
AAAI Technical Track: Vision