Efficient Quantization for Neural Networks with Binary Weights and Low Bitwidth Activations

Kun Huang; Bingbing Ni; Xiaokang Yang

doi:10.1609/aaai.v33i01.33013854

Authors

Kun Huang Shanghai Jiao Tong University
Bingbing Ni Shanghai Jiao Tong University
Xiaokang Yang Shanghai Jiao Tong University

DOI:

https://doi.org/10.1609/aaai.v33i01.33013854

Abstract

Quantization has shown stunning efficiency on deep neural network, especially for portable devices with limited resources. Most existing works uncritically extend weight quantization methods to activations. However, we take the view that best performance can be obtained by applying different quantization methods to weights and activations respectively. In this paper, we design a new activation function dubbed CReLU from the quantization perspective and further complement this design with appropriate initialization method and training procedure. Moreover, we develop a specific quantization strategy in which we formulate the forward and backward approximation of weights with binary values and quantize the activations to low bitwdth using linear or logarithmic quantizer. We show, for the first time, our final quantized model with binary weights and ultra low bitwidth activations outperforms the previous best models by large margins on ImageNet as well as achieving nearly a 10.85× theoretical speedup with ResNet-18. Furthermore, ablation experiments and theoretical analysis demonstrate the effectiveness and robustness of CReLU in comparison with other activation functions.

Efficient Quantization for Neural Networks with Binary Weights and Low Bitwidth Activations

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Developed By

Subscription