Yuriy Mishchenko Papers:
Mishchenko Y., Goren Y., Sun M., Beauchene C., Matsoukas S., Rybakov O., Vitaladevuni S. (2019) "Low-bit quantization and quantization-aware training for small-footprint keyword spotting." in Proc. 2019 18th IEEE International Conference on Machine Learning and Applications (ICMLA), 706-711

In this paper, we investigate novel quantization approaches to reduce memory and computational footprint of deep neural network (DNN) based keyword spotters (KWS). We propose a new method for KWS offline and online quantization, which we call dynamic quantization, where we quantize DNN weight matrices column-wise, using each column's exact individual min-max range, and the DNN layers' inputs and outputs are quantized for every input audio frame individually, using the exact min-max range of each input and output vector. We further apply a new quantization-aware training approach that allows us to incorporate quantization errors into KWS model during training. Together, these approaches allow us to significantly improve the performance of KWS in 4-bit and 8-bit quantized precision, achieving the end-to-end accuracy close to that of full precision models while reducing the models' on-device memory footprint by up to 80%. Full text