Yuriy Mishchenko Papers:
Gao Y., Mishchenko Y., Shah A., Matsoukas S., Vitaladevuni S. (2020) "Towards data-efficient modeling for wake word spotting." in 45th Proc. International Conference on Acoustics, Speech, and Signal Processing-ICASSP 2020

In this paper we present data-efficient solutions to address the challenges in WW modeling, such as domain-mismatch, noisy conditions, limited annotation, etc. The proposed system is composed of a multi-condition training pipeline with a stratified data augmentation, which improves the model robustness to a variety of predefined acoustic conditions, together with a semi-supervised learning pipeline to accurately extract the WW and confusable examples from untranscribed speech corpus. Starting from only 10 hours of domain-mismatched WW audio, we are able to enlarge and enrich the training dataset by 20-100 times to capture the acoustic complexity. Our experiments on real user data show that the proposed so- lutions can achieve comparable performance of a production-grade model by saving 97% of the amount of WW-specific data collection and 86% of the bandwidth for annotation.