Yuriy Mishchenko Papers:
Gao Y., Mishchenko Y., Shah A., Matsoukas S., Vitaladevuni S. (2020) "Towards data-efficient
modeling for wake word spotting." in 45th Proc. International Conference on Acoustics, Speech,
and Signal Processing-ICASSP 2020
In this paper we present data-efficient solutions to
address the challenges in WW modeling, such as domain-mismatch,
noisy conditions, limited annotation, etc. The proposed system is
composed of a multi-condition training pipeline with a stratified data
augmentation, which improves the model robustness to a variety
of predefined acoustic conditions, together with a semi-supervised
learning pipeline to accurately extract the WW and confusable examples from untranscribed speech corpus. Starting from only 10 hours
of domain-mismatched WW audio, we are able to enlarge and enrich
the training dataset by 20-100 times to capture the acoustic complexity. Our experiments on real user data show that the proposed so-
lutions can achieve comparable performance of a production-grade
model by saving 97% of the amount of WW-specific data collection
and 86% of the bandwidth for annotation.