Papers

Please usually use 40-dim features.

Small-footprint keyword spotting using deep neural networks [CPH14] is the first paper using DNN for KWS.

Convolutional neural networks for small-footprint keyword spotting [SP15] replaces DNN with CNN to reduce computation overhead and number of parameters.

CPH14: Guoguo Chen, Carolina Parada, and Georg Heigold. Small-footprint keyword spotting using deep neural networks. In 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP), 4087–4091. IEEE, 2014.
DBK+20: Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, and others. An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020. URL: https://arxiv.org/pdf/2010.11929.
GKD+22: Amir Gholami, Sehoon Kim, Zhen Dong, Zhewei Yao, Michael W Mahoney, and Kurt Keutzer. A survey of quantization methods for efficient neural network inference. In Low-Power Computer Vision, pages 291–326. Chapman and Hall/CRC, 2022. URL: https://arxiv.org/pdf/2103.13630.
GN98: Robert M. Gray and David L. Neuhoff. Quantization. IEEE transactions on information theory, 44(6):2325–2383, 1998. URL: https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=720541.
HJA20: Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020. URL: https://proceedings.neurips.cc/paper/2020/file/4c5bcfec8584af0d967f1ab10179ca4b-Paper.pdf.
LVS+24: Matthew Le, Apoorv Vyas, Bowen Shi, Brian Karrer, Leda Sari, Rashel Moritz, Mary Williamson, Vimal Manohar, Yossi Adi, Jay Mahadeokar, and others. Voicebox: text-guided multilingual universal speech generation at scale. Advances in neural information processing systems, 2024. URL: https://arxiv.org/pdf/2306.15687.
MNA+17: Paulius Micikevicius, Sharan Narang, Jonah Alben, Gregory Diamos, Erich Elsen, David Garcia, Boris Ginsburg, Michael Houston, Oleksii Kuchaiev, Ganesh Venkatesh, and others. Mixed precision training. arXiv preprint arXiv:1710.03740, 2017. URL: https://openreview.net/pdf?id=r1gs9JgRZ.
SP15: Tara N Sainath and Carolina Parada. Convolutional neural networks for small-footprint keyword spotting. In Interspeech, 1478–1482. 2015.
SME20: Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502, 2020. URL: https://arxiv.org/pdf/2010.02502.
VSM+11: Vincent Vanhoucke, Andrew Senior, Mark Z Mao, and others. Improving the speed of neural networks on cpus. In Proc. deep learning and unsupervised feature learning NIPS workshop, volume 1, 4. 2011. URL: https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/37631.pdf.
VSP+17: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. Advances in neural information processing systems, 2017. URL: https://arxiv.org/pdf/2010.11929.
Wir71: Niklaus Wirth. Program development by stepwise refinement. Communications of the ACM, 14(4):221–227, 1971. URL: https://dl.acm.org/doi/pdf/10.1145/362575.362577.
WJZ+20: Hao Wu, Patrick Judd, Xiaojie Zhang, Mikhail Isaev, and Paulius Micikevicius. Integer quantization for deep learning inference: principles and empirical evaluation. arXiv preprint arXiv:2004.09602, 2020. URL: https://arxiv.org/pdf/2004.09602.