FPGA Implementation of Keyword Spotting System Using Depthwise Separable Binarized and Ternarized Neural Networks.
Seongwoo BaeHaechan KimSeongjoo LeeYunho JungPublished in: Sensors (Basel, Switzerland) (2023)
Keyword spotting (KWS) systems are used for human-machine communications in various applications. In many cases, KWS involves a combination of wake-up-word (WUW) recognition for device activation and voice command classification tasks. These tasks present a challenge for embedded systems due to the complexity of deep learning algorithms and the need for optimized networks for each application. In this paper, we propose a depthwise separable binarized/ternarized neural network (DS-BTNN) hardware accelerator capable of performing both WUW recognition and command classification on a single device. The design achieves significant area efficiency by redundantly utilizing bitwise operators in the computation of the binarized neural network (BNN) and ternary neural network (TNN). In a complementary metal-oxide semiconductor (CMOS) 40 nm process environment, the DS-BTNN accelerator demonstrated significant efficiency. Compared with a design approach where BNN and TNN were independently developed and subsequently integrated as two separate modules into the system, our method achieved a 49.3% area reduction while yielding an area of 0.558 mm2. The designed KWS system, which was implemented on a Xilinx UltraScale+ ZCU104 field-programmable gate array (FPGA) board, receives real-time data from the microphone, preprocesses them into a mel spectrogram, and uses this as input to the classifier. Depending on the order, the network operates as a BNN or a TNN for WUW recognition and command classification, respectively. Operating at 170 MHz, our system achieved 97.1% accuracy in BNN-based WUW recognition and 90.5% in TNN-based command classification.