A simplified adversarial architecture for cross-subject silent speech recognition using electromyography.

Qiang Cui Xingyu Zhang Yakun ZhangChangyan ZhengLiang XieYe YanEdmond Q WuErwei Yin

Published in: Journal of neural engineering (2024)

Objective . The decline in the performance of electromyography (EMG)-based silent speech recognition is widely attributed to disparities in speech patterns, articulation habits, and individual physiology among speakers. Feature alignment by learning a discriminative network that resolves domain offsets across speakers is an effective method to address this problem. The prevailing adversarial network with a branching discriminator specializing in domain discrimination renders insufficiently direct contribution to categorical predictions of the classifier. Approach . To this end, we propose a simplified discrepancy-based adversarial network with a streamlined end-to-end structure for EMG-based cross-subject silent speech recognition. Highly aligned features across subjects are obtained by introducing a Nuclear-norm Wasserstein discrepancy metric on the back end of the classification network, which could be utilized for both classification and domain discrimination. Given the low-level and implicitly noisy nature of myoelectric signals, we devise a cascaded adaptive rectification network as the front-end feature extraction network, adaptively reshaping the intermediate feature map with automatically learnable channel-wise thresholds. The resulting features effectively filter out domain-specific information between subjects while retaining domain-invariant features critical for cross-subject recognition. Main results . A series of sentence-level classification experiments with 100 Chinese sentences demonstrate the efficacy of our method, achieving an average accuracy of 89.46% tested on 40 new subjects by training with data from 60 subjects. Especially, our method achieves a remarkable 10.07% improvement compared to the state-of-the-art model when tested on 10 new subjects with 20 subjects employed for training, surpassing its result even with three times training subjects. Significance . Our study demonstrates an improved classification performance of the proposed adversarial architecture using cross-subject myoelectric signals, providing a promising prospect for EMG-based speech interactive application.

Keyphrases