Monaural Speech Dereverberation Using Temporal Convolutional Networks with Self Attention.

Yan ZhaoDeLiang WangBuye XuTao Zhang

Published in: IEEE/ACM transactions on audio, speech, and language processing (2020)

In daily listening environments, human speech is often degraded by room reverberation, especially under highly reverberant conditions. Such degradation poses a challenge for many speech processing systems, where the performance becomes much worse than in anechoic environments. To combat the effect of reverberation, we propose a monaural (single-channel) speech dereverberation algorithm using temporal convolutional networks with self attention. Specifically, the proposed system includes a self-attention module to produce dynamic representations given input features, a temporal convolutional network to learn a nonlinear mapping from such representations to the magnitude spectrum of anechoic speech, and a one-dimensional (1-D) convolution module to smooth the enhanced magnitude among adjacent frames. Systematic evaluations demonstrate that the proposed algorithm improves objective metrics of speech quality in a wide range of reverberant conditions. In addition, it generalizes well to untrained reverberation times, room sizes, measured room impulse responses, real-world recorded noisy-reverberant speech, and different speakers.

Keyphrases