Login / Signup

Monaural Speech Dereverberation Using Temporal Convolutional Networks with Self Attention.

Yan ZhaoDeLiang WangBuye XuTao Zhang
Published in: IEEE/ACM transactions on audio, speech, and language processing (2020)
In daily listening environments, human speech is often degraded by room reverberation, especially under highly reverberant conditions. Such degradation poses a challenge for many speech processing systems, where the performance becomes much worse than in anechoic environments. To combat the effect of reverberation, we propose a monaural (single-channel) speech dereverberation algorithm using temporal convolutional networks with self attention. Specifically, the proposed system includes a self-attention module to produce dynamic representations given input features, a temporal convolutional network to learn a nonlinear mapping from such representations to the magnitude spectrum of anechoic speech, and a one-dimensional (1-D) convolution module to smooth the enhanced magnitude among adjacent frames. Systematic evaluations demonstrate that the proposed algorithm improves objective metrics of speech quality in a wide range of reverberant conditions. In addition, it generalizes well to untrained reverberation times, room sizes, measured room impulse responses, real-world recorded noisy-reverberant speech, and different speakers.
Keyphrases
  • working memory
  • neural network
  • hearing loss
  • machine learning
  • deep learning
  • physical activity
  • mass spectrometry
  • body composition
  • high intensity