Login / Signup

Three-stage hybrid neural beamformer for multi-channel speech enhancement.

Kelan KuangFeiran YangJunfeng LiJun Yang
Published in: The Journal of the Acoustical Society of America (2023)
This paper proposes a hybrid neural beamformer for multi-channel speech enhancement, which comprises three stages, i.e., beamforming, post-filtering, and distortion compensation, called TriU-Net. The TriU-Net first estimates a set of masks to be used within a minimum variance distortionless response beamformer. A deep neural network (DNN)-based post-filter is then utilized to suppress the residual noise. Finally, a DNN-based distortion compensator is followed to further improve speech quality. To characterize the long-range temporal dependencies more efficiently, a network topology, gated convolutional attention network, is proposed and utilized in the TriU-Net. The advantage of the proposed model is that the speech distortion compensation is explicitly considered, yielding higher speech quality and intelligibility. The proposed model achieved an average 2.854 wb-PESQ score and 92.57% ESTOI on the CHiME-3 dataset. In addition, extensive experiments conducted on the synthetic data and real recordings confirm the effectiveness of the proposed method in noisy reverberant environments.
Keyphrases
  • neural network
  • hearing loss
  • randomized controlled trial
  • systematic review
  • working memory
  • air pollution
  • machine learning