Login / Signup

Deep learning in automatic detection of dysphonia: Comparing acoustic features and developing a generalizable framework.

Zhen ChenPeixi ZhuWei QiuJiajie GuoYike Li
Published in: International journal of language & communication disorders (2022)
What is already known on this subject Auditory-perceptual assessment is the current gold standard in clinical evaluation of voice quality, but its value may be limited by the rater's reliability and accessibility. DL is a new method of artificial intelligence that can overcome these disadvantages and promote automatic voice assessment. This study explored the feasibility of a DL approach for automatic detection of dysphonia, along with a quantitative comparison of two common sets of acoustic features. What this study adds to existing knowledge A CNN model is excellent at decoding multidimensional acoustic features, outperforming the baseline parameter-based models in identifying dysphonic voices. The first 13 mel-frequency cepstral coefficients (MFCCs) are sufficient for this task. The mel-spectrogram results in greater performance, indicating the acoustic features are presented in a more favourable way than the MFCCs to the CNN model. What are the potential or actual clinical implications of this work? DL is a feasible method for the detection of dysphonia. The current DL framework may be used for remote vocal health screening or documenting voice recovery after treatment. In future, DL models may potentially be used to perform auditory-perceptual tasks in an automatic, efficient, reliable and low-cost manner.
Keyphrases