Login / Signup

Performance analysis of various fundamental frequency estimation algorithms in the context of pathological speech.

Robin VaysseCorine AstesanoJérôme Farinas
Published in: The Journal of the Acoustical Society of America (2022)
Reliable fundamental frequency (f 0 ) extraction algorithms are crucial in many fields of speech research. The current bulk of studies testing the robustness of different algorithms have focused on healthy speech and/or measurements of sustained vowels. Few studies have tested f 0 estimations in the context of pathological speech, and even fewer on continuous speech. The present study evaluated 12 available pitch detection algorithms on a corpus of read speech by 24 speakers (8 healthy speakers, 8 speakers with Parkinson's disease, and 8 with head and neck cancer). Two fusion methods' algorithms have been tested: one based on the median of algorithms and one based on the fusion between the best algorithm for voicing detection and the algorithm that generates the most accurate f 0 estimations on voiced parts. Our results show that time-domain algorithms, like REAPER, are best for voicing detection while deep neural network algorithms, like FCN- f 0, yield better accuracy for the f 0 values on voiced parts. The combination of REAPER and FCN- f 0 yields the best ratio performance/implementation complexity, since it generates less than 4% errors on voicing detection and less than 5% of gross errors in the estimation of the f 0 values for all speaker groups.
Keyphrases