Learning to localize sounds in a highly reverberant environment: Machine-learning tracking of dolphin whistle-like sounds in a pool.
Sean F WoodwardDiana ReissMarcelo O MagnascoPublished in: PloS one (2020)
Tracking the origin of propagating wave signals in an environment with complex reflective surfaces is, in its full generality, a nearly intractable problem which has engendered multiple domain-specific literatures. We posit that, if the environment and sensor geometries are fixed, machine learning algorithms can "learn" the acoustical geometry of the environment and accurately track signal origin. In this paper, we propose the first machine-learning-based approach to identifying the source locations of semi-stationary, tonal, dolphin-whistle-like sounds in a highly reverberant space, specifically a half-cylindrical dolphin pool. Our algorithm works by supplying a learning network with an overabundance of location "clues", which are then selected under supervised training for their ability to discriminate source location in this particular environment. More specifically, we deliver estimated time-difference-of-arrivals (TDOA's) and normalized cross-correlation values computed from pairs of hydrophone signals to a random forest model for high-feature-volume classification and feature selection, and subsequently deliver the selected features into linear discriminant analysis, linear and quadratic Support Vector Machine (SVM), and Gaussian process models. Based on data from 14 sound source locations and 16 hydrophones, our classification models yielded perfect accuracy at predicting novel sound source locations. Our regression models yielded better accuracy than the established Steered-Response Power (SRP) method when all training data were used, and comparable accuracy along the pool surface when deprived of training data at testing sites; our methods additionally boast improved computation time and the potential for superior localization accuracy in all dimensions with more training data. Because of the generality of our method we argue it may be useful in a much wider variety of contexts.