Optimal Training Configurations of a CNN-LSTM-Based Tracker for a Fall Frame Detection System.

Nur Ayuni Mohamed Mohd Asyraf Zulkifley Ahmad Asrul Ibrahim Mustapha Aouache

Published in: Sensors (Basel, Switzerland) (2021)

In recent years, there has been an immense amount of research into fall event detection. Generally, a fall event is defined as a situation in which a person unintentionally drops down onto a lower surface. It is crucial to detect the occurrence of fall events as early as possible so that any severe fall consequences can be minimized. Nonetheless, a fall event is a sporadic incidence that occurs seldomly that is falsely detected due to a wide range of fall conditions and situations. Therefore, an automated fall frame detection system, which is referred to as the SmartConvFall is proposed to detect the exact fall frame in a video sequence. It is crucial to know the exact fall frame as it dictates the response time of the system to administer an early treatment to reduce the fall's negative consequences and related injuries. Henceforth, searching for the optimal training configurations is imperative to ensure the main goal of the SmartConvFall is achieved. The proposed SmartConvFall consists of two parts, which are object tracking and instantaneous fall frame detection modules that rely on deep learning representations. The first stage will track the object of interest using a fully convolutional neural network (CNN) tracker. Various training configurations such as optimizer, learning rate, mini-batch size, number of training samples, and region of interest are individually evaluated to determine the best configuration to produce the best tracker model. Meanwhile, the second module goal is to determine the exact instantaneous fall frame by modeling the continuous object trajectories using the Long Short-Term Memory (LSTM) network. Similarly, the LSTM model will undergo various training configurations that cover different types of features selection and the number of stacked layers. The exact instantaneous fall frame is determined using an assumption that a large movement difference with respect to the ground level along the vertical axis can be observed if a fall incident happened. The proposed SmartConvFall is a novel technique as most of the existing methods still relying on detection rather than the tracking module. The SmartConvFall outperforms the state-of-the-art trackers, namely TCNN and MDNET-N trackers, with the highest expected average overlap, robustness, and reliability metrics of 0.1619, 0.6323, and 0.7958, respectively. The SmartConvFall also managed to produce the lowest number of tracking failures with only 43 occasions. Moreover, a three-stack LSTM delivers the lowest mean error with approximately one second delay time in locating the exact instantaneous fall frame. Therefore, the proposed SmartConvFall has demonstrated its potential and suitability to be implemented for a real-time application that could help to avoid any crucial fall consequences such as death and internal bleeding if the early treatment can be administered.

Keyphrases