Visual action recognition using deep learning in video surveillance systems

D Kumar, T Priyanka, A Murugesh… - 2020 ITU Kaleidoscope …, 2020 - ieeexplore.ieee.org
D Kumar, T Priyanka, A Murugesh, VP Kafle
2020 ITU Kaleidoscope: Industry-Driven Digital Transformation (ITU K), 2020ieeexplore.ieee.org
The skeleton tracking technique allows the usage of the skeleton information of human-like
objects for action recognition. The major challenge in action recognition in a video
surveillance system is the large variability across and within subjects. In this paper, we
propose a deep-learning-based novel framework to recognize human actions using
skeleton estimation. The main component of the framework consists of pose estimation
using a stacked hourglass network (HGN). The pose estimation module provides the …
The skeleton tracking technique allows the usage of the skeleton information of human-like objects for action recognition. The major challenge in action recognition in a video surveillance system is the large variability across and within subjects. In this paper, we propose a deep-learning-based novel framework to recognize human actions using skeleton estimation. The main component of the framework consists of pose estimation using a stacked hourglass network (HGN). The pose estimation module provides the skeleton joint points of humans. Since the position of skeleton varies according to the point of view, we apply transformations on the skeleton points to make it invariable to rotation and position. The skeleton joint positions are identified using HGN-based deep neural networks (HGN-DNN), and the feature extraction and classification is carried out to obtain the action class. The skeleton action sequence is encoded using Fisher Vector before classification. The proposed system complies with Recommendation ITU-T H.626.5 "Architecture for intelligent visual surveillance systems", and has been evaluated over benchmarked human action recognition data sets. The evaluation results show that the system performance achieves a precision of 85% and the accuracy of 95.6% in recognizing actions like wave, punch, kick, etc. The HGN-DNN model meets the requirements and service description specified in Recommendation ITU-T F.743.
ieeexplore.ieee.org
Showing the best result for this search. See all results