Abstract:
In view of the large amount of computation in human action recognition methods based on two-dimensional convolutional neural networks (2DCNN) and three-dimensional convolutional neural networks (3DCNN), a human action recognition method based on joint point space-time information fusion (Joint-trajectory) is proposed. Firstly, high resolution network (HigherHRnet) is used to extract the spatial coordinate information of each human joint point in each video frame, and the row vector of spatial information of human node in a single frame of image was constructed.. Secondly, the row vector of all joint point spatial information of the whole video are longitudinally spliced in the time dimension, so as to obtain the spatial and temporal information fusion matrix of the video. Finally, the residual network is used to learn and classify the temporal and spatial information fusion matrix of joint points. The experimental results on KTH data set show that the proposed method can effectively reduce the complexity of human action recognition, at the same time, it can obtain higher recognition rate and has strong robustness.