HUANG Jingui, HUANG Yiju. Video prediction based on attention spatiotemporal decoupling 3D convolution LSTM[J]. Microelectronics & Computer, 2022, 39(9): 63-72. DOI: 10.19304/J.ISSN1000-7180.2022.0023
Citation: HUANG Jingui, HUANG Yiju. Video prediction based on attention spatiotemporal decoupling 3D convolution LSTM[J]. Microelectronics & Computer, 2022, 39(9): 63-72. DOI: 10.19304/J.ISSN1000-7180.2022.0023

Video prediction based on attention spatiotemporal decoupling 3D convolution LSTM

  • To efficiently extract video spatio-temporal features to improve video prediction accuracy, an attentional spatio-temporal decoupling 3D convolutional LSTM algorithm is proposed. Firstly, the traditional 2D convolutional operation of the internal unit of convolutional LSTM is changed to 3D convolution to additionally extract short-term spatial motion information between video frames; and the correlation of long-term dynamic information between video frames is automatically captured by the attention mechanism. Since the Z-shaped transfer direction of feature information in the convolutional LSTM network in all layers leads to gradient disappearance, for this reason, inter-layer high-speed channels are added to the network structure to optimize the transfer process of video information flow between different inter-layer LSTM units. Meanwhile, temporal and spatial features in the network will interfere with each other to learn redundant functions, resulting in inefficient acquisition of feature information and degradation of network prediction quality, so temporal decoupling operations are added to the loss function to separate the learning of temporal and spatial features. For the data input process in the training encoding phase and the prediction decoding phase, data input resampling is proposed to reduce the differences between the encoder and decoder by using similar and opposite data input strategies in the model training and prediction phases. Experimental results on synthetic datasets as well as human action databases show that the algorithmic model has better performance in spatio-temporal feature extraction.
  • loading

Catalog

    Turn off MathJax
    Article Contents

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return