赵康, 黎向锋, 李高扬, 左敦稳. 轻量级(2+1)D卷积结构的动态手势识别研究[J]. 微电子学与计算机, 2022, 39(9): 46-54. DOI: 10.19304/J.ISSN1000-7180.2022.0115
引用本文: 赵康, 黎向锋, 李高扬, 左敦稳. 轻量级(2+1)D卷积结构的动态手势识别研究[J]. 微电子学与计算机, 2022, 39(9): 46-54. DOI: 10.19304/J.ISSN1000-7180.2022.0115
ZHAO Kang, LI Xiangfeng, LI Gaoyang, ZUO Dunwen. Dynamic gesture recognition based on lightweight (2+1)D convolution structure[J]. Microelectronics & Computer, 2022, 39(9): 46-54. DOI: 10.19304/J.ISSN1000-7180.2022.0115
Citation: ZHAO Kang, LI Xiangfeng, LI Gaoyang, ZUO Dunwen. Dynamic gesture recognition based on lightweight (2+1)D convolution structure[J]. Microelectronics & Computer, 2022, 39(9): 46-54. DOI: 10.19304/J.ISSN1000-7180.2022.0115

轻量级(2+1)D卷积结构的动态手势识别研究

Dynamic gesture recognition based on lightweight (2+1)D convolution structure

  • 摘要: 目前,基于卷积神经网络的动态手势识别方法取得了巨大的进展,但神经网络模型具有很大的参数量,计算成本和内存占用较大,很难应用在设备资源有限的场合.以减少计算量和参数量为出发点,提出了一种轻量级(2+1)D卷积结构.该结构在(2+1)D卷积结构的基础上,将其中的3D卷积替换为3D深度可分离卷积,在输出向量维度不变的前提下,进一步减少了(2+1)D卷积结构的计算量和参数量.为了弥补时空特征在表征动态手势上的不足,融合注意力机制模块,专注于对运动特征的提取,结合轻量级(2+1)D卷积结构提取的时空特征,可以更好地表征手势动作.实验结果表明,注意力机制模块的插入,在不增加太多额外计算和空间成本的前提下,进一步提高了模型的识别精度.基于以上结构构建的模型,在20BN-jester、EgoGesture和IsoGD数据集上分别取得了96.62%、91.83%和60.1%的识别精度,模型参数量和浮点计算量分别为5.05M和12.81GFLOPs,相比于其他手势识别模型,计算成本和内存占用大大减少,实时手势识别速度达到每秒70帧.

     

    Abstract: At present, great progress has been made in dynamic gesture recognition based on convolutional neural network. But neural network model has a large number of parameters, the cost of calculation and memory footprint is high, and it is difficult to apply for the occasion of limited equipment resources. In order to reduce the amount of calculation and parameter, a lightweight (2+1)D convolution structure is proposed. Based on the (2+1)D convolution structure, the 3D convolution is replaced by the 3D depthwise separable convolution. So the computation and parameter number of (2+1)D convolution structure are further reduced under the premise that the dimension of the output vector is unchanged. In order to make up for the deficiency of spatio-temporal features in the representation of dynamic gestures, attention mechanism module that focusing on the extraction of motion features is integrated. Combined with the spatio-temporal features that be extracted by the lightweight (2+1)D convolution structure, it can better represent gestures. Experimental results show that by inserting the attention mechanism module, the recognition accuracy of the model is further improved without increasing too much extra calculation and space cost. On 20BN-jester, EgoGesture and IsoGD datasets, the model based on the above structure achieved the recognition accuracy of 96.62%, 91.83% and 60.1%, respectively. The number of parameters and floating point of operations are 5.05M and 12.81GFLOPs respectively, which greatly reduces the calculation cost and memory footprint. Recognition speed is 70 frames per second in the real-time gesture recognition.

     

/

返回文章
返回