• 北大核心期刊(《中文核心期刊要目总览》2017版)
  • 中国科技核心期刊(中国科技论文统计源期刊)
  • JST 日本科学技术振兴机构数据库(日)收录期刊

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于深度学习的行为识别方法综述

袁首 乔勇军 苏航 陈青华 刘星

袁首, 乔勇军, 苏航, 陈青华, 刘星. 基于深度学习的行为识别方法综述[J]. 微电子学与计算机, 2022, 39(8): 1-10. doi: 10.19304/J.ISSN1000-7180.2021.1327
引用本文: 袁首, 乔勇军, 苏航, 陈青华, 刘星. 基于深度学习的行为识别方法综述[J]. 微电子学与计算机, 2022, 39(8): 1-10. doi: 10.19304/J.ISSN1000-7180.2021.1327
YUAN Shou, QIAO Yongjun, SU Hang, CHEN Qinghua, LIU Xing. A review of behavior recognition methods based on deep learning[J]. Microelectronics & Computer, 2022, 39(8): 1-10. doi: 10.19304/J.ISSN1000-7180.2021.1327
Citation: YUAN Shou, QIAO Yongjun, SU Hang, CHEN Qinghua, LIU Xing. A review of behavior recognition methods based on deep learning[J]. Microelectronics & Computer, 2022, 39(8): 1-10. doi: 10.19304/J.ISSN1000-7180.2021.1327

基于深度学习的行为识别方法综述

doi: 10.19304/J.ISSN1000-7180.2021.1327
基金项目: 

军队“十三五”模拟仿真体系建设项目 2021-HCX-MN-050014

详细信息
    作者简介:

    袁首  男,(1998-),硕士研究生.研究方向为信息化作战与辅助决策.E-mail:1151190542@qq.com

    乔勇军  男,(1974-),博士,副教授.研究方向为行为识别、轨迹预测

    苏航  男,(1998-),硕士研究生.研究方向为遥感图像智能翻译

    陈青华  女,(1979-),博士,副教授.研究方向装备信息化与信息安全

    刘星  男,(1982-),博士,助理工程师.研究方向为复杂电子系统测试与诊断技术研究

  • 中图分类号: TP301.6

A review of behavior recognition methods based on deep learning

  • 摘要:

    行为识别作为计算机视觉领域研究的热点,在当今社会的智能安防、智能监控、智慧医疗等领域有着广泛的应用,而将在在计算机视觉方面有着突出表现的深度学习应用在行为识别研究上效果便更加显著.相较于传统基于手动特征提取方法,基于深度学习的行为识别方法具有速度快、鲁棒性强、准确率高等优点,因此文章针对基于深度学习中的视频行为识别方法进行综述.通过对国内外最新发表的相关文献进行归纳总结,首先阐述分析了传统行为识别方法以及相应改进点,依照网络架构的不同详细梳理基于深度学习的行为识别方法,继而研究对比常见的识别数据集并且比较各算法在数据集上的表现优劣,最后对本领域的研究进行总结,侧重于存在的问题对未来进行了展望,希望可以对之后研究者予以启迪和帮助.

     

  • 图 1  手动特征的行为识别方法流程图

    Figure 1.  Flow chart of behavior recognition method for manual features

    表  1  常见的行为识别数据集

    Table  1.   Common behavior recognition data sets

    下载: 导出CSV

    表  2  典型算法在行为数据集上的表现

    Table  2.   Performance of typical algorithms on behavioral data sets

    算法类别 算法名称 准确度/% 优势 不足
    UCF-101
    数据集
    HMDB-51
    数据集
    Kinetics
    数据集
    NTU
    数据集
    手动特征行为识别 iDT 85.9 57.2 稳定性准确度高,易结合深度学习 计算量大且速度慢
    CNN C3D 82.3 51.6 结构简单,训练速度快 学习开销大,硬件要求高
    I3D 93.4 66.4 90.0
    P3D 88.6
    R(2+1)D 96.3 74.1 91.4
    RNN LSTM 88.6 88.1 70.9 计算量低,对细长肢体信息敏感 梯度弥散、应用受限
    Video-LSTM 88.9 56.4
    AC-LSTM 94.6 69.8 89.3 71.3
    SD-LSTM 95.2 71.6 89.6 71.4
    CNN+RNN LRCN 82.9 输入丰富,处理范围广 层数浅易掉帧
    Two-Stream类 Two-Stream 88.0 59.4 91.3 准确性高,扩展性强 时间规模与实时性差
    TSN 94.2 69.4 72.3
    STResNet 94.6 70.3
    其他类 Chen-ShuffleNet 96.0 74.6 精确度高,降低模型研发计算成本 实际使用性有待检验
    ST-GCN 52.8 88.3
    AS-GCN 56.5 94.2
    AGC-LSTM 89.2 94.2
    SGN 94.5 75.3
    下载: 导出CSV
  • [1] BOBICK A F, DAVIS J W. The recognition of human movement using temporal templates[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2001, 23(3): 257-267. DOI: 10.1109/34.910878.
    [2] WEINLAND D, BOYER E, RONFARD R. Action recognition from arbitrary views using 3D exemplars[C]//Proceedings of the 2007 IEEE 11th International Conference on Computer Vision. Rio de Janeiro: IEEE, 2007: 1-7. DOI: 10.1109/ICCV.2007.4408849.
    [3] WANG Y, HUANG KQ, TAN T N. Human activity recognition based on r transform[C]//Proceedings of 2007 IEEE Conference on Computer Vision and Pattern Recognition. Minneapolis: IEEE, 2007: 1-8. DOI: 10.1109/CVPR.2007.383505.
    [4] WANG L, SUTER D. Informative shape representations for human action recognition[C]//Proceedings of the 18th International Conference on Pattern Recognition. Hong Kong, China: IEEE, 2006: 1266-1269. DOI: 10.1109/ICPR.2006.711.
    [5] LAPTEV I. On space-time interest points[J]. International Journal of Computer Vision, 2005, 64(2): 107-123. DOI: 10.1007/s11263-005-1838-7.
    [6] DOLLÁR P, RABAUD V, COTTRELL G, et al. Behavior recognition via sparse spatio-temporal features[C]//Proceedings of 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance. Beijing: IEEE, 2005: 65-72. DOI: 10.1109/VSPETS.2005.1570899.
    [7] WILLEMS G, TUYTELAARS T, VAN GOOL L. An efficient dense and scale-invariant spatio-temporal interest point detector[C]//Proceedings of the 10th European Conference on Computer Vision. Marseille: Springer, 2008: 650-663. DOI: 10.1007/978-3-540-88688-4_48.
    [8] WANG H, ULLAH M M, KLASER A, et al. Evaluation of local spatio-temporal features for action recognition[C]//Proceedings of British Machine Vision Conference. London: BMVC, 2009: 124.1-124.11. DOI: 10.5244/C.23.124.
    [9] RAPANTZIKOS K, AVRITHIS Y, KOLLIAS S. Spatiotemporal saliency for event detection and representation in the 3D wavelet domain: potential in human action recognition[C]//Proceedings of the 6th ACM International Conference on Image and Video Retrieval. Amsterdam: ACM, 2007: 294-301. DOI: 10.1145/1282280.1282326.
    [10] KLAESER A, MARSZALEK M, SCHMID C. A spatio-temporal descriptor based on 3D-gradients[C]//Proceedings of British Machine Vision Conference. Leeds: BMVA, 2008: 1-10. DOI: 10.5244/C.22.99.
    [11] WANG H, KLÄSER A, SCHMID C, et al. Dense trajectories and motion boundary descriptors for action recognition[J]. International Journal of Computer Vision, 2013, 103(1): 60-79. DOI: 10.1007/s11263-012-0594-8.
    [12] HERATH S, HARANDI M, PORIKLI F. Going deeper into action recognition: a survey[J]. Image and Vision Computing, 2017, 60: 4-21. DOI: 10.1016/j.imavis.2017.01.010.
    [13] WANG H, SCHMID C. Action recognition with improved trajectories[C]//Proceedings of 2013 IEEE International Conference on Computer Vision. Sydney: IEEE, 2013: 3551-3558. DOI: 10.1109/ICCV.2013.441.
    [14] HAO F F, LIU J, CHEN X D. A review of human behavior recognition based on deep learning[C]//Proceedings of 2020 International Conference on Artificial Intelligence and Education. Tianjin: IEEE, 2020: 19-23. DOI: 10.1109/ICAIE50891.2020.00012.
    [15] HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 770-778. DOI: 10.1109/CVPR.2016.90.
    [16] TRAN D, BOURDEV L, FERGUS R, et al. Learning spatiotemporal features with 3D convolutional networks[C]//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago: IEEE, 2015: 4489-4497. DOI: 10.1109/ICCV.2015.510.
    [17] CARREIRA J, ZISSERMAN A. Quo vadis, action recognition? A new model and the kinetics dataset[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 4724-4733. DOI: 10.1109/CVPR.2017.502.
    [18] QIU Z F, YAO T, MEI T. Learning spatio-temporal representation with pseudo-3D residual networks[C]//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice: IEEE, 2017: 5534-5542. DOI: 10.1109/ICCV.2017.590.
    [19] TRAN D, WANG H, TORRESANI L, et al. A closer look at spatiotemporal convolutions for action recognition[C]//Proceedings of 2018IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 6450-6459. DOI: 10.1109/CVPR.2018.00675.
    [20] 黄文明, 阳沐利, 蓝如师, 等. 融合非局部神经网络的行为检测模型[J]. 图学学报, 2021, 42(3): 439-445. https://www.cnki.com.cn/Article/CJFDTOTAL-GCTX202103013.htm

    HUANG W M, YANG M L, LAN R S, et al. Action detection model fused with non-local neural network[J]. Journal of Graphics, 2021, 42(3): 439-445. https://www.cnki.com.cn/Article/CJFDTOTAL-GCTX202103013.htm
    [21] CRASTO N, WEINZAEPFEL P, ALAHARI K, et al. MARS: motion-augmented RGB stream for action recognition[C]//Proceedings of 2019IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 7874-7883. DOI: 10.1109/CVPR.2019.00807.
    [22] YANG H, YUAN C F, LI B, et al. Asymmetric 3D convolutional neural networks for action recognition[J]. Pattern Recognition, 2019, 85: 1-12. DOI: 10.1016/j.patcog.2018.07.028.
    [23] DIBA A, FAYYAZ M, SHARMA V, et al. Spatio-temporal channel correlation networks for action classification[C]//Proceedings of the 15th European Conference on Computer Vision. Munich: Springer, 2018: 299-315. DOI: 10.1007/978-3-030-01225-0_18.
    [24] HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural Computation, 1997, 9(8): 1735-1780. DOI: 10.1162/neco.1997.9.8.1735.
    [25] 余兴. 基于深度学习的视频行为识别技术研究[D]. 成都: 电子科技大学, 2018.

    YU X. Video action recognition technology research based on deep learning[D]. Chengdu: University of Electronic Science and Technology of China, 2018.
    [26] LI Z Y, GAVRILYUK K, GAVVES E, et al. VideoLSTM convolves, attends and flows for action recognition[J]. Computer Vision and Image Understanding, 2018, 166: 41-50. DOI: 10.1016/j.cviu.2017.10.011.
    [27] 揭志浩, 曾明如, 周鑫恒, 等. 结合Attention-ConvLSTM的双流卷积行为识别[J]. 小型微型计算机系统, 2021, 42(2): 405-408. DOI: 10.3969/j.issn.1000-1220.2021.02.031.

    JIE Z H, ZENG M R, ZHOU X H, et al. Two stream CNN with Attention-ConvLSTM on human behavior recognition[J]. Journal of Chinese Computer Systems, 2021, 42(2): 405-408. DOI: 10.3969/j.issn.1000-1220.2021.02.031.
    [28] AGHAEI A, NAZARI A, MOGHADDAM M E. Sparse deep LSTMs with convolutional attention for human action recognition[J]. SN Computer Science, 2021, 2(3): 151. DOI: 10.1007/s42979-021-00576-x.
    [29] DONAHUE J, ANNE HENDRICKS L, GUADARRAMA S, et al. Long-term recurrent convolutional networks for visual recognition and description[C]//Proceedings of 2015IEEE Conference on Computer Vision and Pattern Recognition. Boston: IEEE, 2015: 2625-2634. DOI: 10.1109/CVPR.2015.7298878.
    [30] NG J Y H, HAUSKNECHT M, VIJAYANARASIMHAN S, et al. Beyond short snippets: deep networks for video classification[C]//Proceedings of 2015IEEE Conference on Computer Vision and Pattern Recognition. Boston: IEEE, 2015: 4694-4702. DOI: 10.1109/CVPR.2015.7299101.
    [31] SRIVASTAVA N, MANSIMOV E, SALAKHUTDINOV R. Unsupervised learning of video representations using LSTMs[C]//Proceedings of the 32nd International Conference on Machine Learning. Lille: JMLR. org, 2015: 843-852. DOI: 10.5555/3045118.3045209.
    [32] YAN S J, XIONG Y J, LIN D H. Spatial temporal graph convolutional networks for skeleton-based action recognition[C]//Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence. New Orleans: AAAI, 2018: 912. DOI: 10.5555/3504035.3504947.
    [33] SI C Y, JING Y, WANG W, et al. Skeleton-based action recognition with spatial reasoning and temporal stack learning[C]//Proceedings of the 15th European Conference on Computer Vision. Munich: Springer, 2018: 109-121. DOI: 10.1007/978-3-030-01246-5_7.
    [34] WU Z X, WANG X, JIANG Y G, et al. Modeling spatial-temporal clues in a hybrid deep learning framework for video classification[C]//Proceedings of the 23rd ACM international conference on Multimedia. Brisbane: ACM, 2015: 461-470. DOI: 10.1145/2733373.2806222.
    [35] SIMONYAN K, ZISSERMAN A. Two-stream convolutional networks for action recognition in videos[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems. Montreal: MIT Press, 2014: 568-576. DOI: 10.5555/2968826.2968890.
    [36] WANG L M, XIONG Y J, WANG Z, et al. Temporal segment networks: towards good practices for deep action recognition[C]//Proceedings of the 14th European Conference on Computer Vision. Amsterdam: Springer, 2016: 20-36. DOI: 10.1007/978-3-319-46484-8_2.
    [37] WANG L M, TONG Z, JI B, et al. TDN: temporal difference networks for efficient action recognition[C]//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021: 1895-1904. DOI: 10.1109/CVPR46437.2021.00193.
    [38] FEICHTENHOFER C, PINZ A, WILDES R P. Spatiotemporal residual networks for video action recognition[J]. arXiv: 1611.02155, 2016.
    [39] FEICHTENHOFER C, FAN H Q, MALIK J, et al. SlowFast networks for video recognition[C]//Proceedings of 2019IEEE/CVF International Conference on Computer Vision. Seoul: IEEE, 2019: 6201-6210. DOI: 10.1109/ICCV.2019.00630.
    [40] TAO L, WANG X T, YAMASAKI T. Rethinking motion representation: residual frames with 3D convnets for better action recognition[J]. arXiv: 2001.05661, 2020.
    [41] OHN-BAR E, TRIVEDI M M. Joint angles similarities and HOG2 for action recognition[C]//Proceedings of 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops. Portland: IEEE, 2013: 465-470. DOI: 10.1109/CVPRW.2013.76.
    [42] LUO J J, WANG W, QI H R. Spatio-temporal feature extraction and representation for RGB-D human action recognition[J]. Pattern Recognition Letters, 2014, 50: 139-148. DOI: 10.1016/j.patrec.2014.03.024.
    [43] 周启臻, 邢建春, 杨启亮, 等. 基于连续图像深度学习的Wi-Fi人体行为识别方法[J]. 通信学报, 2020, 41(8): 43-54. DOI: 10.11959/j.issn.1000-436x.2020141.

    ZHOU Q Z, XING J C, YANG Q L, et al. Sequential image deep learning-based Wi-Fi human activity recognition method[J]. Journal on Communications, 2020, 41(8): 43-54. DOI: 10.11959/j.issn.1000-436x.2020141.
    [44] CHEN Y P, KALANTIDIS Y, LI J S, et al. Multi-fiber networks for video recognition[C]//Proceedings of the 15th European Conference on Computer Vision. Munich: Springer, 2018: 364-380. DOI: 10.1007/978-3-030-01246-5_22.
    [45] YAN S J, XIONG Y J, LIN D H. Spatial temporal graph convolutional networks for skeleton-based action recognition[C]//Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence. New Orleans: AAAI, 2018: 7444-7452.
    [46] LI M S, CHEN S H, CHEN X, et al. Actional-structural graph convolutional networks for skeleton-based action recognition[C]//Proceedings of 2019IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 3590-3598. DOI: 10.1109/CVPR.2019.00371.
    [47] SI C Y, CHEN W T, WANG W, et al. An attention enhanced graph convolutional LSTM network for skeleton-based action recognition[C]//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 1227-1236. DOI: 10.1109/CVPR.2019.00132.
    [48] ZHANG P F, LAN C L, ZENG W J, et al. Semantics-guided neural networks for efficient skeleton-based human action recognition[C]//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 1109-1118. DOI: 10.1109/CVPR42600.2020.00119.
    [49] HAO X K, LI J, GUO Y C, et al. Hypergraph neural network for skeleton-based action recognition[J]. IEEE Transactions on Image Processing, 2021, 30: 2263-2275. DOI: 10.1109/TIP.2021.3051495.
    [50] SCHULDT C, LAPTEV I, CAPUTO B. Recognizing human actions: a local SVM approach[C]//Proceedings of the 17th International Conference on Pattern Recognition. Cambridge: IEEE, 2004: 32-36. DOI: 10.1109/ICPR.2004.1334462.
    [51] GORELICK L, BLANK M, SHECHTMAN E, et al. Actions as space-time shapes[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2007, 29(12): 2247-2253. DOI: 10.1109/TPAMI.2007.70711.
    [52] LAPTEV I, MARSZALEK M, SCHMID C, et al. Learning realistic human actions from movies[C]//Proceedings of 2008 IEEE Conference on Computer Vision and Pattern Recognition. Anchorage: IEEE, 2008: 1-8. DOI: 10.1109/CVPR.2008.4587756.
    [53] VEZZANI R, BALTIERI D, CUCCHIARA R. HMM based action recognition with projection histogram features[C]//Proceedings of the 20th International conference on Recognizing Patterns in Signals, Speech, Images, and Videos. Istanbul: Springer, 2010: 286-293. DOI: 10.1007/978-3-642-17711-8_29.
    [54] SAFDARNEJAD S M, LIU X M, UDPA L, et al. Sports videos in the wild (SVW): a video dataset for sports analysis[C]//Proceedings of the 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition. Ljubljana: IEEE, 2015: 1-7. DOI: 10.1109/FG.2015.7163105.
    [55] SOOMRO K, ZAMIR A R, SHAH M. UCF101: a dataset of 101 human actions classes from videos in the wild[J]. arXiv: 1212.0402, 2012.
    [56] CABA HEILBRON F, ESCORCIA V, GHANEM B, et al. ActivityNet: a large-scale video benchmark for human activity understanding[C]//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston: IEEE, 2015: 961-970. DOI: 10.1109/CVPR.2015.7298698.
    [57] CARREIRA J, NOLAND E, HILLIER C, et al. A short note on the kinetics-700 human action dataset[J]. arXiv: 1907.06987, 2019.
  • 加载中
图(1) / 表(2)
计量
  • 文章访问数:  106
  • HTML全文浏览量:  55
  • PDF下载量:  35
  • 被引次数: 0
出版历程
  • 收稿日期:  2021-12-13
  • 修回日期:  2022-02-12
  • 网络出版日期:  2022-08-15

目录

    /

    返回文章
    返回