Human motion recognition based on key frame two-stream convolutional network
Author:
  • Article
  • | |
  • Metrics
  • |
  • Reference [25]
  • |
  • Related [20]
  • | | |
  • Comments
    Abstract:

    Aiming at the problem of large information redundancy and low accuracy in human motion recognition in video sequences,a human motion recognition method is proposed based on key frame two-stream convolutional network. We construct a network framework consisting of three modules:feature extraction,key frame extraction,and spatial-temporal feature fusion.Firstly,the single-frame RGB image of the spatial domain video and the optical flow image superimposed in the time domain multi-frame are sent as input to the VGG16 network model to extract the depth feature of the video;secondly,the importance of each video frame is continuously predicted,then useful frames with sufficient information are pooled and trained by neural network to select key frames and discard redundant frames.Finally,the Softmax outputs of the two models are weighted and combined as the output result to obtain a multi-model fusion.The human body motion recognizer realizes the key frame processing of the video and the full utilization of the spatial-temporal information of the action.The experimental results on the UCF-101 public dataset show that,compared with the mainstream methods of human motion recognition,the proposed method has a higher recognition rate and relatively reduces the complexity of the network.

    Reference
    [1] Simonyan K,Zisserman A.Two-stream convolutional networks for action recognition in videos[J].Neural Information Processing Systems,2014,1(2):568-576
    [2] Feichtenhofer C,Pinz A,Wildes R P.Spatiotemporal residual networks for video action recognition[J].Neural Information Processing Systems,2016,2(3):3468-3476
    [3] Feichtenhofer C,Pinz A,Wildes R P.Spatiotemporal multiplier networks for video action recognition[C]//IEEE Conference on Computer Vision and Pattern Recognition (CVPR),2017,DOI:10.1109/CVPR.2017.787
    [4] Zhu Y,Lan Z Z,Newsam S,et al.Hidden two-stream convolutional networks for action recognition[M]//Computer Vision-ACCV 2018.Cham:Springer International Publishing,2019:363-378
    [5] Hochreiter S,Schmidhuber J.Long short-term memory[J].Neural Computation,1997,9(8):1735-1780
    [6] Kar A,Rai N,Sikka K,et al.AdaScan:adaptive scan pooling in deep convolutional neural networks for human action recognition in videos[C]//IEEE Conference on Computer Vision and Pattern Recognition (CVPR),2016:3376-3385
    [7] Bobick A F,Davis J W.The recognition of human movement using temporal templates[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2001,23(3):257-267
    [8] Gorelick L,Blank M.Actions as space-time shapes[J].Pattern Analysis and Machine Intelligence,2007,29(12):2247-2253
    [9] Laptev I,Marszalek M,et al.Learning realistic human actions from movies[J].IEEE Conference on Computer Vision and Pattern Recognition (CVPR),2008,DOI:10.1109/CVPR.2008.4587756
    [10] Klaser A,Marszalek M.A spatio-temporal descriptor based on 3D-gradients[C]//British Machine Vision Conference,2008,DOI:10.5244/C.22.99
    [11] Mikolajczyk K,Mikolajczyk K.Scale & affine invariant interest point detectors[J].International Journal of Computer Vision,2004,60(1):63-86
    [12] Scovanner P,Ali S,Shah M.A 3-dimensional sift descriptor and its application to action recognition[C]//ACM International Conference on Multimedia,2007:357-360
    [13] Wang H,Ullah M M,Klaser A,et al.Evaluation of local spatio-temporal features for action recognition[C]//British Machine Vision Conference,2009,DOI:10.5244/C.23.124
    [14] Ng Y H,Hausknecht M,Vijayanarasimhan S,et al.Beyond short snippets:deep networks for video classification[C]//IEEE Conference on Computer Vision and Pattern Recognition (CVPR),2015:4694-4702
    [15] Donahue J,Hendricks L A,Rohrbach M,et al.Long-term recurrent convolutional networks for visual recognition and description[J].IEEE Transactions on Pattern Analysis & Machine Intelligence,2014,39(4):677-691
    [16] Karpathy A,Toderici G,Shetty S,et al.Large-scale video classification with convolutional neural networks[C]//IEEE Conference on Computer Vision and Pattern Recognition,2014:1725-1732
    [17] Glorot X,Bengio Y.Understanding the difficulty of training deep feedforward neural networks[C]//International Conference on Artificial Intelligence and Statistics,2010:249-256
    [18] 张文宇.基于证据理论的无线传感器网络决策融合算法研究[D].北京:北京交通大学,2016 ZHANG Wenyu.Research on beliffunction based decision fusion for wireless sensor networks[D].Beijing:Beijing Jiaotong University,2016
    [19] Soomro K,Zamir A R,Shah M.Ucf101:a dataset of 101 human actions classes from videos in the wild[J].arXiv Preprint,2012,arXiv:1212.0402
    [20] Wang L,Xiong Y,Wang Z,et al.Towards good practices for very deep two-stream convnets[J].arXiv Preprint,2015,arXiv:1507.02159
    [21] Deng J,Dong W,Socher R,et al.ImageNet:a large-scale hierarchical image database[C]//IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR),2009,DOI:10.1109/CVPR.2009.5206848
    [22] Simonyan K,Zisserman A.Very deep convolutional networks for large-scale image recognition[J].arXiv Preprint,2014,arXiv:1409.1556
    [23] Srivastava N,Mansimov E,Salakhutdinov R.Unsupervised learning of video representations using LSTMs[C]//The 32th International Conference on Machine Learning (ICML),2015:843-852
    [24] Bilen H,Fernando B,Gavves E,et al.Dynamic image networks for action recognition[C]//IEEE Conference on Computer Vision and Pattern Recognition (CVPR),2016:3034-3042
    [25] Feichtenhofer C,Pinz A,Zisserman A.Convolutional two-stream network fusion for video action recognition[C]//IEEE Conference on Computer Vision and Pattern Recognition (CVPR),2016:1933-1941
    Cited by
    Comments
    Comments
    分享到微博
    Submit
Get Citation

ZHANG Congcong, HE Ning. Human motion recognition based on key frame two-stream convolutional network[J]. Journal of Nanjing University of Information Science & Technology,2019,11(6):716-721

Copy
Share
Article Metrics
  • Abstract:587
  • PDF: 2196
  • HTML: 0
  • Cited by: 0
History
  • Received:October 07,2019
  • Online: January 19,2020
Article QR Code

Address:No. 219, Ningliu Road, Nanjing, Jiangsu Province

Postcode:210044

Phone:025-58731025