基于关键帧的双流卷积网络的人体动作识别方法

doi:10.13878/j.cnki.jnuist.2019.06.009

2025年6月23日 20:57 星期一

首页 > 过刊浏览>2019年第11卷第6期 >716-721. DOI:10.13878/j.cnki.jnuist.2019.06.009

基于关键帧的双流卷积网络的人体动作识别方法
DOI:
                        10.13878/j.cnki.jnuist.2019.06.009
                    
作者:
                        张聪聪张聪聪
北京联合大学 机器人学院, 北京, 100101
在期刊界中查找
在百度中查找
在本站中查找
何宁何宁
北京联合大学 智慧城市学院, 北京, 100101
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:国家自然科学基金（61872042，61572077）；北京市自然科学基金委和北京市教委联合重点项目（KZ201911417048）

Human motion recognition based on key frame two-stream convolutional network

Author:

ZHANG Congcong
ZHANG Congcong
Robotics College, Beijing Union University, Beijing 100101
在期刊界中查找
在百度中查找
在本站中查找
HE Ning
HE Ning
Smart City College, Beijing Union University, Beijing 100101
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

针对视频序列中人体动作识别存在信息冗余大、准确率低的问题，提出基于关键帧的双流卷积网络的人体动作识别方法.该方法构建了由特征提取、关键帧提取和时空特征融合3个模块构成的网络框架.首先将空间域视频的单帧RGB图像和时间域多帧叠加后的光流图像作为输入，送入VGG16网络模型，提取视频的深度特征；其次提取视频的关键帧，通过不断预测每个视频帧的重要性，选取有足够信息的有用帧并汇聚起来送入神经网络进行训练，选出关键帧并丢弃冗余帧；最后将两个模型的Softmax输出加权融合作为输出结果，得到一个多模型融合的人体动作识别器，实现了对视频的关键帧处理和对动作的时空信息的充分利用.在UCF-101公开数据集上的实验结果表明，与当前人体动作识别的主流方法相比，该方法具有较高的识别率，并且相对降低了网络的复杂度.

关键词:关键帧;双流网络;动作识别;特征提取;特征融合

Abstract:

Aiming at the problem of large information redundancy and low accuracy in human motion recognition in video sequences,a human motion recognition method is proposed based on key frame two-stream convolutional network. We construct a network framework consisting of three modules:feature extraction,key frame extraction,and spatial-temporal feature fusion.Firstly,the single-frame RGB image of the spatial domain video and the optical flow image superimposed in the time domain multi-frame are sent as input to the VGG16 network model to extract the depth feature of the video;secondly,the importance of each video frame is continuously predicted,then useful frames with sufficient information are pooled and trained by neural network to select key frames and discard redundant frames.Finally,the Softmax outputs of the two models are weighted and combined as the output result to obtain a multi-model fusion.The human body motion recognizer realizes the key frame processing of the video and the full utilization of the spatial-temporal information of the action.The experimental results on the UCF-101 public dataset show that,compared with the mainstream methods of human motion recognition,the proposed method has a higher recognition rate and relatively reduces the complexity of the network.

Key words:keyframe;two stream networks;action recognition;feature extraction;feature fusion

引用本文

张聪聪,何宁.基于关键帧的双流卷积网络的人体动作识别方法[J].南京信息工程大学学报(自然科学版),2019,11(6):716-721
ZHANG Congcong, HE Ning. Human motion recognition based on key frame two-stream convolutional network[J]. Journal of Nanjing University of Information Science & Technology, 2019,11(6):716-721

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2019-10-07
最后修改日期:
录用日期:
在线发布日期: 2020-01-19
出版日期:

地址：江苏省南京市宁六路219号邮编：210044

联系电话：025-58731025 E-mail：nxdxb@nuist.edu.cn

引用本文

分享

文章指标

历史