Video description based on relationship feature embedding
Author:
  • Article
  • | |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • | |
  • Comments
    Abstract:

    Video description has received increased interest in the field of computer vision.The process of generating video descriptions needs the technology of natural language processing,and the capacity to allow both the lengths of input (sequence of video frames) and output (sequence of description words) to be variable.To this end,this paper uses the recent advances in machine translation,and designs a two-layer LSTM (Long Short-Term Memory) model based on the encoder-decoder architecture.Since the deep neural network can learn appropriate representation of input data,we extract the feature vectors of the video frames by convolution neural network (CNN) and take them as the input sequence of the LSTM model.Finally,we compare the influences of different feature extraction methods on the LSTM video description model.The results show that the model in this paper is able to learn to transform sequence of knowledge representation to natural language.

    Reference
    Related
    Cited by
Get Citation

HUANG Yi, BAO Bingkun, XU Changsheng. Video description based on relationship feature embedding[J]. Journal of Nanjing University of Information Science & Technology,2017,9(6):642-649

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:August 28,2017
  • Online: November 25,2017
Article QR Code

Address:No. 219, Ningliu Road, Nanjing, Jiangsu Province

Postcode:210044

Phone:025-58731025