YU Jun , TAN Min , ZHANG Hongyuan , ZHANG Haichao
2017, 9(6):567-574. DOI: 10.13878/j.cnki.jnuist.2017.06.001 CSTR:
Abstract:In recent years,fine-grained image recognition has become a hotspot in computer vision area.Due to the subtle visual differences among different image categories and the serious semantic gap,the performance of traditional image recognition algorithms for fine-grained images recognition is mostly unsatisfactory.To overcome these challenges,many researchers have been concentrating on image recognition with user click data.This paper focuses on the three key modules of the fine-grained recognition system with user click data:data pre-processing,feature extracting and model construction.Also,existing algorithms for click data based image recognition are summarized,and the related latest progresses are demonstrated.
ZHU Muyijie , BAO Bingkun , XU Changsheng
2017, 9(6):575-582. DOI: 10.13878/j.cnki.jnuist.2017.06.002 CSTR:
Abstract:Knowledge graph technology is widely concerned and studied during recent years,in this paper we introduce the construction methods,recent development of knowledge graph in details,we also summarize the interdisciplinary applications of knowledge graph and future directions of research.This paper details the key technologies of textual,visual and multi-modal knowledge graph,such as information extraction,knowledge fusion and knowledge representation.As an important part of the knowledge engineering,knowledge graph,especially the development of multi-modal knowledge graph,is of great significance for efficient knowledge management,knowledge acquisition and knowledge sharing in the era of big data.
JIN Zhiwei , CAO Juan , WANG Bo , WANG Rui , ZHANG Yongdong
2017, 9(6):583-592. DOI: 10.13878/j.cnki.jnuist.2017.06.003 CSTR:
Abstract:Social media,such as microblogs,has developed rapidly nowadays,which accelerates the information diffusion on the Internet.However,numerous false rumors fostered on social media are spreading widely on the social network and can result in serious consequences.It has become a huge concern in research and industry areas to detect rumors automatically on social media.Focused on the rumor detection task,this paper summarizes the approaches of multimodal fusion on this problem.Starting from the basic concepts,we give formal definitions of rumors and introduce the characteristics of social media.We summarize the studies on rumor detection into two major parts,i.e.,extracting effective multimodal features to identify rumors and constructing robust models to detect rumors.For each of the research aspects,we give detailed introduction based on existing studies.This paper can be served as a basic guidance to build state-of-the-art rumor detection models and a reference for future researches.
DENG Yingying , TANG Fan , DONG Weiming
2017, 9(6):593-598. DOI: 10.13878/j.cnki.jnuist.2017.06.004 CSTR:
Abstract:In recent years,image artistic style transfer has become a prosperous research field.More and more activities in this field have been promoted by scientific challenges and industrial needs,so image artistic style transfer is worthy of researching.In this paper,we analyze the present situation of image artistic style transfer,the characteristics of different style transfer methods,the shortcomings of the current style transfer methods and the development trend of image style transfer.Finally,we provide the direction for further style transfer research.
QIAN Shengsheng , ZHANG Tianzhu , XU Changsheng
2017, 9(6):599-612. DOI: 10.13878/j.cnki.jnuist.2017.06.005 CSTR:
Abstract:In recent years,with the rapid development of Internet,more and more social networking sites appear and allow users to conveniently share their ideas,pictures,posts,and activities.Therefore,when a popular event is happening around us,it can spread very fast in different social media sites with substantial amounts of multimedia data including images,videos,and texts.Therefore,it is important and necessary to conduct the research of multimedia social event analysis to know the evolutionary trend of social event over time automatically.This paper provides a survey and summarizes major progresses in multimedia social event analysis.We focus on four areas:(1) multimedia social event representation;(2) multimedia social event detection and tracking;(3)multimedia social event evolutionary analysis;and (4) multimedia social event topic-opinion analysis.Then,the development trend of multimedia social event analysis is highlighted.Finally,the possible future research topics in multimedia social event analysis are prospected.
ZHOU Wengang , LI Houqiang , TIAN Qi
2017, 9(6):613-634. DOI: 10.13878/j.cnki.jnuist.2017.06.006 CSTR:
Abstract:The explosive increase and ubiquitous accessibility of visual data on the Web have led to the prosperity of research activity in image search or retrieval.With the ignorance of visual content as a ranking clue,methods with text search techniques for visual retrieval may suffer inconsistency between the text words and the visual content.Content-based image retrieval(CBIR),which makes use of the representation of visual content to identify relevant images,has attracted sustained attention in recent two decades.Such a problem is challenging due to the intention gap and the semantic gap problems.Numerous techniques have been developed for content-based image retrieval in the last decade.The purpose of this paper is to categorize and evaluate those algorithms proposed during the period of 2003 to 2016.We conclude with several promising directions for future research.
MEI Shuhuan , MIN Weiqing , LIU Linhu , DUAN Hua , JIANG Shuqiang
2017, 9(6):635-641. DOI: 10.13878/j.cnki.jnuist.2017.06.007 CSTR:
Abstract:Automatic understanding of food images has various applications in different fields,such as food intake monitor and food calorie estimation.Thus,the research on food related tasks,such as food image retrieval and classification has been one of the hot research topics in the field of multimedia analysis and applications recently.Existing methods mainly extract the visual features from the whole food image for further food analysis.The extracted features are lacking in robustness because of the background interference from the images.In order to solve this problem,we propose a Faster R-CNN (Region-based Convolutional Neural Network) based food retrieval and classification method.For the solution,we first detect the food candidate regions using Faster R-CNN,and then adopt the CNN network to extract the visual features from the detected food regions.Such extracted features are more discriminative for reducing the background interference.Furthermore,we select the annotated food images from the Visual Genome dataset to fine-tune the Faster R-CNN to guarantee its performance.We conduct the experiment on two datasets:Food-101 with 101 classes and 10 641 food images,and Dish-233 with 233 dishes and 49 168 images.The extensive evaluation demonstrates the effectiveness of the proposed Faster R-CNN based food visual feature extraction method in food image retrieval and classification.
HUANG Yi , BAO Bingkun , XU Changsheng
2017, 9(6):642-649. DOI: 10.13878/j.cnki.jnuist.2017.06.008 CSTR:
Abstract:Video description has received increased interest in the field of computer vision.The process of generating video descriptions needs the technology of natural language processing,and the capacity to allow both the lengths of input (sequence of video frames) and output (sequence of description words) to be variable.To this end,this paper uses the recent advances in machine translation,and designs a two-layer LSTM (Long Short-Term Memory) model based on the encoder-decoder architecture.Since the deep neural network can learn appropriate representation of input data,we extract the feature vectors of the video frames by convolution neural network (CNN) and take them as the input sequence of the LSTM model.Finally,we compare the influences of different feature extraction methods on the LSTM video description model.The results show that the model in this paper is able to learn to transform sequence of knowledge representation to natural language.
JIA Huimiao , LI Chunping , ZHOU Dengwen
2017, 9(6):650-655. DOI: 10.13878/j.cnki.jnuist.2017.06.009 CSTR:
Abstract:In order to accurately restore the texture on the oblique edges and improve the overall resolution of the demosaiced image,a convolutional neural network demosaicing algorithm is proposed based on residual interpolation.The algorithm uses the information of Bayer color filter arrays to calculate the gradient of diagonal edges,which can be used to determine the edge directions.Therefore,the corresponding interpolation formula is proposed for different edges.We incorporate the convolutional neural networks into our method to refine the interpolated images.To demonstrate the superiority of the proposed algorithm,several experiments were conducted with IMAX dataset.The experimental results show that the proposed algorithm exhibits better visual effect,higher PSNR and shorter running time compared with those of commonly used Bayer demosaicing algorithms.
LIU Zhonggeng , LIAN Zhichao , FENG Changju
2017, 9(6):656-660. DOI: 10.13878/j.cnki.jnuist.2017.06.010 CSTR:
Abstract:The multiple object tracking (MOT) algorithm will fail when its target is occluded or in fast motion,furthermore,it cannot recover from drifting.To solve these problems,firstly,we employ integrated information to enhance the representation of objects,which includes the target's appearance,shape and motion information.By means of the integrated information,we can accurately calculate the similarity,which is as similar as possible between the same targets and as different as possible between the different targets.Secondly,we propose a novel real-time single object tracker based on the combination of the discriminative correlation filters (DCF) and the Kalman filters,which is robust to occlusion and fast motion.Extensive experiments have been done,and results show that the proposed MOT algorithm can accurately track the target in case of occlusion or fast motion in real time.
2017, 9(6):661-668. DOI: 10.13878/j.cnki.jnuist.2017.06.011 CSTR:
Abstract:With the development of the big data and social network,electronic albums and online services have become basic uses of computers and the Internet.Especially in recent years,the number of electronic albums has exploded with the popularity of social network.So how to improve the user experience of music album becomes particularly important.A photo album with certain topic usually has some emotion information.This paper studies the problem of automatic generation of family music album based on multi-modal fusion,so that users can enjoy music when browsing album photos with matched emotion.According to the emotions in music and images,the representative sentence-level features both for music and images are selected,and the LPP (Locality Preserving Projection) is employed to study the relevance between the music and the images in the same emotion.The image feature and the music feature are mapped into the latent space with more emotional classification ability to realize the automatic generation of music album.In the experiments,the objective evaluation result shows that the LPP method is higher than pure CCA (Canonical Correlation Analysis) method in precision;and in the subjective evaluation,the proposed LPP method achieves 72.06% at satisfaction level,which is close to the results of manually recommended approach (78.09%) and is higher than the results of randomly recommended approach and pure CCA approach.
LI Chunping , ZHOU Dengwen , JIA Huimiao
2017, 9(6):669-674. DOI: 10.13878/j.cnki.jnuist.2017.06.012 CSTR:
Abstract:At present,although the super-resolution (SR) reconstruction algorithm based on the Convolutional Neural Network (CNN) has achieved great success,it cannot well reconstruct the high-frequency texture of the image.As a result,there exists obvious shake in local edge of the high-resolution (HR) image.We present an edge guided dual-channel CNN SR reconstruction algorithm integrated with Morphological Component Analysis (MCA).The low-resolution (LR) image to be processed is decomposed into texture part and structure part by MCA,then the texture part and the original LR image form a dual channel together,which is then input into the modified network structure to reconstruct the HR texture part.The reconstruction loss of both the HR image and HR texture are chosen simultaneously for training.As for post-processing step,we perform histogram matching between our network output and the LR input to strengthen the visual effect and apply an iterative back projection refinement to improve the PSNR.As shown in experiment results,this method with dual-channel input can restore texture details of the image,especially restore the image with rich texture.
XIE Dawei , LIU Yujuan , LEI Ting
2017, 9(6):675-680. DOI: 10.13878/j.cnki.jnuist.2017.06.013 CSTR:
Abstract:To address the problem of excessive short-circuit current in power grid system,we analyzed the application of various short-circuit current limiting measures,including switches in system operating modes,improvement in the power grid structure,use of high impedance transformer,add of fault current limiter.Based on simulation and comparison of short-circuit current limiting measures,this paper puts forward the most reasonable short circuit current restrictive measures according to the actual situation of the power grid.
Address:No. 219, Ningliu Road, Nanjing, Jiangsu Province
Postcode:210044
Phone:025-58731025