DENG Xuran , MIN Shaobo , XU Jingyuan , LI Pandeng , XIE Hongtao , ZHANG Yongdong
2019, 11(6):625-637. DOI: 10.13878/j.cnki.jnuist.2019.06.001
Abstract:Fine-grained image classification is a fundamental and important task in field of computer vision.The purpose of the task is to distinguish between object categories that have subtle inter-class differences (e.g.,birds,flowers,or animals of different sub-categories).Different from traditional image classification tasks that can employ a large number of common people for image annotations,fine-grained image classification usually requires expert-level knowledge.In addition to the common classification challenges of pose,lighting,and viewing changes,fine-grained datasets have larger inter-class similarity and intra-class variability.Therefore,it puts a high demand on the models to capture the subtle visual differences between classes and common intra-class characteristics.Furthermore,owing to the difficulty in obtaining samples of different categories,fine-grained datasets suffer from long-tail distribution problem.In summary,fine-grained data distribution has the characteristics of small,non-uniform,and indistinguishable inter-class differences,which also poses a huge challenge to the powerful deep learning algorithms.In this paper,we first introduce the formulation and challenges of fine-grained visual categorization tasks,and then illustrate two mainstream methods about local features and global features,as well as their advantages and disadvantages.Finally,we compare the performance of related works on common used datasets,and we make the required summarization and forecast.
2019, 11(6):638-650. DOI: 10.13878/j.cnki.jnuist.2019.06.002
Abstract:Visual object tracking is always a fundamental block in the field of computer vision.The task scenarios of object tracking technology include single object tracking and multi-object tracking.In this work,we contribute the comprehensive and most recent review on the problem of single object tracking.First,a thorough review on these algorithms in recent decades is shown.Then,existing approaches,which have been proposed to tackle this problem of single object tracking,are divided into different categories,and each category is discussed in detail for the principles,representative models,advances and drawbacks.What's more,this work also provides a discussion about the difficulties and some interesting directions which could possibly become a potential research hotspot in the future.This work can be an effective reference for researchers in this field to quickly learn about the technology of single object tracking.
YANG Yijun , SHAO Wenze , WANG Liqian , GE Qi , BAO Bingkun , DENG Haisong , LI Haibo
2019, 11(6):651-659. DOI: 10.13878/j.cnki.jnuist.2019.06.003
Abstract:Nowadays,deep learning has become one of the hottest research directions in the field of machine learning.It has achieved great success in a wide range of fields such as image recognition,target detection,voice processing,and question answering system.However,the emergence of adversarial examples has triggered new thinking on deep learning.The performance of deep learning models can be destroyed by adversarial examples constructed by adding specially designed subtle disturbance.The existence of adversarial examples makes many technical fields with high requirements on safety performance face new threats and challenges,especially the automatic driving system which uses visual perception as the main technology priority.Therefore,the research on adversarial attack and active defense has become an extremely important cross-cutting research topic in the field of deep learning and computer vision.In this paper,relevant concepts on adversarial examples are summarized firstly,and then a series of typical adversarial attack methods and defense algorithms are introduced in detail.Subsequently,a number of physical world attacks against visual perception are introduced along with discussions on their potential impact on the field of automatic driving.Finally,we give a technical outlook on the future study of adversarial attacks and defenses.
HUANG Fei , GAO Fei , ZHU Jingjie , DAI Lingna , YU Jun
2019, 11(6):660-681. DOI: 10.13878/j.cnki.jnuist.2019.06.004
Abstract:Heterogeneous face synthesis aims at generating visually realistic and identity-preserving portraits of different modality,such as sketches,caricatures,etc.Heterogeneous face synthesis is of great significance for both public security and digital entertainment,and has attracted numerous attention.Recently,inspired by the dramatic progress in generative adversarial networks (GANs) and their great successes in image-to-image translation tasks,researchers have proposed a number of new heterogeneous face synthesis methods based on GANs.In this paper,we briefly introduce the development of heterogeneous face synthesis,and detailed recent progresses in terms of developments of applications,architectures of GANs,performance evaluation approaches,datasets,and qualitative analysis.Finally,we summarize the challenges and some prospects of heterogeneous face synthesis.
2019, 11(6):682-689. DOI: 10.13878/j.cnki.jnuist.2019.06.005
Abstract:With the fast growing number of images,especially the user-generated ones,the semantic content of images become richer,and labels become more complex.Therefore,the study on image multi-label learning is one of the hot research areas in both academia and industry,and a large number of efficient methods have emerged in recent years.This paper surveys the existing work on image multi-label learning in recent years.Firstly,we briefly describe the concept of multi-label learning and introduce two types of methods,that is,single-instance multi-label learning and multi-instance multi-label learning.Then,we summarize three challenges on multi-label learning caused by the big data characteristics,and provide related work which can handle these challenges.Finally,we elaborate two applications on image recognition and automatic drive to show that multi-label learning techniques can be effective for many application scenarios.
DING Zhengtong , XU Lei , ZHANG Yan , LI Piaoyang , LI Yangyang , LUO Bin , TU Zhengzheng
2019, 11(6):690-697. DOI: 10.13878/j.cnki.jnuist.2019.06.006
Abstract:RGB-Thermal object tracking has developed due to its strongly complementary benefits of thermal information to visible data.In this paper,we introduce the research background of RGB-T object tracking and the challenges in this task;then summarize and introduce the existing methods of RGB-T object tracking,including traditional methods and deep learning methods.Finally,we analyze and compare the existing RGB-T datasets and evaluation criteria,and point out the aspects worthy of study in RGB-T object tracking.
PAN Xingjia , ZHANG Xulong , DONG Weiming , YAO Hanxing , XU Changsheng
2019, 11(6):698-705. DOI: 10.13878/j.cnki.jnuist.2019.06.007
Abstract:Research on object detection has developed rapidly in recent years with the progress of deep learning.However,the deep learning based object detection systems rely heavily on large scale labeled training data,which are rarely available in our realistic scene,so few-shot object detection get researchers' great concern.In this paper,we present a survey of few-shot object detection,and introduce the mainstream approaches and their characteristics,merits as well as limits.Finally,we provide the possible direction for further few-shot object detection research.
ZENG Mingrui , YUAN Mengqi , SHAO Xi , BAO Bingkun , XU Changsheng
2019, 11(6):706-715. DOI: 10.13878/j.cnki.jnuist.2019.06.008
Abstract:Text understanding is an important research branch in artificial intelligence,which avails the effective interaction between human and computer with natural language.Text feature extraction is one of the basic and key steps for computers to understand and perceive the textual data.In this paper,we introduce the development history of text feature extraction and the mainstream feature extraction methods in recent years,and prospects the future research directions of text feature extraction.The three semantic hierarchies,namely word representation,sentence representation and discourse relationship mining are elaborated,then a case is given to show the typical application of text feature extraction on question answering system.
2019, 11(6):716-721. DOI: 10.13878/j.cnki.jnuist.2019.06.009
Abstract:Aiming at the problem of large information redundancy and low accuracy in human motion recognition in video sequences,a human motion recognition method is proposed based on key frame two-stream convolutional network. We construct a network framework consisting of three modules:feature extraction,key frame extraction,and spatial-temporal feature fusion.Firstly,the single-frame RGB image of the spatial domain video and the optical flow image superimposed in the time domain multi-frame are sent as input to the VGG16 network model to extract the depth feature of the video;secondly,the importance of each video frame is continuously predicted,then useful frames with sufficient information are pooled and trained by neural network to select key frames and discard redundant frames.Finally,the Softmax outputs of the two models are weighted and combined as the output result to obtain a multi-model fusion.The human body motion recognizer realizes the key frame processing of the video and the full utilization of the spatial-temporal information of the action.The experimental results on the UCF-101 public dataset show that,compared with the mainstream methods of human motion recognition,the proposed method has a higher recognition rate and relatively reduces the complexity of the network.
JIAN Muwei , WANG Ruihong , JU Yakun , ZHU Chengzhan , DONG Junyu
2019, 11(6):722-726. DOI: 10.13878/j.cnki.jnuist.2019.06.010
Abstract:In this paper,an algorithm for detecting brain tumors in MRI images based on directional features and saliency modeling is proposed.The designed model first preprocesses the MRI brain images to remove the interference from the skull region of the image,and then uses saliency detection based on directional features to increase the contrast of the lesion region,so as to more accurately extract the region of the tumors.Extensive experiments have been carried out on the brain image dataset and compared with several other saliency detection methods,which proves the effectiveness of the algorithm and provides reliable auxiliary diagnosis and clinical reference for doctors.
HE Bin , LI Xinyu , CHEN Beilei , XIA Meng , ZENG Zhizhong
2019, 11(6):727-734. DOI: 10.13878/j.cnki.jnuist.2019.06.011
Abstract:Online learning systems need to perform the fundamental task of annotating a large number of raw questions to be able to provide students with learning materials of high quality.The existing methods used for this task rely either on labeling by human experts or traditional ways of machine learning.In practical applications,the existing methods are limited by being either labor intensive or inaccurate.In this paper,we propose a method based on the mining of attribute relations to annotate the knowledge points of questions.We first define and extract the explicit attribute relations from the text and diagram of a given question.We then extract the implicit attribute relations of the question using Monte Carlo Tree Search (MCTS) algorithm.Next,we map the attribute relations to the knowledge graph space using a transform model,to generate the knowledge points of the question.The experimental results confirm the effectiveness of the proposed method,which demonstrates practicality for the cognitive diagnosis of students and personalized questions recommendation.
LIU Tianliang , LU Panyu , DAI Xiubin , LIU Feng , LUO Jiebo
2019, 11(6):735-742. DOI: 10.13878/j.cnki.jnuist.2019.06.012
Abstract:To perceive indoor spatial layout,we present a scene layout estimation method based on informative edges and multi-modality features.First,the VGG-16 full convolutional neural network is applied to predict informative edge map with the prior of spatial layout.Then,Canny edge detection and voting strategy are utilized to estimate the horizontal and vertical vanishing points,while the rays led at equal intervals from the given vanishing points finely resample the divided regions with high informative edge energies for the layout candidates.Next,the spatial multi-scaled VGG-16-based convolutional neural network is adopted to estimate the related geometric depth and normal vectors on the scene surfaces.And then,integral geometry is applied to accumulate the multi-model regional features as unary occurrence potential in the polygons of candidate layouts,and the pairwise label constrains are reflected by surface normal smooth and the location relationship of candidate layouts.Finally,the mode parameters can be learned by structural SVM learning,and the scene layout can be inferred by maximizing the related scores of the layout candidates.Experimental results show that,compared with traditional methods,this proposed estimation method can effectively improve the completeness of the resulting spatial layouts.
LI Hui , MIN Weiqing , WANG Zhiling , PENG Xin
2019, 11(6):743-750. DOI: 10.13878/j.cnki.jnuist.2019.06.013
Abstract:People's awareness about their nutrition habits is increasing.Keeping track of what we eat will be helpful for us to follow a healthier diet.Currently,nutrient recognition of food images is mainly focused on food categories recognition,or is tackled as multi-label task recognition.These two approaches,however,are not very discriminative owing to their neglect of potential relationship between ingredients.In this paper,we introduce the relationship between ingredients to identify food nutrients based on previous work.The recognition approach includes two modules,namely the image feature extraction module and the ingredients relationship learning module.The low-dimensional image feature vectors are extracted by convolutional neural network (CNN),and the relationship between ingredients is learned through a graph convolutional network (GCN).Specifically,GCN uses graph data where nodes represent food ingredients as word embedding and edges represent the correlation between nodes.Then the GCN directly map the graph data into a set of interdependent classifiers.Finally,the low-dimensional image feature vectors are fused to make detailed classification.We conducted experiments on food data sets of Food-101 and VireoFood-172.Compared with state of the art food recognition methods,our GCN-based multi-label food image classification method offers very promising results and can effectively improve the recognition performance.
2019, 11(6):751-756. DOI: 10.13878/j.cnki.jnuist.2019.06.014
Abstract:The recognition of the speaker's intention has greatly promoted the development of natural language understanding.In previous studies,the bidirectional long short-term memory (Bi-LSTM) model has been mostly employed in natural language processing to extract the features of words and the relationships between them.However,Bi-LSTM cannot establish a well-enough relation between the information contained in a sentence and its individual vocabulary.Another previously proposed model,i.e.,the S-LSTM (Sentence-state LSTM) model,can establish a relation between sentence information and its individual words.This,in turn,facilitates the establishment of the relationship between intention detection and slot filling,for the purpose of proposing a joint model to better understand the semantics contained in the question-answer system.Therefore,in this paper,slot-gate mechanism is introduced to solve the waste of the latest iteration sentence state when S-LSTM is applied to the joint task of intention detection and slot filling.The experimental results based on ATIS and Snips datasets confirm that the proposed mechanism is superior to other state-of-the-art approaches.
HE Chunming , XU Lei , LU Guosheng , DENG Lizhen
2019, 11(6):757-763. DOI: 10.13878/j.cnki.jnuist.2019.06.015
Abstract:Image segmentation is a basic and important issue in field of computer vision.Entropy threshold image segmentation,as an effective segmentation method,is widely used in pattern recognition and image processing.Traditional image segmentation methods cannot obtain enough effective image features.In order to solve this problem and further explore the application of entropy threshold in image segmentation,a GLLE (Gray Level and Local Entropy) two-dimensional histogram is introduced to improve the entropy threshold image segmentation model,and a method based on fuzzy entropy is proposed to calculate the established two-dimensional histogram model.The comparison experiments on standard experimental datasets show that the proposed GLLE entropy threshold segmentation method based on fuzzy entropy can get more accurate thresholds and improve the segmentation accuracy.Compared with traditional algorithms,our method performs better on different types of images,and has stronger robustness.
ZHONG Jinsong , NIE Qi , ZENG Feitong , LI Ning , HU Haoliang , ZHANG Jianwen
2019, 11(6):764-770. DOI: 10.13878/j.cnki.jnuist.2019.06.016
Abstract:In this article,we propose a wireless DC voltage transformer calibration method and related hardware design for Jinping-Suzhou±800 kV DC converter station to conduct error validation test on DC voltage transformer.The proposed method is based on wireless communication and Beidou synchronous technology to solve the issue of long distance between the DC transformer voltage divider and the secondary communication measuring system.The test results show that a minor deviation is observed in wireless calibration test on measurement of low voltage.While on measurement of high voltage,the GPS positioning and synchronization are affected owing to the high voltage and strong electromagnetic environment,which produce instability and minor deviation of the calibration.To resolve this problem,the characteristics of wireless calibration system of DC voltage transformers are analyzed under interference of complex electromagnetic environment,and a solution is proposed to reduce the error generated in DC voltage transformers using wireless transmission approach.This will avail the development of wireless calibration technology for DC voltage transformers.
Address:No. 219, Ningliu Road, Nanjing, Jiangsu Province
Postcode:210044
Phone:025-58731025