多模态融合的家庭音乐相册自动生成

doi:10.13878/j.cnki.jnuist.2017.06.011

2025年4月5日 0:16 星期六

首页 > 过刊浏览>2017年第9卷第6期 >661-668. DOI:10.13878/j.cnki.jnuist.2017.06.011

多模态融合的家庭音乐相册自动生成
DOI:
                        10.13878/j.cnki.jnuist.2017.06.011
                    
作者:
                        刘君芳刘君芳
南京邮电大学 通信与信息工程学院, 南京, 210003
在期刊界中查找
在百度中查找
在本站中查找
邵曦邵曦
南京邮电大学 通信与信息工程学院, 南京, 210003
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:国家自然科学基金（61401227）；北京市自然科学基金（4152053）

Automatic generation of family music album based on multi-modal fusion

Author:

LIU Junfang
LIU Junfang
College of Telecommunications and Information Engineering, Nanjing University of Posts and Telecommunications, Nanjing 210003
在期刊界中查找
在百度中查找
在本站中查找
SHAO Xi
SHAO Xi
College of Telecommunications and Information Engineering, Nanjing University of Posts and Telecommunications, Nanjing 210003
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

随着大数据以及社交网络的发展，电子相册与在线服务成为如今人们使用计算机与互联网的基础应用.尤其是近年社交网络的流行，电子相册的数量得到了爆炸增长，而如何增强相册的用户体验变得尤为重要.具有某种主题的相册一般都带有一定的情感信息，因此，本文研究了基于多模态融合的家庭音乐相册自动生成问题，旨在使用户能够在享受音乐的同时配以与音乐情感相同的相册图片.针对音乐与图片中所蕴含的情感，本文在音乐和图像中分别选取能够表达其情感的句子级别的音频特征和图像特征，然后在图像与音乐之间异构和跨模态的特征融合问题上，采用局部保持投影（LPP）方法，将图像特征与音乐特征映射到更具情感分类能力的隐式特征空间中，实现了音乐相册的自动生成.在实验中，客观评测结果表明，采用LPP方法在查准率方面高于纯CCA方法；在主观评测中LPP获得72.06%的满意度，与人工推荐的评价结果（78.09%）比较接近，明显高于随机推荐和CCA方法的满意度.

关键词:音乐相册;情感模型;句子级别;多模态融合;隐式空间

Abstract:

With the development of the big data and social network,electronic albums and online services have become basic uses of computers and the Internet.Especially in recent years,the number of electronic albums has exploded with the popularity of social network.So how to improve the user experience of music album becomes particularly important.A photo album with certain topic usually has some emotion information.This paper studies the problem of automatic generation of family music album based on multi-modal fusion,so that users can enjoy music when browsing album photos with matched emotion.According to the emotions in music and images,the representative sentence-level features both for music and images are selected,and the LPP (Locality Preserving Projection) is employed to study the relevance between the music and the images in the same emotion.The image feature and the music feature are mapped into the latent space with more emotional classification ability to realize the automatic generation of music album.In the experiments,the objective evaluation result shows that the LPP method is higher than pure CCA (Canonical Correlation Analysis) method in precision;and in the subjective evaluation,the proposed LPP method achieves 72.06% at satisfaction level,which is close to the results of manually recommended approach (78.09%) and is higher than the results of randomly recommended approach and pure CCA approach.

Key words:music album;emotion model;sentence-level;multi-modal fusion;latent space

引用本文

刘君芳,邵曦.多模态融合的家庭音乐相册自动生成[J].南京信息工程大学学报(自然科学版),2017,9(6):661-668
LIU Junfang, SHAO Xi. Automatic generation of family music album based on multi-modal fusion[J]. Journal of Nanjing University of Information Science & Technology, 2017,9(6):661-668

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2017-08-28
最后修改日期:
录用日期:
在线发布日期: 2017-11-25
出版日期:

地址：江苏省南京市宁六路219号邮编：210044

联系电话：025-58731025 E-mail：nxdxb@nuist.edu.cn

引用本文

分享

文章指标

历史