基于深度学习的开放场景下声纹识别系统的设计与实现
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

TN912.3;TP18

基金项目:

广东省青年创新人才项目(2018GkQNCX005)


A deep learning-based speaker recognition system for open set scenarios
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    针对现实应用场景中短时语音和混叠有噪声情况下声纹识别准确性低的问题,本文设计了一种改进的基于深度学习的声纹识别算法,提高了声纹识别模型在短时语音和带噪环境下的鲁棒性,并将该模型部署到了嵌入式设备中.本文主要对声纹识别算法的编码层和损失函数进行改进.对于编码层,本文使用了基于差分编码的NeXtVLAD技术,同时对帧级特征中的静态声纹特征和动态声纹特征进行建模.对于损失函数,本文将基于小样本学习框架的余弦-原型损失函数cosine-Prototypical 与附加间隔分类损失函数AM-Softmax 进行融合来训练声纹识别模型,使得模型在特征空间中的同类特征尽可能集聚,异类特征尽可能分离.此外,本文还将声纹识别算法部署在Raspberry Pi平台上,实现了能快速推理的声纹识别系统.实验结果表明:这种改进的声纹识别系统在多种开放场景下,能够实时、准确地完成声纹识别任务,可以达到实际应用的要求.

    Abstract:

    Due to the low accuracy of speaker recognition for short-term speech or under overlapping noisy situations, a new speaker recognition algorithm based on deep learning is proposed and then deployed on an embedded device.The encoding layer and loss function are the two aspects to improve the speaker recognition system in robustness.For the encoding layer, the NeXtVLAD technique based on differential encoding is used to model both static and dynamic speaker features at frame level.For the loss function, the cosine-prototypical loss function based on small-sample learning framework is fused with the additional margin classification loss function AM-Softmax to train the speaker recognition model, which enables the model to collect similar features and separate dissimilar features as much as possible in the feature space.Then the improved speaker recognition algorithm is deployed on the Raspberry Pi platform to realize speaker recognition with fast inference.The experimental results illustrate that the system can accomplish speaker recognition in real time and accurately under various open set scenarios, and meet the requirements of practical applications.

    参考文献
    相似文献
    引证文献
引用本文

郭新,罗程方,邓爱文.基于深度学习的开放场景下声纹识别系统的设计与实现[J].南京信息工程大学学报(自然科学版),2021,13(5):526-532
GUO Xin, LUO Chengfang, DENG Aiwen. A deep learning-based speaker recognition system for open set scenarios[J]. Journal of Nanjing University of Information Science & Technology, 2021,13(5):526-532

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2021-08-08
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2021-12-02
  • 出版日期:

地址:江苏省南京市宁六路219号    邮编:210044

联系电话:025-58731025    E-mail:nxdxb@nuist.edu.cn

南京信息工程大学学报 ® 2024 版权所有  技术支持:北京勤云科技发展有限公司