开放场景下短时语音说话人识别系统的优化设计
DOI:
作者:
作者单位:

1.广东交通职业技术学院 机电工程学院;2.华南理工大学 自动化科学与工程学院

作者简介:

通讯作者:

中图分类号:

基金项目:

广东省普通高校特色创新类项目(2022KTSCX258,2021KTSCX224);广州市基础研究计划项目(202002030476);广东交通职业技术学院项目(GDCP-ZX-2021-004-N1)


Optimal design of short-time speech speaker recognition system in open scenarios
Author:
Affiliation:

1.Guangdong Communication Polytechnic;2.South China University of Technology

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    为适应开放场景下说话人识别短时语音的应用需要,本文将优化说话人识别模型,提升模型的准确率和鲁棒性。为了实现对重要频率特征的筛选,提出基于重加权的特征增强层及网络,起到增强特征表达的作用。将人脸识别领域的误分类样本损失函数首次引入到说话人识别领域,提高对困难样本的挖掘能力。提出基于误分类样本挖掘的分类损失与基于小样本学习框架的余弦角度原型损失的组合损失函数,解决了分类损失函数与说话人识别实际评测需求不匹配和度量函数对采样策略依赖性强的问题。实验结果显示,与基准模型相比,性能指标等误率(EER)降低12.45%,最小检测代价函数(minDCF)降低14.09%,取得现有说话人识别领域的优异效果。

    Abstract:

    Speaker recognition has become a second ID card for people. In order to meet the application needs of speaker recognition for short-time speech in open scenarios, the paper will optimize the model of speaker recognition to improve the accuracy and robustness of the model. First, in order to realize the selection of important frequency features in the input acoustic features, Reweighted-based Feature Enhancement Layer (RFEL) and Reweighted-based Feature Enhancement Network (RFEN) are proposed to enhance the feature expression, which serves to enhance feature representation. Second, the loss function of Mis-Classified Vector Guided Softmax Loss(MV) in face recognition is introduced to the speaker recognition field for the first time to improve the mining ability of hard samples. Third, we propose a combined loss function based on MV and Angular Prototypical Loss (AP) based on a few-shot framework, which solves the mismatch between the classification loss function and the actual evaluation requirements of speaker recognition. Besises, the strong dependence of the metric function on the sampling strategy is also solved. Finally, The experimental results show that the performance metric EER of the model proposed in this paper is reduced by 12.45% compared to the baseline, and the minDCF is relatively reduced by 14.09%, achieving excellent results in the existing speaker recognition field.

    参考文献
    相似文献
    引证文献
引用本文

郭新,邓爱文,罗程方,邓飞其.开放场景下短时语音说话人识别系统的优化设计[J].南京信息工程大学学报,,():

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2022-11-08
  • 最后修改日期:2022-11-21
  • 录用日期:2022-11-28
  • 在线发布日期:
  • 出版日期:

地址:江苏省南京市宁六路219号    邮编:210044

联系电话:025-58731025    E-mail:nxdxb@nuist.edu.cn

南京信息工程大学学报 ® 2024 版权所有  技术支持:北京勤云科技发展有限公司