开放场景下短时语音说话人识别系统的优化设计
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

TN912.3;TP18

基金项目:

广东省普通高校特色创新类项目(2022KTSCX258,2021KTSCX224);广州市基础研究计划(202002030476);广东交通职业技术学院项目(GDCP-ZX-2021-004-N1)


Optimal design of short-time speech speaker recognition system in open scenarios
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    为适应开放场景下说话人识别短时语音的应用需要,本文对说话人识别模型进行优化,提升了模型的准确率和鲁棒性.为了实现对重要频率特征的筛选,提出基于重加权的特征增强层及网络,起到增强特征表达的作用.将人脸识别领域的误分类样本损失函数首次引入到说话人识别领域,提高对困难样本的挖掘能力.提出基于误分类样本挖掘的分类损失与基于小样本学习框架的余弦角度原型损失的组合损失函数,解决了分类损失函数与说话人识别实际评测需求不匹配和度量函数对采样策略依赖性强的问题.实验结果显示,与基准模型相比,性能指标等误率(EER)降低12.45%,最小检测代价函数(minDCF)降低14.09%,取得现有说话人识别领域的优异效果.

    Abstract:

    To meet the application needs of speaker recognition for short-duration speech in open scenarios, we herein optimize the speaker recognition model in aspects of accuracy and robustness.First, to realize the selection of important frequency features from the input acoustic data, a Reweighted-based Feature Enhancement Layer (RFEL) and a Reweighted-based Feature Enhancement Network (RFEN) are proposed to enhance the feature representation.Second, the loss function of Misclassified Vector guided Softmax loss (MVSoftmax) in face recognition is introduced into the speaker recognition to improve the mining ability towards hard samples.Third, a combined loss function of MVSoftmax and few-shot learning based Angular Prototypical loss (AP) is proposed, which solves the mismatch between the classification loss function and the actual evaluation requirements of speaker recognition, and relieve the strong dependence of the metric function on the sampling strategy.Finally, the experimental results show that the performance metric EER of the proposed model is reduced by 12.45% and the minDCF is decreased by 14.09% compared to the baseline model, achieving excellent performance in speaker recognition.

    参考文献
    相似文献
    引证文献
引用本文

郭新,邓爱文,罗程方,邓飞其.开放场景下短时语音说话人识别系统的优化设计[J].南京信息工程大学学报(自然科学版),2023,15(5):585-591
GUO Xin, DENG Aiwen, LUO Chengfang, DENG Feiqi. Optimal design of short-time speech speaker recognition system in open scenarios[J]. Journal of Nanjing University of Information Science & Technology, 2023,15(5):585-591

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2022-11-08
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2023-10-24
  • 出版日期:

地址:江苏省南京市宁六路219号    邮编:210044

联系电话:025-58731025    E-mail:nxdxb@nuist.edu.cn

南京信息工程大学学报 ® 2024 版权所有  技术支持:北京勤云科技发展有限公司