开放场景下短时语音说话人识别系统的优化设计

doi:10.13878/j.cnki.jnuist.20221108003

2025年4月14日 8:10 星期一

首页 > 过刊浏览>2023年第15卷第5期 >585-591. DOI:10.13878/j.cnki.jnuist.20221108003

开放场景下短时语音说话人识别系统的优化设计
DOI:
                        10.13878/j.cnki.jnuist.20221108003
                    
作者:
                        郭新郭新
广东交通职业技术学院 机电工程学院, 广州, 510650
在期刊界中查找
在百度中查找
在本站中查找
邓爱文邓爱文
华南理工大学 自动化科学与工程学院, 广州, 510641
在期刊界中查找
在百度中查找
在本站中查找
罗程方罗程方
华南理工大学 自动化科学与工程学院, 广州, 510641
在期刊界中查找
在百度中查找
在本站中查找
邓飞其邓飞其
华南理工大学 自动化科学与工程学院, 广州, 510641
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:
通讯作者:
中图分类号:TN912.3;TP18
基金项目:广东省普通高校特色创新类项目（2022KTSCX258，2021KTSCX224）；广州市基础研究计划（202002030476）；广东交通职业技术学院项目（GDCP-ZX-2021-004-N1）

Optimal design of short-time speech speaker recognition system in open scenarios

Author:

GUO Xin
GUO Xin
School of Electrical and Mechanical Engineering, Guangdong Communication Polytechnic, Guangzhou 510650, China
在期刊界中查找
在百度中查找
在本站中查找
DENG Aiwen
DENG Aiwen
School of Automation Science and Engineering, South China University of Technology, Guangzhou 510641, China
在期刊界中查找
在百度中查找
在本站中查找
LUO Chengfang
LUO Chengfang
School of Automation Science and Engineering, South China University of Technology, Guangzhou 510641, China
在期刊界中查找
在百度中查找
在本站中查找
DENG Feiqi
DENG Feiqi
School of Automation Science and Engineering, South China University of Technology, Guangzhou 510641, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

为适应开放场景下说话人识别短时语音的应用需要，本文对说话人识别模型进行优化，提升了模型的准确率和鲁棒性.为了实现对重要频率特征的筛选，提出基于重加权的特征增强层及网络，起到增强特征表达的作用.将人脸识别领域的误分类样本损失函数首次引入到说话人识别领域，提高对困难样本的挖掘能力.提出基于误分类样本挖掘的分类损失与基于小样本学习框架的余弦角度原型损失的组合损失函数，解决了分类损失函数与说话人识别实际评测需求不匹配和度量函数对采样策略依赖性强的问题.实验结果显示，与基准模型相比，性能指标等误率（EER）降低12.45％，最小检测代价函数（minDCF）降低14.09％，取得现有说话人识别领域的优异效果.

关键词:说话人识别;重加权;特征增强层;分类损失函数;度量损失函数

Abstract:

To meet the application needs of speaker recognition for short-duration speech in open scenarios, we herein optimize the speaker recognition model in aspects of accuracy and robustness.First, to realize the selection of important frequency features from the input acoustic data, a Reweighted-based Feature Enhancement Layer (RFEL) and a Reweighted-based Feature Enhancement Network (RFEN) are proposed to enhance the feature representation.Second, the loss function of Misclassified Vector guided Softmax loss (MVSoftmax) in face recognition is introduced into the speaker recognition to improve the mining ability towards hard samples.Third, a combined loss function of MVSoftmax and few-shot learning based Angular Prototypical loss (AP) is proposed, which solves the mismatch between the classification loss function and the actual evaluation requirements of speaker recognition, and relieve the strong dependence of the metric function on the sampling strategy.Finally, the experimental results show that the performance metric EER of the proposed model is reduced by 12.45% and the minDCF is decreased by 14.09% compared to the baseline model, achieving excellent performance in speaker recognition.

Key words:speaker recognition;reweighted;feature enhancement layer;classification loss function;metric loss function

引用本文

郭新,邓爱文,罗程方,邓飞其.开放场景下短时语音说话人识别系统的优化设计[J].南京信息工程大学学报(自然科学版),2023,15(5):585-591
GUO Xin, DENG Aiwen, LUO Chengfang, DENG Feiqi. Optimal design of short-time speech speaker recognition system in open scenarios[J]. Journal of Nanjing University of Information Science & Technology, 2023,15(5):585-591

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2022-11-08
最后修改日期:
录用日期:
在线发布日期: 2023-10-24
出版日期:

地址：江苏省南京市宁六路219号邮编：210044

联系电话：025-58731025 E-mail：nxdxb@nuist.edu.cn

引用本文

分享

文章指标

历史