Optimal design of short-time speech speaker recognition system in open scenarios

doi:10.13878/j.cnki.jnuist.20221108003

2025-5-28- 13

Home > Archive>Volume 15, Issue 5, 2023 >585-591. DOI:10.13878/j.cnki.jnuist.20221108003

Optimal design of short-time speech speaker recognition system in open scenarios
DOI:
                        10.13878/j.cnki.jnuist.20221108003
                    
CSTR:
                        
                    
Author:
                        GUO XinGUO Xin
School of Electrical and Mechanical Engineering, Guangdong Communication Polytechnic, Guangzhou 510650, China
Find this author on All Journals
Find this author on BaiDu
Search for this author on this site
DENG AiwenDENG Aiwen
School of Automation Science and Engineering, South China University of Technology, Guangzhou 510641, China
Find this author on All Journals
Find this author on BaiDu
Search for this author on this site
LUO ChengfangLUO Chengfang
School of Automation Science and Engineering, South China University of Technology, Guangzhou 510641, China
Find this author on All Journals
Find this author on BaiDu
Search for this author on this site
DENG FeiqiDENG Feiqi
School of Automation Science and Engineering, South China University of Technology, Guangzhou 510641, China
Find this author on All Journals
Find this author on BaiDu
Search for this author on this site

                    
Affiliation:
Clc Number:TN912.3;TP18
Fund Project:

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

To meet the application needs of speaker recognition for short-duration speech in open scenarios, we herein optimize the speaker recognition model in aspects of accuracy and robustness.First, to realize the selection of important frequency features from the input acoustic data, a Reweighted-based Feature Enhancement Layer (RFEL) and a Reweighted-based Feature Enhancement Network (RFEN) are proposed to enhance the feature representation.Second, the loss function of Misclassified Vector guided Softmax loss (MVSoftmax) in face recognition is introduced into the speaker recognition to improve the mining ability towards hard samples.Third, a combined loss function of MVSoftmax and few-shot learning based Angular Prototypical loss (AP) is proposed, which solves the mismatch between the classification loss function and the actual evaluation requirements of speaker recognition, and relieve the strong dependence of the metric function on the sampling strategy.Finally, the experimental results show that the performance metric EER of the proposed model is reduced by 12.45% and the minDCF is decreased by 14.09% compared to the baseline model, achieving excellent performance in speaker recognition.

Key words:speaker recognition;reweighted;feature enhancement layer;classification loss function;metric loss function

Get Citation

GUO Xin, DENG Aiwen, LUO Chengfang, DENG Feiqi. Optimal design of short-time speech speaker recognition system in open scenarios[J]. Journal of Nanjing University of Information Science & Technology,2023,15(5):585-591

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:November 08,2022
Revised:
Adopted:
Online: October 24,2023
Published:

Article QR Code

Address：No. 219, Ningliu Road, Nanjing, Jiangsu Province

Postcode：210044

Phone：025-58731025

Get Citation

Share

Article Metrics

History

Article QR Code