Abstract:Speaker recognition has become a second ID card for people. In order to meet the application needs of speaker recognition for short-time speech in open scenarios, the paper will optimize the model of speaker recognition to improve the accuracy and robustness of the model. First, in order to realize the selection of important frequency features in the input acoustic features, Reweighted-based Feature Enhancement Layer (RFEL) and Reweighted-based Feature Enhancement Network (RFEN) are proposed to enhance the feature expression, which serves to enhance feature representation. Second, the loss function of Mis-Classified Vector Guided Softmax Loss(MV) in face recognition is introduced to the speaker recognition field for the first time to improve the mining ability of hard samples. Third, we propose a combined loss function based on MV and Angular Prototypical Loss (AP) based on a few-shot framework, which solves the mismatch between the classification loss function and the actual evaluation requirements of speaker recognition. Besises, the strong dependence of the metric function on the sampling strategy is also solved. Finally, The experimental results show that the performance metric EER of the model proposed in this paper is reduced by 12.45% compared to the baseline, and the minDCF is relatively reduced by 14.09%, achieving excellent results in the existing speaker recognition field.