Abstract:Despite the continuous enrichment of music, the underlying music features are often overlooked when using traditional collaborative filtering. By multi-modal fusion of audio features and lyric information and supplementing the fusion information feature as a collaborative filtering recommendation, a multi-modal music recommendation system is proposed. This studyprimarily discusses the extraction of audio features and lyrics information and uses the LDA topic model to reduce the character dimension of the lyrics information. For the multi-model fusion problem, this study proposes an EFFC fusion method, and compares the results of multi-modal fusion with the results using single-mode. For result recommendations, the user interest model is established based on the multi-modal information feature with the input of LSTM networks to filter and optimize the user group. The results show that the multi-modal music recommendation system reduces the SSE of the result from 2.009 to 0.388 6, verifying the effectiveness of the method.