Abstract:Micro-expression is the facial expression that people reveal involuntarily when they try to hide their true emotions. It is a hot research field in the field of affective computing in recent years. However, micro-expression recognition is a challenging task due to its short duration and low intensity. Based on the excellent performance of CrossViT in the field of image classification, this paper uses CrossViT as the backbone network to improve the cross-attention mechanism in the network. DA module (Dual Attention) is proposed to extend the traditional cross-attention mechanism and determine the correlation between attention results. Thus, the precision of micro-expression recognition is improved. The network learns from three optical flow features (optical strain, horizontal and vertical optical flow fields), which are calculated from the starting frame and peak frame of each micro-expression sequence. Finally, the micro-expression is classified by Softmax. In the microexpression fusion data set, UF1 and UAR reach 0.7275 and 0.7272, respectively, and the recognition accuracy is better than the mainstream algorithms in the field of microexpression, indicating the effectiveness of the network proposed in this paper.