Abstract:Micro-expression is the facial expression that people reveal involuntarily when they try to hide their true emotions, which is a hot spot in research of affective computing in recent years.Micro-expression is a subtle facial movement thus is difficult to recognize.Considering its excellent performance in image classification and ability to capture subtle feature information, the cross-attention multiscale ViT (CrossViT) is used as the backbone network to improve the cross-attention mechanism in the network, and the Dual Attention (DA) module is proposed to extend traditional cross-attention mechanism to determine the correlation between attention results, thus improve the micro-expression recognition accuracy.The proposed network learns from three optical flow features (optical strain, horizontal and vertical optical flow fields), which are calculated from the starting frame and peak frame of each micro-expression sequence, and classifies the micro-expression by Softmax.Experiments on the micro-expression fusion dataset show that the proposed network reaches 0.727 5 and 0.727 2 in UF1 and UAR, respectively, which is more accurate than the mainstream micro-expression recognition algorithms, verifying the effectiveness of the dual attention CrossViT based network.