Abstract:To address the issue that convolutional neural networks can hardly balance the local details and global semantic consistency of results in the process of image inpainting, a multi-scale semantic learning model for face image inpainting based on generative adversarial networks is proposed. Firstly, the face image is decomposed into components with different perceptual fields and feature resolutions using gated convolution, and multi-scale features are extracted using convolution kernels of different sizes to enhance the detail of the restoration results by extracting appropriate local features; Secondly, the extracted multi-scale features are fed into the semantic learning module to learn the semantic relationships between features from both the channel and spatial perspectives, thus enhancing the global consistency of the restoration results; finally, skip connections are introduced to complement the features on the encoding side to the decoding side to reduce the loss of detail information caused by sampling and improve the texture details of the restoration results. Experiments on the CelebA-HQ face dataset show that the proposed model has significant improvements in three performance metrics: peak signal to noise ratio, structure similarity, \ell_1, and the inpainting results are visually more reasonable in terms of local details and global semantics.