To address the problem of accurately inferring the content of missing regions in an image when they are closely related to the surrounding textures and structures, we propose a single-stage image inpainting model. The model first compresses, reconstructs, and enhances features through convolutional layers and the FastStage module, while self-attention and multi-layer perceptron are incorporated to capture contextual relationships among features. Furthermore, in order to enhance the attention and importance perception on features, we propose EMMA in the models, which avoids the shaking and oscillation during updating the model parameters, thereby improving the performance of the generator and the quality of the generated results. Lastly, we introduce a discriminator to evaluate the consistency between the inpainted image and the original image. The end-to-end experimental results conducted on CelebA, Places2, and Paris StreetView datasets demonstrate that, compared with classical methods, the inpainting results of this model exhibit better visual semantics, and it is capable of finely inpainting details, textures, and local features of images.