Abstract:This paper proposes a fusion algorithm utilizing TransNeXt to address detail loss and artifact generation issues in the fusion of infrared and visible images. Firstly, shallow and deep features are extracted from the source images using convolutional neural networks and TransNeXt. An information compensation module is employed to enhance the semantic information of the infrared shallow features. Secondly, a cross-attention-based fusion module integrates these features, and dynamically adjusts weights based on the importance of different regions in the source images to adapt to scene variations, thereby improving fusion robustness and accuracy. The final fused image is obtained through Transformer-based image reconstruction. In addition,the proposed method constrains the fusion process through a VGG19-based saliency mask loss function, preserving richer information in key regions of the fused results. The experimental results indicate that, compared with the other seven methods, this approach has improved the objective evaluation metrics: namely information entropy, standard deviation, sum of correlation differ-ences, peak signal-to-noise ratio, and pixel feature mutual information, by an average of 10.92%, 14.85%, 24.80%, 2.26%, and 1.30%, respectively. Furthermore, it effectively preserves rich texture information while minimizing artifacts, demonstrating outstanding performance in night light fusion. Additionally, it has achieved superior results in object detection relative to the comparison methods.