(1.School of Information Science and Engineering, Yunnan University, Kunming 650000, China; 2.Yunnan Communications Investment and Construction Group Co., Ltd., Kunming 650000, China) 在知网中查找 在百度中查找 在本站中查找
To solve the problems of the existing target tracking algorithms, such as inability to extract deep-level features, failure to fully exploit cross-modal information, and weak representation of target features, a feature fusion shift Siamese network for RGB-T target tracking is proposed. First, a target tracking framework based on the visible modal SiameseRPN++ is designed to extend the infrared modal branch, in order to obtain a multimodal target tracking framework. Moreover, the improved ResNet50 network with adjusted stride as a feature extraction network enables the acquisition of deep-level features of the target. Subsequently, a multimodal feature interactive learning module (FIM) is designed to leverage the discriminative information from one modality to guide the learning process of target appearance features in the other modality. By mining the cross-modal information within the feature space and channels, the module enhances the network’s attention towards foreground information. Thereafter, a multimode feature fusion module (FAM) is designed, which calculates the degree of feature fusion between the input visible light image and the infrared image, enabling spatial fusion of significant features from different modalities to effectively eliminate redundant information and reconstructing multimodal images by employing a cascade fusion strategy. Finally, a feature space shift module (FSM) is designed, which divides the feature maps of the infrared modal branches and shifts them in four different directions to enhance the edge representation of the heat source target. Extensive experiments on two RGB-T datasets thoroughly validate the effectiveness of the proposed algorithm, while ablation experiments demonstrate the superiority of each designed module.