As the world’s largest strawberry producer country, accurate detection of strawberry diseases in China is an effective measure to ensure the quality and yield of strawberries. To address the issues of low detection accuracy under complex backgrounds and difficulty in detecting subtle diseases, an improved real-time detection transformer (RT-DETR) network-based strawberry disease detection method is proposed. First, the backbone feature extraction network is reconstructed using the AdditiveBlock-CGLU module to enhance the model’s ability to represent deep critical features under complex background interference. Second, a multi-scale cross-layer block feature fusion pyramid network (MS-CBFPN) is proposed to optimize the feature fusion part of the model, enabling more effective integration of information across different layers and fully capturing the contextual information of images, thus improving the model’s ability to detect subtle disease features. Finally, a progressive re-parameterized batch normalization (PRepBN) structure is introduced into the attention-based intra-scale feature interaction (AIFI), enabling dynamic adjustment of the learning rate and re-parameterization methods so that the model can better adapt to changes at different training stages, further enhancing the model’s disease detection capability. Experimental results show that the improved model improves accuracy, recall, mAP@0.5, mAP@0.5:0.95, and F1 score by 3.4, 7.6, 3.3, 8.0, and 5.6 percentage points, respectively, and also outperforms other models, indicating that the improved RT-DETR model is an effective strawberry disease detection model in complex scenarios.