With the rapid development of the wind power industry, the proportion of wind turbine failures resulting in downtime is also increasing, particularly yaw system failures, which account for nearly one-third (28.7%) of total downtime. To reduce downtime and operational costs, this paper proposes a deep learning model based on SCADA data, named CNN-smart_Linformer (CNN-SLinformer), for predicting the occurrence time of yaw system failures in wind turbines. This model introduces dynamic self-attention weight calculations for the linear projection matrix, allowing it to adaptively capture changes in the input sequence and significantly enhancing the model’s generalization ability in different operating environments. It combines the advantages of convolutional neural networks (CNN) in local feature extraction with the capability of SLinformer to capture long-term dependencies. Experimental results using actual SCADA data from wind farms show that the CNN-SLinformer model significantly improves prediction accuracy for yaw failure tasks, reducing the score to 144.50, while it also has a shorter runtime, providing an effective predictive tool for wind farms.