Abstract:An improved method called CS-Voxel-RCNN is proposed to address the issue of insufficient detection accuracy of Voxel-RCNN algorithm in detecting small distant targets and occluded targets. Firstly, by introducing three data augmentation methods: random order, random dropout, and random noise, the diversity of training samples is enriched, thereby enhancing the robustness of the model. Secondly, by integrating CBAM in the 2D backbone network and utilizing channel attention mechanism and spatial attention mechanism, multi-scale features are processed in more detail, optimizing the feature fusion effect. Finally, by adding a DIoU loss branch, the original loss function is improved, emphasizing the distance information between the target bounding boxes, thereby improving the accuracy of the target bounding box regression task. Comparative experiments with some classic 3D object detection algorithms on the KITTI dataset are conducted. The results show that the newly proposed algorithm has significantly improved performance, compared with the original Voxel RCNN algorithm, with improvements of 2.91 percentage and 0.87 percentage for pedestrians and cyclists, respectively. The effectiveness of each improvement module is verified through ablation experiments. This series of improvement methods achieve positive results in improving the practicality and accuracy of 3D object detection in real scenes.