Abstract:The existing deep learning-based source code vulnerability detection methods mainly focus on the feature learning of a single programming language, and it is difficult to effectively detect the vulnerabilities caused by the association and invocation of code units in software projects of hybrid programming languages. To address this issue, a deep learning-based hybrid language vulnerability detection method DL-HLVD is proposed. Firstly, the BERT layer is used to convert the code text into low-dimensional vectors, which are then used as inputs to the bidirectional gated loop unit to capture the contextual features, and the conditional random field is used to capture the dependency between adjacent labels. Secondly, functions from different types of programming languages are identified as named entity recognition in the hybrid software and reconstructed with the program slicing results to reduce the loss of syntactic and semantic information in the code characterization process. Finally, the bidirectional long short-term memory network model is designed to extract the vulnerability code features and realize the vulnerability detection of hybrid language software. The comprehensive experimental results on the SARD and CrossVul datasets show that the comprehensive recall rate of DL-HLVD on the two types of vulnerability datasets is 95.0%, and the F1 value reaches 93.6%, which is improved in all indicators compared with the VulDeePecker, SySeVR, and Project Achilles. It demonstrates that the DL-HLVD method can improve the comprehensive performance of source code vulnerability detection in hybrid language scenarios.