基于深度学习的混合语言源代码漏洞检测方法

首页 > 过刊浏览>2025年第52卷第4期 >103-113

基于深度学习的混合语言源代码漏洞检测方法
DOI:
                        
                    
作者:
                        张学军 †，郭梅凤 ，张潇 ，张斌 ，黄海燕 ，蔡特立张学军 †，郭梅凤 ，张潇 ，张斌 ，黄海燕 ，蔡特立
（兰州交通大学 电子与信息工程学院， 兰州 730070）
在知网中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:
通讯作者:
基金项目:

DL-HLVD：Deep Learning-based Hybrid Language Source Code Vulnerability Detection Method

Author:

ZHANG Xuejun†，GUO Meifeng，ZHANG Xiao，ZHANG Bin，HUANG Haiyan，CAI Teli
ZHANG Xuejun†，GUO Meifeng，ZHANG Xiao，ZHANG Bin，HUANG Haiyan，CAI Teli
（School of Electronic and Information Engineering， Lanzhou Jiaotong University， Lanzhou 730070， China）
在知网中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

摘要:

现有基于深度学习的源代码漏洞检测方法主要针对单一编程语言进行特征学习，难以对混合编程语言软件项目因代码单元间的关联和调用产生漏洞进行有效检测.因此，本文提出了一种基于深度学习的混合语言源代码漏洞检测方法DL-HLVD.首先利用BERT层将代码文本转换为低维向量，并将其作为双向门控循环单元的输入来捕获上下文特征，同时使用条件随机场来捕获相邻标签间的依赖关系；然后对混合语言软件中不同类型编程语言的函数进行命名实体识别，并将其和程序切片结果进行重构来减少代码表征过程中的语法和语义信息的损失；最后设计双向长短期记忆网络模型提取漏洞代码特征，实现对混合语言软件漏洞检测.在SARD和CrossVul数据集上的实验结果表明，DL-HLVD在两类漏洞数据集上识别软件漏洞的综合召回率达到了95.0%，F1值达到了93.6%，比最新的深度学习方法VulDee- Pecker、SySeVR、Project Achilles在各个指标上均有提升，说明DL-HLVD能够提高混合语言场景下源代码漏洞检测的综合性能.

关键词:漏洞检测;命名实体识别;程序切片;混合语言

Abstract:

The existing deep learning-based source code vulnerability detection methods mainly focus on the feature learning of a single programming language， and it is difficult to effectively detect the vulnerabilities caused by the association and invocation of code units in software projects of hybrid programming languages. To address this issue， a deep learning-based hybrid language vulnerability detection method DL-HLVD is proposed. Firstly， the BERT layer is used to convert the code text into low-dimensional vectors， which are then used as inputs to the bidirectional gated loop unit to capture the contextual features， and the conditional random field is used to capture the dependency between adjacent labels. Secondly， functions from different types of programming languages are identified as named entity recognition in the hybrid software and reconstructed with the program slicing results to reduce the loss of syntactic and semantic information in the code characterization process. Finally， the bidirectional long short-term memory network model is designed to extract the vulnerability code features and realize the vulnerability detection of hybrid language software. The comprehensive experimental results on the SARD and CrossVul datasets show that the comprehensive recall rate of DL-HLVD on the two types of vulnerability datasets is 95.0%， and the F1 value reaches 93.6%， which is improved in all indicators compared with the VulDeePecker， SySeVR， and Project Achilles. It demonstrates that the DL-HLVD method can improve the comprehensive performance of source code vulnerability detection in hybrid language scenarios.

Key words:vulnerability detection;named entity recognition;program slicing;hybrid language

文章指标

PDF下载次数:
HTML阅读次数:
摘要点击次数:
引用次数:

引用本文

张学军 ?,郭梅凤 ,张潇 ,张斌 ,黄海燕 ,蔡特立.基于深度学习的混合语言源代码漏洞检测方法[J].湖南大学学报：自然科学版,2025,52(4):103~113

复制

历史

收稿日期:
最后修改日期:
录用日期:
在线发布日期: 2025-04-28
出版日期:

首页

期刊简介

编委会

作者中心

下载中心

学术道德

常见问题

版权声明

联系我们

English

文章指标

引用本文

历史