基于深度学习的混合语言源代码漏洞检测方法

基于深度学习的混合语言源代码漏洞检测方法
DOI:
                        
                    
作者:
                        张学军张学军
兰州交通大学
在知网中查找
在百度中查找
在本站中查找
郭梅凤郭梅凤
兰州交通大学
在知网中查找
在百度中查找
在本站中查找
张潇张潇
兰州交通大学
在知网中查找
在百度中查找
在本站中查找
张斌张斌
兰州交通大学
在知网中查找
在百度中查找
在本站中查找
黄海燕黄海燕
兰州交通大学
在知网中查找
在百度中查找
在本站中查找
蔡特立蔡特立
兰州交通大学
在知网中查找
在百度中查找
在本站中查找

                    
作者单位:兰州交通大学
作者简介:
通讯作者:
基金项目:国家自然科学基金资助项目(No.61762058),甘肃省教育厅产业支撑项(No.2022CYZC-38),国家电网科技项目(No.W32KJ2722010，No. 522722220013)

DL-HLVD：Deep Learning-based Hybrid Language Source Code Vulnerabil-ity Detection

Author:

Zhang Xuejun
Zhang Xuejun
兰州交通大学
在知网中查找
在百度中查找
在本站中查找
Guo Meifeng
Guo Meifeng
兰州交通大学
在知网中查找
在百度中查找
在本站中查找
Zhang Xiao
Zhang Xiao
兰州交通大学
在知网中查找
在百度中查找
在本站中查找
Zhang Bin
Zhang Bin
兰州交通大学
在知网中查找
在百度中查找
在本站中查找
Huang Haiyan
Huang Haiyan
兰州交通大学
在知网中查找
在百度中查找
在本站中查找
Cai Teli
Cai Teli
兰州交通大学
在知网中查找
在百度中查找
在本站中查找

Affiliation:

兰州交通大学

Fund Project:

The National Natural Science Foundation of China (No.61762058), The Natural Science Foundation of Gansu Province (No.21JR7RA282), The Industrial support project of Gansu Provincial Department of Education (No.2022CYZC-38), The State Grid Science and Technology Project (No. W32KJ2722010, No. 522722220013)

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

摘要:

为了提高软件开发效率，为软件体系的开发提供更多的选择。目前许多开源软件系统往往会使用多种编程语言共同编写，但是不同编程语言代码单元间通常具有关联和调用关系，由此产生的安全漏洞在实际环境中更加普遍。现有的漏洞检测技术主要针对单一编程语言进行特征学习，难以实现对混合编程语言软件项目漏洞的有效检测，就此提出一种基于深度学习的混合语言源代码漏洞检测方法DL-HLVD（Deep Learning-based Hybrid Language Source Code Vulnerability Detection)。DL-HLVD首先利用BERT层将代码文本转换为低维向量，将该向量作为双向门控循环单元(Bidirectional Gated Recurrent Unit，BGRU)的输入捕获上下文特征，使用条件随机场(Conditional Random Field，CRF)捕获相邻标签间的依赖关系。使用该方法对混合语言软件中不同类型编程语言的函数进行命名实体识别，然后将其和程序切片结果进行重构，进而减少代码表征过程中的语法和语义信息的损失，最后设计双向长短期记忆网络模型提取漏洞代码特征，实现对混合语言软件漏洞检测。在SARD和CrossVul数据集上的全面实验结果表明，DL-HLVD在两类漏洞数据集上识别软件漏洞的综合召回率达到了95.0%，F1值达到了93.6%，相比于最新的深度学习方法VulDeePecker、SySeVR、Project Achilles，DL-HLVD在各个指标上均有提升，表明DL-HLVD能够提高混合语言场景下源代码漏洞检测的综合性能。

关键词:漏洞检测;命名实体识别;程序切片;混合语言

Abstract:

In order to improve the efficiency of software development, more options are provided for the develop-ment of software system. At present, many open-source software systems are often written in multiple programming languages, but there are usually associations and invocation relationships between code units in different programming languages, and the resulting security vulnerabilities are more common in the actual environment. The existing vulnerability detection technology mainly focuses on the feature of a single programming language, and it is difficult to effectively detect the vulnerabilities of mixed pro-gramming language software projects. Based on the idea of deep learning model fusion, DL-HLVD（Deep Learning-based Hybrid Language Source Code Vulnerability Detection) is proposed. DL-HLVD first uses the BERT layer to convert the code text into a low-dimensional vector, then captures the con-text features as the input of the Bidirectional Gated Recurrent Unit (BGRU), and finally uses the Condi-tional Random Field (CRF) to capture the dependencies between adjacent labels. The deep learning mod-el is used to recognize named entities for functions of different types of programming languages in mixed language software, and then reconstructs them with program slicing results to reduce the loss of syntax and semantic information in the process of code representation. The comprehensive experimental results on the SARD and CrossVul datasets show that the comprehensive recall rate of DL-HLVD on the two types of vulnerability datasets is 95.0%, and the F1 value reaches 93.6%, which is improved in all indica-tors compared with the latest deep learning methods VulDeePecker, SySeVR, and Project Achilles. The results show that DL-HLVD can improve the comprehensive performance of source code vulnerability detection in mixed language scenarios.

Key words:vulnerability detection; named entity recognition; program slicing; hybrid language

文章指标

PDF下载次数:
HTML阅读次数:
摘要点击次数:
引用次数:

引用本文

复制

历史

收稿日期: 2024-01-04
最后修改日期: 2024-03-15
录用日期: 2024-05-13
在线发布日期:
出版日期:

首页

期刊简介

编委会

作者中心

下载中心

学术道德

常见问题

版权声明

联系我们

English

文章指标

引用本文

历史