+高级检索
基于深度学习的混合语言源代码漏洞检测方法
作者:

DL-HLVD:Deep Learning-based Hybrid Language Source Code Vulnerability Detection Method
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
    摘要:

    现有基于深度学习的源代码漏洞检测方法主要针对单一编程语言进行特征学习,难以对混合编程语言软件项目因代码单元间的关联和调用产生漏洞进行有效检测.因此,本文提出了一种基于深度学习的混合语言源代码漏洞检测方法DL-HLVD.首先利用BERT层将代码文本转换为低维向量,并将其作为双向门控循环单元的输入来捕获上下文特征,同时使用条件随机场来捕获相邻标签间的依赖关系;然后对混合语言软件中不同类型编程语言的函数进行命名实体识别,并将其和程序切片结果进行重构来减少代码表征过程中的语法和语义信息的损失;最后设计双向长短期记忆网络模型提取漏洞代码特征,实现对混合语言软件漏洞检测.在SARD和CrossVul数据集上的实验结果表明,DL-HLVD在两类漏洞数据集上识别软件漏洞的综合召回率达到了95.0%,F1值达到了93.6%,比最新的深度学习方法VulDee- Pecker、SySeVR、Project Achilles在各个指标上均有提升,说明DL-HLVD能够提高混合语言场景下源代码漏洞检测的综合性能.

    Abstract:

    The existing deep learning-based source code vulnerability detection methods mainly focus on the feature learning of a single programming language, and it is difficult to effectively detect the vulnerabilities caused by the association and invocation of code units in software projects of hybrid programming languages. To address this issue, a deep learning-based hybrid language vulnerability detection method DL-HLVD is proposed. Firstly, the BERT layer is used to convert the code text into low-dimensional vectors, which are then used as inputs to the bidirectional gated loop unit to capture the contextual features, and the conditional random field is used to capture the dependency between adjacent labels. Secondly, functions from different types of programming languages are identified as named entity recognition in the hybrid software and reconstructed with the program slicing results to reduce the loss of syntactic and semantic information in the code characterization process. Finally, the bidirectional long short-term memory network model is designed to extract the vulnerability code features and realize the vulnerability detection of hybrid language software. The comprehensive experimental results on the SARD and CrossVul datasets show that the comprehensive recall rate of DL-HLVD on the two types of vulnerability datasets is 95.0%, and the F1 value reaches 93.6%, which is improved in all indicators compared with the VulDeePecker, SySeVR, and Project Achilles. It demonstrates that the DL-HLVD method can improve the comprehensive performance of source code vulnerability detection in hybrid language scenarios.

    参考文献
    相似文献
    引证文献
文章指标
  • PDF下载次数:
  • HTML阅读次数:
  • 摘要点击次数:
  • 引用次数:
引用本文

张学军 ?,郭梅凤 ,张潇 ,张斌 ,黄海燕 ,蔡特立.基于深度学习的混合语言源代码漏洞检测方法[J].湖南大学学报:自然科学版,2025,52(4):103~113

复制
历史
  • 在线发布日期: 2025-04-28
作者稿件一经被我刊录用,如无特别声明,即视作同意授予我刊论文整体的全部复制传播的权利,包括但不限于复制权、发行权、信息网络传播权、广播权、表演权、翻译权、汇编权、改编权等著作使用权转让给我刊,我刊有权根据工作需要,允许合作的数据库、新媒体平台及其他数字平台进行数字传播和国际传播等。特此声明。
关闭