+高级检索
基于QEMU的SIMD指令替换浮点指令框架
DOI:
作者:
作者单位:

江南大学 人工智能与计算机学院

作者简介:

通讯作者:

基金项目:

国家重点研发专项计划(2022YFE0112400),National Key R D Program of China(2022YFE0112400);国家自然科学基金 (21706096),National Natural Science Foundation of China(21706096);江苏省自然科学基金青年项目(BK20160162),Youth Project of Natural Science Foundation of Jiangsu Province(BK20160162)


QEMU-based framework for SIMD instruction replacement of floating-point instructions
Author:
Affiliation:

School of Artificial Intelligence and Computing,Jiangnan University

Fund Project:

National Key R&D Program of China(2022YFE0112400);National Natural Science Foundation of China(21706096);Youth Project of Natural Science Foundation of Jiangsu Province(BK20160162)

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
    摘要:

    现在,几乎每个处理器架构都已经加入了对于SIMD(Single Instruction MultipleSData)指令的支持,SIMD指令能同时对一组数据执行相同的操作,通过数据并行来提高处理器的处理性能.但是大部分动态二进制翻译器忽略了本地SIMD指令的利用,而是以软件语言实现来模拟浮点计算.文章提出了一种基于QEMU翻译系统的FP-QEMU框架,FP-QEMU框架采用SIMD指令来优化替换浮点计算指令,并在X86和ARM两个基准平台完成了完整的浮点实现.该框架可以识别动态二进制翻译系统中的浮点计算加速优化机会并使用SIMD指令来达到提升动态二进制翻译系统翻译性能的效果.文章采用SPEC2006作为测试基准,实验表明相比QEMU,FP-QEMU跨平台的ARM应用在X86计算机上运行的最高加速比可达51.5%,平均加速比达到37.42%.

    Abstract:

    Now, almost every processor architecture has added support for SIMD(Single Instruction Multiple Data) instructions, SIMD instructions can perform the same operation on a set of data at the same time, and improve the processing performance of the processor through data parallelism. However, most dynamic binary translators ignore the use of native SIMD instructions, and instead simulate floating-point computations in software languages. This paper proposes a framework called FP-QEMU based on QEMU translation system. FP-QEMU adopts SIMD instructions to optimize and replace floating-point calculation instructions, and completes a complete floating-point implementation on X86 and ARM benchmark platforms. The framework can identify the optimization opportunities of floating-point computation acceleration in dynamic binary translation system and use SIMD instructions to achieve the effect of improving the translation performance of dynamic binary translation system. Using SPEC2006 as the benchmark, experiments show that compared with QEMU, FP-QEMU cross-platform ARM applications running on X86 computers can achieve a maximum speedup of 51.5% and an average speedup of 37.42%.

    参考文献
    相似文献
    引证文献
文章指标
  • PDF下载次数:
  • HTML阅读次数:
  • 摘要点击次数:
  • 引用次数:
引用本文
历史
  • 收稿日期: 2023-11-30
  • 最后修改日期: 2024-01-31
  • 录用日期: 2024-04-09
  • 在线发布日期:
  • 出版日期:
作者稿件一经被我刊录用,如无特别声明,即视作同意授予我刊论文整体的全部复制传播的权利,包括但不限于复制权、发行权、信息网络传播权、广播权、表演权、翻译权、汇编权、改编权等著作使用权转让给我刊,我刊有权根据工作需要,允许合作的数据库、新媒体平台及其他数字平台进行数字传播和国际传播等。特此声明。
关闭