王绍刚,徐炜遐,吴丹,庞征斌,夏军.一种面向不可靠网络的快速RDMA通信方法[J].湖南大学学报:自然科学版,2015,42(8):100~107
一种面向不可靠网络的快速RDMA通信方法
A Fast RDMA Offload Method for Unreliable Interconnection Networks
  
DOI:
中文关键词:  远程内存访问  RDMA  MPI  滑动窗口
英文关键词:remote data memory access, RDMA, MPI, sliding window approach
基金项目:
作者单位
王绍刚,徐炜遐,吴丹,庞征斌,夏军 (国防科学技术大学 计算机学院, 湖南 长沙410073) 
摘要点击次数: 447
全文下载次数: 0
中文摘要:
      大数据量的远程内存访问(RDMA)传输是并行计算机中最基本的通信模式之一,对系统整体性能的影响很大.随着并行计算机系统的规模扩大,系统的容错性设计面临着很大的挑战,互连网络具有链路不可靠、自适应路由等特点,如何面向不可靠网络实现可靠的端到端RDMA传输是并行系统体系结构设计的一大难题.提出一种面向不可靠网络下的快速RDMA传输方法,方法能够在节点控制器芯片上高效实现,对上层驱动软件和应用提供可靠的端到端RDMA传输服务.与传统的建立连接的方法相比,方法的硬件设计复杂度大大降低;方法另一优点是实现了按需重传,避免了传统方法中一次RDMA传输出现错误时,需要重传整个RDMA数据的开销,在相同的错误概率下,新方法的传输效率得到了很大的提升.
英文摘要:
      Large data RDMA (Remote Data Memory Access) transport is the most commonly used parallel communication mode for parallel computers, which has great impact on the whole system performance. As the system size increases, the fault-tolerate architecture design faces new challenges. The interconnection network usually uses the adaptive routing mode and becomes more unreliable. This paper proposed a fast RDMA offload method for unreliable interconnection networks, which can be efficiently implemented on the NIC hardware and provides reliable RDMA communication for upper driver and programs. Compared with the traditional approaches, the hardware overhead is greatly reduced. Another benefit is that it can partially retransmit the fault data, which greatly reduces the whole RDMA delay. Simulation results show that the RDMA delay is greatly reduced, compared with the traditional methods.
查看全文  查看/发表评论  下载PDF阅读器
关闭