CHI Li-hua, LIU Jie, YAN Yi-hui, XIE Lin-chuan, GAN Xin-biao, HU Qin-feng, JIANG Jie, LI Sheng-guo
(National Laboratory for Parallel and Distributed Processing, National Univ of Defense Technology,Changsha,Hunan410073,China) 在知网中查找 在百度中查找 在本站中查找
Affiliation:
Fund Project:
Article
|
Figures
|
Metrics
|
Reference
|
Related
|
Cited by
|
Materials
Abstract:
BLAS library is the fundamental linear algebra library and plays an important role in many large scientific applications. This paper developed a linear algebra library named FitenBLAS on a massively multithreaded FT1000 processor. Based on the hierarchical storage system and the number of registers, multilevel loop unrolling methods were developed for vector-vector, matrix-vector and matrix-matrix linear operations. The codes of FitenBLAS were optimized with instruction layout and data prefetching technology. An avoiding redundant packing method was proposed for parallel matrix-matrix multiplication, and the parallel code was developed. The kernel matrix-matrix multiplication code was optimized with instruction layout, time overlapping of data access and computation, and data blocking. The other BLAS3 subroutines were designed on the matrix multiplication code. The kernel codes of FitenBLAS were developed in assembly language. The performance for the key subroutine of the matrix multiplication reaches 6.91Glops/s, nearly 86.4% of the peak performance of the FT1000.