+高级检索
PDM:基于Hadoop的并行数据分析系统
DOI:
作者:
作者单位:

作者简介:

通讯作者:

基金项目:


PDM:A Parallel Data Analysis System Based on Hadoop
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
    摘要:

    提出了一款基于Hadoop的并行数据分析系统——PDM该系统拥有大量以MapReduce为计算框架的并行数据分析算法,不仅包括传统的ETL、数据挖掘、数据统计和文本分析算法,还引入了基于图理论的SNA(社会网络分析)算法.详细阐述了并行多元线性回归算法和“多源最短路径”算法的原理和实现,其中,提出的“消息传递模型”能有效解决MapReduce难以处理邻接矩阵的问题;介绍了基于电信数据的典型应用,如采用并行k均值和决策树算法实现的“套餐推荐”,利用并行PageRank算法实现的“营销关键点发现”等;最后通过性能测试,说明该系统适合高效地处理大规模数据

    Abstract:

    A PDM (Parallel Data Mining) system was built based on Hadoop. PDM contains a large number of parallel data analysis algorithms based on MapReduce computational framework. These algorithms not only contain the classic algorithms of ETL, data mining, data statistical and text analysis, but also introduce SNA (social network analysis) based on graph mining. The principle and implementation of the parallel multiple linear regression algorithm and the multi-source shortest path algorithm were described and the “Message-passing model” proposed can effectively solve the problem that MapReduce is difficult to deal with the adjacency matrix structure. This paper also illustrates some typical applications of telecommunications, such as the “Business recommendation” based on parallel k-means and decision tree algorithms,the “Marketing key points discovery” based on parallel PageRank algorithm and the like. Finally, the results of performance test show that the proposed system is suitable for dealing with large scale data efficiently.

    参考文献
    相似文献
    引证文献
文章指标
  • PDF下载次数:
  • HTML阅读次数:
  • 摘要点击次数:
  • 引用次数:
引用本文

段松青,吴斌,于乐,王柏. PDM:基于Hadoop的并行数据分析系统[J].湖南大学学报:自然科学版,2012,39(10):87~92

复制
历史
  • 收稿日期:
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期:
  • 出版日期:
作者稿件一经被我刊录用,如无特别声明,即视作同意授予我刊论文整体的全部复制传播的权利,包括但不限于复制权、发行权、信息网络传播权、广播权、表演权、翻译权、汇编权、改编权等著作使用权转让给我刊,我刊有权根据工作需要,允许合作的数据库、新媒体平台及其他数字平台进行数字传播和国际传播等。特此声明。
关闭