+高级检索
异构分布式环境中的并行离群点检测算法
DOI:
作者:
作者单位:

作者简介:

通讯作者:

基金项目:


Parallel Outlier Detection Algorithm in Heterogeneous Distributed Environment
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
    摘要:

    离群点检测是数据挖掘领域研究的热点之一,主要目的是识别出数据集中异常但有价值的数据点. 随着数据规模不断扩大,使得处理海量数据的效率降低,随即引入分布式算法. 目前现有的分布式算法大都用于解决同构分布式的处理环境,但在实际应用中,由于参与分布式计算的处理机配置的差异,现有的分布式离群点检测算法不能很好地适用于异构分布式环境. 针对上述问题,本文提出一种面向异构分布式环境的离群点检测算法. 首先提出基于网格的动态数据划分方法(Gird-based Dynamic Data Partitioning,GDDP),充分利用各处理机的计算资源,同时根据数据点的空间位置信息进行数据划分,可有效减少网络通信. 其次基于GDDP算法,提出了异构分布式环境中并行的离群点检测算法(GDDP-based Outlier Detection Algorithm,GODA). 该算法包括2个阶段:在每个处理机本地,按照索引中数据点的顺序进行过滤,通过2次扫描得到离群点候选集;判断候选离群点需要进行网络通信的处理机,使用较低网络开销得出全局离群点. 最后,通过大量实验验证了本文提出的GDDP和GODA算法的有效性.

    Abstract:

    Outlier detection is one of the hotspots in the field of data mining. The main purpose is to identify the abnormal but valuable data points in the data set. With the expansion of data scale,the efficiency of processing massive data is reduced,and then a distributed algorithm is introduced. At present,most of the existing distributed algorithms are used to solve the homogeneous distributed processing environment. However,in practical applications,due to the differences in processor configuration involved in distributed computing,the existing distributed outlier detection algorithms cannot be well applied to heterogeneous distributed environments. In view of the above problems,this paper proposes an outlier detection algorithm for heterogeneous distributed environments. Firstly,a grid-based dynamic data partitioning (GDDP) method is proposed,which makes full use of the computing resources of each processor and divides the data according to the spatial location information of the data points,which can effectively reduce the network communication. Secondly,based on the GDDP algorithm,a parallel GDDP-based Outlier Detection Algorithm (GODA) is proposed. The algorithm consists of two stages:in each processor,filtering according to the order of the data points in the index and obtaining the outlier candidate set by two scans;determining the candidate outliers requiring network communication and using low network overhead leads to global outliers. Finally,the effectiveness of the proposed GDDP and GODA algorithms is verified by a large number of experiments.

    参考文献
    相似文献
    引证文献
文章指标
  • PDF下载次数:
  • HTML阅读次数:
  • 摘要点击次数:
  • 引用次数:
引用本文

王习特?覮,朱宗梅,于雪苹,白梅.异构分布式环境中的并行离群点检测算法[J].湖南大学学报:自然科学版,2020,47(10):100~110

复制
历史
  • 收稿日期:
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2020-10-26
  • 出版日期:
作者稿件一经被我刊录用,如无特别声明,即视作同意授予我刊论文整体的全部复制传播的权利,包括但不限于复制权、发行权、信息网络传播权、广播权、表演权、翻译权、汇编权、改编权等著作使用权转让给我刊,我刊有权根据工作需要,允许合作的数据库、新媒体平台及其他数字平台进行数字传播和国际传播等。特此声明。
关闭