+Advanced Search

Density Based Clustering on Large Scale Spatial Data Using Resilient Distributed Dataset
Author:
  • LI Lu-ming, JIANG Xin-hua, LIAO Lyu-chao

    LI Lu-ming, JIANG Xin-hua, LIAO Lyu-chao

    (1.School of Information Science and Engineering, Central-South Univ, Changsha,Hunan410075, China;2.Hunan Key Laboratory for Special Road Environment, Changsha Univ of Science and Technology, Changsha,Hunan410004,China;3.Fujian Key Laboratory for Automotive Electronics and Electric Drive , Fujian Univ of Technology, Fuzhou,Fujian350108,China)
    在知网中查找
    在百度中查找
    在本站中查找
  • Article
  • | |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
    Abstract:

    This paper proposed a density based parallel clustering algorithm to mine the feature of large scale spatial data. The proposed PClusterdp algorithm is based on the cluster-dp algorithm. First, we introduced a data object count based RDD partition algorithm for balancing the working load of each compute node in computing cluster. Second, we redefined the local density for each data point to suit the parallel computing. Meanwhile, in order to get rid of original algorithm's decision graph, we proposed a method to automatically determine the center point for each cluster. Finally, we discussed the cluster merge stratagem to combine the partially clustered data together to generate the final clustering result. We implemented our Resilient Distributed Dataset (RDD) based algorithm on Spark. The experiment result shows that the proposed algorithm can cluster large scale spatial data effectively, and meanwhile, the method has better performance than the traditional density clustering methods and can achieve the rapid clustering of massive spatial data.

    Reference
    Related
    Cited by
Article Metrics
  • PDF:
  • HTML:
  • Abstract:
  • Cited by:
Get Citation
History
  • Online: October 09,2015