+Advanced Search

Density Based Clustering on Large Scale Spatial Data Using Resilient Distributed Dataset
Author:
Affiliation:

Fund Project:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
    Abstract:

    This paper proposed a density based parallel clustering algorithm to mine the feature of large scale spatial data. The proposed PClusterdp algorithm is based on the cluster-dp algorithm. First, we introduced a data object count based RDD partition algorithm for balancing the working load of each compute node in computing cluster. Second, we redefined the local density for each data point to suit the parallel computing. Meanwhile, in order to get rid of original algorithm's decision graph, we proposed a method to automatically determine the center point for each cluster. Finally, we discussed the cluster merge stratagem to combine the partially clustered data together to generate the final clustering result. We implemented our Resilient Distributed Dataset (RDD) based algorithm on Spark. The experiment result shows that the proposed algorithm can cluster large scale spatial data effectively, and meanwhile, the method has better performance than the traditional density clustering methods and can achieve the rapid clustering of massive spatial data.

    Reference
    Related
    Cited by
Article Metrics
  • PDF:
  • HTML:
  • Abstract:
  • Cited by:
Get Citation
History
  • Received:
  • Revised:
  • Adopted:
  • Online: October 09,2015
  • Published: