Improved density peak clustering for large datasets
Résumé
Clustering is the usual way of classifying data when there is no a priori knowledge,
especially about the number of classes. Within the frame of big data analysis, the
computational effort needed to perform the clustering task may become prohibitive
and motivated the construction of several algorithms or the adaptation of existing
1ones, as the well known K-means algorithm . Recently, Rodriguez and Laio
proposed an algorithm that clusters efficiently by fast searching local density peaks
that are sufficiently distant one from the others. However it is able to work on small
datasets only and is highly sensitive to the value of tunable parameters. In this paper
we propose Improved Density Peak Clustering (IDPC), a new algorithm designed for
large datasets based on [17] which corrects the shortcomings mentioned above. Thanks
to our Cover Map (CM) procedure iterated with a decreasing locally-adaptive window
(ICMDW), we are able to build both a localisation map and a multidimensional
density map. The nature of the density map, which fits perfectly with the approach
of [17], allows us to compute the different steps with much less operations. It carries
unsensitive parameters, supports last improvements on cluster centers selection and
potentially allows new improvements.
Domaines
Statistiques [math.ST]
Origine : Fichiers produits par l'(les) auteur(s)
Loading...