Skip to Main content Skip to Navigation
Preprints, Working Papers, ...

Improved density peak clustering for large datasets

Vincent Courjault-Rade 1 Ludovic d'Estampes 2 Stéphane Puechmorel 1
MAIAA - ENAC - Laboratoire de Mathématiques Appliquées, Informatique et Automatique pour l'Aérien
Abstract : Clustering is the usual way of classifying data when there is no a priori knowledge, especially about the number of classes. Within the frame of big data analysis, the computational effort needed to perform the clustering task may become prohibitive and motivated the construction of several algorithms or the adaptation of existing 1ones, as the well known K-means algorithm . Recently, Rodriguez and Laio proposed an algorithm that clusters efficiently by fast searching local density peaks that are sufficiently distant one from the others. However it is able to work on small datasets only and is highly sensitive to the value of tunable parameters. In this paper we propose Improved Density Peak Clustering (IDPC), a new algorithm designed for large datasets based on [17] which corrects the shortcomings mentioned above. Thanks to our Cover Map (CM) procedure iterated with a decreasing locally-adaptive window (ICMDW), we are able to build both a localisation map and a multidimensional density map. The nature of the density map, which fits perfectly with the approach of [17], allows us to compute the different steps with much less operations. It carries unsensitive parameters, supports last improvements on cluster centers selection and potentially allows new improvements.
Document type :
Preprints, Working Papers, ...
Complete list of metadata

Cited literature [26 references]  Display  Hide  Download
Contributor : Stephane Puechmorel Connect in order to contact the contributor
Submitted on : Friday, August 12, 2016 - 10:55:48 AM
Last modification on : Wednesday, November 3, 2021 - 5:37:51 AM
Long-term archiving on: : Sunday, November 13, 2016 - 11:14:55 AM


Files produced by the author(s)


Distributed under a Creative Commons Attribution 4.0 International License


  • HAL Id : hal-01353574, version 1



Vincent Courjault-Rade, Ludovic d'Estampes, Stéphane Puechmorel. Improved density peak clustering for large datasets. 2016. ⟨hal-01353574⟩



Record views


Files downloads