Improved density peak clustering for large datasets

Vincent Courjault-Rade; Ludovic d'Estampes; Stéphane Puechmorel

Pré-Publication, Document De Travail Année : 2016

Improved density peak clustering for large datasets

(1) , (2) , (1)

1
2

Vincent Courjault-Rade

Fonction : Auteur

Ecole Nationale de l'Aviation Civile

Ludovic d'Estampes

Fonction : Auteur
PersonId : 8813
IdHAL : ludovic-destampes
IdRef : 07604520X

Equipe MAIAA-PROBA

Stéphane Puechmorel

Fonction : Auteur
PersonId : 1385
IdHAL : stephane-puechmorel
IdRef : 078931185

Ecole Nationale de l'Aviation Civile

Résumé

Clustering is the usual way of classifying data when there is no a priori knowledge, especially about the number of classes. Within the frame of big data analysis, the computational effort needed to perform the clustering task may become prohibitive and motivated the construction of several algorithms or the adaptation of existing 1ones, as the well known K-means algorithm . Recently, Rodriguez and Laio proposed an algorithm that clusters efficiently by fast searching local density peaks that are sufficiently distant one from the others. However it is able to work on small datasets only and is highly sensitive to the value of tunable parameters. In this paper we propose Improved Density Peak Clustering (IDPC), a new algorithm designed for large datasets based on [17] which corrects the shortcomings mentioned above. Thanks to our Cover Map (CM) procedure iterated with a decreasing locally-adaptive window (ICMDW), we are able to build both a localisation map and a multidimensional density map. The nature of the density map, which fits perfectly with the approach of [17], allows us to compute the different steps with much less operations. It carries unsensitive parameters, supports last improvements on cluster centers selection and potentially allows new improvements.

Mots clés

Clustering algorithm density-based clustering large datasets

Domaines

Statistiques [math.ST]

Fichier principal

improved-density-peak.pdf (1.24 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

stephane puechmorel : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01353574

Soumis le : vendredi 12 août 2016-10:55:48

Dernière modification le : mercredi 17 avril 2024-11:06:45

Archivage à long terme le : dimanche 13 novembre 2016-11:14:55

Dates et versions

hal-01353574 , version 1 (12-08-2016)

Licence

Paternité

Identifiants

HAL Id : hal-01353574 , version 1

Citer

Vincent Courjault-Rade, Ludovic d'Estampes, Stéphane Puechmorel. Improved density peak clustering for large datasets. 2016. ⟨hal-01353574⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

ENAC MAIAA MAIAA-PROBA DEVI

818 Consultations

1046 Téléchargements

Improved density peak clustering for large datasets

Résumé

Mots clés

Domaines

Dates et versions

Licence

Identifiants

Citer

Exporter

Collections

Partager