FSDM 2018

Invited Speaker---Prof. Juanying Xie

School of Computer Science, Shaanxi Normal University, China

Biography: Juanying Xie is a professor at Shaanxi Normal University in PR China. She was a senior member of CCF, and an associate editor of HISS. She was awarded her Ph.D. in signal and information processing from Xidian University in 2012. She cooperated with Prof. Xiaohui Liu at Brunel University in UK from 2010 to 2011 in gene selection research. Her research interests include machine learning, data mining, and biomedical data analysis. She received an engineering master degree in the application technology of computers at Xidian University in 2004 and a bachelor degree of science in computer science at Shanxi Normal University in 1993, and from then on she has been working in Shaanxi Normal University.

Speech Title: Robust clustering algorithms by detecting density peaks and assigning points based on K-nearest neighbors and fuzzy weighted K-nearest neighbors
Abstract: Clustering by fast search and find of Density Peaks (referred to as DPC) was introduced by Alex Rodríguez and Alessandro Laio in Science at June 2014. The DPC algorithm is based on the idea that cluster centers are characterized by having a higher density than their neighbors and by being at a relatively large distance from points with higher densities. The power of DPC was demonstrated on several test cases. It can intuitively find the number of clusters and can detect and exclude the outliers automatically, while recognizing the clusters regardless of their shape and the dimensions of the space containing them. However, DPC does have some drawbacks to be addressed before it may be widely applied. First, the local density ρi of point i is affected by the cutoff distance dc, and is computed in different ways depending on the size of datasets, which can influence the clustering, especially for small real-world cases. Second, the assignment strategy for the remaining points, after the density peaks (that is the cluster centers) have been found, can create a “Domino Effect”, whereby once one point is assigned erroneously, then there may be many more points subsequently misassigned. This is especially the case in real-word datasets where there could exist several clusters of arbitrary shape overlapping each other. To overcome these deficiencies, we successively proposed two robust clustering algorithms respectively based on the K-nearest neighbors and the fuzzy weighted K-nearest neighbors, named KNN-DPC and FKNN-DPC respectively, which have been published in the journal of Scientia Sinica Informationis (in Chinese) and Information Sciences in 2016. To find the density peaks, KNN-DPC and FKNN-DPC compute the local density ρi of point i relative to its K-nearest neighbors for any size dataset independent of the cutoff distance dc, and assign the remaining points to the most probable clusters respectively using the special two new point assignment strategies. The first strategy of the KNN-DPC and FKNN-DPC assign non-outliers by undertaking a breadth first search of the K-nearest neighbors of a point starting from cluster centers. The second strategy of our KNN-DPC and FKNN-DPC are different. KNN-DPC assigns outliers and the points unassigned by the first assignment strategy using the technique of K-nearest neighbors of those points which are assigned, and FKNN-DPC assigns outliers and the points unassigned by the first assignment strategy using fuzzy weighted K-nearest neighbors of those assigned points. The proposed KNN-DPC and FKNN-DPC are benchmarked on publicly available synthetic and real-world datasets which are commonly used for testing the performance of clustering algorithms. The clustering results of KNN-DPC and FKNN-DPC are compared not only with that of DPC but also with that of Affinity Propagation (AP), Density-Based Spatial Clustering of Applications with Noise (DBSCAN) and K-means. The benchmarks used to test the performance of each algorithm are: clustering accuracy (Acc), Adjusted Mutual Information (AMI) and Adjusted Rand Index (ARI). The experimental results demonstrate that the proposed KNN-DPC and FKNN-DPC can find cluster centers, recognize clusters regardless of their shape and dimension of the space in which they are embedded, be unaffected by outliers, and can often outperform DPC, AP, DBSCAN and K-means. Furthermore, the FKNN-DPC is superior to KNN-DPC.