#### DMCA

## A general framework for increasing the robustness of PCA-based correlation clustering algorithms (2008)

### Cached

### Download Links

Venue: | IN: PROC. SSDBM |

Citations: | 18 - 7 self |

### Citations

1714 | A density-based algorithm for discovering clusters in large spatial databases with noise
- Ester, Kriegel, et al.
- 1996
(Show Context)
Citation Context ...i-dimensional indexing. Initially, however, the eigensystems in both methods are basedonthelocalneighborhoodintheEuclideanspace. The algorithm 4C [2] is based on a density-based clustering paradigma =-=[11]-=-. Thus, the number of clusters is not decided beforehand but clusters grow from a seed as long as a density criterion is fulfilled. Otherwise, another seed is picked to start a new cluster. The densit... |

507 | OPTICS: Ordering Points To Identify the Clustering Structure
- Ankerst, Breunig, et al.
- 1999
(Show Context)
Citation Context ...e distance between points according to their local correlation dimensionality and subspace orientation – thus again based on a local neighborhood query – and uses hierarchical densitybased clustering =-=[12]-=- to derive a hierarchy of correlation clusters. COPAC [5] is based on similar ideas as 4C but disposes of some problems like meaningless similarity matrices due to sparse ε-neighborhoods instead takin... |

397 | When is ”nearest neighbor” meaningful
- Beyer, Goldstein, et al.
- 1999
(Show Context)
Citation Context ...perplane of the cluster. Intuitively, the eigenvectors are chosen such that the corresponding eigenvalues explain more than α of the total variance. The number of those eigenvectors is called local di=-=(7)-=-mensionality (of a cluster), denoted by λE, formally λE = min {λ | ex(E,λ) ≥ α} . (8) λ∈{1...d} Let us note that almost all correlation clustering algorithms use this notion of local dimensionality. ... |

194 | On the surprising behavior of distance metrics in high dimensional space - Aggarwal, Hinneburg, et al. - 1973 |

193 | Finding generalized projected clusters in high dimensional spaces
- Aggarwal, Yu
(Show Context)
Citation Context ...onstrates the impact of the increased robustness of PCA on several data sets. The paper is concluded in Section 7. 2 Related Work The first approach to generalized projected clustering, called ORCLUS =-=[1]-=-, is a K-means like approach. It picks Kc >Kseeds at first and assigns the data base objects to these seeds according to a distance function that is based on an eigensystem of the corresponding cluste... |

134 | What is the nearest neighbor in high dimensional spaces
- Hinneburg, Aggarwal, et al.
- 2000
(Show Context)
Citation Context ...ponding eigenvalues explain more than α of the total variance. The number of those eigenvectors is called local di(7)mensionality (of a cluster), denoted by λE, formally λE = min {λ | ex(E,λ) ≥ α} . =-=(8)-=- λ∈{1...d} Let us note that almost all correlation clustering algorithms use this notion of local dimensionality. Typical values for α are 0.85, i.e. the eigenvectors that span the hyperplane explain ... |

118 | Local dimensionality reduction: A new approach to indexing high dimensional spaces
- Chakrabarti, Mehrotra
- 2000
(Show Context)
Citation Context ...current neighborhood of the cluster center). The number Kc of clusters is reduced iteratively by merging closest pairs of clusters until the user-specified number K is reached. The method proposed in =-=[10]-=- is a slight variant of ORCLUS designed for enhancing multi-dimensional indexing. Initially, however, the eigensystems in both methods are basedonthelocalneighborhoodintheEuclideanspace. The algorith... |

47 | Computing clusters of correlation connected objects
- Bohm, Kailing, et al.
- 2003
(Show Context)
Citation Context ... slight variant of ORCLUS designed for enhancing multi-dimensional indexing. Initially, however, the eigensystems in both methods are basedonthelocalneighborhoodintheEuclideanspace. The algorithm 4C =-=[2]-=- is based on a density-based clustering paradigma [11]. Thus, the number of clusters is not decided beforehand but clusters grow from a seed as long as a density criterion is fulfilled. Otherwise, ano... |

15 | CURLER: Finding and visualizing nonlinear correlated clusters
- Tung, Xu, et al.
- 2005
(Show Context)
Citation Context ...ly assigning complex patterns of intersecting clusters, COPAC and ERiC improve considerably over ORCLUS and 4C. Another approach based on PCA said to find even non-linear correlation clusters, CURLER =-=[3]-=-, seems not restricted to correlations of attributes but, according to its restrictions, finds any narrow trajectory and does not provide a model describing its findings. However, even in this approac... |

10 | Mining hierarchies of correlation clusters
- Achtert, Böhm, et al.
- 2006
(Show Context)
Citation Context ...ices computed from the eigensystems of two points. The eigensystem of a point is based on the covariance matrix of the ε-neighborhood of the point in Euclidean space. As a hierarchical approach, HiCO =-=[4]-=- defines the distance between points according to their local correlation dimensionality and subspace orientation – thus again based on a local neighborhood query – and uses hierarchical densitybased ... |

10 | On exploring complex relationships of correlation clusters
- Achtert, Böhm, et al.
- 2007
(Show Context)
Citation Context ...singk>λensures a meaningful definition of a λdimensional hyperplane. Still, the Euclidean neighborhood critically influences the results. The latest PCA-based correlation clustering algorithm is ERiC =-=[6]-=-, also deriving a local eigensystem for a point based on the k nearest neighbors in Euclidean space. Here, the neighborhood criterion for two points in a DBSCAN-like procedure is an approximate linear... |

9 | Robust, complete, and efficient correlation clustering
- Achtert, Böhm, et al.
(Show Context)
Citation Context ...tion dimensionality and subspace orientation – thus again based on a local neighborhood query – and uses hierarchical densitybased clustering [12] to derive a hierarchy of correlation clusters. COPAC =-=[5]-=- is based on similar ideas as 4C but disposes of some problems like meaningless similarity matrices due to sparse ε-neighborhoods instead taking a fixed number k of neighbors — which raises the questi... |

9 |
Very high compliance in an expanded MS-MS-based newborn screening program despite written parental consent
- Liebl, Nennstiel-Ratzel, et al.
- 2002
(Show Context)
Citation Context ...our novel concepts, the algorithm ERiC is now able to detect some meaningful clusters on the NBA set. In addition, we applied our novel concepts in combination with ERiC to the Metabolome data set of =-=[14]-=- consisting of the concentrations of 43 metabolites in 20,391 human newborns. The newborns were labelled according to some specific metabolic diseases. The data contains 19,730 healthy newborns (“cont... |