#### DMCA

## 1Mining Projected Clusters in High-Dimensional Spaces (2008)

### Cached

### Download Links

### Citations

11966 | Maximum likelihood from incomplete data via the em algorithm
- Dempster, Laird, et al.
- 1977
(Show Context)
Citation Context ...The number of components that minimizes BIC(m) is considered to be the optimal value for m. Typically, the maximum likelihood of the parameters of the distribution is estimated using the EM algorithm =-=[26]-=-. This algorithm requires the initial parameters of each component. Since EM is highly dependent on initialization [27], it will be helpful to perform initialization by mean of a clustering algorithm ... |

4704 |
Self-organizing maps,
- Kohonen
- 1995
(Show Context)
Citation Context ...39] the standard K-means algorithms and three unsupervised competitive neural network algorithms – the neural gas network [40], the growing neural gas network [41] and the self-organizing feature map =-=[42]-=-– are used to cluster the WDBC data. The accuracy achieved by these algorithms on this data is between 90% and 92%, which is very close to the accuracy achieved by PCKA on the same data (WDBC). Such r... |

4319 |
Estimating the dimension of a model.
- Schwarz
- 1978
(Show Context)
Citation Context ...estimate the number of components in a dataset [23] [24]. In our method, we use a penalized likelihood criterion, called the Bayesian Information Criterion (BIC). BIC was first introduced by Schwartz =-=[25]-=- and is given by BIC(m) = −2Lm +Np log(N) (9) where L is the logarithm of the likelihood at the maximum likelihood solution for the mixture model under investigation and Np is the number of parameters... |

3141 | Data mining: concepts and techniques
- Han, Kamber, et al.
- 2006
(Show Context)
Citation Context ...a are also characterized by the presence of outliers. Outliers can be defined as a set of data points that are considerably dissimilar, exceptional, or inconsistent with respect to the remaining data =-=[32]-=-. Most of the clustering algorithms, including PCKA, consider outliers as points that are not located in clusters and should be captured and eliminated because they hinder the clustering process. A co... |

2069 |
Pattern Recognition with Fuzzy Objective Function Algorithms
- Bezdek
- 1981
(Show Context)
Citation Context ...mponent. Since EM is highly dependent on initialization [27], it will be helpful to perform initialization by mean of a clustering algorithm [27] [28]. For this purpose we implement the FCM algorithm =-=[29]-=- to partition the set of sparseness degrees into m components. Based on such a partition we can estimate the parameters of each component and set them as initial parameters to the EM algorithm. Once t... |

1467 |
The EM Algorithm and Extensions
- McLachlan, Krishnan
- 1997
(Show Context)
Citation Context ...s algorithm requires the initial parameters of each component. Since EM is highly dependent on initialization [27], it will be helpful to perform initialization by mean of a clustering algorithm [27] =-=[28]-=-. For this purpose we implement the FCM algorithm [29] to partition the set of sparseness degrees into m components. Based on such a partition we can estimate the parameters of each component and set ... |

1035 | Statistical pattern recognition: a review
- Jain, Duin, et al.
(Show Context)
Citation Context ...other two datasets, it seems there are few studies in the literature that use this set to evaluate clustering algorithms. Most of the work on this dataset is related to classification algorithms [43] =-=[44]-=-. Hence, in order to have an intuitive idea about the behavior of a method that considers all dimensions in the clustering process, we have chosen to run the standard K-means on this data. The accurac... |

724 | Automatic subspace clustering of high dimensional data for data mining applications
- Agrawal, Gehrke, et al.
- 1998
(Show Context)
Citation Context ...c and real datasets. Index Terms Data mining, clustering, high dimensions, projected clustering. I. INTRODUCTION Data mining is the process of extracting potentially useful information from a dataset =-=[1]-=-. Clustering is a popular data mining technique which is intended to help the user discover and understand the structure or grouping of the data in the set according to a certain similarity measure [2... |

599 |
Stochastic complexity.
- Rissanen
- 1987
(Show Context)
Citation Context ...ed on ranks. April 28, 2008 DRAFT 18 In order to identify dense regions in each dimension, we are interested in all components with small values of locq. We therefore propose to use the MDL principle =-=[30]-=- to separate small and large values of locq. The MDL-selection technique that we use in our approach is similar to the MDL-pruning technique described in [1]. The authors in [1] use this technique to ... |

543 |
Statistical Models and Methods for Lifetime Data.
- Lawless
- 1982
(Show Context)
Citation Context ...l) = log( 1 Nl ∑ y∈Gl y)− 1 Nl ∑ y∈Gl log(y) (7) where Ψ(.) is the digamma function given by Ψ(α) = Γ′ (α) Γ(α) . The digamma function can be approximated very accurately using the following equation =-=[22]-=-: Ψ(α) = log(α)− 1 2α − 1 12α2 + 1 120α4 − 1 252α6 + . . . (8) The parameter α̂l can be estimated by solving equation (7) using the Newton-Raphson method. α̂l is then substituted into equation (6) to ... |

418 | Unsupervised learning of finite mixture models,”
- Figueiredo, Jain
- 2002
(Show Context)
Citation Context ...lihood of the parameters of the distribution is estimated using the EM algorithm [26]. This algorithm requires the initial parameters of each component. Since EM is highly dependent on initialization =-=[27]-=-, it will be helpful to perform initialization by mean of a clustering algorithm [27] [28]. For this purpose we implement the FCM algorithm [29] to partition the set of sparseness degrees into m compo... |

408 | When is ”nearest neighbor” meaningful?,
- Beyer, Goldstein, et al.
- 1999
(Show Context)
Citation Context ...the high-dimensional data commonly encountered nowadays, the concept of similarity between objects in the full-dimensional space is often invalid and generally not helpful. Recent theoretical results =-=[3]-=- reveal that data points in a set tend to be more equally spaced as the dimension of the space increases, as long as the components of the data point are i.i.d. (independently and identically distribu... |

401 | A growing neural gas network learns topologies,
- Fritzke
- 1995
(Show Context)
Citation Context ...nsidered in this set of experiments. In [39] the standard K-means algorithms and three unsupervised competitive neural network algorithms – the neural gas network [40], the growing neural gas network =-=[41]-=- and the self-organizing feature map [42]– are used to cluster the WDBC data. The accuracy achieved by these algorithms on this data is between 90% and 92%, which is very close to the accuracy achieve... |

346 |
Neural Gas’ network for vector quantization and its application to time series prediction,
- Martinetz, Berkovich, et al.
- 1993
(Show Context)
Citation Context ...stering algorithms on the datasets considered in this set of experiments. In [39] the standard K-means algorithms and three unsupervised competitive neural network algorithms – the neural gas network =-=[40]-=-, the growing neural gas network [41] and the self-organizing feature map [42]– are used to cluster the WDBC data. The accuracy achieved by these algorithms on this data is between 90% and 92%, which ... |

315 |
Introduction to Mathematical Statistics
- Hogg, Craig
- 1995
(Show Context)
Citation Context ... suitable to model the distribution of the sparseness degrees. April 28, 2008 DRAFT 14 A standard approach for estimating the parameters of the gamma components Gl is the maximum likelihood technique =-=[21]-=-. The likelihood function is defined as LGl(αl, βl) = ∏ y∈Gl Gl(y, αl, βl) = βαlNll ΓNl(αl) ∏ y∈Gl yαl−1 exp(−βl ∑ y∈Gl y) (3) where Nl is the size of the lth component. The logarithm of the likelihoo... |

273 | Murty ‖ Data Clustering: - Jain, Flynn, et al. - 1999 |

266 | Toward Integrating Feature Selection Algorithms for Classification and Clustering
- Liu, Yu
- 2005
(Show Context)
Citation Context ...me the curse of dimensionality. The most informative dimensions are selected by eliminating irrelevant and redundant ones. Such techniques speed up clustering algorithms and improve their performance =-=[4]-=-. Nevertheless, in some applications, different clusters may exist in different subspaces spanned by different dimensions. In such cases, dimension reduction using a conventional feature selection tec... |

230 | Subspace clustering for high dimensional data: A review,”
- Parsons, Haque, et al.
- 2004
(Show Context)
Citation Context ...oncern of this paper is projected clustering, we will focus only on such techniques. Further details and a survey on subspace clustering algorithms and projected clustering algorithms can be found in =-=[17]-=- and [18]. III. THE ALGORITHM PCKA A. Problem Statement To describe our algorithm, we will introduce some notation and definitions. Let DB be a dataset of d-dimensional points, where the set of attrib... |

185 | Distance-Based Outlier : Algorithms and Applications.
- Knorr, Hg, et al.
- 2000
(Show Context)
Citation Context ... because they hinder the clustering process. A common approach to identify outliers is to analyze the relationship of each data point with the rest of the data, based on the concept of proximity [33] =-=[34]-=-. However, in high-dimensional spaces, the notion of proximity is not straightforward [3]. To overcome this problem, our outlier handling mechanism makes an efficient use of the properties of the bina... |

163 | Unsupervised feature selection using feature similarity
- Mitra, Murthy, et al.
- 2002
(Show Context)
Citation Context ... the other two datasets, it seems there are few studies in the literature that use this set to evaluate clustering algorithms. Most of the work on this dataset is related to classification algorithms =-=[43]-=- [44]. Hence, in order to have an intuitive idea about the behavior of a method that considers all dimensions in the clustering process, we have chosen to run the standard K-means on this data. The ac... |

162 | Entropy-based subspace clustering for mining numerical data,” in KDD,
- Cheng, Fu, et al.
- 1999
(Show Context)
Citation Context ...t is closely related to projected clustering is subspace clustering. CLIQUE [1] was the pioneering approach to subspace clustering, followed by a number of algorithms in the same field such as ENCLUS =-=[14]-=-, MAFIA [15] and SUBCLU [16]. The idea behind subspace clustering is to identify all dense regions in all subspaces, whereas in projected clustering, as the name implies, the main focus is on discover... |

104 | A Monte Carlo algorithm for fast projective clustering,”
- Procopiuc, Jones, et al.
- 2002
(Show Context)
Citation Context ...e aim is to determine how the running time scales with 1) the size and 2) the dimensionality of the dataset. We compare the performance of PCKA to that of SSPC [7], HARP [9] , PROCLUS [5] and FASTDOC =-=[10]-=-. The evaluation is performed on a number of generated data sets with different characteristics. Furthermore, experiments on real data sets are also presented. All the experiments reported in this sec... |

84 | MAFIA: efficient and scalable subspace clustering for very large data sets
- Goil, Nagesh, et al.
- 1999
(Show Context)
Citation Context ... related to projected clustering is subspace clustering. CLIQUE [1] was the pioneering approach to subspace clustering, followed by a number of algorithms in the same field such as ENCLUS [14], MAFIA =-=[15]-=- and SUBCLU [16]. The idea behind subspace clustering is to identify all dense regions in all subspaces, whereas in projected clustering, as the name implies, the main focus is on discovering clusters... |

68 | Density-connected subspace clustering for high-dimensional data,” in
- Kailing, Kriegel, et al.
- 2004
(Show Context)
Citation Context ...ected clustering is subspace clustering. CLIQUE [1] was the pioneering approach to subspace clustering, followed by a number of algorithms in the same field such as ENCLUS [14], MAFIA [15] and SUBCLU =-=[16]-=-. The idea behind subspace clustering is to identify all dense regions in all subspaces, whereas in projected clustering, as the name implies, the main focus is on discovering clusters that are projec... |

53 | Unsupervised learning using MML.
- Oliver, Baxter, et al.
- 1996
(Show Context)
Citation Context ...olves calculating an associated criterion and selecting the value of m which optimizes the criterion. A variety of approaches have been proposed to estimate the number of components in a dataset [23] =-=[24]-=-. In our method, we use a penalized likelihood criterion, called the Bayesian Information Criterion (BIC). BIC was first introduced by Schwartz [25] and is given by BIC(m) = −2Lm +Np log(N) (9) where ... |

46 | Outlier mining in large high dimensional datasets.
- Angiulli, Pizzuti
- 2005
(Show Context)
Citation Context ...nated because they hinder the clustering process. A common approach to identify outliers is to analyze the relationship of each data point with the rest of the data, based on the concept of proximity =-=[33]-=- [34]. However, in high-dimensional spaces, the notion of proximity is not straightforward [3]. To overcome this problem, our outlier handling mechanism makes an efficient use of the properties of the... |

31 |
A Primer on Statistical Distributions.
- Balakrishnan, Nevzorov
- 2004
(Show Context)
Citation Context ...onent Gl in equation (2) has two parameters: the shape parameter αl and the scale parameter βl. The shape parameter allows the distribution to take on a variety of shapes, depending on its value [19] =-=[20]-=-. When αl < 1, the distribution is highly skewed and is L-shaped. When αl = 1, we get the exponential distribution. In the case of αl > 1, the distribution has a peak (mode) in (αl − 1)/βl and skewed ... |

30 | Harp: a practical projected clustering algorithm,”
- Yip, Cheung, et al.
- 2004
(Show Context)
Citation Context ... it is clear that a similarity function that uses all dimensions misleads the relevant dimensions detection mechanism and adversely affect the performance of these algorithms. Another example is HARP =-=[9]-=-, a hierarchical projected clustering algorithm based on the assumption that two data points are likely to belong to the same cluster if they are very similar to each other along many dimensions. Howe... |

27 |
Statistical Distributions in Engineering,
- Bury
- 1999
(Show Context)
Citation Context ... component Gl in equation (2) has two parameters: the shape parameter αl and the scale parameter βl. The shape parameter allows the distribution to take on a variety of shapes, depending on its value =-=[19]-=- [20]. When αl < 1, the distribution is highly skewed and is L-shaped. When αl = 1, we get the exponential distribution. In the case of αl > 1, the distribution has a peak (mode) in (αl − 1)/βl and sk... |

21 |
Redefining clustering for high-dimensional applications
- Aggarwal, Yu
- 2002
(Show Context)
Citation Context ...difficulties in correctly identifying projected clusters. PROCLUS tends to classify a large number of data points as outliers. Similar April 28, 2008 DRAFT 32 behavior of PROCLUS was also observed in =-=[8]-=- and [9]. The same phenomenon occurs for FastDOC. We found that FastDOC performs well on datasets that contain dense projected clusters with the form of a hypercube. E. Scalability In this subsection,... |

20 | Comparing Subspace Clusterings
- Patrikainen, Meila
- 2006
(Show Context)
Citation Context ... Intel Core 2 Duo CPU of 2.4GHz and 4GB RAM. A. Performance Measures A number of new metrics for comparing projected clustering algorithms and subspace clustering algorithms were recently proposed in =-=[36]-=-. The performance measure used in our paper is the Clustering Error (CE) distance for projected/subspace clustering. This metric performs comparisons in a more objective way since it takes into accoun... |

19 | On discovery of extremely low-dimensional clusters using semisupervised projected clustering.
- Yip, Cheung, et al.
- 2005
(Show Context)
Citation Context ...s have been successful in discovering clusters in different subspaces, they encounter difficulties in identifying very low-dimensional projected clusters embedded in highdimensional space. Yip et al. =-=[7]-=- observed that current projected clustering algorithms provide meaningful results only when the dimensionalities of the clusters are not much lower than that of the dataset. For instance, some partiti... |

15 |
An approach for clustering gene expression data with error information." Bmc Bioinformatics 7
- Tjaden
- 2006
(Show Context)
Citation Context ...alignant. Saccharomyces Cerevisiae Gene Expression Data (SCGE): This data set, available from http://cs.wellesley.edu/∼btjaden/CORE, represents the supplementary material used in Brian Tjaden’s paper =-=[38]-=-. The Saccharomyces Cerevisiae data contains the expression level of 205 genes under 80 experiments. The data set is presented as a matrix. Each row corresponds to a gene and each column to an experim... |

14 | Iterative projected clustering by subspace mining,”
- Yiu, Mamoulis
- 2005
(Show Context)
Citation Context ...ng all relevant dimensions. In some types of data, however, clusters with different widths are more realistic. Another hypercube approach called FPC (Frequent-Pattern-based Clustering) is proposed in =-=[11]-=- to improve the efficiency of DOC. FPC replaces the randomized module of DOC with systematic search for the best cluster defined by a random medoid point p. In order to discover relevant dimensions fo... |

12 | A Unified View on Clustering Binary Data
- Li
- 2006
(Show Context)
Citation Context ...tween binary data points zi for (i = 1, . . . , N) in the matrix Z. Given two binary data points z1 and z2, there are four fundamental quantities that can be used to define similarity between the two =-=[35]-=-: a = |z1j = z2j = 1|, b = |z1j = 1 ∧ z2j = 0|, c = |z1j = 0 ∧ z2j = 1| and d = |z1j = z2j = 0|, where j = 1, . . . , d. One commonly used similarity measure for binary data is the Jaccard coefficient... |

11 | An Objective approach to cluster validation.
- Bouguessa, Wang, et al.
- 2006
(Show Context)
Citation Context ...p involves calculating an associated criterion and selecting the value of m which optimizes the criterion. A variety of approaches have been proposed to estimate the number of components in a dataset =-=[23]-=- [24]. In our method, we use a penalized likelihood criterion, called the Bayesian Information Criterion (BIC). BIC was first introduced by Schwartz [25] and is given by BIC(m) = −2Lm +Np log(N) (9) w... |

9 | Projective clustering by histograms.
- Ng, Fu, et al.
- 2005
(Show Context)
Citation Context ...he form of labeled data points and/or labeled dimensions is very limited and not usually available. A density-based algorithm named EPCH (Efficient Projective Clustering by Histograms) is proposed in =-=[12]-=- for projected clustering. EPCH performs projected clustering by histogram construction. By iteratively lowering a threshold, dense regions are identified in each histogram. A ”signature” is generated... |

6 |
A KMeans-Based Algorithm for Projective Clustering.
- Bouguessa, Wang, et al.
- 2006
(Show Context)
Citation Context ...of irregular shape. On the other hand, while EPCH avoids the computation of distance between data points in the full-dimensional space, it suffers from the curse of dimensionality. In our experiments =-=[13]-=-, we have observed that when February 5, 2007 DRAFT 8the dimensionality of the data space increases and the number of relevant dimensions for clusters decreases, the accuracy of EPCH is affected. A fi... |

4 | Identifying Projected Clusters from Gene Expression Profiles
- Yip, Cheung, et al.
(Show Context)
Citation Context ...hat it provides a mechanism to automatically determine relevant dimensions for each cluster and avoid the use of input parameters, whose values are difficult to set. In addition to this, the study in =-=[6]-=- illustrates that HARP provides interesting results on gene expression data. On the other hand, as mentioned in Section 1, it has been shown in [3] that, for a number of common data distribution, as d... |

1 | Unsupervised learning with normalised data and non-Euclidean norms
- Doherty, Adams, et al.
- 2007
(Show Context)
Citation Context ... comparisons and to confirm the suitability of our approach, we also analyzed the qualitative behavior of non-projected clustering algorithms on the datasets considered in this set of experiments. In =-=[39]-=- the standard K-means algorithms and three unsupervised competitive neural network algorithms – the neural gas network [40], the growing neural gas network [41] and the self-organizing feature map [42... |