Results 1  10
of
28
Automatic Subspace Clustering of High Dimensional Data
 Data Mining and Knowledge Discovery
, 2005
"... Data mining applications place special requirements on clustering algorithms including: the ability to find clusters embedded in subspaces of high dimensional data, scalability, enduser comprehensibility of the results, nonpresumption of any canonical data distribution, and insensitivity to the or ..."
Abstract

Cited by 561 (12 self)
 Add to MetaCart
Data mining applications place special requirements on clustering algorithms including: the ability to find clusters embedded in subspaces of high dimensional data, scalability, enduser comprehensibility of the results, nonpresumption of any canonical data distribution, and insensitivity to the order of input records. We present CLIQUE, a clustering algorithm that satisfies each of these requirements. CLIQUE identifies dense clusters in subspaces of maximum dimensionality. It generates cluster descriptions in the form of DNF expressions that are minimized for ease of comprehension. It produces identical results irrespective of the order in which input records are presented and does not presume any specific mathematical form for data distribution. Through experiments, we show that CLIQUE efficiently finds accurate clusters in large high dimensional datasets.
Comparison of texture features based on gabor filters
 IEEE Trans. on Image Processing
"... Abstract—Texture features that are based on the local power spectrum obtained by a bank of Gabor filters are compared. The features differ in the type of nonlinear postprocessing which is applied to the local power spectrum. The following features are considered: Gabor energy, complex moments, and ..."
Abstract

Cited by 101 (5 self)
 Add to MetaCart
Abstract—Texture features that are based on the local power spectrum obtained by a bank of Gabor filters are compared. The features differ in the type of nonlinear postprocessing which is applied to the local power spectrum. The following features are considered: Gabor energy, complex moments, and grating cell operator features. The capability of the corresponding operators to produce distinct feature vector clusters for different textures is compared using two methods: the Fisher criterion and the classification result comparison. Both methods give consistent results. The grating cell operator gives the best discrimination and segmentation results. The texture detection capabilities of the operators and their robustness to nontexture features are also compared. The grating cell operator is the only one that selectively responds only to texture and does not give false response to nontexture features such as object contours. Index Terms—Classification, complex moments, discrimination,
Unsupervised Learning from Dyadic Data
, 1998
"... Dyadic data refers to a domain with two finite sets of objects in which observations are made for dyads, i.e., pairs with one element from either set. This includes event cooccurrences, histogram data, and single stimulus preference data as special cases. Dyadic data arises naturally in many applic ..."
Abstract

Cited by 100 (9 self)
 Add to MetaCart
Dyadic data refers to a domain with two finite sets of objects in which observations are made for dyads, i.e., pairs with one element from either set. This includes event cooccurrences, histogram data, and single stimulus preference data as special cases. Dyadic data arises naturally in many applications ranging from computational linguistics and information retrieval to preference analysis and computer vision. In this paper, we present a systematic, domainindependent framework for unsupervised learning from dyadic data by statistical mixture models. Our approach covers different models with flat and hierarchical latent class structures and unifies probabilistic modeling and structure discovery. Mixture models provide both, a parsimonious yet flexible parameterization of probability distributions with good generalization performance on sparse data, as well as structural information about datainherent grouping structure. We propose an annealed version of the standard Expectation Maximization algorithm for model fitting which is empirically evaluated on a variety of data sets from different domains.
Range Queries in OLAP Data Cubes
 In Proceedings of the 1997 ACM SIGMOD International Conference on Management of Data
, 1997
"... A range query applies an aggregation operation over all selected cells of an OLAP data cube where the selection is specified by providing ranges of values for numeric dimensions. We present fast algorithms for range queries for two types of aggregation operations: SUM and MAX. These two operations c ..."
Abstract

Cited by 59 (1 self)
 Add to MetaCart
A range query applies an aggregation operation over all selected cells of an OLAP data cube where the selection is specified by providing ranges of values for numeric dimensions. We present fast algorithms for range queries for two types of aggregation operations: SUM and MAX. These two operations cover techniques required for most popular aggregation operations, such as those supported by SQL. For rangesum queries, the essential idea is to precompute some auxiliary information (prefix sums) that is used to answer ad hoc queries at runtime. By maintaining auxiliary information which is of the same size as the data cube, all range queries for a given cube can be answered in constant time, irrespective of the size of the subcube circumscribed by a query. Alternatively, one can keep auxiliary information which is 1/b d of the size of the ddimensional data cube. Response to a range query may now require access to some cells of the data cube in addition to the access to the auxiliary ...
Exact and Approximation Algorithms for Clustering
, 1997
"... In this paper we present a n O(k 1\Gamma1=d ) time algorithm for solving the kcenter problem in R d , under L1 and L 2 metrics. The algorithm extends to other metrics, and can be used to solve the discrete kcenter problem, as well. We also describe a simple (1 + ffl)approximation algorith ..."
Abstract

Cited by 57 (5 self)
 Add to MetaCart
In this paper we present a n O(k 1\Gamma1=d ) time algorithm for solving the kcenter problem in R d , under L1 and L 2 metrics. The algorithm extends to other metrics, and can be used to solve the discrete kcenter problem, as well. We also describe a simple (1 + ffl)approximation algorithm for the kcenter problem, with running time O(n log k) + (k=ffl) O(k 1\Gamma1=d ) . Finally, we present a n O(k 1\Gamma1=d ) time algorithm for solving the Lcapacitated kcenter problem, provided that L = \Omega\Gamma n=k 1\Gamma1=d ) or L = O(1). We conclude with a simple approximation algorithm for the Lcapacitated kcenter problem. The work on this paper was partially supported by a National Science Foundation Grant CCR9301259, by an Army Research Office MURI grant DAAH049610013, by a Sloan fellowship, by an NYI award and matching funds from Xerox Corporation, and by a grant from the U.S.Israeli Binational Science Foundation. y Department of Computer Science, Box ...
Pattern Recognition in Images By Symmetries and Coordinate Transformations
, 1997
"... A theory for detecting general curve families by means of symmetry measurements in the coordinate transformed originals is presented. Symmetries are modeled by isogray curves of conjugate harmonic function pairs which also define the coordinate transformations. Harmonic function pair coordinates re ..."
Abstract

Cited by 23 (4 self)
 Add to MetaCart
A theory for detecting general curve families by means of symmetry measurements in the coordinate transformed originals is presented. Symmetries are modeled by isogray curves of conjugate harmonic function pairs which also define the coordinate transformations. Harmonic function pair coordinates render the target curve patterns as parallel lines, which is defined here as linear symmetry. Detecting these lines, or generalized linear symmetry fitting as it will be called, corresponds to finding invariants of Lie groups of transformations. A technique based on least square error minimization for estimating the invariance parameters is presented. It uses the Lie infinitesimal operators to construct feature extraction methods that are efficient and simple to implement. The technique, which is shown to be an extension of the generalized Hough transform, enables detection by voting and accumulating evidence for the searched pattern. In this approach complex valued votes are permitted, where ...
Histogram Clustering for Unsupervised Image Segmentation
 Proceedings of CVPR ’99
, 1999
"... This paper introduces a novel statistical mixturemodel for probabilistic grouping of distributional (histogram) data. Adopting the Bayesian framework, we propose to perform annealed maximum a posteriori estimation to compute optimal clustering solutions. In order to accelerate the optimization proce ..."
Abstract

Cited by 23 (1 self)
 Add to MetaCart
This paper introduces a novel statistical mixturemodel for probabilistic grouping of distributional (histogram) data. Adopting the Bayesian framework, we propose to perform annealed maximum a posteriori estimation to compute optimal clustering solutions. In order to accelerate the optimization process, an efficient multiscale formulation is developed. We present a prototypical application of this method for the unsupervised segmentation of textured images based on local distributions of Gabor coefficients. Benchmark results indicate superior performance compared to Kmeans clustering and proximitybased algorithms.
Texture Boundary Detection for RealTime Tracking
 In European Conference on Computer Vision
, 2004
"... We propose an approach to texture boundary detection that only requires a linesearch in the direction normal to the edge. It is therefore very fast and can be incorporated into a realtime 3D pose estimation algorithm that retains the speed of those that rely solely on gradient properties alon ..."
Abstract

Cited by 21 (5 self)
 Add to MetaCart
We propose an approach to texture boundary detection that only requires a linesearch in the direction normal to the edge. It is therefore very fast and can be incorporated into a realtime 3D pose estimation algorithm that retains the speed of those that rely solely on gradient properties along object contours but does not fail in the presence of highly textured object and clutter.
Centered Pyramids
 IEEE TRANSACTIONS ON IMAGE PROCESSING
, 1999
"... Quadtreelike pyramids have the advantage of resulting in a multiresolution representation where each pyramid node has four unambiguous parents. Such a centered topology guarantees a clearly defined upprojection of labels. This concept has been successfully and extensively used in applications of c ..."
Abstract

Cited by 10 (5 self)
 Add to MetaCart
Quadtreelike pyramids have the advantage of resulting in a multiresolution representation where each pyramid node has four unambiguous parents. Such a centered topology guarantees a clearly defined upprojection of labels. This concept has been successfully and extensively used in applications of contour detection, object recognition and segmentation. Unfortunately, the quadtreelike type of pyramid has poor approximation powers because of the employed piecewiseconstant image model. This paper deals with the construction of improved centered image pyramids in terms of general approximation functions. The advantages of the centered topology such a symmetry, consistent boundary conditions and accurate upprojection of labels are combined with a more faithful image representation at coarser pyramid levels. We start by introducing a general framework for the design of least squares pyramids using the standard filtering and decimation tools. We give the most general explicit formulas for the computation of the filter coefficients by any (well behaving) approximation function in both the continuous (L2 ) and the discrete (l 2 ) norm. We then define centered pyramids and provide the filter coefficients for odd spline approximation functions. Finally, we compare the centered pyramid to the ordinary one and highlight some applications.
Hyperrectanglebased discriminative data generalization and applications in data mining
, 2007
"... The ultimate goal of data mining is to extract knowledge from massive data. Knowledge is ideally represented as humancomprehensible patterns from which endusers can gain intuitions and insights. Axisparallel hyperrectangles provide interpretable generalizations for multidimensional data points ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
The ultimate goal of data mining is to extract knowledge from massive data. Knowledge is ideally represented as humancomprehensible patterns from which endusers can gain intuitions and insights. Axisparallel hyperrectangles provide interpretable generalizations for multidimensional data points with numerical attributes. In this dissertation, we study the fundamental problem of rectanglebased discriminative data generalization in the context of several useful data mining applications: cluster description, rule learning, and Nearest Rectangle classification. Clustering is one of the most important data mining tasks. However, most clustering methods output sets of points as clusters and do not generalize them into interpretable patterns. We perform a systematic study of cluster description, where we propose novel description formats leading to enhanced expressive power and introduce novel description problems specifying different tradeoffs between interpretability and accuracy. We also present efficient heuristic algorithms for the introduced problems in the proposed formats. Ifthen rules are