Results 1 - 10
of
98
Data Clustering: A Review
- ACM COMPUTING SURVEYS
, 1999
"... Clustering is the unsupervised classification of patterns (observations, data items, or feature vectors) into groups (clusters). The clustering problem has been addressed in many contexts and by researchers in many disciplines; this reflects its broad appeal and usefulness as one of the steps in exp ..."
Abstract
-
Cited by 912 (9 self)
- Add to MetaCart
Clustering is the unsupervised classification of patterns (observations, data items, or feature vectors) into groups (clusters). The clustering problem has been addressed in many contexts and by researchers in many disciplines; this reflects its broad appeal and usefulness as one of the steps in exploratory data analysis. However, clustering is a difficult problem combinatorially, and differences in assumptions and contexts in different communities has made the transfer of useful generic concepts and methodologies slow to occur. This paper presents an overview of pattern clustering methods from a statistical pattern recognition perspective, with a goal of providing useful advice and references to fundamental concepts accessible to the broad community of clustering practitioners. We present a taxonomy of clustering techniques, and identify cross-cutting themes and recent advances. We also describe some important applications of clustering algorithms such as image segmentation, object recognition, and information retrieval.
Quantization
- IEEE TRANS. INFORM. THEORY
, 1998
"... The history of the theory and practice of quantization dates to 1948, although similar ideas had appeared in the literature as long ago as 1898. The fundamental role of quantization in modulation and analog-to-digital conversion was first recognized during the early development of pulsecode modula ..."
Abstract
-
Cited by 515 (10 self)
- Add to MetaCart
The history of the theory and practice of quantization dates to 1948, although similar ideas had appeared in the literature as long ago as 1898. The fundamental role of quantization in modulation and analog-to-digital conversion was first recognized during the early development of pulsecode modulation systems, especially in the 1948 paper of Oliver, Pierce, and Shannon. Also in 1948, Bennett published the first high-resolution analysis of quantization and an exact analysis of quantization noise for Gaussian processes, and Shannon published the beginnings of rate distortion theory, which would provide a theory for quantization as analog-to-digital conversion and as data compression. Beginning with these three papers of fifty years ago, we trace the history of quantization from its origins through this decade, and we survey the fundamentals of the theory and many of the popular and promising techniques for quantization.
Consistency of spectral clustering
, 2004
"... Consistency is a key property of statistical algorithms, when the data is drawn from some underlying probability distribution. Surprisingly, despite decades of work, little is known about consistency of most clustering algorithms. In this paper we investigate consistency of a popular family of spe ..."
Abstract
-
Cited by 170 (11 self)
- Add to MetaCart
Consistency is a key property of statistical algorithms, when the data is drawn from some underlying probability distribution. Surprisingly, despite decades of work, little is known about consistency of most clustering algorithms. In this paper we investigate consistency of a popular family of spectral clustering algorithms, which cluster the data with the help of eigenvectors of graph Laplacian matrices. We show that one of the two of major classes of spectral clustering (normalized clustering) converges under some very general conditions, while the other (unnormalized), is only consistent under strong additional assumptions, which, as we demonstrate, are not always satisfied in real data. We conclude that our analysis provides strong evidence for the superiority of normalized spectral clustering in practical applications. We believe that methods used in our analysis will provide a basis for future exploration of Laplacian-based methods in a statistical setting.
Automated Construction Of Classifications Conceptual Clustering Versus Numerical Taxonomy
, 1983
"... A method for automated construction of classifications called conceptual clustering is described and compared to methods used in numerical taxonomy. This method arranges objects into classes rep- resenting certain descriptive concepts, rather than into.classes defined solely by a similarity metric i ..."
Abstract
-
Cited by 85 (10 self)
- Add to MetaCart
A method for automated construction of classifications called conceptual clustering is described and compared to methods used in numerical taxonomy. This method arranges objects into classes rep- resenting certain descriptive concepts, rather than into.classes defined solely by a similarity metric in some a priori defined attribute space. A specific form of the method is conjunctive conceptual clustering, in which descriptive concepts are conjunetive statements involving rela- tions on selected object attributes and optimized aeeording to an assumed global criterion of clustering quality. The method, implemented in program CLUSTER/2, is tested together with 18 numerical taxonomy methods on two exemplary problems: 1) a construction of a classification of popular microcomputers and 2) the reconstruction of a classification of selected plant disease categories. In both experiments, the majority of numerical taxonomy methods (14 out of 18) produced results which were difficult to interpret and seemed to be arbitrary. In contrast to this, the conceptual clustering method produced results that had a simple interpretation and corresponded well to solutions pre- ferred by people.
Dimensionality reduction for supervised learning with reproducing kernel Hilbert spaces
- Journal of Machine Learning Research
, 2004
"... We propose a novel method of dimensionality reduction for supervised learning problems. Given a regression or classification problem in which we wish to predict a response variable Y from an explanatory variable X, we treat the problem of dimensionality reduction as that of finding a low-dimensional ..."
Abstract
-
Cited by 79 (23 self)
- Add to MetaCart
We propose a novel method of dimensionality reduction for supervised learning problems. Given a regression or classification problem in which we wish to predict a response variable Y from an explanatory variable X, we treat the problem of dimensionality reduction as that of finding a low-dimensional “effective subspace ” for X which retains the statistical relationship between X and Y. We show that this problem can be formulated in terms of conditional independence. To turn this formulation into an optimization problem we establish a general nonparametric characterization of conditional independence using covariance operators on reproducing kernel Hilbert spaces. This characterization allows us to derive a contrast function for estimation of the effective subspace. Unlike many conventional methods for dimensionality reduction in supervised learning, the proposed method requires neither assumptions on the marginal distribution of X, nor a parametric model of the conditional distribution of Y. We present experiments that compare the performance of the method with conventional methods.
Broad-band fading channels: signal burstiness and capacity
- IEEE Trans. Inform. Theory
, 2002
"... Abstract—Médard and Gallager recently showed that very large bandwidths on certain fading channels cannot be effectively used by direct sequence or related spread-spectrum systems. This paper complements the work of Médard and Gallager. First, it is shown that a key information-theoretic inequality ..."
Abstract
-
Cited by 54 (5 self)
- Add to MetaCart
Abstract—Médard and Gallager recently showed that very large bandwidths on certain fading channels cannot be effectively used by direct sequence or related spread-spectrum systems. This paper complements the work of Médard and Gallager. First, it is shown that a key information-theoretic inequality of Médard and Gallager can be directly derived using the theory of capacity per unit cost, for a certain fourth-order cost function, called fourthegy. This provides insight into the tightness of the bound. Secondly, the bound is explored for a wide-sense-stationary uncorrelated scattering (WSSUS) fading channel, which entails mathematically defining such a channel. In this context, the fourthegy can be expressed using the ambiguity function of the input signal. Finally, numerical data and conclusions are presented for direct-sequence type input signals. Index Terms—Channel capacity, fading channels, spread spectrum, wide-sense-stationary uncorrelated scattering (WSSUS) fading channels. I.
Multiple Regimes in Northern Hemisphere Height Fields via Mixture Model Clustering
- J. Atmos. Sci
, 1998
"... Mixture model clustering is applied to Northern Hemisphere (NH) 700-mb geopotential height anomalies. A mixture model is a flexible probability density estimation technique, consisting of a linear combination of k component densities. A key feature of the mixture modeling approach to clustering is t ..."
Abstract
-
Cited by 37 (24 self)
- Add to MetaCart
Mixture model clustering is applied to Northern Hemisphere (NH) 700-mb geopotential height anomalies. A mixture model is a flexible probability density estimation technique, consisting of a linear combination of k component densities. A key feature of the mixture modeling approach to clustering is the ability to estimate a posterior probability distribution for k, the number of clusters, given the data and the model, and thus objectively determine the number of clusters that is most likely to fit the data. A data set of 44 winters of NH 700-mb fields is projected onto its two leading empirical orthogonal functions (EOFs) and analyzed using mixtures of Gaussian components. Crossvalidated likelihood is used to determine the best value of k, the number of clusters. The posterior probability so determined peaks at k = 3 and thus yields clear evidence for 3 clusters in the NH 700-mb data. The 3-cluster result is found to be robust with respect to variations in data preprocessing and data an...
Kernel measures of conditional dependence
- In Adv. NIPS
, 2008
"... We propose a new measure of conditional dependence of random variables, based on normalized cross-covariance operators on reproducing kernel Hilbert spaces. Unlike previous kernel dependence measures, the proposed criterion does not depend on the choice of kernel in the limit of infinite data, for a ..."
Abstract
-
Cited by 31 (24 self)
- Add to MetaCart
We propose a new measure of conditional dependence of random variables, based on normalized cross-covariance operators on reproducing kernel Hilbert spaces. Unlike previous kernel dependence measures, the proposed criterion does not depend on the choice of kernel in the limit of infinite data, for a wide class of kernels. At the same time, it has a straightforward empirical estimate with good convergence behaviour. We discuss the theoretical properties of the measure, and demonstrate its application in experiments. 1
Intertwining Multiresolution Analyses and the Construction of Piecewise Polynomial Wavelets
, 1994
"... Let (Vp ) be a local multiresolution analysis (MRA) of L 2 (R) of multiplicity r 1, i.e., V0 is generated by r compactly supported scaling functions. If the scaling functions generate an orthogonal basis of V0 then (Vp) is called an orthogonal MRA. We prove that there exists an orthogonal local M ..."
Abstract
-
Cited by 24 (6 self)
- Add to MetaCart
Let (Vp ) be a local multiresolution analysis (MRA) of L 2 (R) of multiplicity r 1, i.e., V0 is generated by r compactly supported scaling functions. If the scaling functions generate an orthogonal basis of V0 then (Vp) is called an orthogonal MRA. We prove that there exists an orthogonal local MRA (V 0 p ) of multiplicity r 0 such that Vq ae V 0 0 ae Vq+n for some integers q 0, n 1 and r 0 ? 1. In particular, this shows that compactly supported orthogonal polynomial spline wavelets and scaling functions (of mulitplicity r 0 ? 1) of arbitrary regularity exist and we give several such examples. 1 Introduction The starting point for most wavelet constructions is a single function OE 2 L 2 (R) called a scaling function whose integer translates form a Riesz basis for a closed linear subspace V 0 ae L 2 (R). If the scaling function is compactly supported and generates an orthogonal basis of V 0 , then the associated wavelet will also be compactly supported and generate...
Injective hilbert space embeddings of probability measures
- In COLT
, 2008
"... A Hilbert space embedding for probability measures has recently been proposed, with applications including dimensionality reduction, homogeneity testing and independence testing. This embedding represents any probability measure as a mean element in a reproducing kernel Hilbert space (RKHS). The emb ..."
Abstract
-
Cited by 23 (19 self)
- Add to MetaCart
A Hilbert space embedding for probability measures has recently been proposed, with applications including dimensionality reduction, homogeneity testing and independence testing. This embedding represents any probability measure as a mean element in a reproducing kernel Hilbert space (RKHS). The embedding function has been proven to be injective when the reproducing kernel is universal. In this case, the embedding induces a metric on the space of probability distributions defined on compact metric spaces. In the present work, we consider more broadly the problem of specifying characteristic kernels, defined as kernels for which the RKHS embedding of probability measures is injective. In particular, characteristic kernels can include non-universal kernels. We restrict ourselves to translation-invariant kernels on Euclidean space, and define the associated metric on probability measures in terms of the Fourier spectrum of the kernel and characteristic functions of these measures. The support of the kernel spectrum is important in finding whether a kernel is characteristic: in particular, the embedding is injective if and only if the kernel spectrum has the entire domain as its support. Characteristic kernels may nonetheless have difficulty in distinguishing certain distributions on the basis of finite samples, again due to the interaction of the kernel spectrum and the characteristic functions of the measures. 1

