Results 1  10
of
226
Databasefriendly Random Projections
, 2001
"... A classic result of Johnson and Lindenstrauss asserts that any set of n points in ddimensional Euclidean space can be embedded into kdimensional Euclidean space  where k is logarithmic in n and independent of d  so that all pairwise distances are maintained within an arbitrarily small factor. Al ..."
Abstract

Cited by 241 (3 self)
 Add to MetaCart
A classic result of Johnson and Lindenstrauss asserts that any set of n points in ddimensional Euclidean space can be embedded into kdimensional Euclidean space  where k is logarithmic in n and independent of d  so that all pairwise distances are maintained within an arbitrarily small factor. All known constructions of such embeddings involve projecting the n points onto a random kdimensional hyperplane. We give a novel construction of the embedding, suitable for database applications, which amounts to computing a simple aggregate over k random attribute partitions.
Random projection in dimensionality reduction: Applications to image and text data
 in Knowledge Discovery and Data Mining
, 2001
"... Random projections have recently emerged as a powerful method for dimensionality reduction. Theoretical results indicate that the method preserves distances quite nicely; however, empirical results are sparse. We present experimental results on using random projection as a dimensionality reduction t ..."
Abstract

Cited by 239 (0 self)
 Add to MetaCart
(Show Context)
Random projections have recently emerged as a powerful method for dimensionality reduction. Theoretical results indicate that the method preserves distances quite nicely; however, empirical results are sparse. We present experimental results on using random projection as a dimensionality reduction tool in a number of cases, where the high dimensionality of the data would otherwise lead to burdensome computations. Our application areas are the processing of both noisy and noiseless images, and information retrieval in text documents. We show that projecting the data onto a random lowerdimensional subspace yields results comparable to conventional dimensionality reduction methods such as principal component analysis: the similarity of data vectors is preserved well under random projection. However, using random projections is computationally signicantly less expensive than using, e.g., principal component analysis. We also show experimentally that using a sparse random matrix gives additional computational savings in random projection.
The Global KMeans Clustering Algorithm
, 2003
"... We present the global kmeans algorithm which is an incremental approach to clustering that dynamically adds one cluster center at a time through a deterministic global search procedure consisting of N (with N being the size of the data set) executions of the kmeans algorithm from suitable initial ..."
Abstract

Cited by 131 (6 self)
 Add to MetaCart
(Show Context)
We present the global kmeans algorithm which is an incremental approach to clustering that dynamically adds one cluster center at a time through a deterministic global search procedure consisting of N (with N being the size of the data set) executions of the kmeans algorithm from suitable initial positions. We also propose modifications of the method to reduce the computational load without significantly affecting solution quality. The proposed clustering methods are tested on wellknown data sets and they compare favorably to the kmeans algorithm with random restarts.
Toward privacy in public databases
, 2005
"... We initiate a theoretical study of the census problem. Informally, in a census individual respondents give private information to a trusted party (the census bureau), who publishes a sanitized version of the data. There are two fundamentally conflicting requirements: privacy for the respondents an ..."
Abstract

Cited by 107 (11 self)
 Add to MetaCart
(Show Context)
We initiate a theoretical study of the census problem. Informally, in a census individual respondents give private information to a trusted party (the census bureau), who publishes a sanitized version of the data. There are two fundamentally conflicting requirements: privacy for the respondents and utility of the sanitized data. Unlike in the study of secure function evaluation, in which privacy is preserved to the extent possible given a specific functionality goal, in the census problem privacy is paramount; intuitively, things that cannot be learned “safely ” should not be learned at all. An important contribution of this work is a definition of privacy (and privacy compromise) for statistical databases, together with a method for describing and comparing the privacy offered by specific sanitization techniques. We obtain several privacy results using two different sanitization techniques, and then show how to combine them via cross training. We also obtain two utility results involving clustering.
Mixture Density Estimation
 IN ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 12
, 1999
"... Gaussian mixtures (or socalled radial basis function networks) for density estimation provide a natural counterpart to sigmoidal neural networks for function fitting and approximation. In both cases, it is possible to give simple expressions for the iterative improvement of performance as component ..."
Abstract

Cited by 85 (2 self)
 Add to MetaCart
(Show Context)
Gaussian mixtures (or socalled radial basis function networks) for density estimation provide a natural counterpart to sigmoidal neural networks for function fitting and approximation. In both cases, it is possible to give simple expressions for the iterative improvement of performance as components of the network are introduced one at a time. In particular, for mixture density estimation we show that a kcomponent mixture estimated by maximum likelihood (or by an iterative likelihood improvement that we introduce) achieves loglikelihood within order 1/k of the loglikelihood achievable by any convex combination. Consequences for approximation and estimation using KullbackLeibler risk are also given. A Minimum Description Length principle selects the optimal number of components k that minimizes the risk bound.
On Spectral Learning of Mixtures of Distributions
"... We consider the problem of learning mixtures of distributions via spectral methods and derive a tight characterization of when such methods are useful. Specifically, given a mixturesample, let i , C i , w i denote the empirical mean, covariance matrix, and mixing weight of the ith component. We ..."
Abstract

Cited by 79 (0 self)
 Add to MetaCart
We consider the problem of learning mixtures of distributions via spectral methods and derive a tight characterization of when such methods are useful. Specifically, given a mixturesample, let i , C i , w i denote the empirical mean, covariance matrix, and mixing weight of the ith component. We prove that a very simple algorithm, namely spectral projection followed by singlelinkage clustering, properly classifies every point in the sample when each i is separated from all j by 2 (1/w i +1/w j ) plus a term that depends on the concentration properties of the distributions in the mixture. This second term is very small for many distributions, including Gaussians, Logconcave, and many others. As a result, we get the best known bounds for learning mixtures of arbitrary Gaussians in terms of the required mean separation. On the other hand, we prove that given any k means i and mixing weights w i , there are (many) sets of matrices C i such that each i is separated from all j by 2 (1/w i + 1/w j ) , but applying spectral projection to the corresponding Gaussian mixture causes it to collapse completely, i.e., all means and covariance matrices in the projected mixture are identical.
MultiView Clustering via Canonical Correlation Analysis
"... Clustering data in highdimensions is believed to be a hard problem in general. A number of efficient clustering algorithms developed in recent years address this problem by projecting the data into a lowerdimensional subspace, e.g. via Principal Components Analysis (PCA) or random projections, bef ..."
Abstract

Cited by 75 (6 self)
 Add to MetaCart
(Show Context)
Clustering data in highdimensions is believed to be a hard problem in general. A number of efficient clustering algorithms developed in recent years address this problem by projecting the data into a lowerdimensional subspace, e.g. via Principal Components Analysis (PCA) or random projections, before clustering. Such techniques typically require stringent requirements on the separation between the cluster means (in order for the algorithm to be be successful). Here, we show how using multiple views of the data can relax these stringent requirements. We use Canonical Correlation Analysis (CCA) to project the data in each view to a lowerdimensional subspace. Under the assumption that conditioned on the cluster label the views are uncorrelated, we show that the separation conditions required for the algorithm to be successful are rather mild (significantly weaker than those of prior results in the literature). We provide results for mixture of The multiview approach to learning is one in which we have ‘views ’ of the data (sometimes in a rather abstract sense) and, if we understand the underlying relationship between these views, the hope is that this relationship can be used to alleviate the difficulty of a learning problem of interest [BM98, KF07, AZ07]. In this work, we explore how having ‘two views ’ of the data makes
Efficient greedy learning of Gaussian mixture models
 Neural Computation
, 2003
"... This paper concerns the greedy learning of Gaussian mixtures. In the greedy approach, mixture components are inserted into the mixture one after the other. We propose a heuristic for searching for the optimal component to insert. In a randomized manner a set of candidate new components is generated. ..."
Abstract

Cited by 74 (7 self)
 Add to MetaCart
(Show Context)
This paper concerns the greedy learning of Gaussian mixtures. In the greedy approach, mixture components are inserted into the mixture one after the other. We propose a heuristic for searching for the optimal component to insert. In a randomized manner a set of candidate new components is generated. For each of these candidates we find the locally optimal new component. The best local optimum is then inserted into the existing mixture. The resulting algorithm resolves the sensitivity to initialization of stateoftheart methods, like EM, and has running time linear in the number of data points and quadratic in the (final) number of mixture components. Due to its greedy nature the algorithm can be particularly useful when the optimal number of mixture components is unknown. Experimental results comparing the proposed algorithm to other methods on density estimation and texture segmentation are provided.
Tensor decompositions for learning latent variable models
, 2014
"... This work considers a computationally and statistically efficient parameter estimation method for a wide class of latent variable models—including Gaussian mixture models, hidden Markov models, and latent Dirichlet allocation—which exploits a certain tensor structure in their loworder observable mo ..."
Abstract

Cited by 72 (5 self)
 Add to MetaCart
(Show Context)
This work considers a computationally and statistically efficient parameter estimation method for a wide class of latent variable models—including Gaussian mixture models, hidden Markov models, and latent Dirichlet allocation—which exploits a certain tensor structure in their loworder observable moments (typically, of second and thirdorder). Specifically, parameter estimation is reduced to the problem of extracting a certain (orthogonal) decomposition of a symmetric tensor derived from the moments; this decomposition can be viewed as a natural generalization of the singular value decomposition for matrices. Although tensor decompositions are generally intractable to compute, the decomposition of these specially structured tensors can be efficiently obtained by a variety of approaches, including power iterations and maximization approaches (similar to the case of matrices). A detailed analysis of a robust tensor power method is provided, establishing an analogue of Wedin’s perturbation theorem for the singular vectors of matrices. This implies a robust and computationally tractable estimation approach for several popular latent variable models.