Results 1  10
of
158
Centroidal Voronoi tessellations: Applications and algorithms
 SIAM Rev
, 1999
"... Abstract. A centroidal Voronoi tessellation is a Voronoi tessellation whose generating points are the centroids (centers of mass) of the corresponding Voronoi regions. We give some applications of such tessellations to problems in image compression, quadrature, finite difference methods, distributio ..."
Abstract

Cited by 237 (25 self)
 Add to MetaCart
Abstract. A centroidal Voronoi tessellation is a Voronoi tessellation whose generating points are the centroids (centers of mass) of the corresponding Voronoi regions. We give some applications of such tessellations to problems in image compression, quadrature, finite difference methods, distribution of resources, cellular biology, statistics, and the territorial behavior of animals. We discuss methods for computing these tessellations, provide some analyses concerning both the tessellations and the methods for their determination, and, finally, present the results of some numerical experiments.
Self Organization of a Massive Document Collection
 IEEE Transactions on Neural Networks
"... This article describes the implementation of a system that is able to organize vast document collections according to textual similarities. It is based on the SelfOrganizing Map (SOM) algorithm. As the feature vectors for the documents we use statistical representations of their vocabularies. The m ..."
Abstract

Cited by 204 (14 self)
 Add to MetaCart
This article describes the implementation of a system that is able to organize vast document collections according to textual similarities. It is based on the SelfOrganizing Map (SOM) algorithm. As the feature vectors for the documents we use statistical representations of their vocabularies. The main goal in our work has been to scale up the SOM algorithm to be able to deal with large amounts of highdimensional data. In a practical experiment we mapped 6,840,568 patent abstracts onto a 1,002,240node SOM. As the feature vectors we used 500dimensional vectors of stochastic figures obtained as random projections of weighted word histograms. Keywords Data mining, exploratory data analysis, knowledge discovery, large databases, parallel implementation, random projection, SelfOrganizing Map (SOM), textual documents. I. Introduction A. From simple searches to browsing of selforganized data collections Locating documents on the basis of keywords and simple search expressions is a c...
Clustering of the SelfOrganizing Map
, 2000
"... The selforganizing map (SOM) is an excellent tool in exploratory phase of data mining. It projects input space on prototypes of a lowdimensional regular grid that can be effectively utilized to visualize and explore properties of the data. When the number of SOM units is large, to facilitate quant ..."
Abstract

Cited by 159 (1 self)
 Add to MetaCart
The selforganizing map (SOM) is an excellent tool in exploratory phase of data mining. It projects input space on prototypes of a lowdimensional regular grid that can be effectively utilized to visualize and explore properties of the data. When the number of SOM units is large, to facilitate quantitative analysis of the map and the data, similar units need to be grouped, i.e., clustered. In this paper, different approaches to clustering of the SOM are considered. In particular, the use of hierarchical agglomerative clustering and partitive clustering usingmeans are investigated. The twostage procedurefirst using SOM to produce the prototypes that are then clustered in the second stageis found to perform well when compared with direct clustering of the data and to reduce the computation time.
Data Exploration Using SelfOrganizing Maps
 ACTA POLYTECHNICA SCANDINAVICA: MATHEMATICS, COMPUTING AND MANAGEMENT IN ENGINEERING SERIES NO. 82
, 1997
"... Finding structures in vast multidimensional data sets, be they measurement data, statistics, or textual documents, is difficult and timeconsuming. Interesting, novel relations between the data items may be hidden in the data. The selforganizing map (SOM) algorithm of Kohonen can be used to aid the ..."
Abstract

Cited by 96 (4 self)
 Add to MetaCart
Finding structures in vast multidimensional data sets, be they measurement data, statistics, or textual documents, is difficult and timeconsuming. Interesting, novel relations between the data items may be hidden in the data. The selforganizing map (SOM) algorithm of Kohonen can be used to aid the exploration: the structures in the data sets can be illustrated on special map displays. In this work, the methodology of using SOMs for exploratory data analysis or data mining is reviewed and developed further. The properties of the maps are compared with the properties of related methods intended for visualizing highdimensional multivariate data sets. In a set of case studies the SOM algorithm is applied to analyzing electroencephalograms, to illustrating structures of the standard of living in the world, and to organizing fulltext document collections. Measures are proposed for evaluating the quality of different types of maps in representing a given data set, and for measuring the robu...
Clustering Based on Conditional Distributions in an Auxiliary Space
 Neural Computation
, 2001
"... We study the problem of learning groups or categories that are local ..."
Abstract

Cited by 79 (22 self)
 Add to MetaCart
We study the problem of learning groups or categories that are local
On Lattice Quantization Noise
 IEEE Trans. Inform. Theory
, 1996
"... Abstract We present several results regarding the properties of a random vector, uniformly distributed over a lattice cell. This random vector is the quantization noise of a lattice quantizer at high resolution, or the noise of a dithered lattice quantizer at all distortion levels. We find that for ..."
Abstract

Cited by 73 (20 self)
 Add to MetaCart
Abstract We present several results regarding the properties of a random vector, uniformly distributed over a lattice cell. This random vector is the quantization noise of a lattice quantizer at high resolution, or the noise of a dithered lattice quantizer at all distortion levels. We find that for the optimal lattice quantizers this noise is widesensestationary and white. Any desirable noise spectra may be realized by an appropriate linear transformation (“shaping”) of a lattice quantizer. As the dimension increases, the normalized second.moment of the optimal lattice quantizer goes to 1/2xe, and consequently the quantization noise approaches a white Gaussian process in the divergence sense. In entropycoded dithered quantization, which can be modeled accurately as passing the source through an additive noise channel, this limit behavior implies that for large lattice dimension both the error and the bit rate approach the error and the information rate of an Additive White Gaussian Noise (AWGN) channel. Index TermsLattice, quantization noise, shaping, normalized second moment, divergence from Gaussianity. I I.
Geodesic entropic graphs for dimension and entropy estimation in manifold learning
 IEEE Trans. on Signal Processing
, 2004
"... Abstract—In the manifold learning problem, one seeks to discover a smooth low dimensional surface, i.e., a manifold embedded in a higher dimensional linear vector space, based on a set of measured sample points on the surface. In this paper, we consider the closely related problem of estimating the ..."
Abstract

Cited by 66 (4 self)
 Add to MetaCart
Abstract—In the manifold learning problem, one seeks to discover a smooth low dimensional surface, i.e., a manifold embedded in a higher dimensional linear vector space, based on a set of measured sample points on the surface. In this paper, we consider the closely related problem of estimating the manifold’s intrinsic dimension and the intrinsic entropy of the sample points. Specifically, we view the sample points as realizations of an unknown multivariate density supported on an unknown smooth manifold. We introduce a novel geometric approach based on entropic graph methods. Although the theory presented applies to this general class of graphs, we focus on the geodesicminimalspanningtree (GMST) to obtaining asymptotically consistent estimates of the manifold dimension and the Rényientropy of the sample density on the manifold. The GMST approach is striking in its simplicity and does not require reconstruction of the manifold or estimation of the multivariate density of the samples. The GMST method simply constructs a minimal spanning tree (MST) sequence using a geodesic edge matrix and uses the overall lengths of the MSTs to simultaneously estimate manifold dimension and entropy. We illustrate the GMST approach on standard synthetic manifolds as well as on real data sets consisting of images of faces. Index Terms—Conformal embedding, intrinsic dimension, intrinsic entropy, manifold learning, minimal spanning tree, nonlinear dimensionality reduction. I.
Tradeoff Between Source and Channel Coding
 IEEE TRANS. INFORM. THEORY
, 1997
"... A fundamental problem in the transmission of analog information across a noisy discrete channel is the choice of channel code rate that optimally allocates the available transmission rate between lossy source coding and block channel coding. We establish tight bounds on the channel code rate that mi ..."
Abstract

Cited by 66 (5 self)
 Add to MetaCart
A fundamental problem in the transmission of analog information across a noisy discrete channel is the choice of channel code rate that optimally allocates the available transmission rate between lossy source coding and block channel coding. We establish tight bounds on the channel code rate that minimizes the average distortion of a vector quantizer cascaded with a channel coder and a binarysymmetric channel. Analytic expressions are derived in two cases of interest: small biterror probability and arbitrary source vector dimension; arbitrary biterror probability and large source vector dimension. We demonstrate that the optimal channel code rate is often substantially smaller than the channel capacity, and obtain a noisychannel version of the Zador highresolution distortion formula.
Theoretical Foundations of Transform Coding
, 2001
"... This article explains the fundamental principles of transform coding; these principles apply equally well to images, audio, video, and various other types of data, so abstract formulations are given. Much of the material presented here is adapted from [14, Chap. 2, 4]. The details on wavelet transfo ..."
Abstract

Cited by 65 (6 self)
 Add to MetaCart
This article explains the fundamental principles of transform coding; these principles apply equally well to images, audio, video, and various other types of data, so abstract formulations are given. Much of the material presented here is adapted from [14, Chap. 2, 4]. The details on wavelet transformbased image compression and the JPEG2000 image compression standard are given in the following two articles of this special issue [38], [37]
Vector Quantization with Complexity Costs
, 1993
"... Vector quantization is a data compression method where a set of data points is encoded by a reduced set of reference vectors, the codebook. We discuss a vector quantization strategy which jointly optimizes distortion errors and the codebook complexity, thereby, determining the size of the codebook. ..."
Abstract

Cited by 54 (18 self)
 Add to MetaCart
Vector quantization is a data compression method where a set of data points is encoded by a reduced set of reference vectors, the codebook. We discuss a vector quantization strategy which jointly optimizes distortion errors and the codebook complexity, thereby, determining the size of the codebook. A maximum entropy estimation of the cost function yields an optimal number of reference vectors, their positions and their assignment probabilities. The dependence of the codebook density on the data density for different complexity functions is investigated in the limit of asymptotic quantization levels. How different complexity measures influence the efficiency of vector quantizers is studied for the task of image compression, i.e., we quantize the wavelet coefficients of gray level images and measure the reconstruction error. Our approach establishes a unifying framework for different quantization methods like Kmeans clustering and its fuzzy version, entropy constrained vector quantizati...