Results 1  10
of
240
Centroidal Voronoi tessellations: Applications and algorithms
 SIAM Rev
, 1999
"... Abstract. A centroidal Voronoi tessellation is a Voronoi tessellation whose generating points are the centroids (centers of mass) of the corresponding Voronoi regions. We give some applications of such tessellations to problems in image compression, quadrature, finite difference methods, distributio ..."
Abstract

Cited by 346 (37 self)
 Add to MetaCart
(Show Context)
Abstract. A centroidal Voronoi tessellation is a Voronoi tessellation whose generating points are the centroids (centers of mass) of the corresponding Voronoi regions. We give some applications of such tessellations to problems in image compression, quadrature, finite difference methods, distribution of resources, cellular biology, statistics, and the territorial behavior of animals. We discuss methods for computing these tessellations, provide some analyses concerning both the tessellations and the methods for their determination, and, finally, present the results of some numerical experiments.
Self Organization of a Massive Document Collection
 IEEE Transactions on Neural Networks
"... This article describes the implementation of a system that is able to organize vast document collections according to textual similarities. It is based on the SelfOrganizing Map (SOM) algorithm. As the feature vectors for the documents we use statistical representations of their vocabularies. The m ..."
Abstract

Cited by 244 (14 self)
 Add to MetaCart
(Show Context)
This article describes the implementation of a system that is able to organize vast document collections according to textual similarities. It is based on the SelfOrganizing Map (SOM) algorithm. As the feature vectors for the documents we use statistical representations of their vocabularies. The main goal in our work has been to scale up the SOM algorithm to be able to deal with large amounts of highdimensional data. In a practical experiment we mapped 6,840,568 patent abstracts onto a 1,002,240node SOM. As the feature vectors we used 500dimensional vectors of stochastic figures obtained as random projections of weighted word histograms. Keywords Data mining, exploratory data analysis, knowledge discovery, large databases, parallel implementation, random projection, SelfOrganizing Map (SOM), textual documents. I. Introduction A. From simple searches to browsing of selforganized data collections Locating documents on the basis of keywords and simple search expressions is a c...
Clustering of the SelfOrganizing Map
, 2000
"... The selforganizing map (SOM) is an excellent tool in exploratory phase of data mining. It projects input space on prototypes of a lowdimensional regular grid that can be effectively utilized to visualize and explore properties of the data. When the number of SOM units is large, to facilitate quant ..."
Abstract

Cited by 232 (1 self)
 Add to MetaCart
(Show Context)
The selforganizing map (SOM) is an excellent tool in exploratory phase of data mining. It projects input space on prototypes of a lowdimensional regular grid that can be effectively utilized to visualize and explore properties of the data. When the number of SOM units is large, to facilitate quantitative analysis of the map and the data, similar units need to be grouped, i.e., clustered. In this paper, different approaches to clustering of the SOM are considered. In particular, the use of hierarchical agglomerative clustering and partitive clustering usingmeans are investigated. The twostage procedurefirst using SOM to produce the prototypes that are then clustered in the second stageis found to perform well when compared with direct clustering of the data and to reduce the computation time.
Data Exploration Using SelfOrganizing Maps
 ACTA POLYTECHNICA SCANDINAVICA: MATHEMATICS, COMPUTING AND MANAGEMENT IN ENGINEERING SERIES NO. 82
, 1997
"... Finding structures in vast multidimensional data sets, be they measurement data, statistics, or textual documents, is difficult and timeconsuming. Interesting, novel relations between the data items may be hidden in the data. The selforganizing map (SOM) algorithm of Kohonen can be used to aid the ..."
Abstract

Cited by 107 (4 self)
 Add to MetaCart
Finding structures in vast multidimensional data sets, be they measurement data, statistics, or textual documents, is difficult and timeconsuming. Interesting, novel relations between the data items may be hidden in the data. The selforganizing map (SOM) algorithm of Kohonen can be used to aid the exploration: the structures in the data sets can be illustrated on special map displays. In this work, the methodology of using SOMs for exploratory data analysis or data mining is reviewed and developed further. The properties of the maps are compared with the properties of related methods intended for visualizing highdimensional multivariate data sets. In a set of case studies the SOM algorithm is applied to analyzing electroencephalograms, to illustrating structures of the standard of living in the world, and to organizing fulltext document collections. Measures are proposed for evaluating the quality of different types of maps in representing a given data set, and for measuring the robu...
Geodesic entropic graphs for dimension and entropy estimation in manifold learning
 IEEE Trans. on Signal Processing
, 2004
"... Abstract—In the manifold learning problem, one seeks to discover a smooth low dimensional surface, i.e., a manifold embedded in a higher dimensional linear vector space, based on a set of measured sample points on the surface. In this paper, we consider the closely related problem of estimating the ..."
Abstract

Cited by 86 (5 self)
 Add to MetaCart
Abstract—In the manifold learning problem, one seeks to discover a smooth low dimensional surface, i.e., a manifold embedded in a higher dimensional linear vector space, based on a set of measured sample points on the surface. In this paper, we consider the closely related problem of estimating the manifold’s intrinsic dimension and the intrinsic entropy of the sample points. Specifically, we view the sample points as realizations of an unknown multivariate density supported on an unknown smooth manifold. We introduce a novel geometric approach based on entropic graph methods. Although the theory presented applies to this general class of graphs, we focus on the geodesicminimalspanningtree (GMST) to obtaining asymptotically consistent estimates of the manifold dimension and the Rényientropy of the sample density on the manifold. The GMST approach is striking in its simplicity and does not require reconstruction of the manifold or estimation of the multivariate density of the samples. The GMST method simply constructs a minimal spanning tree (MST) sequence using a geodesic edge matrix and uses the overall lengths of the MSTs to simultaneously estimate manifold dimension and entropy. We illustrate the GMST approach on standard synthetic manifolds as well as on real data sets consisting of images of faces. Index Terms—Conformal embedding, intrinsic dimension, intrinsic entropy, manifold learning, minimal spanning tree, nonlinear dimensionality reduction. I.
Clustering Based on Conditional Distributions in an Auxiliary Space
 Neural Computation
, 2001
"... We study the problem of learning groups or categories that are local ..."
Abstract

Cited by 84 (23 self)
 Add to MetaCart
We study the problem of learning groups or categories that are local
Theoretical Foundations of Transform Coding
, 2001
"... This article explains the fundamental principles of transform coding; these principles apply equally well to images, audio, video, and various other types of data, so abstract formulations are given. Much of the material presented here is adapted from [14, Chap. 2, 4]. The details on wavelet transfo ..."
Abstract

Cited by 75 (6 self)
 Add to MetaCart
This article explains the fundamental principles of transform coding; these principles apply equally well to images, audio, video, and various other types of data, so abstract formulations are given. Much of the material presented here is adapted from [14, Chap. 2, 4]. The details on wavelet transformbased image compression and the JPEG2000 image compression standard are given in the following two articles of this special issue [38], [37]
Tradeoff Between Source and Channel Coding
 IEEE TRANS. INFORM. THEORY
, 1997
"... A fundamental problem in the transmission of analog information across a noisy discrete channel is the choice of channel code rate that optimally allocates the available transmission rate between lossy source coding and block channel coding. We establish tight bounds on the channel code rate that mi ..."
Abstract

Cited by 74 (5 self)
 Add to MetaCart
A fundamental problem in the transmission of analog information across a noisy discrete channel is the choice of channel code rate that optimally allocates the available transmission rate between lossy source coding and block channel coding. We establish tight bounds on the channel code rate that minimizes the average distortion of a vector quantizer cascaded with a channel coder and a binarysymmetric channel. Analytic expressions are derived in two cases of interest: small biterror probability and arbitrary source vector dimension; arbitrary biterror probability and large source vector dimension. We demonstrate that the optimal channel code rate is often substantially smaller than the channel capacity, and obtain a noisychannel version of the Zador highresolution distortion formula.
On Universal Quantization by Randomized Uniform/Lattice Quantizers
 IEEE Trans. Inform. Theory
, 1992
"... Uniform quantization with dither, or lattice quantization with dither in the vector case, followed by a universal lossless source encoder (entropy coder), is a simple procedure for universal coding with distortion of a source that may take continuously many values. The rate of this universal codi ..."
Abstract

Cited by 62 (16 self)
 Add to MetaCart
(Show Context)
Uniform quantization with dither, or lattice quantization with dither in the vector case, followed by a universal lossless source encoder (entropy coder), is a simple procedure for universal coding with distortion of a source that may take continuously many values. The rate of this universal coding scheme is examined, and we derive a general expression for it. An upper bound for the redundancy of this scheme, defined as the difference between its rate and the minimal possible rate, given by the rate distortion function of the source, is derived. This bound holds for all distortion levels. Furthermore, we present a composite upper bound on the redundancy as a function of the quantizer resolution which leads to a tighter bound in the high rate (low distortion) case. Key Words: Uniform and Lattice Quantization, Randomized Quantization, Universal Coding, RateDistortion Performance Meir Feder was also supported by The Andrew W. Mellon Foundation, Woods Hole Oceanographic Institu...