Results 1  10
of
98
Clustering of the SelfOrganizing Map
, 2000
"... The selforganizing map (SOM) is an excellent tool in exploratory phase of data mining. It projects input space on prototypes of a lowdimensional regular grid that can be effectively utilized to visualize and explore properties of the data. When the number of SOM units is large, to facilitate quant ..."
Abstract

Cited by 159 (1 self)
 Add to MetaCart
The selforganizing map (SOM) is an excellent tool in exploratory phase of data mining. It projects input space on prototypes of a lowdimensional regular grid that can be effectively utilized to visualize and explore properties of the data. When the number of SOM units is large, to facilitate quantitative analysis of the map and the data, similar units need to be grouped, i.e., clustered. In this paper, different approaches to clustering of the SOM are considered. In particular, the use of hierarchical agglomerative clustering and partitive clustering usingmeans are investigated. The twostage procedurefirst using SOM to produce the prototypes that are then clustered in the second stageis found to perform well when compared with direct clustering of the data and to reduce the computation time.
Data Exploration Using SelfOrganizing Maps
 ACTA POLYTECHNICA SCANDINAVICA: MATHEMATICS, COMPUTING AND MANAGEMENT IN ENGINEERING SERIES NO. 82
, 1997
"... Finding structures in vast multidimensional data sets, be they measurement data, statistics, or textual documents, is difficult and timeconsuming. Interesting, novel relations between the data items may be hidden in the data. The selforganizing map (SOM) algorithm of Kohonen can be used to aid the ..."
Abstract

Cited by 96 (4 self)
 Add to MetaCart
Finding structures in vast multidimensional data sets, be they measurement data, statistics, or textual documents, is difficult and timeconsuming. Interesting, novel relations between the data items may be hidden in the data. The selforganizing map (SOM) algorithm of Kohonen can be used to aid the exploration: the structures in the data sets can be illustrated on special map displays. In this work, the methodology of using SOMs for exploratory data analysis or data mining is reviewed and developed further. The properties of the maps are compared with the properties of related methods intended for visualizing highdimensional multivariate data sets. In a set of case studies the SOM algorithm is applied to analyzing electroencephalograms, to illustrating structures of the standard of living in the world, and to organizing fulltext document collections. Measures are proposed for evaluating the quality of different types of maps in representing a given data set, and for measuring the robu...
On Lattice Quantization Noise
 IEEE Trans. Inform. Theory
, 1996
"... Abstract We present several results regarding the properties of a random vector, uniformly distributed over a lattice cell. This random vector is the quantization noise of a lattice quantizer at high resolution, or the noise of a dithered lattice quantizer at all distortion levels. We find that for ..."
Abstract

Cited by 73 (20 self)
 Add to MetaCart
Abstract We present several results regarding the properties of a random vector, uniformly distributed over a lattice cell. This random vector is the quantization noise of a lattice quantizer at high resolution, or the noise of a dithered lattice quantizer at all distortion levels. We find that for the optimal lattice quantizers this noise is widesensestationary and white. Any desirable noise spectra may be realized by an appropriate linear transformation (“shaping”) of a lattice quantizer. As the dimension increases, the normalized second.moment of the optimal lattice quantizer goes to 1/2xe, and consequently the quantization noise approaches a white Gaussian process in the divergence sense. In entropycoded dithered quantization, which can be modeled accurately as passing the source through an additive noise channel, this limit behavior implies that for large lattice dimension both the error and the bit rate approach the error and the information rate of an Additive White Gaussian Noise (AWGN) channel. Index TermsLattice, quantization noise, shaping, normalized second moment, divergence from Gaussianity. I I.
Tradeoff Between Source and Channel Coding
 IEEE TRANS. INFORM. THEORY
, 1997
"... A fundamental problem in the transmission of analog information across a noisy discrete channel is the choice of channel code rate that optimally allocates the available transmission rate between lossy source coding and block channel coding. We establish tight bounds on the channel code rate that mi ..."
Abstract

Cited by 66 (5 self)
 Add to MetaCart
A fundamental problem in the transmission of analog information across a noisy discrete channel is the choice of channel code rate that optimally allocates the available transmission rate between lossy source coding and block channel coding. We establish tight bounds on the channel code rate that minimizes the average distortion of a vector quantizer cascaded with a channel coder and a binarysymmetric channel. Analytic expressions are derived in two cases of interest: small biterror probability and arbitrary source vector dimension; arbitrary biterror probability and large source vector dimension. We demonstrate that the optimal channel code rate is often substantially smaller than the channel capacity, and obtain a noisychannel version of the Zador highresolution distortion formula.
Vector Quantization with Complexity Costs
, 1993
"... Vector quantization is a data compression method where a set of data points is encoded by a reduced set of reference vectors, the codebook. We discuss a vector quantization strategy which jointly optimizes distortion errors and the codebook complexity, thereby, determining the size of the codebook. ..."
Abstract

Cited by 54 (18 self)
 Add to MetaCart
Vector quantization is a data compression method where a set of data points is encoded by a reduced set of reference vectors, the codebook. We discuss a vector quantization strategy which jointly optimizes distortion errors and the codebook complexity, thereby, determining the size of the codebook. A maximum entropy estimation of the cost function yields an optimal number of reference vectors, their positions and their assignment probabilities. The dependence of the codebook density on the data density for different complexity functions is investigated in the limit of asymptotic quantization levels. How different complexity measures influence the efficiency of vector quantizers is studied for the task of image compression, i.e., we quantize the wavelet coefficients of gray level images and measure the reconstruction error. Our approach establishes a unifying framework for different quantization methods like Kmeans clustering and its fuzzy version, entropy constrained vector quantizati...
A WaveletBased Analysis of Fractal Image Compression
 IEEE Trans. Image Processing
, 1997
"... Why does fractal image compression work? What is the implicit image model underlying fractal block coding? How can we characterize the types of images for which fractal block coders will work well? These are the central issues we address. We introduce a new waveletbased framework for analyzing block ..."
Abstract

Cited by 42 (2 self)
 Add to MetaCart
Why does fractal image compression work? What is the implicit image model underlying fractal block coding? How can we characterize the types of images for which fractal block coders will work well? These are the central issues we address. We introduce a new waveletbased framework for analyzing blockbased fractal compression schemes. Within this framework we are able to draw upon insights from the wellestablished transform coder paradigm in order to address the issue of why fractal block coders work. We show that fractal block coders of the form introduced by Jacquin[1] are a Haar wavelet subtree quantization scheme. We examine a generalization of this scheme to smooth wavelets with additional vanishing moments. The performance of our generalized coder is comparable to the best results in the literature for a Jacquinstyle coding scheme. Our wavelet framework gives new insight into the convergence properties of fractal block coders, and leads us to develop an unconditionally convergen...
Intrinsic Dimensionality Estimation with Optimally Topology Preserving Maps
 IEEE Transactions on Pattern Analysis and Machine Intelligence
, 1997
"... A new method for analyzing the intrinsic dimensionality (ID) of low dimensional manifolds in high dimensional feature spaces is presented. The basic idea is to first extract a lowdimensional representation that captures the intrinsic topological structure of the input data and then to analyze this ..."
Abstract

Cited by 39 (3 self)
 Add to MetaCart
A new method for analyzing the intrinsic dimensionality (ID) of low dimensional manifolds in high dimensional feature spaces is presented. The basic idea is to first extract a lowdimensional representation that captures the intrinsic topological structure of the input data and then to analyze this representation, i.e. estimate the intrinsic dimensionality. More specifically, the representation we extract is an optimally topology preserving feature map (OTPM) which is an undirected parametrized graph with a pointer in the input space associated with each node. Estimation of the intrinsic dimensionality is based on local PCA of the pointers of the nodes in the OTPM and their direct neighbors. The method has a number of important advantages compared with previous approaches: First, it can be shown to have only linear time complexity w.r.t. the dimensionality of the input space, in contrast to conventional PCA based approaches which have cubic complexity and hence become computational imp...
Fast Approximate Spectral Clustering
, 2009
"... Spectral clustering refers to a flexible class of clustering procedures that can produce highquality clusterings on small data sets but which has limited applicability to largescale problems due to its computational complexity of O(n 3), with n the number of data points. We extend the range of spe ..."
Abstract

Cited by 36 (1 self)
 Add to MetaCart
Spectral clustering refers to a flexible class of clustering procedures that can produce highquality clusterings on small data sets but which has limited applicability to largescale problems due to its computational complexity of O(n 3), with n the number of data points. We extend the range of spectral clustering by developing a general framework for fast approximate spectral clustering in which a distortionminimizing local transformation is first applied to the data. This framework is based on a theoretical analysis that provides a statistical characterization of the effect of local distortion on the misclustering rate. We develop two concrete instances of our general framework, one based on local kmeans clustering (KASP) and one based on random projection trees (RASP). Extensive experiments show that these algorithms can achieve significant speedups with little degradation in clustering accuracy. Specifically, our algorithms outperform kmeans by a large margin in terms of accuracy, and run several times faster than approximate spectral clustering based on the Nyström method, with comparable accuracy and significantly smaller memory footprint. Remarkably, our algorithms make it possible for a single machine to spectral cluster data sets with a million observations within several minutes. 1
Waveletbased image coding: An overview
 Applied and Computational Control, Signals, and Circuits
, 1998
"... ABSTRACT This paper presents an overview of waveletbased image coding. We develop the basics of image coding with a discussion of vector quantization. We motivate the use of transform coding in practical settings,and describe the properties of various decorrelating transforms. We motivate the use o ..."
Abstract

Cited by 35 (3 self)
 Add to MetaCart
ABSTRACT This paper presents an overview of waveletbased image coding. We develop the basics of image coding with a discussion of vector quantization. We motivate the use of transform coding in practical settings,and describe the properties of various decorrelating transforms. We motivate the use of the wavelet transform in coding using ratedistortion considerations as well as approximationtheoretic considerations. Finally,we give an overview of current coders in the literature. 1
Theoretical aspects of the SOM algorithm
 Neurocomputing
, 1998
"... The SOM algorithm is very astonishing. On the one hand, it is very simple to write down and to simulate, its practical properties are clear and easy to observe. But, on the other hand, its theoretical properties still remain without proof in the general case, despite the great efforts of several aut ..."
Abstract

Cited by 34 (6 self)
 Add to MetaCart
The SOM algorithm is very astonishing. On the one hand, it is very simple to write down and to simulate, its practical properties are clear and easy to observe. But, on the other hand, its theoretical properties still remain without proof in the general case, despite the great efforts of several authors. In this paper, we pass in review the last results and provide some conjectures for the future work. Keywords: Selforganization, Kohonen algorithm, Convergence of stochastic processes, Vectorial quantization.