Results 1  10
of
33
Survey of clustering algorithms
 IEEE TRANSACTIONS ON NEURAL NETWORKS
, 2005
"... Data analysis plays an indispensable role for understanding various phenomena. Cluster analysis, primitive exploration with little or no prior knowledge, consists of research developed across a wide variety of communities. The diversity, on one hand, equips us with many tools. On the other hand, the ..."
Abstract

Cited by 270 (3 self)
 Add to MetaCart
(Show Context)
Data analysis plays an indispensable role for understanding various phenomena. Cluster analysis, primitive exploration with little or no prior knowledge, consists of research developed across a wide variety of communities. The diversity, on one hand, equips us with many tools. On the other hand, the profusion of options causes confusion. We survey clustering algorithms for data sets appearing in statistics, computer science, and machine learning, and illustrate their applications in some benchmark data sets, the traveling salesman problem, and bioinformatics, a new field attracting intensive efforts. Several tightly related topics, proximity measure, and cluster validation, are also discussed.
Centroidal Voronoi tessellations: Applications and algorithms
 SIAM Rev
, 1999
"... Abstract. A centroidal Voronoi tessellation is a Voronoi tessellation whose generating points are the centroids (centers of mass) of the corresponding Voronoi regions. We give some applications of such tessellations to problems in image compression, quadrature, finite difference methods, distributio ..."
Abstract

Cited by 264 (28 self)
 Add to MetaCart
Abstract. A centroidal Voronoi tessellation is a Voronoi tessellation whose generating points are the centroids (centers of mass) of the corresponding Voronoi regions. We give some applications of such tessellations to problems in image compression, quadrature, finite difference methods, distribution of resources, cellular biology, statistics, and the territorial behavior of animals. We discuss methods for computing these tessellations, provide some analyses concerning both the tessellations and the methods for their determination, and, finally, present the results of some numerical experiments.
Controling the Magnification Factor of SelfOrganizing Feature Maps
, 1995
"... The magnification exponents ¯ occuring in adaptive map formation algorithms like Kohonen's selforganizing feature map deviate for the information theoretically optimal value ¯ = 1 as well as from the values which optimize, e.g., the mean square distortion error (¯ = 1=3 for onedimensional map ..."
Abstract

Cited by 45 (7 self)
 Add to MetaCart
The magnification exponents ¯ occuring in adaptive map formation algorithms like Kohonen's selforganizing feature map deviate for the information theoretically optimal value ¯ = 1 as well as from the values which optimize, e.g., the mean square distortion error (¯ = 1=3 for onedimensional maps). At the same time, models for categorical perception such as the "perceptual magnet" effect which are based on topographic maps require negative magnification exponents ¯ ! 0. We present an extension of the selforganizing feature map algorithm which utilizes adaptive local learning step sizes to actually control the magnification properties of the map. By change of a single parameter, maps with optimal information transfer, with various minimal reconstruction errors, or with an inverted magnification can be generated. Analytic results on this new algorithm are complemented by numerical simulations. 1. Introduction The representation of information in topographic maps is a common property of...
The Enhanced LBG Algorithm
, 2001
"... Clustering applications cover several elds such as audio and video data compression, pattern recognition, computer vision, medical image recognition, etc. In this paper we present a new clustering algorithm called Enhanced LBG (ELBG). It belongs to the hard and Kmeans vector quantization groups an ..."
Abstract

Cited by 23 (1 self)
 Add to MetaCart
Clustering applications cover several elds such as audio and video data compression, pattern recognition, computer vision, medical image recognition, etc. In this paper we present a new clustering algorithm called Enhanced LBG (ELBG). It belongs to the hard and Kmeans vector quantization groups and derives directly from the simpler LBG. The basic idea we have developed is the concept of utility of a codeword, a powerful instrument to overcome one of the main drawbacks of clustering algorithms: generally, the results achieved are not good in the case of a bad choice of the initial codebook. We will present our experimental results showing that ELBG is able to nd better codebooks than previous clustering techniques and the computational complexity is virtually the same as the simpler LBG.
Neural Maps and Topographic Vector Quantization
, 1999
"... Neural maps combine the representation of data by codebook vectors, like a vector quantizer, with the property of topography, like a continuous function. While the quantization error is simple to compute and to compare between different maps, topography of a map is difficult to define and to quantif ..."
Abstract

Cited by 22 (4 self)
 Add to MetaCart
(Show Context)
Neural maps combine the representation of data by codebook vectors, like a vector quantizer, with the property of topography, like a continuous function. While the quantization error is simple to compute and to compare between different maps, topography of a map is difficult to define and to quantify. Yet, topography of a neural map is an advantageous property, e.g. in the presence of noise in a transmission channel, in data visualization, and in numerous other applications. In this paper we review some conceptual aspects of definitions of topography, and some recently proposed measures to quantify topography. We apply the measures first to neural maps trained on synthetic data sets, and check the measures for properties like reproducability, scalability, systematic dependence of the value of the measure on the topology of the map etc. We then test the measures on maps generated for four realworld data sets, a chaotic time series, speech data, and two sets of image data. The measures ...
Initialization of Adaptive Parameters in Density Networks
 3RD CONF. ON NEURAL NETWORKS, KULE
, 1997
"... Initialization of adaptive parameters in neural networks is of crucial importance to the speed of convergence of the learning procedure. Methods of initialization for the density networks are reviewed and two new methods, based on decision trees and dendrograms, presented. These two methods were app ..."
Abstract

Cited by 13 (12 self)
 Add to MetaCart
Initialization of adaptive parameters in neural networks is of crucial importance to the speed of convergence of the learning procedure. Methods of initialization for the density networks are reviewed and two new methods, based on decision trees and dendrograms, presented. These two methods were applied in the Feature Space Mapping framework to artificial and real world datasets. Results show superiority of the dendrogrambased method including rotation.
A deep learning architecture comprising homogeneous cortical circuits for scalable spatiotemporal pattern inference
 in Proc. NIPS Workshop Deep Learn. Speech
, 2009
"... A key challenge associated with the design of scalable deep learning architectures pertains to efciently capturing spatiotemporal dependencies in a scalable framework that is modality independent. This paper presents a novel discriminative deep learning architecture, which relies on an identical co ..."
Abstract

Cited by 10 (4 self)
 Add to MetaCart
(Show Context)
A key challenge associated with the design of scalable deep learning architectures pertains to efciently capturing spatiotemporal dependencies in a scalable framework that is modality independent. This paper presents a novel discriminative deep learning architecture, which relies on an identical cortical circuit populating the hierarchical structure. Belief states formed across the hierarchy intrinsically capture sequences of patterns, rather than static patterns, thereby facilitating the embedding of temporal dependencies. At the core of the adaptation mechanism are two learned constructs, one of which relies on a fast and stable incremental clustering. Moreover, the proposed methodology does not require layerbylayer training and lends itself naturally to massivelyparallel processing platforms. A simple test case demonstrates the validity of the architecture and learning algorithm. The system can be efciently applied to various modalities, including those associated with complex visual and audio information representation. 1
Characterizing Computer Systems' Workloads
, 2002
"... The performance of any system cannot be determined without knowing the workload, that is, the set of requests presented to the system. Workload characterization is the process by which we produce models that are capable of describing and reproducing the behavior of a workload. Such models are imp ..."
Abstract

Cited by 8 (6 self)
 Add to MetaCart
The performance of any system cannot be determined without knowing the workload, that is, the set of requests presented to the system. Workload characterization is the process by which we produce models that are capable of describing and reproducing the behavior of a workload. Such models are imperative to any performance related studies such as capacity planning, workload balancing, performance prediction and system tuning. In this paper, we survey workload characterization techniques used for several types of computer systems. We identify significant issues and concerns encountered during the characterization process and propose an augmented methodology for workload characterization as a framework.
A Fast and Stable Incremental Clustering Algorithm
 in 2010 Seventh International Conference on Information Technology. IEEE
"... Abstract — Clustering is a pivotal building block in many data mining applications and in machine learning in general. Most clustering algorithms in the literature pertain to offline (or batch) processing, in which the clustering process repeatedly sweeps through a set of data samples in an attempt ..."
Abstract

Cited by 6 (5 self)
 Add to MetaCart
(Show Context)
Abstract — Clustering is a pivotal building block in many data mining applications and in machine learning in general. Most clustering algorithms in the literature pertain to offline (or batch) processing, in which the clustering process repeatedly sweeps through a set of data samples in an attempt to capture its underlying structure in a compact and ef cient way. However, many recent applications require that the clustering algorithm be online, or incremental, in the that there is no a priori set of samples to process but rather samples are provided one iteration at a time. Accordingly, the clustering algorithm is expected to gradually improve its prototype (or centroid) constructs. Several problems emerge in this context, particularly relating to the stability of the process and its speed of convergence. In this paper, we present a fast and stable incremental clustering algorithm, which is computationally modest and imposes minimal memory requirements. Simulation results clearly demonstrate the advantages of the proposed framework in a variety of practical scenarios. I.
The Impact of Workload Clustering on Transaction Routing
"... The qualitative and quantitative description of the workload of a system is very important for capacity planning and performance management. In largescale transaction processing systems, dynamic workload control algorithms are applied to optimize system performance. Such algorithms can benefit from ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
The qualitative and quantitative description of the workload of a system is very important for capacity planning and performance management. In largescale transaction processing systems, dynamic workload control algorithms are applied to optimize system performance. Such algorithms can benefit from the results of workload clustering algorithms that partition the workload into classes consisting of units of work exhibiting similar characteristics. This paper presents CLUE, a clustering environment for OLTP workload characterization. CLUE provides a library of clustering algorithms that classify transactions into classes, according to their database reference patterns. This paper introduces HALC, a new batchmode heuristic clustering algorithm, designed to cope with the large volume of input data that is typical for reallife applications. Next, an on the y clustering algorithm based on neural networks is described. This algorithm can be used in an online fashion in systems whose characteristics change through time. This paper provides an evaluation of the performance of HALC and the on the fly algorithms in terms of execution times and statistical metrics related to the quality of clusters that they compute, for both synthetic and reallife workload traces. Finally, this paper quantifies the impact of workload clustering on the performance of three dynamic transaction routing algorithms for SharedNothing transaction processing systems.