Results 1  10
of
10
Parallel Algorithms for Hierarchical Clustering
 Parallel Computing
, 1995
"... Hierarchical clustering is a common method used to determine clusters of similar data points in multidimensional spaces. O(n 2 ) algorithms are known for this problem [3, 4, 10, 18]. This paper reviews important results for sequential algorithms and describes previous work on parallel algorithms f ..."
Abstract

Cited by 80 (1 self)
 Add to MetaCart
Hierarchical clustering is a common method used to determine clusters of similar data points in multidimensional spaces. O(n 2 ) algorithms are known for this problem [3, 4, 10, 18]. This paper reviews important results for sequential algorithms and describes previous work on parallel algorithms for hierarchical clustering. Parallel algorithms to perform hierarchical clustering using several distance metrics are then described. Optimal PRAM algorithms using n log n processors are given for the average link, complete link, centroid, median, and minimum variance metrics. Optimal butterfly and tree algorithms using n log n processors are given for the centroid, median, and minimum variance metrics. Optimal asymptotic speedups are achieved for the best practical algorithm to perform clustering using the single link metric on a n log n processor PRAM, butterfly, or tree. Keywords. Hierarchical clustering, pattern analysis, parallel algorithm, butterfly network, PRAM algorithm. 1 In...
Computer Vision Algorithms on Reconfigurable Logic Arrays
 IEEE TRANS. ON PARALLEL AND DISTRIBUTED SYSTEMS
, 1999
"... Computer vision algorithms are natural candidates for high performance computing due to their inherent parallelism and intense computational demands. For example, a simple 3 x 3 convolution on a 512 x 512 gray scale image at 30 frames per second requires 67.5 million multiplications and 60 million a ..."
Abstract

Cited by 15 (1 self)
 Add to MetaCart
Computer vision algorithms are natural candidates for high performance computing due to their inherent parallelism and intense computational demands. For example, a simple 3 x 3 convolution on a 512 x 512 gray scale image at 30 frames per second requires 67.5 million multiplications and 60 million additions to be performed in one second. Computer vision tasks can be classified into three categories based on their computational complexity andcommunication complexity: lowlevel, intermediatelevel and highlevel. Specialpurpose hardware provides better performance compared to a generalpurpose hardware for all the three levels of vision tasks. With recent advances in very large scale integration (VLSI) technology, an application specific integrated circuit (ASIC) can provide the best performance in terms of total execution time. However, long design cycle time, high development cost and inflexibility of a dedicated hardware deter design of ASICs. In contrast, field programmable gate arrays (FPGAs) support lower design verification time and easier design adaptability atalower cost. Hence, FPGAs with an array of reconfigurable logic blocks canbevery useful compute elements. FPGAbased custom computing machines are
Time and Space Efficient Pose Clustering
 In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
, 1994
"... This paper shows that the pose clustering method of object recognition can be decomposed into small subproblems without loss of accuracy. Randomization can then be used to limit the number of subproblems that need to be examined to achieve accurate recognition. These techniques are used to decrease ..."
Abstract

Cited by 14 (6 self)
 Add to MetaCart
This paper shows that the pose clustering method of object recognition can be decomposed into small subproblems without loss of accuracy. Randomization can then be used to limit the number of subproblems that need to be examined to achieve accurate recognition. These techniques are used to decrease the computational complexity of pose clustering. The clustering step is formulated as an efficient tree search of the pose space. This method requires little memory since not many poses are clustered at a time. Analysis shows that pose clustering is not inherently more sensitive to noise than other methods of generating hypotheses. Finally, experiments on real and synthetic data are presented. 1 Introduction Modelbased object recognition systems determine which objects appear in images using a catalog of object models and estimate their positions and orientations (poses) relative to the camera. This paper examines methods of improving the efficiency of the pose clustering method of object ...
Improving the Orthogonal Range Search kwindows Algorithm
 In Proceedings of the 14th IEEE International Conference on Tools with Artificial Intelligence
, 2002
"... Clustering, that is the partitioning of a set of patterns into disjoint and homogeneous meaningful groups (clusters) , is a fundamental process in the practice of science. kwindows is an efficient clustering algorithm that reduces the number of patterns that need to be examined for similarity, usin ..."
Abstract

Cited by 13 (10 self)
 Add to MetaCart
Clustering, that is the partitioning of a set of patterns into disjoint and homogeneous meaningful groups (clusters) , is a fundamental process in the practice of science. kwindows is an efficient clustering algorithm that reduces the number of patterns that need to be examined for similarity, using a windowing technique. It is based on well known spatial data structures, namely the range tree, that allows fast range searches. From a theoretical standpoint, the k windows algorithm has a lower time complexity than the other wellknown existing clustering algorithms. Moreover, it achieves high quality clustering results. However,it seems that it would not be directly applicable in highdimensional practical settings due to the superlinear space requirements for the range tree. In this paper we present an improvement of the kwindows algorithm, aiming at attacking this problem, that it is based on a different solution to the orthogonal range search problem.
A stochastic connectionist approach for global optimization with application to pattern clustering
 IEEE Transactions on Systems, Man, and CyberneticsPart B
, 2000
"... Abstract—In this paper, a stochastic connectionist approach is proposed for solving function optimization problems with realvalued parameters. With the assumption of increased processing capability of a node in the connectionist network, we show how a broader class of problems can be solved. As the ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Abstract—In this paper, a stochastic connectionist approach is proposed for solving function optimization problems with realvalued parameters. With the assumption of increased processing capability of a node in the connectionist network, we show how a broader class of problems can be solved. As the proposed approach is a stochastic search technique, it avoids getting stuck in local optima. Robustness of the approach is demonstrated on several multimodal functions with different numbers of variables. Optimization of a wellknown partitional clustering criterion, the squarederror criterion (SEC), is formulated as a function optimization problem and is solved using the proposed approach. This approach is used to cluster selected data sets and the results obtained are compared with that of the Kmeans algorithm and a simulated annealing (SA) approach. The amenability of the connectionist approach to parallelization enables effective use of parallel hardware. Index Terms—Clustering, connectionist approaches, function optimization, global optimization. I.
Parallelism in knowledge discovery techniques
 LNCS 2367: Applied Parallel Computing, 6th International Conference PARA’02
, 2002
"... Abstract. Knowledge discovery in databases or data mining is the semiautomated analysis of large volumes of data, looking for the relationships and knowledge that are implicit in large volumes of data and are ’interesting’ in the sense of impacting an organization’s practice. Data mining and knowled ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Abstract. Knowledge discovery in databases or data mining is the semiautomated analysis of large volumes of data, looking for the relationships and knowledge that are implicit in large volumes of data and are ’interesting’ in the sense of impacting an organization’s practice. Data mining and knowledge discovery on large amounts of data can benefit of the use of parallel computers both to improve performance and quality of data selection. This paper presents and discusses different forms of parallelism that can be exploited in data mining techniques and algorithms. For the main data mining techniques, such as rule induction, clustering algorithms, decision trees, genetic algorithms, and neural networks, the possible ways to exploit parallelism are presented and discussed in detail. Finally, some promising research directions in the parallel data mining research area are outlined. 1
Vectorization and Parallelization of Clustering Algorithms
 VI Spanish Symposium on Pattern Recognition and Image Analysis
, 1995
"... In this work we present a study on the parallelization of code segments that are typical of clustering algorithms. In order to approach this problem from a practical point of view we have considered the parallelization on the three types of architectures currently available from parallel system manu ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
In this work we present a study on the parallelization of code segments that are typical of clustering algorithms. In order to approach this problem from a practical point of view we have considered the parallelization on the three types of architectures currently available from parallel system manufacturers: vector computers, shared memory multiprocessors and distributed memory multicomputers. We have selected the FC (Fuzzy Covariance) and AD (Affinity Decompositions) algorithms as representative of the different computational structures found in clustering algorithms. We present a comparative study of the results obtained from running these algorithms on three systems: VP2400/10, KSR1 and AP1000. 1 Introduction The automatic classification of data is one of the basic tasks in pattern recognition. Given its iterative nature and high computational cost (CPU time), the most adequate solution for its numerical treatment is to use concurrent techniques in order to reduce the execution ...
Data Mining and Knowledge Discovery, 3, 263–290 (1999) c ○ 1999 Kluwer Academic Publishers. Manufactured in The Netherlands. A Fast Parallel Clustering Algorithm for Large Spatial Databases
"... Abstract. The clustering algorithm DBSCAN relies on a densitybased notion of clusters and is designed to discover clusters of arbitrary shape as well as to distinguish noise. In this paper, we present PDBSCAN, a parallel version of this algorithm. We use the ‘sharednothing ’ architecture with mult ..."
Abstract
 Add to MetaCart
Abstract. The clustering algorithm DBSCAN relies on a densitybased notion of clusters and is designed to discover clusters of arbitrary shape as well as to distinguish noise. In this paper, we present PDBSCAN, a parallel version of this algorithm. We use the ‘sharednothing ’ architecture with multiple computers interconnected through a network. A fundamental component of a sharednothing system is its distributed data structure. We introduce the dR∗tree, a distributed spatial index structure in which the data is spread among multiple computers and the indexes of the data are replicated on every computer. We implemented our method using a number of workstations connected via Ethernet (10 Mbit). A performance evaluation shows that PDBSCAN offers nearly linear speedup and has excellent scaleup and sizeup behavior.
A Fast Approach to Clustering Datasets using DBSCAN and Pruning Algorithms S.Vijayalaksmi
"... Among the various clustering algorithms, DBSCAN is an effective clustering algorithm used in many applications. It has various advantages like no a priori assumption needed about the number of clusters, can find arbitrarily shaped clusters and can perform well even in the presence of outliers. Howev ..."
Abstract
 Add to MetaCart
Among the various clustering algorithms, DBSCAN is an effective clustering algorithm used in many applications. It has various advantages like no a priori assumption needed about the number of clusters, can find arbitrarily shaped clusters and can perform well even in the presence of outliers. However, the performance is seriously affected when the dataset size becomes large. Moreover, the selection of the two input parameters, Eps and MinPts, has a great impact on the clustering performance. To solve these two problems, this paper modifies the traditional DBSCAN algorithm in two manners. The first method uses Kdimensional tree instead of the traditional Rtree algorithm while the second method includes a locally sensitive hash procedure to speed up the process of clustering and increase the efficiency of clustering. The algorithms use a kdistance graph method to automatically calculate Eps and MinPts. Experimental results show that both the algorithms are efficient in terms of scalability and speeds up the clustering process in an efficient manner.
CLUSTERING
"... Abstract—Clustering of data has numerous applications and has been studied extensively. Though most of the algorithms in the literature are sequential, many parallel algorithms have also been designed. In this paper, we present parallel algorithms with better performance than known algorithms. We co ..."
Abstract
 Add to MetaCart
Abstract—Clustering of data has numerous applications and has been studied extensively. Though most of the algorithms in the literature are sequential, many parallel algorithms have also been designed. In this paper, we present parallel algorithms with better performance than known algorithms. We consider algorithms that work well in the worst case as well as algorithms with good expected performance. Index Terms—Reconfigurable networks, meshconnected computers, meshes with optical buses, hierarchical clustering, PRAMs, singlelink metric. æ