Results 1  10
of
3,195,133
Determining the k in kmeans with MapReduce Abstract
"... In this paper we propose a MapReduce implementation of Gmeans, a variant of kmeans that is able to automatically determine k, the number of clusters. We show that our implementation scales to very large datasets and very large values of k, as the computation cost is proportional to nk. Other techn ..."
Abstract
 Add to MetaCart
In this paper we propose a MapReduce implementation of Gmeans, a variant of kmeans that is able to automatically determine k, the number of clusters. We show that our implementation scales to very large datasets and very large values of k, as the computation cost is proportional to nk. Other
Constrained Kmeans Clustering with Background Knowledge
 In ICML
, 2001
"... Clustering is traditionally viewed as an unsupervised method for data analysis. However, in some cases information about the problem domain is available in addition to the data instances themselves. In this paper, we demonstrate how the popular kmeans clustering algorithm can be pro tably modi ed ..."
Abstract

Cited by 473 (9 self)
 Add to MetaCart
Clustering is traditionally viewed as an unsupervised method for data analysis. However, in some cases information about the problem domain is available in addition to the data instances themselves. In this paper, we demonstrate how the popular kmeans clustering algorithm can be pro tably modi ed
An Efficient kMeans Clustering Algorithm: Analysis and Implementation
, 2000
"... Kmeans clustering is a very popular clustering technique, which is used in numerous applications. Given a set of n data points in R d and an integer k, the problem is to determine a set of k points R d , called centers, so as to minimize the mean squared distance from each data point to its ..."
Abstract

Cited by 405 (4 self)
 Add to MetaCart
Kmeans clustering is a very popular clustering technique, which is used in numerous applications. Given a set of n data points in R d and an integer k, the problem is to determine a set of k points R d , called centers, so as to minimize the mean squared distance from each data point to its
Refining Initial Points for KMeans Clustering
, 1998
"... Practical approaches to clustering use an iterative procedure (e.g. KMeans, EM) which converges to one of numerous local minima. It is known that these iterative techniques are especially sensitive to initial starting conditions. We present a procedure for computing a refined starting condition fro ..."
Abstract

Cited by 308 (5 self)
 Add to MetaCart
Practical approaches to clustering use an iterative procedure (e.g. KMeans, EM) which converges to one of numerous local minima. It is known that these iterative techniques are especially sensitive to initial starting conditions. We present a procedure for computing a refined starting condition
Xmeans: Extending Kmeans with Efficient Estimation of the Number of Clusters
 In Proceedings of the 17th International Conf. on Machine Learning
, 2000
"... Despite its popularity for general clustering, Kmeans suffers three major shortcomings; it scales poorly computationally, the number of clusters K has to be supplied by the user, and the search is prone to local minima. We propose solutions for the first two problems, and a partial remedy for the t ..."
Abstract

Cited by 412 (5 self)
 Add to MetaCart
Despite its popularity for general clustering, Kmeans suffers three major shortcomings; it scales poorly computationally, the number of clusters K has to be supplied by the user, and the search is prone to local minima. We propose solutions for the first two problems, and a partial remedy
Evaluating MapReduce for multicore and multiprocessor systems
 In HPCA ’07: Proceedings of the 13th International Symposium on HighPerformance Computer Architecture
, 2007
"... This paper evaluates the suitability of the MapReduce model for multicore and multiprocessor systems. MapReduce was created by Google for application development on datacenters with thousands of servers. It allows programmers to write functionalstyle code that is automatically parallelized and s ..."
Abstract

Cited by 248 (3 self)
 Add to MetaCart
This paper evaluates the suitability of the MapReduce model for multicore and multiprocessor systems. MapReduce was created by Google for application development on datacenters with thousands of servers. It allows programmers to write functionalstyle code that is automatically parallelized
Mean shift, mode seeking, and clustering
 IEEE Transactions on Pattern Analysis and Machine Intelligence
, 1995
"... AbstractMean shift, a simple iterative procedure that shifts each data point to the average of data points in its neighborhood, is generalized and analyzed in this paper. This generalization makes some kmeans like clustering algorithms its special cases. It is shown that mean shift is a modeseeki ..."
Abstract

Cited by 620 (0 self)
 Add to MetaCart
AbstractMean shift, a simple iterative procedure that shifts each data point to the average of data points in its neighborhood, is generalized and analyzed in this paper. This generalization makes some kmeans like clustering algorithms its special cases. It is shown that mean shift is a mode
Pig Latin: A NotSoForeign Language for Data Processing
"... There is a growing need for adhoc analysis of extremely large data sets, especially at internet companies where innovation critically depends on being able to analyze terabytes of data collected every day. Parallel database products, e.g., Teradata, offer a solution, but are usually prohibitively e ..."
Abstract

Cited by 584 (12 self)
 Add to MetaCart
expensive at this scale. Besides, many of the people who analyze this data are entrenched procedural programmers, who find the declarative, SQL style to be unnatural. The success of the more procedural mapreduce programming model, and its associated scalable implementations on commodity hardware
Determining Optical Flow
 ARTIFICIAL INTELLIGENCE
, 1981
"... Optical flow cannot be computed locally, since only one independent measurement is available from the image sequence at a point, while the flow velocity has two components. A second constraint is needed. A method for finding the optical flow pattern is presented which assumes that the apparent veloc ..."
Abstract

Cited by 2379 (9 self)
 Add to MetaCart
Optical flow cannot be computed locally, since only one independent measurement is available from the image sequence at a point, while the flow velocity has two components. A second constraint is needed. A method for finding the optical flow pattern is presented which assumes that the apparent velocity of the brightness pattern varies smoothly almost everywhere in the image. An iterative implementation is shown which successfully computes the optical flow for a number of synthetic image sequences. The algorithm is robust in that it can handle image sequences that are quantized rather coarsely in space and time. It is also insensitive to quantization of brightness levels and additive noise. Examples are included where the assumption of smoothness is violated at singular points or along lines in the image.
Results 1  10
of
3,195,133