Results 1  10
of
93,521
Clustering by passing messages between data points
 Science
, 2007
"... Clustering data by identifying a subset of representative examples is important for processing sensory signals and detecting patterns in data. Such “exemplars ” can be found by randomly choosing an initial subset of data points and then iteratively refining it, but this works well only if that initi ..."
Abstract

Cited by 696 (8 self)
 Add to MetaCart
if that initial choice is close to a good solution. We devised a method called “affinity propagation,” which takes as input measures of similarity between pairs of data points. Realvalued messages are exchanged between data points until a highquality set of exemplars and corresponding clusters gradually emerges
Data Streams: Algorithms and Applications
, 2005
"... In the data stream scenario, input arrives very rapidly and there is limited memory to store the input. Algorithms have to work with one or few passes over the data, space less than linear in the input size or time significantly less than the input size. In the past few years, a new theory has emerg ..."
Abstract

Cited by 533 (22 self)
 Add to MetaCart
In the data stream scenario, input arrives very rapidly and there is limited memory to store the input. Algorithms have to work with one or few passes over the data, space less than linear in the input size or time significantly less than the input size. In the past few years, a new theory has
Critical Power for Asymptotic Connectivity in Wireless Networks
, 1998
"... : In wireless data networks each transmitter's power needs to be high enough to reach the intended receivers, while generating minimum interference on other receivers sharing the same channel. In particular, if the nodes in the network are assumed to cooperate in routing each others ' pack ..."
Abstract

Cited by 541 (19 self)
 Add to MetaCart
: In wireless data networks each transmitter's power needs to be high enough to reach the intended receivers, while generating minimum interference on other receivers sharing the same channel. In particular, if the nodes in the network are assumed to cooperate in routing each others &apos
Implementing data cubes efficiently
 In SIGMOD
, 1996
"... Decision support applications involve complex queries on very large databases. Since response times should be small, query optimization is critical. Users typically view the data as multidimensional data cubes. Each cell of the data cube is a view consisting of an aggregation of interest, like total ..."
Abstract

Cited by 548 (1 self)
 Add to MetaCart
Decision support applications involve complex queries on very large databases. Since response times should be small, query optimization is critical. Users typically view the data as multidimensional data cubes. Each cell of the data cube is a view consisting of an aggregation of interest, like
A blocksorting lossless data compression algorithm
, 1994
"... We describe a blocksorting, lossless data compression algorithm, and our implementation of that algorithm. We compare the performance of our implementation with widely available data compressors running on the same hardware. The algorithm works by applying a reversible transformation to a block o ..."
Abstract

Cited by 809 (5 self)
 Add to MetaCart
of input text. The transformation does not itself compress the data, but reorders it to make it easy to compress with simple algorithms such as movetofront coding. Our algorithm achieves speed comparable to algorithms based on the techniques of Lempel and Ziv, but obtains compression close to the best
Automatic Subspace Clustering of High Dimensional Data
 Data Mining and Knowledge Discovery
, 2005
"... Data mining applications place special requirements on clustering algorithms including: the ability to find clusters embedded in subspaces of high dimensional data, scalability, enduser comprehensibility of the results, nonpresumption of any canonical data distribution, and insensitivity to the or ..."
Abstract

Cited by 724 (12 self)
 Add to MetaCart
identical results irrespective of the order in which input records are presented and does not presume any specific mathematical form for data distribution. Through experiments, we show that CLIQUE efficiently finds accurate clusters in large high dimensional datasets.
Statistical Comparisons of Classifiers over Multiple Data Sets
, 2006
"... While methods for comparing two learning algorithms on a single data set have been scrutinized for quite some time already, the issue of statistical tests for comparisons of more algorithms on multiple data sets, which is even more essential to typical machine learning studies, has been all but igno ..."
Abstract

Cited by 744 (0 self)
 Add to MetaCart
classifiers and the Friedman test with the corresponding posthoc tests for comparison of more classifiers over multiple data sets. Results of the latter can also be neatly presented with the newly introduced CD (critical difference) diagrams.
MapReduce: Simplified data processing on large clusters.
 In Proceedings of the Sixth Symposium on Operating System Design and Implementation (OSDI04),
, 2004
"... Abstract MapReduce is a programming model and an associated implementation for processing and generating large data sets. Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines. The runtime system takes care of the details of ..."
Abstract

Cited by 3439 (3 self)
 Add to MetaCart
of partitioning the input data, scheduling the program's execution across a set of machines, handling machine failures, and managing the required intermachine communication. This allows programmers without any experience with parallel and distributed systems to easily utilize the resources of a large
Fuzzy extractors: How to generate strong keys from biometrics and other noisy data
, 2008
"... We provide formal definitions and efficient secure techniques for • turning noisy information into keys usable for any cryptographic application, and, in particular, • reliably and securely authenticating biometric data. Our techniques apply not just to biometric information, but to any keying mater ..."
Abstract

Cited by 535 (38 self)
 Add to MetaCart
for various measures of “closeness” of input data, such as Hamming distance, edit distance, and set difference.
Data Preparation for Mining World Wide Web Browsing Patterns
 KNOWLEDGE AND INFORMATION SYSTEMS
, 1999
"... The World Wide Web (WWW) continues to grow at an astounding rate in both the sheer volume of tra#c and the size and complexity of Web sites. The complexity of tasks such as Web site design, Web server design, and of simply navigating through a Web site have increased along with this growth. An i ..."
Abstract

Cited by 567 (43 self)
 Add to MetaCart
. An important input to these design tasks is the analysis of how a Web site is being used. Usage analysis includes straightforward statistics, such as page access frequency, as well as more sophisticated forms of analysis, such as finding the common traversal paths through a Web site. Web Usage Mining
Results 1  10
of
93,521