Results 1 - 10
of
940,750
A Scalable Algorithm for Clustering Sequential Data
- IN PROC. OF THE 1ST IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM
, 2001
"... Many scientific and commercial domains have seen an enormous growth of data in recentyears. Such data sets have inherent sequential nature. The clustering of such data is useful for various purposes. Over the years, many methods havebeendeveloped for clustering objects according to their similarit ..."
Abstract
-
Cited by 33 (3 self)
- Add to MetaCart
to their similarity. However, in contexts of sequential data these methods tend to have a computational complexity that is at least quadratic on the number of sequences, as they require an all-against-all initial analysis. In this paper we presentanentirely different approach to sequence clustering that does
Scalable Application Layer Multicast
, 2002
"... We describe a new scalable application-layer multicast protocol, specifically designed for low-bandwidth, data streaming applications with large receiver sets. Our scheme is based upon a hierarchical clustering of the application-layer multicast peers and can support a number of different data deliv ..."
Abstract
-
Cited by 719 (21 self)
- Add to MetaCart
We describe a new scalable application-layer multicast protocol, specifically designed for low-bandwidth, data streaming applications with large receiver sets. Our scheme is based upon a hierarchical clustering of the application-layer multicast peers and can support a number of different data
A Sequential Algorithm for Training Text Classifiers
, 1994
"... The ability to cheaply train text classifiers is critical to their use in information retrieval, content analysis, natural language processing, and other tasks involving data which is partly or fully textual. An algorithm for sequential sampling during machine learning of statistical classifiers was ..."
Abstract
-
Cited by 626 (10 self)
- Add to MetaCart
The ability to cheaply train text classifiers is critical to their use in information retrieval, content analysis, natural language processing, and other tasks involving data which is partly or fully textual. An algorithm for sequential sampling during machine learning of statistical classifiers
A universal algorithm for sequential data compression
- IEEE TRANSACTIONS ON INFORMATION THEORY
, 1977
"... A universal algorithm for sequential data compression is presented. Its performance is investigated with respect to a nonprobabilistic model of constrained sources. The compression ratio achieved by the proposed universal code uniformly approaches the lower bounds on the compression ratios attainabl ..."
Abstract
-
Cited by 1501 (7 self)
- Add to MetaCart
A universal algorithm for sequential data compression is presented. Its performance is investigated with respect to a nonprobabilistic model of constrained sources. The compression ratio achieved by the proposed universal code uniformly approaches the lower bounds on the compression ratios
Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks
- In EuroSys
, 2007
"... Dryad is a general-purpose distributed execution engine for coarse-grain data-parallel applications. A Dryad applica-tion combines computational “vertices ” with communica-tion “channels ” to form a dataflow graph. Dryad runs the application by executing the vertices of this graph on a set of availa ..."
Abstract
-
Cited by 730 (27 self)
- Add to MetaCart
-gle computers, through small clusters of computers, to data centers with thousands of computers. The Dryad execution engine handles all the difficult problems of creating a large distributed, concurrent application: scheduling the use of computers and their CPUs, recovering from communication or computer
Mining Sequential Patterns
, 1995
"... We are given a large database of customer transactions, where each transaction consists of customer-id, transaction time, and the items bought in the transaction. We introduce the problem of mining sequential patterns over such databases. We present three algorithms to solve this problem, and empiri ..."
Abstract
-
Cited by 1534 (7 self)
- Add to MetaCart
, and empirically evaluate their performance using synthetic data. Two of the proposed algorithms, AprioriSome and AprioriAll, have comparable performance, albeit AprioriSome performs a little better when the minimum number of customers that must support a sequential pattern is low. Scale-up experiments show
On Spectral Clustering: Analysis and an algorithm
- ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS
, 2001
"... Despite many empirical successes of spectral clustering methods -- algorithms that cluster points using eigenvectors of matrices derived from the distances between the points -- there are several unresolved issues. First, there is a wide variety of algorithms that use the eigenvectors in slightly ..."
Abstract
-
Cited by 1697 (13 self)
- Add to MetaCart
Despite many empirical successes of spectral clustering methods -- algorithms that cluster points using eigenvectors of matrices derived from the distances between the points -- there are several unresolved issues. First, there is a wide variety of algorithms that use the eigenvectors
CURE: An Efficient Clustering Algorithm for Large Data sets
- Published in the Proceedings of the ACM SIGMOD Conference
, 1998
"... Clustering, in data mining, is useful for discovering groups and identifying interesting distributions in the underlying data. Traditional clustering algorithms either favor clusters with spherical shapes and similar sizes, or are very fragile in the presence of outliers. We propose a new clustering ..."
Abstract
-
Cited by 713 (5 self)
- Add to MetaCart
Clustering, in data mining, is useful for discovering groups and identifying interesting distributions in the underlying data. Traditional clustering algorithms either favor clusters with spherical shapes and similar sizes, or are very fragile in the presence of outliers. We propose a new
Scalable Recognition with a Vocabulary Tree
- IN CVPR
, 2006
"... A recognition scheme that scales efficiently to a large number of objects is presented. The efficiency and quality is exhibited in a live demonstration that recognizes CD-covers from a database of 40000 images of popular music CD's. The scheme ..."
Abstract
-
Cited by 1043 (0 self)
- Add to MetaCart
A recognition scheme that scales efficiently to a large number of objects is presented. The efficiency and quality is exhibited in a live demonstration that recognizes CD-covers from a database of 40000 images of popular music CD's. The scheme
Automatic Subspace Clustering of High Dimensional Data
- Data Mining and Knowledge Discovery
, 2005
"... Data mining applications place special requirements on clustering algorithms including: the ability to find clusters embedded in subspaces of high dimensional data, scalability, end-user comprehensibility of the results, non-presumption of any canonical data distribution, and insensitivity to the or ..."
Abstract
-
Cited by 724 (12 self)
- Add to MetaCart
Data mining applications place special requirements on clustering algorithms including: the ability to find clusters embedded in subspaces of high dimensional data, scalability, end-user comprehensibility of the results, non-presumption of any canonical data distribution, and insensitivity
Results 1 - 10
of
940,750