Results 1 - 10
of
7,110
A comparison of document clustering techniques
- In KDD Workshop on Text Mining
, 2000
"... This paper presents the results of an experimental study of some common document clustering techniques: agglomerative hierarchical clustering and K-means. (We used both a “standard” K-means algorithm and a “bisecting ” K-means algorithm.) Our results indicate that the bisecting K-means technique is ..."
Abstract
-
Cited by 613 (27 self)
- Add to MetaCart
This paper presents the results of an experimental study of some common document clustering techniques: agglomerative hierarchical clustering and K-means. (We used both a “standard” K-means algorithm and a “bisecting ” K-means algorithm.) Our results indicate that the bisecting K-means technique
A Sequential Algorithm for Training Text Classifiers
, 1994
"... The ability to cheaply train text classifiers is critical to their use in information retrieval, content analysis, natural language processing, and other tasks involving data which is partly or fully textual. An algorithm for sequential sampling during machine learning of statistical classifiers was ..."
Abstract
-
Cited by 631 (10 self)
- Add to MetaCart
The ability to cheaply train text classifiers is critical to their use in information retrieval, content analysis, natural language processing, and other tasks involving data which is partly or fully textual. An algorithm for sequential sampling during machine learning of statistical classifiers
A density-based algorithm for discovering clusters in large spatial databases with noise
, 1996
"... Clustering algorithms are attractive for the task of class identification in spatial databases. However, the application to large spatial databases rises the following requirements for clustering algorithms: minimal requirements of domain knowledge to determine the input parameters, discovery of clu ..."
Abstract
-
Cited by 1786 (70 self)
- Add to MetaCart
Clustering algorithms are attractive for the task of class identification in spatial databases. However, the application to large spatial databases rises the following requirements for clustering algorithms: minimal requirements of domain knowledge to determine the input parameters, discovery
BIRCH: an efficient data clustering method for very large databases
- In Proc. of the ACM SIGMOD Intl. Conference on Management of Data (SIGMOD
, 1996
"... Finding useful patterns in large datasets has attracted considerable interest recently, and one of the most widely st,udied problems in this area is the identification of clusters, or deusel y populated regions, in a multi-dir nensional clataset. Prior work does not adequately address the problem of ..."
Abstract
-
Cited by 576 (2 self)
- Add to MetaCart
is also the first clustering algorithm proposerl in the database area to handle “noise) ’ (data points that are not part of the underlying pattern) effectively. We evaluate BIRCH’S time/space efficiency, data input order sensitivity, and clustering quality through several experiments. We also present a
Parallel Networks that Learn to Pronounce English Text
- COMPLEX SYSTEMS
, 1987
"... This paper describes NETtalk, a class of massively-parallel network systems that learn to convert English text to speech. The memory representations for pronunciations are learned by practice and are shared among many processing units. The performance of NETtalk has some similarities with observed h ..."
Abstract
-
Cited by 549 (5 self)
- Add to MetaCart
task, but differ completely at the levels of synaptic strengths and single-unit responses. However, hierarchical clustering techniques applied to NETtalk reveal that these different networks have similar internal representations of letter-to-sound correspondences within groups of processing units
An extensive empirical study of feature selection metrics for text classification
- J. of Machine Learning Research
, 2003
"... Machine learning for text classification is the cornerstone of document categorization, news filtering, document routing, and personalization. In text domains, effective feature selection is essential to make the learning task efficient and more accurate. This paper presents an empirical comparison ..."
Abstract
-
Cited by 496 (15 self)
- Add to MetaCart
Machine learning for text classification is the cornerstone of document categorization, news filtering, document routing, and personalization. In text domains, effective feature selection is essential to make the learning task efficient and more accurate. This paper presents an empirical comparison
The geometry of graphs and some of its algorithmic applications
- COMBINATORICA
, 1995
"... In this paper we explore some implications of viewing graphs as geometric objects. This approach offers a new perspective on a number of graph-theoretic and algorithmic problems. There are several ways to model graphs geometrically and our main concern here is with geometric representations that res ..."
Abstract
-
Cited by 524 (19 self)
- Add to MetaCart
, small balanced separators can be found efficiently. Faithful low-dimensional representations of statistical data allow for meaningful and efficient clustering, which is one of the most basic tasks in pattern-recognition. For the (mostly heuristic) methods used
Web Document Clustering: A Feasibility Demonstration
, 1998
"... Abstract Users of Web search engines are often forced to sift through the long ordered list of document “snippets” returned by the engines. The IR community has explored document clustering as an alternative method of organizing retrieval results, but clustering has yet to be deployed on the major s ..."
Abstract
-
Cited by 435 (3 self)
- Add to MetaCart
that clusters based on snippets are almost as good as clusters created using the full text of Web documents. To satisfy the stringent requirements of the Web domain, we introduce an incremental, linear time (in the document collection size) algorithm called Suffix Tree Clustering (STC). which creates clusters
Concept Decompositions for Large Sparse Text Data using Clustering
- Machine Learning
, 2000
"... . Unlabeled document collections are becoming increasingly common and available; mining such data sets represents a major contemporary challenge. Using words as features, text documents are often represented as high-dimensional and sparse vectors--a few thousand dimensions and a sparsity of 95 to 99 ..."
Abstract
-
Cited by 407 (27 self)
- Add to MetaCart
empirically demonstrate that, owing to the high-dimensionality and sparsity of the text data, the clusters produced by the algorithm have a certain "fractal-like" and "self-similar" behavior. As our second contribution, we introduce concept decompositions to approximate the matrix
Mean shift: A robust approach toward feature space analysis
- In PAMI
, 2002
"... A general nonparametric technique is proposed for the analysis of a complex multimodal feature space and to delineate arbitrarily shaped clusters in it. The basic computational module of the technique is an old pattern recognition procedure, the mean shift. We prove for discrete data the convergence ..."
Abstract
-
Cited by 2395 (37 self)
- Add to MetaCart
-estimators of location is also established. Algorithms for two low-level vision tasks, discontinuity preserving smoothing and image segmentation are described as applications. In these algorithms the only user set parameter is the resolution of the analysis, and either gray level or color images are accepted as input
Results 1 - 10
of
7,110