• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Feature Reduction for Document Clustering and Classification (2000)

by S M Rüger, S E Gauch
Add To MetaCart

Tools

Sorted by:
Results 1 - 6 of 6

Info navigator: A visualization tool for document searching and browsing

by Matthew Carey, Daniel C Heesch, Stefan M Rüger - In Proc. of the Intl. Conf. on Distributed Multimedia Systems (DMS , 2003
"... We present a text document search engine with several new visualization front-ends that aid navigation through the set of documents returned by a query (short “returned documents”). ..."
Abstract - Cited by 13 (4 self) - Add to MetaCart
We present a text document search engine with several new visualization front-ends that aid navigation through the set of documents returned by a query (short “returned documents”).

A Visualization Interface for Document Searching and Browsing

by Matthew Carey, Frank Kriwaczek, Stefan M. Rüger - In Proceedings of CIKM 2000 Workshop on New Paradigms in Information Visualization and Manipulation , 2000
"... We present a text document search engine with several new visualization front-ends that aid navigation through the set of documents returned by a query (hit documents). Our methods are based on identifying and selecting keywords on the fly. The choice of keywords depends not only on the frequency of ..."
Abstract - Cited by 3 (0 self) - Add to MetaCart
We present a text document search engine with several new visualization front-ends that aid navigation through the set of documents returned by a query (hit documents). Our methods are based on identifying and selecting keywords on the fly. The choice of keywords depends not only on the frequency of their occurrence within hit documents but also on the specificity of their occurrence just within these hit documents. Keywords are subsequently used to obtain a sparse document representation and to compute document clusters using a variant of the buckshot algorithm. One of the visualization front-ends uses the sparse document representation to obtain keyword clusters. We make use of the clustering to group the documents returned from the search visually, and to label the groups with their most salient keywords. The different front-ends cater for different user needs. Two of them can be employed to browse cluster information as well as to drill down or up in clusters and refine the search...

Dynamic Clustering using Support Vector Learning with Particle Swarm Optimization

by 指導教授:林建宏博 士 Advisor:dr. Jiann-horng Lin , 2005
"... Institute of Information Management I-Shou University This thesis presents a new approach to the support vector learning for dynamic clustering based on particle swarm optimization. Support vector clustering requires solving a constrained quadratic optimization problem. This problem often involves a ..."
Abstract - Cited by 2 (0 self) - Add to MetaCart
Institute of Information Management I-Shou University This thesis presents a new approach to the support vector learning for dynamic clustering based on particle swarm optimization. Support vector clustering requires solving a constrained quadratic optimization problem. This problem often involves a matrix with an extremely large number of entries, which make off-the-shelf optimization packages unsuitable. Several methods have been used to decompose the problem, of which many require numeric packages for solving the smaller sub-problems. Support vector clustering solves the unsupervised clustering problem by searching for a minimal sphere enclosing all data images in feature space. Data points are mapped

News item extraction for text mining in web newspapers

by Kjetil Nørv Ag, I Øyri - In Proceedings of International Workshop on Challenges in Web Information Retrieval and Integration (in conjunction with ICDE’2005 , 2005
"... Web newspapers provide a valuable resource for information. In order to benefit more from the available information, text mining techniques can be applied. However, because each newspaper page often covers a lot of unrelated topics, page-based data mining will not always give useful results. In orde ..."
Abstract - Cited by 2 (0 self) - Add to MetaCart
Web newspapers provide a valuable resource for information. In order to benefit more from the available information, text mining techniques can be applied. However, because each newspaper page often covers a lot of unrelated topics, page-based data mining will not always give useful results. In order to improve on complete-page mining, we present an approach based on extracting the individual news items from the web pages and mining these separately. Automatic news item extraction is a difficult problem, and in this paper we also provide strategies solving that task. We study the quality of the news item extraction, and also provide results from clustering the extracted news items. 2 1

Rich Document Representation for Document Clustering

by Azam Jalali, Farhad Oroumchian , 2004
"... In traditional document clustering models, a document is considered as a bag of words. In this paper we present a new method for generating feature vectors, using the sentence fragments that are called logical terms and statements, in PLIR system. PLIR is a Knowledge-Based Information system based ..."
Abstract - Cited by 1 (1 self) - Add to MetaCart
In traditional document clustering models, a document is considered as a bag of words. In this paper we present a new method for generating feature vectors, using the sentence fragments that are called logical terms and statements, in PLIR system. PLIR is a Knowledge-Based Information system based on the theory of the Plausible Reasoning. We have conducted a number of experiments using OHSUMED document collection and the clustering methods K-Means with seven different similarity measures between documents. The Experiments seem to indicate that the use of richer features such as logical terms or statements for clustering tends to perform better than the simp le bag of words approaches within our domain of experiments that is second phase of a twostage retrieval system.

Performance Analysis of Standard k-Means Clustering Algorithm on Clustering TMG format Document Data

by P. Perumal, R. Nedunchezhian
"... Abstract- Document clustering is useful in many information retrieval operations such as document browsing, organization and viewing of retrieval results, generation of Yahoo-like hierarchies of documents, etc. The general goal of clustering is to group data elements such that the intra-group simila ..."
Abstract - Add to MetaCart
Abstract- Document clustering is useful in many information retrieval operations such as document browsing, organization and viewing of retrieval results, generation of Yahoo-like hierarchies of documents, etc. The general goal of clustering is to group data elements such that the intra-group similarities are high and the inter-group similarities are low. Generative models based on the multivariate Bernoulli and multinomial distributions have been widely used for text classification. In this work, we explore the k-means clustering algorithm for document clustering problem. The proposed work implements the standard k-mean clustering algorithm and tests it with TMG format document data and L2normalized document data. The results of the k-means clustering algorithm are compared with von Mises-Fisher model-based clustering (vMF-based k-means) algorithm.
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University