Results 1 -
4 of
4
An evaluation of techniques for clustering search results
, 1996
"... The ability to effectively organize retrieval results becomes more important as the focus of Information Retrieval (IR) shifts towards interactive search processes. Automatic classification techniques are capable of providing the necessary information organization by arranging the retrieved data int ..."
Abstract
-
Cited by 35 (3 self)
- Add to MetaCart
The ability to effectively organize retrieval results becomes more important as the focus of Information Retrieval (IR) shifts towards interactive search processes. Automatic classification techniques are capable of providing the necessary information organization by arranging the retrieved data into groups of documents with common subjects. In this paper, we compare classification methods from IR and Machine Learning (ML) for clustering search results. Issues such as document representation, classification algorithms, and cluster representation are discussed. We introduce several evaluation techniques and use them in preliminary experiments. These experiments indicate that the proposed techniques have promise, but it is clear that user experiments are required to carry out more thorough evaluation.
Fast Supervised Dimensionality Reduction Algorithm with Applications to Document Categorization & Retrieval
- In Proceedings of CIKM-00
, 2000
"... Retrieval techniques based on dimensionality reduction, such as Latent Semantic Indexing (LSI), have been shown to improve the quality of the information being retrieved by capturing the latent meaning of the words present in the documents. Unfortunately, the high computational and memory requiremen ..."
Abstract
-
Cited by 15 (1 self)
- Add to MetaCart
Retrieval techniques based on dimensionality reduction, such as Latent Semantic Indexing (LSI), have been shown to improve the quality of the information being retrieved by capturing the latent meaning of the words present in the documents. Unfortunately, the high computational and memory requirements of LSI and its inability to compute an effective dimensionality reduction in a supervised setting limits its applicability. In this paper we present a fast supervised dimensionality reduction algorithm that is derived from the recently developed cluster-based unsupervised dimensionality reduction algorithms. We experimentally evaluate the quality of the lower dimensional spaces both in the context of document categorization and improvements in retrieval performance on a variety of different document collections. Our experiments show that the lower dimensional spaces computed by our algorithm consistently improve the performance of traditional algorithms such as C4.5, k-nearest- neighbor, ...
An Evaluation of Techniques for Clustering Search Results
, 1996
"... . The ability to effectively organize retrieval results becomes more important as the focus of Information Retrieval (IR) shifts towards interactive search processes. Automatic classification techniques are capable of providing the necessary information organization by arranging the retrieved data ..."
Abstract
- Add to MetaCart
. The ability to effectively organize retrieval results becomes more important as the focus of Information Retrieval (IR) shifts towards interactive search processes. Automatic classification techniques are capable of providing the necessary information organization by arranging the retrieved data into groups of documents with common subjects. In this paper, we compare classification methods from IR and Machine Learning (ML) for clustering search results. Issues such as document representation, classification algorithms, and cluster representation are discussed. We introduce several evaluation techniques and use them in preliminary experiments. These experiments indicate that the proposed techniques have promise, but it is clear that user experiments are required to carry out more thorough evaluation. T his material is based on work supported in part by the National Science Foundation, Library of Congress and Department of Commerce under cooperative agreement number EEC-9209623. Any opinions, findings and conclusions or recommendations expressed in this material are the author(s) and do not necessarily reflect those of the sponsor. This material is based on work supported in part by NRaD Contract Number N66001-94-D-6054. 2 1
Binary Methods in Data Mining
, 2008
"... Data Mining is the process of extracting useful and nontrivial information from databases. Databases tend to be very big. As a consequence, fast and scalable data mining techniques are increasingly becoming more important. This dissertation introduces four new data mining techniques which use binary ..."
Abstract
- Add to MetaCart
Data Mining is the process of extracting useful and nontrivial information from databases. Databases tend to be very big. As a consequence, fast and scalable data mining techniques are increasingly becoming more important. This dissertation introduces four new data mining techniques which use binary representation of data, and take advantage of bit vectors to allow for fast computation and low memory requirements. Finding frequent item sets is a very important problem that can be solved in exponential time. In general, the algorithms that find all the frequent item sets are not practical. Chapter 2 introduces a fast and approximative algorithm for finding the frequent item sets in quadratic time. In the same chapter, it is shown the algorithm works very well in practice. Chapter 3 introduces a new clustering algorithm that is designed for text documents. In a text document database, we use a dictionary for finding synonyms and therefore considerably reduce the size of unique words. The main algorithm creates iv overlapping clusters with cluster descriptions. An extensive set of experiments on

