Results 1 - 10
of
16
Web Document Clustering: A Feasibility Demonstration
, 1998
"... Abstract Users of Web search engines are often forced to sift through the long ordered list of document “snippets” returned by the engines. The IR community has explored document clustering as an alternative method of organizing retrieval results, but clustering has yet to be deployed on the major s ..."
Abstract
-
Cited by 279 (3 self)
- Add to MetaCart
Abstract Users of Web search engines are often forced to sift through the long ordered list of document “snippets” returned by the engines. The IR community has explored document clustering as an alternative method of organizing retrieval results, but clustering has yet to be deployed on the major search engines. The paper articulates the unique requirements of Web document clustering and reports on the first evaluation of clustering methods in this domain. A key requirement is that the methods create their clusters based on the short snippets returned by Web search engines. Surprisingly, we find that clusters based on snippets are almost as good as clusters created using the full text of Web documents. To satisfy the stringent requirements of the Web domain, we introduce an incremental, linear time (in the document collection size) algorithm called Suffix Tree Clustering (STC). which creates clusters based on phrases shared between documents. We show that STC is faster than standard clustering methods in this domain, and argue that Web document clustering via STC is both feasible and potentially beneficial. 1
Grouper: A Dynamic Clustering Interface to Web Search Results
, 1999
"... Users of Web search engines are often forced to sift through the long ordered list of document "snippets" returned by the engines. The IR community has explored document clustering as an alternative method of organizing retrieval results, but clustering has yet to be deployed on most major search en ..."
Abstract
-
Cited by 196 (2 self)
- Add to MetaCart
Users of Web search engines are often forced to sift through the long ordered list of document "snippets" returned by the engines. The IR community has explored document clustering as an alternative method of organizing retrieval results, but clustering has yet to be deployed on most major search engines. The NorthernLight search engine organizes its output into "custom folders" based on pre-computed document labels, but does not reveal how the folders are generated or how well they correspond to users' interests. In this paper, we introduce Grouper -- an interface to the results of the HuskySearch meta-search engine, which dynamically groups the search results into clusters labeled by phrases extracted from the snippets. In addition, we report on the first empirical comparison of user Web search behavior on a standard ranked-list presentation versus a clustered presentation. By analyzing HuskySearch logs, we are able to demonstrate substantial differences in the number of documents f...
Interactive cluster visualization for information retrieval
- In Proceedings of ECDL'98
, 1997
"... Abstract. In this paper we investigate a general purpose interactive information organization system. The system organizes documents by placing them into 1-, 2-, or 3dimensional space based on their similarity and a springembedding algorithm. We begin by developing a method for estimating the qualit ..."
Abstract
-
Cited by 28 (8 self)
- Add to MetaCart
Abstract. In this paper we investigate a general purpose interactive information organization system. The system organizes documents by placing them into 1-, 2-, or 3dimensional space based on their similarity and a springembedding algorithm. We begin by developing a method for estimating the quality of the organization when it is applied to a set of documents returned in response to a query. We show how the relevant documents tend to clump together in space. We proceed by presenting a method for measuring the amount of structure in the organization and explain how this knowledge can be used to refine the system. We also show that increasing the dimensionality of the organization generally improves its quality, albeit only a small amount. We introduce two methods for modifying the organization based on information obtained from the user and show how such feedback improves the organization. All the analysis is done offline without direct user intervention.
Improving Interactive Retrieval by Combining Ranked Lists and Clustering
- IN PROCEEDINGS OF RIAO’2000
, 2000
"... We study the problem of organizing the documents returned by an information retrieval system in response to a natural language query. We consider two well-known approaches -- the ranked list and clustering of the results -- and we show how they can be integrated. This new procedure is designed to ..."
Abstract
-
Cited by 18 (5 self)
- Add to MetaCart
We study the problem of organizing the documents returned by an information retrieval system in response to a natural language query. We consider two well-known approaches -- the ranked list and clustering of the results -- and we show how they can be integrated. This new procedure is designed to accept user feedback and direct the user toward the relevant material as effectively as the traditional relevance feedback approach. We show how our technique can be explained to the user by visualizing the process in two or three dimensions, providing him or her with complete control of the procedure. We show that increasing the dimensionality of the visualization generally improves its quality, albeit only a small amount. Additionally we present the result of a small user study designed to investigate how effective our visualization is in supporting the user navigating the retrieved results.
Visualization of search results in document retrieval systems, General Examination Report
- University of Washington, SIGTRS Bulletin
, 1998
"... Traditional information retrieval systems present search results as a ranked list of documents, ordered by their estimated relevance to the query. Visualization of search results is emerging as a powerful tool for presenting more information to the user in a way that is both intuitive and easy to in ..."
Abstract
-
Cited by 13 (1 self)
- Add to MetaCart
Traditional information retrieval systems present search results as a ranked list of documents, ordered by their estimated relevance to the query. Visualization of search results is emerging as a powerful tool for presenting more information to the user in a way that is both intuitive and easy to interpret. This paper describes the various visualization techniques, and presents a novel classification of these methods. Next it discusses several of the important issues concerning these techniques: how they are evaluated, how they scale to large document sets, can they be combined, and will we see them in practice on the Web anytime soon. 1
Info navigator: A visualization tool for document searching and browsing
- In Proc. of the Intl. Conf. on Distributed Multimedia Systems (DMS
, 2003
"... We present a text document search engine with several new visualization front-ends that aid navigation through the set of documents returned by a query (short “returned documents”). ..."
Abstract
-
Cited by 13 (4 self)
- Add to MetaCart
We present a text document search engine with several new visualization front-ends that aid navigation through the set of documents returned by a query (short “returned documents”).
Feature Reduction for Document Clustering and Classification
, 2000
"... Often users receive search results which contain a wide range of documents, only some of which are relevant to their information needs. To address this problem, ever more systems not only locate information for users, but also organise that information on their behalf. We look at two main automatic ..."
Abstract
-
Cited by 12 (2 self)
- Add to MetaCart
Often users receive search results which contain a wide range of documents, only some of which are relevant to their information needs. To address this problem, ever more systems not only locate information for users, but also organise that information on their behalf. We look at two main automatic approaches to information organisation: interactive clustering of search results and pre-categorising documents to provide hierarchical browsing structures. To be feasible in real world applications, both of these approaches require accurate yet efficient algorithms. Yet, both suffer from the curse of dimensionality — documents are typically represented by hundreds or thousands of words (features) which must be analysed and processed during clustering or classification. In this paper, we discuss feature reduction techniques and their application to document clustering and classification, showing that feature reduction improves efficiency as well as accuracy. We validate these algorithms using human relevance assignments and categorisation. 1
Using Clustering and Classification Approaches in Interactive Retrieval
, 2001
"... Satisfying non-trivial information needs involves collecting information from multiple resources, and synthesizing an answer that organizes that information. Traditional recall/precision-oriented information retrieval focuses on just one phase of that process: how to efficiently and effectively i ..."
Abstract
-
Cited by 12 (4 self)
- Add to MetaCart
Satisfying non-trivial information needs involves collecting information from multiple resources, and synthesizing an answer that organizes that information. Traditional recall/precision-oriented information retrieval focuses on just one phase of that process: how to efficiently and effectively identify documents likely to be relevant to a specific, focused query. The TREC Interactive Track has as its goal the location of documents that pertain to different instances of a query topic, with no reward for duplicated coverage of topic instances. This task is similar to the task of organizing answer components into a complete answer. Clustering and classification are two mechanisms for organizing documents into groups. In this paper, we present an ongoing series of experiments that test the feasibility and effectiveness of using clustering and classification as an aid to instance retrieval and, ultimately, answer construction. Our results show that users prefer such structured p...
Visual Interactions with a Multidimensional Ranked List
- IN PROCEEDINGS OF THE 21ST ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (THESE PROCEEDINGS
, 1998
"... ..."

