Results 1 - 10
of
12
Reexamining the Cluster Hypothesis: Scatter/Gather on Retrieval Results
, 1996
"... We present Scatter/Gather, a cluster-based document browsing method, as an alternative to ranked titles for the organization and viewing of retrieval results. We systematically evaluate Scatter/Gather in this context and find significant improvements over similarity search ranking alone. This resul ..."
Abstract
-
Cited by 331 (5 self)
- Add to MetaCart
We present Scatter/Gather, a cluster-based document browsing method, as an alternative to ranked titles for the organization and viewing of retrieval results. We systematically evaluate Scatter/Gather in this context and find significant improvements over similarity search ranking alone. This result provides evidence validating the cluster hypothesis which states that relevant documents tend to be more similar to each other than to non-relevant documents. We describe a system employing Scatter/Gather and demonstrate that users are able to use this system close to its full potential. 1 Introduction An important service offered by an information access system is the organization of retrieval results. Conventional systems rank results based on an automatic assessment of relevance to the query [20]. Alternatives include graphical displays of interdocument similarity (e.g., [1, 22, 7]), relationship to fixed attributes (e.g., [21, 14]), and query term distribution patterns (e.g., [12]). I...
Projections for Efficient Document Clustering
, 1997
"... Clustering is increasing in importance, but linear- and even constant-time clustering algorithms are often too slow for real-time applications. A simple way to speed up clustering is to speed up the distance calculations at the heart of clustering routines. We study two techniques for improving the ..."
Abstract
-
Cited by 86 (0 self)
- Add to MetaCart
Clustering is increasing in importance, but linear- and even constant-time clustering algorithms are often too slow for real-time applications. A simple way to speed up clustering is to speed up the distance calculations at the heart of clustering routines. We study two techniques for improving the cost of distance calculations, LSI and truncation, and determine both how much these techniques speed up clustering and how much they affect the quality of the resulting clusters. We find that the speed increase is significant while --- surprisingly --- the quality of clustering is not adversely affected. We conclude that truncation yields clusters as good as those produced by full-profile clustering while offering a significant speed advantage.
On-Line New Event Detection, Clustering, And Tracking
, 1999
"... In this work, we discuss and evaluate solutions to text classification problems associated with the events that are reported in on-line sources of news. We present solutions to three related classification problems: new event detection, event clustering, and event tracking. The primary focus of this ..."
Abstract
-
Cited by 16 (0 self)
- Add to MetaCart
In this work, we discuss and evaluate solutions to text classification problems associated with the events that are reported in on-line sources of news. We present solutions to three related classification problems: new event detection, event clustering, and event tracking. The primary focus of this thesis is new event detection, where the goal is to identify news stories that have not previously been reported, in a stream of broadcast news comprising radio, television, and newswire. We present an algorithm for new event detection, and analyze the effects of incorporating domain properties into the classification algorithm. We explore a solution that models the temporal relationship between news stories, and investigate the use of proper noun phrase
The use of categories and clusters for organizing retrieval results
- Natural Language Information Retrieval
, 1999
"... Abstract. An important problem for information access systems is that of organizing large sets of documents that have been retrieved in response to a query. Text categorization and text clustering are two natural language processing tasks whose results can be applied to document organization. This c ..."
Abstract
-
Cited by 14 (1 self)
- Add to MetaCart
Abstract. An important problem for information access systems is that of organizing large sets of documents that have been retrieved in response to a query. Text categorization and text clustering are two natural language processing tasks whose results can be applied to document organization. This chapter describes user interfaces that use categories and clusters to organize retrieval results, and examines the relationship between the two. 1 1.
A Study of Search Result Clustering Interfaces: Comparing Textual and Zoomable User Interfaces
- University of Maryland HCIL
, 2003
"... There have been several studies that compare sequential search results versus clustered search results, and graphical presentations versus textual presentations. These studies have resulted in confirmed efficiency and preference of clustering over sequential lists. The studies between graphical and ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
There have been several studies that compare sequential search results versus clustered search results, and graphical presentations versus textual presentations. These studies have resulted in confirmed efficiency and preference of clustering over sequential lists. The studies between graphical and textual presentations have usually shown to be task dependant. This study shows a systematic evaluation of zoomable versus textual clustered search results. A controlled experiment with repeated measures design and within-subjects differences was performed with fifteen subjects, comparing Groxis, Inc.’s Grokker – their clustering product – a zoomable user interface, their textual clustering product and Vivisimo’s textual clustering product. No significant differences were found for objective measures. However, there were significant differences for subjective measures. The textual clustering interfaces was preferred and elicited major satisfaction among the users. Results are summarized in both a quantitative and qualitative format. Author Keywords Information retrieval, textual clusters, graphical clusters,
From Information Retrieval to Hypertext and Back Again: The Role of Interaction in the Information Exploration Interface
, 1997
"... This work explores the design space of user interfaces for large-scale full-text database retrieval systems. Research suggests that elements of hypertext interfaces may be merged with traditional information retrieval (IR) algorithms to produce flexible hybrid interfaces for userdirected information ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
This work explores the design space of user interfaces for large-scale full-text database retrieval systems. Research suggests that elements of hypertext interfaces may be merged with traditional information retrieval (IR) algorithms to produce flexible hybrid interfaces for userdirected information exploration. This work examines the effectiveness of multiple-view newspaper-like interfaces, and describes a prototype that uses newspaper-style layouts to organize information retrieval results. Finally, it explores some possible visualization techniques designed to aid browsing performance. The first of two experiments in this thesis examines the effectiveness of the simultaneous display of several documents retrieved by a given query. Experimental results suggest that viewed recall increases with increasing numbers of articles displayed on the screen simultaneously. Subjects' decision-making strategies appear to be independent of user interface factors. The second experiment tests differences in behavior between query-based and link-based browsing. Differences in performance are found between groups of users employing different strategies, but not between interface conditions. These results suggest that dynamic querymediated hypertext interfaces are viable alternatives to more explicit queries, and that subjects' intrinsic strategies have significant impact on their interaction with the system and on their performance. This work proposes an implementation of dynamic links in the WWW medium. It concludes with a discussion about the nature of hypertext interfaces and about the role of the user interface in information exploration tasks, and suggests some avenues for future research in this area.
Y.: Filtering speaker-specific words from electronic discussions
- In: Proceedings of The 20th International Conference on Computational Linguistics. (2004
"... The work presented in this paper is the first step in a project which aims to cluster and summarise electronic discussions in the context of help-desk applications. The eventual objective of this project is to use these summaries to assist help-desk users and operators. In this paper, we identify fe ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
The work presented in this paper is the first step in a project which aims to cluster and summarise electronic discussions in the context of help-desk applications. The eventual objective of this project is to use these summaries to assist help-desk users and operators. In this paper, we identify features of electronic discussions that influence the clustering process, and offer a filtering mechanism that removes undesirable influences. We tested the clustering and filtering processes on electronic newsgroup discussions, and evaluated their performance by means of two experiments: coarse-level clustering and simple information retrieval. Our evaluation shows that our filtering mechanism has a significant positive effect on both tasks. 1
VII-1 VII. Search and Retrieval Experiments in Real-Time Information Retrieval
"... Future operating document retrieval systems may be based on fullyautomatic information analysis methods instead of manual indexing, and on real-time search procedures which allow the user to interact with the system during the search process. Performance characteristics are first given for fully-aut ..."
Abstract
- Add to MetaCart
Future operating document retrieval systems may be based on fullyautomatic information analysis methods instead of manual indexing, and on real-time search procedures which allow the user to interact with the system during the search process. Performance characteristics are first given for fully-automatic information retrieval systems, and comparisons are made with presently operating partly-manual systems. Thereafter, various user-controlled search strategies are described, and the potential of these strategies in improving systems performance is discussed. The evaluation results for the real-time retrieval procedures are used to derive design criteria for future automatic information systems. 1.
XI-1 Interactive Search Strategies and Dynamic File Organization in Information Retrieval
"... A great deal of effort has been devoted in recent years to the evaluation of automatic or semi-automatic information retrieval systems. Recent evaluation results indicate that the search effectiveness presently achieved, or likely to be achievable in the foreseeable future, is much smaller than expe ..."
Abstract
- Add to MetaCart
A great deal of effort has been devoted in recent years to the evaluation of automatic or semi-automatic information retrieval systems. Recent evaluation results indicate that the search effectiveness presently achieved, or likely to be achievable in the foreseeable future, is much smaller than expected by a majority of the potential user population. Furthermore, theoretical advances in language analysis and data organization promise only relatively modest future improvements. The most significant advances in retrieval effectiveness are likely to be obtained by adaptive interaction techniques that extract information from the user during the search process to improve the organization of the data space, thereby providing more effective search and retrieval operations. The various user feedback techniques described either modify the user queries in such a way as to bring these queries close:- to existing groups of relevant

