Results 1 - 10
of
64
Reexamining the Cluster Hypothesis: Scatter/Gather on Retrieval Results
, 1996
"... We present Scatter/Gather, a cluster-based document browsing method, as an alternative to ranked titles for the organization and viewing of retrieval results. We systematically evaluate Scatter/Gather in this context and find significant improvements over similarity search ranking alone. This resul ..."
Abstract
-
Cited by 331 (5 self)
- Add to MetaCart
We present Scatter/Gather, a cluster-based document browsing method, as an alternative to ranked titles for the organization and viewing of retrieval results. We systematically evaluate Scatter/Gather in this context and find significant improvements over similarity search ranking alone. This result provides evidence validating the cluster hypothesis which states that relevant documents tend to be more similar to each other than to non-relevant documents. We describe a system employing Scatter/Gather and demonstrate that users are able to use this system close to its full potential. 1 Introduction An important service offered by an information access system is the organization of retrieval results. Conventional systems rank results based on an automatic assessment of relevance to the query [20]. Alternatives include graphical displays of interdocument similarity (e.g., [1, 22, 7]), relationship to fixed attributes (e.g., [21, 14]), and query term distribution patterns (e.g., [12]). I...
Grouper: A Dynamic Clustering Interface to Web Search Results
, 1999
"... Users of Web search engines are often forced to sift through the long ordered list of document "snippets" returned by the engines. The IR community has explored document clustering as an alternative method of organizing retrieval results, but clustering has yet to be deployed on most major search en ..."
Abstract
-
Cited by 196 (2 self)
- Add to MetaCart
Users of Web search engines are often forced to sift through the long ordered list of document "snippets" returned by the engines. The IR community has explored document clustering as an alternative method of organizing retrieval results, but clustering has yet to be deployed on most major search engines. The NorthernLight search engine organizes its output into "custom folders" based on pre-computed document labels, but does not reveal how the folders are generated or how well they correspond to users' interests. In this paper, we introduce Grouper -- an interface to the results of the HuskySearch meta-search engine, which dynamically groups the search results into clusters labeled by phrases extracted from the snippets. In addition, we report on the first empirical comparison of user Web search behavior on a standard ranked-list presentation versus a clustered presentation. By analyzing HuskySearch logs, we are able to demonstrate substantial differences in the number of documents f...
Cat-a-Cone: An Interactive Interface for Specifying Searches and Viewing Retrieval Results using a Large Category Hierarchy
, 1997
"... This paper introduces a novel user interface that integrates search and browsing of very large category hierarchies with their associated text collections. A key component is the separate but simultaneous display of the representations of the categories and the retrieved documents. Another key compo ..."
Abstract
-
Cited by 92 (3 self)
- Add to MetaCart
This paper introduces a novel user interface that integrates search and browsing of very large category hierarchies with their associated text collections. A key component is the separate but simultaneous display of the representations of the categories and the retrieved documents. Another key component is the display ofmultiple selected categories simultaneously, complete with their hierarchical context. The prototype implementation uses animation and a three-dimensional graphical workspace to accommodate the category hierarchy and to store intermediate search results. Query specification in this 3D environment is accomplished via a novel method for painting Boolean queries over a combination of category labels and free text. Examples are shown on a collection of medical text.
Information retrieval on the Web
- ACM Computing Surveys
, 2000
"... In this paper we review studies of the growth of the Internet and technologies that are useful for information search and retrieval on the Web. We present data on the Internet from several different sources, e.g., current as well as projected number of users, hosts, and Web sites. Although numerical ..."
Abstract
-
Cited by 58 (0 self)
- Add to MetaCart
In this paper we review studies of the growth of the Internet and technologies that are useful for information search and retrieval on the Web. We present data on the Internet from several different sources, e.g., current as well as projected number of users, hosts, and Web sites. Although numerical figures vary, overall trends cited
Aspect Windows, 3-D Visualizations, and Indirect Comparisons of Information Retrieval Systems
, 1998
"... We built two Information Retrieval systems that were targeted for the TREC-6 "aspect oriented " retrieval track. The systems were built to test the usefulness of different visualizations in an interactive IR setting---in particular, an "aspect window" for the chosen task, and a 3-D visualization of ..."
Abstract
-
Cited by 46 (4 self)
- Add to MetaCart
We built two Information Retrieval systems that were targeted for the TREC-6 "aspect oriented " retrieval track. The systems were built to test the usefulness of different visualizations in an interactive IR setting---in particular, an "aspect window" for the chosen task, and a 3-D visualization of document inter-relationships. We studied 24 users of the system in order to investigate: whether the systems were more effective than a control system, whether experienced users outperformed novices, whether spatial reasoning ability was a good predictor of effective use of 3-D, and whether the systems could be compared indirectly via a control system. Our results show substantial differences in user performance are related to spatial reasoning ability and to a lesser degree other traits. We also obtained markedly different results from the direct and indirect comparisons. 1 Introduction We are interested in building and evaluating high quality information retrieval and organization tools....
The ecological approach to text visualization
- Journal of the American Society for Information Science
, 1999
"... “Words and rocks contain a language that follows a syntax of splits and ruptures. Look at any word long enough and you will see it open up into...a terrain of particles, each containing its own void... ” Robert Smithson (1996) This article presents both theoretical and technical bases on which to bu ..."
Abstract
-
Cited by 40 (0 self)
- Add to MetaCart
“Words and rocks contain a language that follows a syntax of splits and ruptures. Look at any word long enough and you will see it open up into...a terrain of particles, each containing its own void... ” Robert Smithson (1996) This article presents both theoretical and technical bases on which to build a ‘‘science of text visualization.’’ These conceptually produce ‘‘the ecological approach,’’ which is rooted in ecological and evolutionary psychology. The basic idea is that humans are genetically selected from their species history to perceptually interpret certain informational aspects of natural environments. If information from text documents is visually spatialized in a manner conformal with these predilections, its meaningful interpretation to the user of a text visualization system becomes relatively intuitive and accurate. The SPIRE text visualization system, which images information from free text documents as natural terrains, serves as an example of the ‘‘ecological approach’’ in its visual metaphor, its text analysis, and its spatializing procedures. This article both formalizes Smithson’s evocative prose and responds to Steven Eick’s recent challenge (Eick, l997) to proceed to a real “science of information visualization.” It describes the theoretical rationale and technical basis of
The SOMLib Digital Library System
- In Proc. Europ. Conf. on Research and Advanced Technology for Digital Libraries (ECDL99
, 1999
"... . Digital Libraries have gained tremendous interest with several research projects addressing the wealth of challenges in this field. While computational intelligence systems are being used for specific tasks in this arena, the majority of projects relies on conventional techniques for the basic str ..."
Abstract
-
Cited by 35 (16 self)
- Add to MetaCart
. Digital Libraries have gained tremendous interest with several research projects addressing the wealth of challenges in this field. While computational intelligence systems are being used for specific tasks in this arena, the majority of projects relies on conventional techniques for the basic structure of the library itself. With the SOMLib project we created a digital library system that uses a neural network-based core for the representation of the library. The self-organizing map, a popular unsupervised neural network model, is used to topically structure a document collection similar to the organization of real-world libraries. Based on this core, additional modules provide information retrieval features, integrate distributed libraries, and automatically label the various topical sections in the document collection. A metaphor graphics based interface further assists the user in intuitively understanding the library providing an instant overview. Keywords: Self-Organizing Map ...
Lighthouse: Showing the Way to Relevant Information
, 2000
"... Lighthouse is an on-line interface for a Web-based information retrieval system. It accepts queries from a user, collects the retrieved documents from the search engine, organizes and presents them to the user. The system integrates two known presentations of the retrieved results -- the ranked list ..."
Abstract
-
Cited by 30 (3 self)
- Add to MetaCart
Lighthouse is an on-line interface for a Web-based information retrieval system. It accepts queries from a user, collects the retrieved documents from the search engine, organizes and presents them to the user. The system integrates two known presentations of the retrieved results -- the ranked list and clustering visualization -- in a novel and effective way. It accepts the user's input and adjusts the document visualization accordingly. We give a brief overview of the system. H.3.3 Information Search and Retrieval -- Relevance feedback. H.3.5 Online Information Services -- Web-based services; H.5.2 User Interfaces -- Graphical user interfaces, Screen design; 1. Introduction Locating interesting information on the World Wide Web is the main task of on-line search engines. Such an engine accepts a query from a user and responds with a list of documents or web pages that are considered to be relevant to the query. The pages are ranked by their likelihood of being relevant to the user...
Interactive cluster visualization for information retrieval
- In Proceedings of ECDL'98
, 1997
"... Abstract. In this paper we investigate a general purpose interactive information organization system. The system organizes documents by placing them into 1-, 2-, or 3dimensional space based on their similarity and a springembedding algorithm. We begin by developing a method for estimating the qualit ..."
Abstract
-
Cited by 28 (8 self)
- Add to MetaCart
Abstract. In this paper we investigate a general purpose interactive information organization system. The system organizes documents by placing them into 1-, 2-, or 3dimensional space based on their similarity and a springembedding algorithm. We begin by developing a method for estimating the quality of the organization when it is applied to a set of documents returned in response to a query. We show how the relevant documents tend to clump together in space. We proceed by presenting a method for measuring the amount of structure in the organization and explain how this knowledge can be used to refine the system. We also show that increasing the dimensionality of the organization generally improves its quality, albeit only a small amount. We introduce two methods for modifying the organization based on information obtained from the user and show how such feedback improves the organization. All the analysis is done offline without direct user intervention.
Human performance on clustering web pages: a preliminary study
- In Proceedings of The Fourth International Conference on Knowledge Discovery and Data Mining (KDD-98
, 1998
"... With the increase in information on the World Wide Web it has become difficult to quickly find desired information without using multiple queries or using a topic-specific search engine. One way to help in the search is by grouping HTML pages together that appear in some way to be related. In order ..."
Abstract
-
Cited by 26 (1 self)
- Add to MetaCart
With the increase in information on the World Wide Web it has become difficult to quickly find desired information without using multiple queries or using a topic-specific search engine. One way to help in the search is by grouping HTML pages together that appear in some way to be related. In order to better understand this task, we performed an initial study of human clustering of web pages, in the hope that it would provide some insight into the difficulty of automating this task. Our results show that subjects did not cluster identically; in fact, on average, any two subjects had little similarity in their web-page clusters. We also found that subjects generally created rather small clusters, and those with access only to URLs created fewer clusters than those with access to the full text of each web page. Generally the overlap of documents between clusters for any given subject increased when given the full text, as did the percentage of documents clustered. When analyzing individual subjects, we found that each had different behavior across queries, both in terms of overlap, size of clusters, and number of clusters. These results provide a sobering note on any quest for a single clearly correct clustering method for web pages.

