Results 1 -
4 of
4
Fast and Intuitive Clustering of Web Documents
- In Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining
, 1997
"... Conventional document retrieval systems (e.g., Alta Vista) return long lists of ranked documents in response to user queries. Recently, document clustering has been put forth as an alternative method of organizing retrieval results (Cutting et al. 1992). A person browsing the clusters can discover ..."
Abstract
-
Cited by 87 (2 self)
- Add to MetaCart
Conventional document retrieval systems (e.g., Alta Vista) return long lists of ranked documents in response to user queries. Recently, document clustering has been put forth as an alternative method of organizing retrieval results (Cutting et al. 1992). A person browsing the clusters can discover patterns that could be overlooked in the traditional presentation. This paper describes two novel clustering methods that intersect the documents in a cluster to determine the set of words (or phrases) shared by all the documents in the cluster. We report on experiments that evaluate these intersectionbased clustering methods on collections of snippets returned from Web search engines. First, we show that word-intersection clustering produces superior clusters and does so faster than standard techniques. Second, we show that our O(n log n) time phrase-intersection clustering method produces comparable clusters and does so more than two orders of magnitude faster than all methods tested. I...
The effectiveness of query-specific hierarchic clustering
- in information retrieval. Information Processing and Management
, 2002
"... Hierarchic document clustering has been widely applied to Information Retrieval (IR) on the grounds of its potential improved effectiveness over inverted file search. However, previous research has been inconclusive as to whether clustering does bring improvements. In this paper we take the view tha ..."
Abstract
-
Cited by 29 (2 self)
- Add to MetaCart
Hierarchic document clustering has been widely applied to Information Retrieval (IR) on the grounds of its potential improved effectiveness over inverted file search. However, previous research has been inconclusive as to whether clustering does bring improvements. In this paper we take the view that if hierarchic clustering is applied to search results (query-specific clustering), then it has the potential to increase the retrieval effectiveness compared both to that of static clustering and of conventional inverted file search. We conducted a number of experiments using five document collections and four hierarchic clustering methods. Our results show that the effectiveness of query-specific clustering is indeed higher, and suggest that there is scope for its application to IR.
Multiple Search Engines in Database Merging
- Set Size on Retrieval Experiment Error, ACM SIGIR 2002 Proceedings
, 1997
"... A database merging technique is a strategy for combining the results of multiple independent searches into a single cohesive response. While a variety of techniques have been developed to address a range of problem characteristics, our work focuses on environments in which search engines work in iso ..."
Abstract
-
Cited by 17 (0 self)
- Add to MetaCart
A database merging technique is a strategy for combining the results of multiple independent searches into a single cohesive response. While a variety of techniques have been developed to address a range of problem characteristics, our work focuses on environments in which search engines work in isolation. This paper shows that the behavior of two previously developed isolated techniques is indeed independent of the particular search engines that participate in the search. Two very different search engines, SMART and TOPIC, were each used to retrieve documents from five subcollections. The relative effectiveness of the merged result compared to the effectiveness of a corresponding single collection run is comparable for both engines. The effectiveness of the merged result is improved when both search engines search the same five subcollections but participate in a single merging. The improvement is such that this 10-collection merge is sometimes more effective than the single coll...
Clustering information retrieval search outputs
- In: Proceedings of the 21st BCS IRSG Colloquium on Information Retrieval
, 1999
"... Users are known to experience difficulties in dealing with information retrieval search outputs, especially if those outputs are above a certain size. It has been argued by several researchers that search output clustering can help users in their interaction with IR systems in some retrieval situati ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Users are known to experience difficulties in dealing with information retrieval search outputs, especially if those outputs are above a certain size. It has been argued by several researchers that search output clustering can help users in their interaction with IR systems in some retrieval situations, providing them with an overview of their results by exploiting the topicality information that resides in the output but has not been used at the retrieval stage. This overview might enable them to find relevant documents more easily by focusing on the most promising clusters, or to use the clusters as a starting-point for query refinement or expansion. In this paper, the results of experiments carried out to assess the viability of clustering as a search output presentation method are reported and discussed. 1.

