Results 1 -
5 of
5
Projections for Efficient Document Clustering
, 1997
"... Clustering is increasing in importance, but linear- and even constant-time clustering algorithms are often too slow for real-time applications. A simple way to speed up clustering is to speed up the distance calculations at the heart of clustering routines. We study two techniques for improving the ..."
Abstract
-
Cited by 86 (0 self)
- Add to MetaCart
Clustering is increasing in importance, but linear- and even constant-time clustering algorithms are often too slow for real-time applications. A simple way to speed up clustering is to speed up the distance calculations at the heart of clustering routines. We study two techniques for improving the cost of distance calculations, LSI and truncation, and determine both how much these techniques speed up clustering and how much they affect the quality of the resulting clusters. We find that the speed increase is significant while --- surprisingly --- the quality of clustering is not adversely affected. We conclude that truncation yields clusters as good as those produced by full-profile clustering while offering a significant speed advantage.
Fuzzy Clustering: Related Work \Lambda
, 2001
"... Two of the most fundamental characteristics of the English 1 ..."
List of Tables.................................
, 2006
"... Duplicate bug reports, reports which describe problems or enhancements for which there is already a report in a bug repository, consume time of bug triagers and software developers that might better be spent working on reports that describe unique requests. For many open source projects, the number ..."
Abstract
- Add to MetaCart
Duplicate bug reports, reports which describe problems or enhancements for which there is already a report in a bug repository, consume time of bug triagers and software developers that might better be spent working on reports that describe unique requests. For many open source projects, the number of duplicate reports represents a significant percentage of the repository, numbering in the thousands of reports for many projects. In this thesis, we introduce an approach to suggest potential duplicate bug reports to a bug triager who is processing a new report. We tested our approach on four popular open source projects, achieving the best precision and recall rates of 29 % and 50 % respectively on reports from the popular Firefox open source project. We report on a user study in which we investigated whether our approach can help novice bug triagers process reports from the Firefox repository. Despite the relatively low precision and recall rates of our approach, we found that its use does increase the duplicate detection accuracy of novice bug triagers, while significantly reducing the
THESAURUS AND QUERY EXPANSION
"... The explosive growth of the World Wide Web is making it difficult for a user to locate information that is relevant to his/her interest. Though existing search engines work well to a certain extent but they still face problems like word mismatch which arises because the majority of information retri ..."
Abstract
- Add to MetaCart
The explosive growth of the World Wide Web is making it difficult for a user to locate information that is relevant to his/her interest. Though existing search engines work well to a certain extent but they still face problems like word mismatch which arises because the majority of information retrieval systems compare query and document terms on lexical level rather than on semantic level and short query: the average length of queries by the user is less than two words. Short queries and the incompatibility between the terms in user queries and documents strongly affect the retrieval of relevant document. Query expansion has long been suggested as a technique to increase the effectiveness of the information retrieval. Query expansion is the process of supplementing additional terms or phrases to the original query to improve the retrieval performance. The central problem of query expansion is the selection of the expansion terms based on which user’s original query is expanded. Thesaurus helps to solve this problem. Thesaurus have frequently been incorporated in information retrieval system for identifying the synonymous expressions and linguistic entities that are semantically similar. Thesaurus has been widely used in many applications, including information retrieval and natural language processing.

