Results 1 -
8 of
8
A comparison of document clustering techniques
- In KDD Workshop on Text Mining
, 2000
"... This paper presents the results of an experimental study of some common document clustering techniques: agglomerative hierarchical clustering and K-means. (We used both a “standard” K-means algorithm and a “bisecting ” K-means algorithm.) Our results indicate that the bisecting K-means technique is ..."
Abstract
-
Cited by 306 (18 self)
- Add to MetaCart
This paper presents the results of an experimental study of some common document clustering techniques: agglomerative hierarchical clustering and K-means. (We used both a “standard” K-means algorithm and a “bisecting ” K-means algorithm.) Our results indicate that the bisecting K-means technique is better than the standard K-means approach and (somewhat surprisingly) as good or better than the hierarchical approaches that we tested.
Frequent Term-Based Text Clustering
, 2002
"... Text clustering methods can be used to structure large sets of text or hypertext documents. The well-known methods of text clustering, however, do not really address the special problems of text clustering: very high dimensionality of the data, very large size of the databases and understandability ..."
Abstract
-
Cited by 57 (1 self)
- Add to MetaCart
Text clustering methods can be used to structure large sets of text or hypertext documents. The well-known methods of text clustering, however, do not really address the special problems of text clustering: very high dimensionality of the data, very large size of the databases and understandability of the cluster description. In this paper, we introduce a novel approach which uses frequent item (term) sets for text clustering. Such frequent sets can be efficiently discovered using algorithms for association rule mining. To cluster based on frequent term sets, we measure the mutual overlap of frequent sets with respect to the sets of supporting documents. We present two algorithms for frequent term-based text clustering, FTC which creates flat clusterings and HFTC for hierarchical clustering. An experimental evaluation on classical text documents as well as on web documents demonstrates that the proposed algorithms obtain clusterings of comparable quality significantly more efficiently than state-of-theart text clustering algorithms. Furthermore, our methods provide an understandable description of the discovered clusters by their frequent term sets.
F.: The Influence of Semantics in IR using LSI and K-Means Clustering Techniques
- In: Proc. of Workshop on Conceptual Information Retrieval and Clustering of Documents, ACM Int. Conf
, 2003
"... In this paper we study the influence of semantics in the information retrieval preprocessing. We concretely compare the reached performance with stemming and semantic lemmatization as preprocessing. Three techniques are used in the study: the direct use of a weighted matrix, the SVD technique in the ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
In this paper we study the influence of semantics in the information retrieval preprocessing. We concretely compare the reached performance with stemming and semantic lemmatization as preprocessing. Three techniques are used in the study: the direct use of a weighted matrix, the SVD technique in the LSI model and the bisecting spherical k-means clustering technique. Although the results seem not to be very promising, we believe that they should be improved in the future. 1.
Mining conference proceedings for corporate technology knowledge management
- Portland International Conference on Management of Engineering and Technology (PICMET
, 2005
"... An organization's knowledge gained through technical conference attendance is generally isolated to the individual(s) attending the event. The aggregate corporate knowledge is extremely limited, unless the organization institutes a process to document and transfer that knowledge to the organization. ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
An organization's knowledge gained through technical conference attendance is generally isolated to the individual(s) attending the event. The aggregate corporate knowledge is extremely limited, unless the organization institutes a process to document and transfer that knowledge to the organization. Even if such a process exists, the knowledge gains are limited to the experiences and communication skills of the individuals attending the conference. Many conference proceedings are now published and provided to attendees in electronic format, such as on CD-ROM and/or published on the internet, such as IEEE conference proceedings listed at
Proceedings of the 43rd Hawaii International Conference on System Sciences- 2010 Tag Clusters as Information Retrieval Interfaces
"... The paper presents our design of a next generation information retrieval system based on tag co-occurrences and subsequent clustering. We help users getting access to digital data through information visualization in the form of tag clusters. Current problems like the absence of interactivity and se ..."
Abstract
- Add to MetaCart
The paper presents our design of a next generation information retrieval system based on tag co-occurrences and subsequent clustering. We help users getting access to digital data through information visualization in the form of tag clusters. Current problems like the absence of interactivity and semantics between tags or the difficulty of adding additional search arguments are solved. In the evaluation, based upon SERVQUAL and IT systems quality indicators, we found out that tag clusters are perceived as more useful than tag clouds, are much more trustworthy, and are more enjoyable to use. 1.
ICTNET at Web Track 2010 Diversity Task
"... In this paper, our team – “ICTNET”, participated in the diversity task of Web Track of TREC 2010. The full Category A dataset was used. The same settings as the ad-hoc task were adopted for retrieval. Different clustering methods which were then applied on different fields are elaborated. Query expa ..."
Abstract
- Add to MetaCart
In this paper, our team – “ICTNET”, participated in the diversity task of Web Track of TREC 2010. The full Category A dataset was used. The same settings as the ad-hoc task were adopted for retrieval. Different clustering methods which were then applied on different fields are elaborated. Query expansion techniques are presented next. 1.

