Results 1 - 10
of
40
Frequent Term-Based Text Clustering
, 2002
"... Text clustering methods can be used to structure large sets of text or hypertext documents. The well-known methods of text clustering, however, do not really address the special problems of text clustering: very high dimensionality of the data, very large size of the databases and understandability ..."
Abstract
-
Cited by 57 (1 self)
- Add to MetaCart
Text clustering methods can be used to structure large sets of text or hypertext documents. The well-known methods of text clustering, however, do not really address the special problems of text clustering: very high dimensionality of the data, very large size of the databases and understandability of the cluster description. In this paper, we introduce a novel approach which uses frequent item (term) sets for text clustering. Such frequent sets can be efficiently discovered using algorithms for association rule mining. To cluster based on frequent term sets, we measure the mutual overlap of frequent sets with respect to the sets of supporting documents. We present two algorithms for frequent term-based text clustering, FTC which creates flat clusterings and HFTC for hierarchical clustering. An experimental evaluation on classical text documents as well as on web documents demonstrates that the proposed algorithms obtain clusterings of comparable quality significantly more efficiently than state-of-theart text clustering algorithms. Furthermore, our methods provide an understandable description of the discovered clusters by their frequent term sets.
Towards semantic web mining
- IN INTERNATIONAL SEMANTIC WEB CONFERENCE (ISWC
, 2002
"... Semantic Web Mining aims at combining the two fast-developing research areas Semantic Web and Web Mining. The idea is to improve, on the one hand, the results of Web Mining by exploiting the new semantic structures in the Web; and to make use of Web Mining, on the other hand, for building up the Sem ..."
Abstract
-
Cited by 44 (9 self)
- Add to MetaCart
Semantic Web Mining aims at combining the two fast-developing research areas Semantic Web and Web Mining. The idea is to improve, on the one hand, the results of Web Mining by exploiting the new semantic structures in the Web; and to make use of Web Mining, on the other hand, for building up the Semantic Web. This paper gives an overview of where the two areas meet today, and sketches ways of how a closer integration could be profitable.
Mining Newsgroups Using Networks Arising From Social Behavior
, 2003
"... Recent advances in information retrieval over hyperlinked corpora have convincingly demonstrated that links carry less noisy information than text. We investigate the feasibility of applying link-based methods in new applications domains. The specific application we consider is to partition authors ..."
Abstract
-
Cited by 40 (0 self)
- Add to MetaCart
Recent advances in information retrieval over hyperlinked corpora have convincingly demonstrated that links carry less noisy information than text. We investigate the feasibility of applying link-based methods in new applications domains. The specific application we consider is to partition authors into opposite camps within a given topic in the context of newsgroups. A typical newsgroup posting consists of one or more quoted lines from another posting followed by the opinion of the author. This social behavior gives rise to a network in which the vertices are individuals and the links represent "responded-to" relationships. An interesting characteristic of many newsgroups is that people more frequently respond to a message when they disagree than when they agree. This behavior is in sharp contrast to the WWW link graph, where linkage is an indicator of agreement or common interest. By analyzing the graph structure of the responses, we are able to effectively classify people into opposite camps. In contrast, methods based on statistical analysis of text yield low accuracy on such datasets because the vocabulary used by the two sides tends to be largely identical, and many newsgroup postings consist of relatively few words of text.
Hyperlink Ensembles: A Case Study in Hypertext Classification
- Information Fusion
, 2001
"... In this paper, we introduce hyperlink ensembles, a novel type of ensemble classifier for classifying hypertext documents. Instead of using the text on a page for deriving features that can be used for training a classifier, we suggest to use portions of texts from all pages that point to the targ ..."
Abstract
-
Cited by 20 (4 self)
- Add to MetaCart
In this paper, we introduce hyperlink ensembles, a novel type of ensemble classifier for classifying hypertext documents. Instead of using the text on a page for deriving features that can be used for training a classifier, we suggest to use portions of texts from all pages that point to the target page. A hyperlink ensemble is formed by obtaining one prediction for each hyperlink that points to a page.
Reverse engineering for web data: From visual to semantic structures
- In Intl. Conf. on Data Engineering (ICDE
, 2002
"... Despite the advancement of XML, the majority of documents on the Web is still marked up with HTML for visual renderingpurposes only, thus buildinga huge amount of ”legacy ” data. In order to facilitate querying Web based data in a way more efficient and effective than just keyword based retrieval, e ..."
Abstract
-
Cited by 18 (0 self)
- Add to MetaCart
Despite the advancement of XML, the majority of documents on the Web is still marked up with HTML for visual renderingpurposes only, thus buildinga huge amount of ”legacy ” data. In order to facilitate querying Web based data in a way more efficient and effective than just keyword based retrieval, enrichingsuch Web documents with both structure and semantics is necessary. This paper describes a novel approach to the integration of topic specific HTML documents into a repository of XML documents. In particular, we describe how topic specific HTML documents are transformed into XML documents. The proposed document transformation and semantic element taggingprocess utilizes document restructuringrules and minimum information about the topic in form of concepts. For the resultingXML documents, a majority schema is derived that describes common structures amongthe documents in the form of a DTD. We explore and discuss different techniques and rules for document conversion and majority schema discovery. We finally demonstrate the feasibility and effectiveness of our approach by applyingit to a setofr��sum� � HTML documents gathered by a Web crawler. 1
Machine Learning in Games: A Survey
- MACHINES THAT LEARN TO PLAY GAMES, CHAPTER 2
, 2000
"... This paper provides a survey of previously published work on machine learning in game playing. The material is organized around a variety of problems that typically arise in game playing and that can be solved with machine learning methods. This approach, we believe, allows both, researchers in g ..."
Abstract
-
Cited by 16 (3 self)
- Add to MetaCart
This paper provides a survey of previously published work on machine learning in game playing. The material is organized around a variety of problems that typically arise in game playing and that can be solved with machine learning methods. This approach, we believe, allows both, researchers in game playing to find appropriate learning techniques for helping to solve their problems as well as machine learning researchers to identify rewarding topics for further research in game-playing domains. The paper covers learning techniques that range from neural networks to decision tree learning in games that range from poker to chess.
Web Page Classification: Features and Algorithms
, 2007
"... Classification of web page content is essential to many tasks in web information retrieval such as maintaining web directories and focused crawling. The uncontrolled nature of web content presents additional challenges to web page classification as compared to traditional text classification, but th ..."
Abstract
-
Cited by 16 (0 self)
- Add to MetaCart
Classification of web page content is essential to many tasks in web information retrieval such as maintaining web directories and focused crawling. The uncontrolled nature of web content presents additional challenges to web page classification as compared to traditional text classification, but the interconnected nature of hypertext also provides features that can assist the process. As we review work in web page classification, we note the importance of these web-specific features and algorithms, describe state-of-the-art practices, and track the underlying assumptions behind the use of information from neighboring pages. 1
A Dynamic Adaptive Self-Organising Hybrid Model for Text Clustering
- Proceedings of The Third IEEE International Conference on Data Mining (ICDM’03
, 2003
"... Clustering by document concepts is a powerful way of retrieving information from a large number of documents. This task in general does not make any assumption on the data distribution. In this paper, for this task we propose a new competitive Self-Organising (SOM) model, namely the Dynamic Adaptive ..."
Abstract
-
Cited by 12 (4 self)
- Add to MetaCart
Clustering by document concepts is a powerful way of retrieving information from a large number of documents. This task in general does not make any assumption on the data distribution. In this paper, for this task we propose a new competitive Self-Organising (SOM) model, namely the Dynamic Adaptive Self-Organising Hybrid model (DASH). The features of DASH are a dynamic structure, hierarchical clustering, non-stationary data learning and parameter self-adjustment. All features are data-oriented: DASH adjusts its behaviour not only by modifying its parameters but also by an adaptive structure. The hierarchical growing architecture is a useful facility for such a competitive neural model which is designed for text clustering. In this paper, we have presented a new type of self-organising dynamic growing neural network which can deal with the non-uniform data distribution and the non-stationary data sets and represent the inner data structure by a hierarchical view.
Usage Mining for and on the Semantic Web
- In [57
, 2004
"... Semantic Web Mining aims at combining the two fast-developing research areas Semantic Web and Web Mining. Web Mining aims at discovering insights about the meaning of Web resources and their usage. Given the primarily syntactical nature of data Web mining operates on, the discovery of meaning is imp ..."
Abstract
-
Cited by 12 (7 self)
- Add to MetaCart
Semantic Web Mining aims at combining the two fast-developing research areas Semantic Web and Web Mining. Web Mining aims at discovering insights about the meaning of Web resources and their usage. Given the primarily syntactical nature of data Web mining operates on, the discovery of meaning is impossible based on these data only. Therefore, formalizations of the semantics of Web resources and navigation behavior are increasingly being used. This fits exactly with the aims of the Semantic Web: the Semantic Web enriches the WWW by machineprocessable information which supports the user in his tasks. In this paper, we discuss the interplay of the Semantic Web with Web Mining, with a specific focus on usage mining.

