Results 1 -
5 of
5
Selection of relevant features and examples in machine learning
- ARTIFICIAL INTELLIGENCE
, 1997
"... In this survey, we review work in machine learning on methods for handling data sets containing large amounts of irrelevant information. We focus on two key issues: the problem of selecting relevant features, and the problem of selecting relevant examples. We describe the advances that have been mad ..."
Abstract
-
Cited by 340 (1 self)
- Add to MetaCart
In this survey, we review work in machine learning on methods for handling data sets containing large amounts of irrelevant information. We focus on two key issues: the problem of selecting relevant features, and the problem of selecting relevant examples. We describe the advances that have been made on these topics in both empirical and theoretical work in machine learning, and we present a general framework that we use to compare different methods. We close with some challenges for future work in this area.
Smokey: Automatic Recognition of Hostile Messages
- In Proc. IAAI
, 1997
"... Abusive messages (flames) can be both a source of frustration and a waste of time for Internet users. This paper describes some approaches to flame recognition, including a prototype system, Smokey. Smokey builds a 47-element feature vector based on the syntax and semantics of each sentence, combini ..."
Abstract
-
Cited by 48 (0 self)
- Add to MetaCart
Abusive messages (flames) can be both a source of frustration and a waste of time for Internet users. This paper describes some approaches to flame recognition, including a prototype system, Smokey. Smokey builds a 47-element feature vector based on the syntax and semantics of each sentence, combining the vectors for the sentences within each message. A training set of 720 messages was used by Quinlan's C4.5 decision-tree generator to determine featurebased rules that were able to correctly categorize 64% of the flames and 98% of the non-flames in a separate test set of 460 messages. Additional techniques for greater accuracy and user customization are also discussed. Introduction Flames are one of the current hazards of on-line communication. While some people enjoy exchanging flames, most users consider these abusive and insulting messages to be a nuisance or even upsetting. I describe Smokey, a prototype system to automatically recognize email flames. Smokey combines natural-langu...
EigenTransfer: A Unified Framework for Transfer Learning
"... This paper proposes a general framework, called EigenTransfer, to tackle a variety of transfer learning problems, e.g. cross-domain learning, self-taught learning, etc. Our basic idea is to construct a graph to represent the target transfer learning task. By learning the spectra of a graph which rep ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
This paper proposes a general framework, called EigenTransfer, to tackle a variety of transfer learning problems, e.g. cross-domain learning, self-taught learning, etc. Our basic idea is to construct a graph to represent the target transfer learning task. By learning the spectra of a graph which represents a learning task, we obtain a set of eigenvectors that reflect the intrinsic structure of the task graph. These eigenvectors can be used as the new features which transfer the knowledge from auxiliary data to help classify target data. Given an arbitrary non-transfer learner (e.g. SVM) and a particular transfer learning task, EigenTransfer can produce a transfer learner accordingly for the target transfer learning task. We apply EigenTransfer on three different transfer learning tasks, cross-domain learning, cross-category learning and self-taught learning, to demonstrate its unifying ability, and show through experiments that EigenTransfer can greatly outperform several representative non-transfer learners. 1.
Rules and fuzzy rules in text: Concept, extraction, and usage
, 2003
"... Several concepts and techniques have been imported from other disciplines such as Machine Learning and Artificial Intelligence to the field of textual data. In this paper, we focus on the concept of rule and the management of uncertainty in text applications. The different structures considered for ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
Several concepts and techniques have been imported from other disciplines such as Machine Learning and Artificial Intelligence to the field of textual data. In this paper, we focus on the concept of rule and the management of uncertainty in text applications. The different structures considered for the construction of the rules, the extraction of the knowledge base and the applications and usage of these rules are detailed. We include a review of the most relevant works of the different types of rules based on their representation and their application to most of the common tasks of Information Retrieval such as categorization, indexing and classification.
Mining context specific similarity relationships using the World Wide Web
- In Proceedings of the 2005 Conference on Human Language Technologies
, 2005
"... We have studied how context specific web corpus can be automatically created and mined for discovering semantic similarity relationships between terms (words or phrases) from a given collection of documents (target collection). These relationships between terms can be used to adjust the standard vec ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
We have studied how context specific web corpus can be automatically created and mined for discovering semantic similarity relationships between terms (words or phrases) from a given collection of documents (target collection). These relationships between terms can be used to adjust the standard vectors space representation so as to improve the accuracy of similarity computation between text documents in the target collection. Our experiments with a standard test collection (Reuters) have revealed the reduction of similarity errors by up to 50%, twice as much as the improvement by using other known techniques. 1

