Results 11 - 20
of
501
Concept Learning and the Problem of Small Disjuncts
-
, 1995
"... Ideally, definitions induced from examples should consist of all, and only, disjuncts that are meaningful (e.g., as measured by a statistical significance test) and have a low error rate. Existing inductive systems create definitions that are ideal with regard to large disjuncts, but far from ideal ..."
Abstract
-
Cited by 136 (1 self)
- Add to MetaCart
Ideally, definitions induced from examples should consist of all, and only, disjuncts that are meaningful (e.g., as measured by a statistical significance test) and have a low error rate. Existing inductive systems create definitions that are ideal with regard to large disjuncts, but far from ideal with regard to small disjuncts, where a small (large) disjunct is one that correctly classifies few (many) training examples. The problem with small disjuncts is that many of them have high rates of misclassification, and it is difficult to eliminate the error-prone small disjuncts from a definition without adversely affecting other disjuncts in the definition. Various approaches to this problem are evaluated, including the novel approach of using a bias different than the "maximum generality" bias. This approach, and some others, prove partly successful, but the problem of small disjuncts remains open.
Information Extraction from HTML: Application of a General Machine Learning Approach
- In Proceedings of the Fifteenth National Conference on Artificial Intelligence
, 1998
"... Because the World Wide Web consists primarily of text, information extraction is central to any effort that would use the Web as a resource for knowledge discovery. We show how information extraction can be cast as a standard machine learning problem, and argue for the suitability of relational lear ..."
Abstract
-
Cited by 134 (6 self)
- Add to MetaCart
Because the World Wide Web consists primarily of text, information extraction is central to any effort that would use the Web as a resource for knowledge discovery. We show how information extraction can be cast as a standard machine learning problem, and argue for the suitability of relational learning in solving it. The implementation of a general-purpose relational learner for information extraction, SRV, is described. In contrast with earlier learning systems for information extraction, SRV makes no assumptions about document structure and the kinds of information available for use in learning extraction patterns. Instead, structural and other information is supplied as input in the form of an extensible token-oriented feature set. We demonstrate the effectiveness of this approach by adapting SRV for use in learning extraction rules for a domain consisting of university course and research project pages sampled from the Web. Making SRV Web-ready only involves adding several simple...
Feature Selection for Classification
- Intelligent Data Analysis
, 1997
"... Feature selection has been the focus of interest for quite some time and much work has been done. With the creation of huge databases and the consequent requirements for good machine learning techniques, new problems arise and novel approaches to feature selection are in demand. This survey is a com ..."
Abstract
-
Cited by 127 (7 self)
- Add to MetaCart
Feature selection has been the focus of interest for quite some time and much work has been done. With the creation of huge databases and the consequent requirements for good machine learning techniques, new problems arise and novel approaches to feature selection are in demand. This survey is a comprehensive overview of many existing methods from the 1970's to the present. It identifies four steps of a typical feature selection method, and categorizes the different existing methods in terms of generation procedures and evaluation functions, and reveals hitherto unattempted combinations of generation procedures and evaluation functions. Representative methods are chosen from each category for detailed explanation and discussion via example. Benchmark datasets with different characteristics are used for comparative study. The strengths and weaknesses of different methods are explained. Guidelines for applying feature selection methods are given based on data types and domain characteris...
Separate-and-conquer rule learning
- Artificial Intelligence Review
, 1999
"... This paper is a survey of inductive rule learning algorithms that use a separate-and-conquer strategy. This strategy can be traced back to the AQ learning system and still enjoys popularity as can be seen from its frequent use in inductive logic programming systems. We will put this wide variety of ..."
Abstract
-
Cited by 118 (29 self)
- Add to MetaCart
This paper is a survey of inductive rule learning algorithms that use a separate-and-conquer strategy. This strategy can be traced back to the AQ learning system and still enjoys popularity as can be seen from its frequent use in inductive logic programming systems. We will put this wide variety of algorithms into a single framework and analyze them along three different dimensions, namely their search, language and overfitting avoidance biases.
JAM: Java Agents for Meta-Learning over Distributed Databases
- In Proc. 3rd Intl. Conf. Knowledge Discovery and Data Mining
, 1997
"... In this paper, we describe the JAM system, a distributed, scalable and portable agent-based data mining system that employs a general approach to scaling data mining applications that we have come to call meta-learning. JAM provides a set of learning programs, implemented either as JAVA applets or a ..."
Abstract
-
Cited by 115 (23 self)
- Add to MetaCart
In this paper, we describe the JAM system, a distributed, scalable and portable agent-based data mining system that employs a general approach to scaling data mining applications that we have come to call meta-learning. JAM provides a set of learning programs, implemented either as JAVA applets or applications, that compute models over data stored locally at a site. JAM also provides a set of meta-learning agents for combining multiple models that were learned (perhaps) at different sites. It employs a special distribution mechanism which allows the migration of the derived models or classifier agents to other remote sites. We describe the overall architecture of the JAM system and the specific implementation currently under development at Columbia University. One of JAM's target applications is fraud and intrusion detection in financial information systems. A brief description of this learning task and JAM's applicability are also described. Interested users may download JAM from http...
Learning classification trees
- Statistics and Computing
, 1992
"... Algorithms for learning cIassification trees have had successes in ar-tificial intelligence and statistics over many years. This paper outlines how a tree learning algorithm can be derived using Bayesian statis-tics. This iutroduces Bayesian techniques for splitting, smoothing, and tree averaging. T ..."
Abstract
-
Cited by 112 (8 self)
- Add to MetaCart
Algorithms for learning cIassification trees have had successes in ar-tificial intelligence and statistics over many years. This paper outlines how a tree learning algorithm can be derived using Bayesian statis-tics. This iutroduces Bayesian techniques for splitting, smoothing, and tree averaging. The splitting rule is similar to QuinIan’s information gain, while smoothing and averaging replace pruning. Comparative ex-periments with reimplementations of a minimum encoding approach, Quinlan’s C4 (1987) and Breiman et aL’s CART (1984) show the full Bayesian algorithm produces more accurate predictions than versions
Cached Sufficient Statistics for Efficient Machine Learning with Large Datasets
- Journal of Artificial Intelligence Research
, 1997
"... This paper introduces new algorithms and data structures for quick counting for machine learning datasets. We focus on the counting task of constructing contingency tables, but our approach is also applicable to counting the number of records in a dataset that match conjunctive queries. Subject to c ..."
Abstract
-
Cited by 108 (17 self)
- Add to MetaCart
This paper introduces new algorithms and data structures for quick counting for machine learning datasets. We focus on the counting task of constructing contingency tables, but our approach is also applicable to counting the number of records in a dataset that match conjunctive queries. Subject to certain assumptions, the costs of these operations can be shown to be independent of the number of records in the dataset and loglinear in the number of non-zero entries in the contingency table. We provide a very sparse data structure, the ADtree, to minimize memory use. We provide analytical worst-case bounds for this structure for several models of data distribution. We empirically demonstrate that tractably-sized data structures can be produced for large real-world datasets by (a) using a sparse tree structure that never allocates memory for counts of zero, (b) never allocating memory for counts that can be deduced from other counts, and (c) not bothering to expand the tree fully near its...
Hypothesis-driven Constructive Induction in AQ17: A Method and Experiments
, 1992
"... This paper presents a method for constructive induction in which new problem-relevant attributes are generated by analyzing consecutively created inductive hypotheses. The method starts by creating a set of rules from given examples using the AQ algorithm. These rules are then evaluated according to ..."
Abstract
-
Cited by 107 (33 self)
- Add to MetaCart
This paper presents a method for constructive induction in which new problem-relevant attributes are generated by analyzing consecutively created inductive hypotheses. The method starts by creating a set of rules from given examples using the AQ algorithm. These rules are then evaluated according to a rule quality criterion. Subsets of the best-performing rules for each decision class are selected to form new attributes. These new attributes are used to reformulate the training examples used in the previous step, and the whole inductive process repeats. This iterative process ends when the performance accuracy of the rules exceeds a predefined threshold In several experiments on learning different well-defined transformations, the method consistently outperformed (in terms of predictive accuracy) the AQ15 rule learning method, GREEDY3 and GROVE decision list learning methods. and REDWOOD and FRINGE decision tree learning methods.
Concept Learning and Heuristic Classification in Weak-Theory Domains
- Artificial Intelligence
, 1990
"... This paper describes a successful approach to concept learning for heuristic classification. Almost all current programs for this task create or use explicit, abstract generalizations. These programs are largely ineffective for domains with weak or intractable theories. An exemplar-based approach is ..."
Abstract
-
Cited by 101 (7 self)
- Add to MetaCart
This paper describes a successful approach to concept learning for heuristic classification. Almost all current programs for this task create or use explicit, abstract generalizations. These programs are largely ineffective for domains with weak or intractable theories. An exemplar-based approach is suitable for domains with inadequate theories but raises two additional problems: determining similarity and indexing exemplars. Our approach extends the exemplar-based approach with solutions to these problems. An implementation of our approach, called Protos, has been applied to the domain of clinical audiology. After reasonable training, Protos achieved a competence level equaling that of human experts and far surpassing that of other machine learning programs. Additionally, an "ablation study" has identified the aspects of Protos that are primarily responsible for its success. 1 Introduction This paper describes a successful approach to the task of concept learning for heuristic clas...
Incremental Reduced Error Pruning
, 1994
"... This paper outlines some problems that may occur with Reduced Error Pruning in Inductive Logic Programming , most notably efficiency. Thereafter a new method, Incremental Reduced Error Pruning , is proposed that attempts to address all of these problems. Experiments show that in many noisy domains t ..."
Abstract
-
Cited by 101 (22 self)
- Add to MetaCart
This paper outlines some problems that may occur with Reduced Error Pruning in Inductive Logic Programming , most notably efficiency. Thereafter a new method, Incremental Reduced Error Pruning , is proposed that attempts to address all of these problems. Experiments show that in many noisy domains this method is much more efficient than alternative algorithms, along with a slight gain in accuracy. However, the experiments show as well that the use of this algorithm cannot be recommended for domains with a very specific concept description. OEFAI-TR-94-09 1 Introduction Being able to deal with noisy data is a must for algorithms that are meant to learn concepts in real-world domains. Significant effort has gone into investigating the effect of noisy data on decision tree learning algorithms (see e.g. [Quinlan, 1993, Breiman et al., 1984]). Not surprisingly, noise handling methods have also entered the emerging field of Inductive Logic Programming (ILP) [Muggleton, 1992]. Linus [Lavr...

