Results 1  10
of
13
Minimum Message Length and Kolmogorov Complexity
 Computer Journal
, 1999
"... this paper is to describe some of the relationships among the different streams and to try to clarify some of the important differences in their assumptions and development. Other studies mentioning the relationships appear in [1, Section IV, pp. 10381039], [2, sections 5.2, 5.5] and [3, p. 465] ..."
Abstract

Cited by 104 (25 self)
 Add to MetaCart
this paper is to describe some of the relationships among the different streams and to try to clarify some of the important differences in their assumptions and development. Other studies mentioning the relationships appear in [1, Section IV, pp. 10381039], [2, sections 5.2, 5.5] and [3, p. 465]
BottomUp Induction of Oblivious ReadOnce Decision Graphs
, 1994
"... . We investigate the use of oblivious, readonce decision graphs as structures for representing concepts over discrete domains, and present a bottomup, hillclimbing algorithm for inferring these structures from labelled instances. The algorithm is robust with respect to irrelevant attributes, and ..."
Abstract

Cited by 45 (8 self)
 Add to MetaCart
. We investigate the use of oblivious, readonce decision graphs as structures for representing concepts over discrete domains, and present a bottomup, hillclimbing algorithm for inferring these structures from labelled instances. The algorithm is robust with respect to irrelevant attributes, and experimental results show that it performs well on problems considered difficult for symbolic induction methods, such as the Monk's problems and parity. 1 Introduction Top down induction of decision trees [25, 24, 20] has been one of the principal induction methods for symbolic, supervised learning. The tree structure, which is used for representing the hypothesized target concept, suffers from some wellknown problems, most notably the replication problem and the fragmentation problem [23]. The replication problem forces duplication of subtrees in disjunctive concepts, such as (A B) (C D); the fragmentation problem causes partitioning of the data into fragments, when a higharity attrib...
On Pruning and Averaging Decision Trees
 In Proceedings of the Twelfth International Conference on Machine Learning
, 1995
"... Pruning a decision tree is considered by some researchers to be the most important part of tree building in noisy domains. While, there are many approaches to pruning, an alternative approach of averaging over decision trees has not received as much attention. We perform an empirical comparison of p ..."
Abstract

Cited by 37 (0 self)
 Add to MetaCart
Pruning a decision tree is considered by some researchers to be the most important part of tree building in noisy domains. While, there are many approaches to pruning, an alternative approach of averaging over decision trees has not received as much attention. We perform an empirical comparison of pruning with the approach of averaging over decision trees. For this comparison we use a computationally efficient method of averaging, namely averaging over the extended fanned set of a tree. Since there are a wide range of approaches to pruning, we compare tree averaging with a traditional pruning approach, along with an optimal pruning approach.
Decision Graphs  An Extension of Decision Trees
, 1993
"... : In this paper, we examine Decision Graphs, a generalization of decision trees. We present an inference scheme to construct decision graphs using the Minimum Message Length Principle. Empirical tests demonstrate that this scheme compares favourably with other decision tree inference schemes. This w ..."
Abstract

Cited by 35 (1 self)
 Add to MetaCart
: In this paper, we examine Decision Graphs, a generalization of decision trees. We present an inference scheme to construct decision graphs using the Minimum Message Length Principle. Empirical tests demonstrate that this scheme compares favourably with other decision tree inference schemes. This work provides a metric for comparing the relative merit of the decision tree and decision graph formalisms for a particular domain. 1 Introduction In this paper, we examine the problem of inferring a decision procedure from a set of examples. We examine the decision graph [5, 1, 16, 15, 14], a generalization of the decision tree [3, 18], and propose a method to construct decision graphs based upon Wallace's Minimum Message Length Principle (MMLP) [24, 10, 25]. The MMLP is related to Rissanen's Minimum Description Length Principle (MDLP) [21, 22, 20]. For the reader unfamiliar with minimum encoding methods (MML and MDL), a good introduction to the area is given by Georgeff [10]. We formalize ...
Constructing XofN Attributes for Decision Tree Learning
 Machine Learning
, 1998
"... . While many constructive induction algorithms focus on generating new binary attributes, this paper explores novel methods of constructing nominal and numeric attributes. We propose a new constructive operator, XofN. An XofN representation is a set containing one or more attributevalue pairs. ..."
Abstract

Cited by 19 (0 self)
 Add to MetaCart
. While many constructive induction algorithms focus on generating new binary attributes, this paper explores novel methods of constructing nominal and numeric attributes. We propose a new constructive operator, XofN. An XofN representation is a set containing one or more attributevalue pairs. For a given instance, the value of an XofN representation corresponds to the number of its attributevalue pairs that are true of the instance. A single XofN representation can directly and simply represent any concept that can be represented by a single conjunctive, a single disjunctive, or a single MofN representation commonly used for constructive induction, and the reverse is not true. In this paper, we describe a constructive decision tree learning algorithm, called XofN. When building decision trees, this algorithm creates one XofN representation, either as a nominal attribute or as a numeric attribute, at each decision node. The construction of XofN representations is carrie...
Constructing Nominal XofN Attributes
 Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence
, 1995
"... Most constructive induction researchers focus only on new boolean attributes. This paper reports a new constructive induction algorithm, called XofN, that constructs new nominal attributes in the form of XofN representations. An XofN is a set containing one or more attributevalue pairs. For a g ..."
Abstract

Cited by 14 (6 self)
 Add to MetaCart
Most constructive induction researchers focus only on new boolean attributes. This paper reports a new constructive induction algorithm, called XofN, that constructs new nominal attributes in the form of XofN representations. An XofN is a set containing one or more attributevalue pairs. For a given instance, its value corresponds to the number of its attributevalue pairs that are true. The promising preliminary experimental results, on both artificial and realworld domains, show that constructing new nominal attributes in the form of XofN representations can significantly improve the performance of selective induction in terms of both higher prediction accuracy and lower theory complexity. 1 Introduction A wellknown elementary limitation of selective induction algorithms is that when tasksupplied attributes are not adequate for describing hypotheses, their performance in terms of prediction accuracy and/or theory complexity is poor. To overcome this limitation, constructiv...
Circular Clustering Of Protein Dihedral Angles By Minimum Message Length
 In Proceedings of the 1st Pacific Symposium on Biocomputing (PSB1
, 1996
"... this paper is given in [DADH95] and is available from ftp://www.cs.monash.edu.au/www/publications/1995/TR237.ps.Z.) Section 2introduces the MML principle and how it can be used for this circular clustering problem. The remaining sections give the results of the secondary structure groups [KaSa83] th ..."
Abstract

Cited by 14 (11 self)
 Add to MetaCart
this paper is given in [DADH95] and is available from ftp://www.cs.monash.edu.au/www/publications/1995/TR237.ps.Z.) Section 2introduces the MML principle and how it can be used for this circular clustering problem. The remaining sections give the results of the secondary structure groups [KaSa83] that resulted from applying Snob to cluster our dihedral angle data.
Constructing New Attributes for Decision Tree Learning
, 1996
"... A wellknown fundamental limitation of selective induction algorithms is that when tasksupplied attributes are not adequate for, or directly relevant to, describing hypotheses, their performance in terms of prediction accuracy and/or theory complexity is poor. One solution to this problem is constru ..."
Abstract

Cited by 7 (3 self)
 Add to MetaCart
A wellknown fundamental limitation of selective induction algorithms is that when tasksupplied attributes are not adequate for, or directly relevant to, describing hypotheses, their performance in terms of prediction accuracy and/or theory complexity is poor. One solution to this problem is constructive induction. It constructs, by using tasksupplied attributes, new attributes that are expected to be more appropriate than the tasksupplied attributes for describing the target concepts. This thesis focuses on constructive induction with decision trees as the theory description language. It explores: (1) novel approaches to constructing new binary attributes using existing constructive operators, and (2) novel methods of constructing new nominal and new continuousvalued attributes based on a newly proposed constructive operator. The thesis investigates a fixed rulebased approach to constructing new binary attributes for decision tree learning. It generates conjunctions from producti...
Learning Monotonic Linear Functions
 Proceedings of the 17th Annual Conference on Learning Theory, 2004
, 2004
"... Abstract. Learning probabilities (pconcepts [13]) and other realvalued concepts (regression) is an important role of machine learning. For example, a doctor may need to predict the probability of getting a disease P [yx], which depends on a number of risk factors. Generalized additive models [9] ..."
Abstract

Cited by 6 (3 self)
 Add to MetaCart
Abstract. Learning probabilities (pconcepts [13]) and other realvalued concepts (regression) is an important role of machine learning. For example, a doctor may need to predict the probability of getting a disease P [yx], which depends on a number of risk factors. Generalized additive models [9] are a wellstudied nonparametric model in the statistics literature, usually with monotonic link functions. However, no known efficient algorithms exist for learning such a general class. We show that regression graphs efficiently learn such realvalued concepts, while regression trees inefficiently learn them. One corollary is that any function E[yx] = u(w · x) for u monotonic can be learned to arbitrarily small squared error ɛ in time polynomial in 1/ɛ, w1, and the Lipschitz constant of u (analogous to a margin). The model includes, as special cases, linear and logistic regression, as well as learning a noisy halfspace with a margin [5, 4]. Kearns, Mansour, and McAllester [12, 15], analyzed decision trees and decision graphs as boosting algorithms for classification accuracy. We extend their analysis and the boosting analogy to the case of realvalued predictors, where a small positive correlation coefficient can be boosted to arbitrary accuracy. Viewed as a noisy boosting algorithm [3, 10], the algorithm learns both the target function and the asymmetric noise. 1
Continuousvalued XofN Attributes Versus Nominal XofN Attributes for Constructive Induction: A Case Study
 for Young Computer Scientists, Peking University
, 1995
"... : An XofN is a set containing one or more attributevalue pairs. For a given instance, its value corresponds to the number of its attributevalue pairs that are true. In this paper, we explore the characteristics and performance of continuousvalued XofN attributes versus nominal XofN attribute ..."
Abstract

Cited by 5 (4 self)
 Add to MetaCart
: An XofN is a set containing one or more attributevalue pairs. For a given instance, its value corresponds to the number of its attributevalue pairs that are true. In this paper, we explore the characteristics and performance of continuousvalued XofN attributes versus nominal XofN attributes for constructive induction. Nominal XofNs are more representationally powerful than continuousvalued XofNs, but the former suffer the "fragmentation" problem, although some mechanisms such as subsetting can help to solve the problem. Two approaches to constructive induction using continuousvalued XofNs are described. Continuousvalued XofNs perform better than nominal ones on domains that need XofNs with only one cut point. On domains that need XofN representations with more than one cut point, nominal XofNs perform better than continuousvalued ones. Experimental results on a set of artificial and realworld domains support these statements. 1. Introduction A wide variet...