• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Decision Graphs - An Extension of Decision Trees (1993)

by Jonathan J. Oliver
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 23
Next 10 →

Operations for Learning with Graphical Models

by Wray L. Buntine - Journal of Artificial Intelligence Research , 1994
"... This paper is a multidisciplinary review of empirical, statistical learning from a graphical model perspective. Well-known examples of graphical models include Bayesian networks, directed graphs representing a Markov chain, and undirected networks representing a Markov field. These graphical models ..."
Abstract - Cited by 214 (13 self) - Add to MetaCart
This paper is a multidisciplinary review of empirical, statistical learning from a graphical model perspective. Well-known examples of graphical models include Bayesian networks, directed graphs representing a Markov chain, and undirected networks representing a Markov field. These graphical models are extended to model data analysis and empirical learning using the notation of plates. Graphical operations for simplifying and manipulating a problem are provided including decomposition, differentiation, and the manipulation of probability models from the exponential family. Two standard algorithm schemas for learning are reviewed in a graphical framework: Gibbs sampling and the expectation maximization algorithm. Using these operations and schemas, some popular algorithms can be synthesized from their graphical specification. This includes versions of linear regression, techniques for feed-forward networks, and learning Gaussian and discrete Bayesian networks from data. The paper conclu...

Minimum Message Length and Kolmogorov Complexity

by C. S. Wallace, D. L. Dowe - Computer Journal , 1999
"... this paper is to describe some of the relationships among the different streams and to try to clarify some of the important differences in their assumptions and development. Other studies mentioning the relationships appear in [1, Section IV, pp. 1038--1039], [2, sections 5.2, 5.5] and [3, p. 465] ..."
Abstract - Cited by 86 (20 self) - Add to MetaCart
this paper is to describe some of the relationships among the different streams and to try to clarify some of the important differences in their assumptions and development. Other studies mentioning the relationships appear in [1, Section IV, pp. 1038--1039], [2, sections 5.2, 5.5] and [3, p. 465]

Bottom-Up Induction of Oblivious Read-Once Decision Graphs

by Ron Kohavi , 1994
"... . We investigate the use of oblivious, read-once decision graphs as structures for representing concepts over discrete domains, and present a bottom-up, hill-climbing algorithm for inferring these structures from labelled instances. The algorithm is robust with respect to irrelevant attributes, and ..."
Abstract - Cited by 42 (8 self) - Add to MetaCart
. We investigate the use of oblivious, read-once decision graphs as structures for representing concepts over discrete domains, and present a bottom-up, hill-climbing algorithm for inferring these structures from labelled instances. The algorithm is robust with respect to irrelevant attributes, and experimental results show that it performs well on problems considered difficult for symbolic induction methods, such as the Monk's problems and parity. 1 Introduction Top down induction of decision trees [25, 24, 20] has been one of the principal induction methods for symbolic, supervised learning. The tree structure, which is used for representing the hypothesized target concept, suffers from some wellknown problems, most notably the replication problem and the fragmentation problem [23]. The replication problem forces duplication of subtrees in disjunctive concepts, such as (A B) (C D); the fragmentation problem causes partitioning of the data into fragments, when a high-arity attrib...

The sk-strings method for inferring PFSA

by Anand Raman, Jon Patrick, Palmerston North - In Proceedings of the , 1997
"... We describe a simple, fast and easy to implement recursive algorithm with four alternate intuitive heuristics for inferring Probabilistic Finite State Automata. The algorithm is an extension for stochastic machines of the k-tails method introduced in 1972 by Biermann and Feldman for non-stochastic m ..."
Abstract - Cited by 30 (2 self) - Add to MetaCart
We describe a simple, fast and easy to implement recursive algorithm with four alternate intuitive heuristics for inferring Probabilistic Finite State Automata. The algorithm is an extension for stochastic machines of the k-tails method introduced in 1972 by Biermann and Feldman for non-stochastic machines. Experiments comparing the two are done and benchmark results are also presented. It is also shown that sk-strings performs better than k-tails at least in inferring small automata. Introduction When given a finite number of examples of the behaviour of a probabilistic state determined machine, it is possible to imagine methods by which we can infer its structure. Ideally, we would like to identify the exact automaton which generated the strings. But it is impossible to do this from the behaviour of the machine because more than one non-minimal machine may generate the same language. This paper is concerned not with identifing the generating machine, which is demonstratably impossib...

Introduction to Minimum Encoding Inference

by Jonathan J. Oliver, David Hand - DEPT. OF STATISTICS, OPEN UNIVERSITY, WALTON HALL, MILTON , 1994
"... This paper examines the minimumencoding approaches to inference, Minimum Message Length (MML) and Minimum Description Length (MDL). This paper was written with the objective of providing an introduction to this area for statisticians. We describe coding techniques for data, and examine how these tec ..."
Abstract - Cited by 21 (4 self) - Add to MetaCart
This paper examines the minimumencoding approaches to inference, Minimum Message Length (MML) and Minimum Description Length (MDL). This paper was written with the objective of providing an introduction to this area for statisticians. We describe coding techniques for data, and examine how these techniques can be applied to perform inference and model selection.

Image Recognition CAPTCHAs

by Monica Chew, J. D. Tygar, Uc Berkeley - In Proceedings of the 7th Information Security Conference (ISC ’04), Springer Lecture Notes in Computer Science , 2004
"... Abstract. CAPTCHAs are tests that distinguish humans from software robots in an online environment [3, 14, 7]. We propose and implement three CAPTCHAs based on naming images, distinguishing images, and identifying an anomalous image out of a set. Novel contributions include proposals for two new CAP ..."
Abstract - Cited by 20 (2 self) - Add to MetaCart
Abstract. CAPTCHAs are tests that distinguish humans from software robots in an online environment [3, 14, 7]. We propose and implement three CAPTCHAs based on naming images, distinguishing images, and identifying an anomalous image out of a set. Novel contributions include proposals for two new CAPTCHAs, the first user study on image recognition CAPTCHAs, and a new metric for evaluating CAPTCHAs. 1

Transforming Rules and Trees into Comprehensible Knowledge Structures

by Brian R. Gaines , 1996
"... The problem of transforming the knowledge bases of expert systems using induced rules or decision trees into comprehensible knowledge structures is addressed. A knowledge structure is developed that generalizes and subsumes production rules, decision trees, and rules with exceptions. It gives rise t ..."
Abstract - Cited by 18 (3 self) - Add to MetaCart
The problem of transforming the knowledge bases of expert systems using induced rules or decision trees into comprehensible knowledge structures is addressed. A knowledge structure is developed that generalizes and subsumes production rules, decision trees, and rules with exceptions. It gives rise to a natural complexity measure that allows them to be understood, analyzed and compared on a uniform basis. The structure is a directed acyclic graph with the semantics that nodes are premises, some of which have attached conclusions, and the arcs are inheritance links with disjunctive multiple inheritance. A detailed example is given of the generation of a range of such structures of equivalent performance for a simple problem, and the complexity measure of a particular structure is shown to relate to its perceived complexity. The simplest structures are generated by an algorithm that factors common sub-premises from the premises of rules. A more complex example of a chess dataset is used t...

Inferring Reduced Ordered Decision Graphs of Minimal Description Length

by Arlindo L. Oliveira , Alberto Sangiovanni-Vincentelli - PROCEEDINGS OF THE TWELFTH INTERNATIONAL CONFERENCE ON MACHINE LEARNING , 1994
"... This work describes an approach for the inference of reduced ordered decision graphs from training sets. Reduced ordered decision graphs (RODGs) are graphs where the variables can only be tested in accordance with a pre-specified order and no redundant nodes exist. RODGs have several interesting pro ..."
Abstract - Cited by 10 (1 self) - Add to MetaCart
This work describes an approach for the inference of reduced ordered decision graphs from training sets. Reduced ordered decision graphs (RODGs) are graphs where the variables can only be tested in accordance with a pre-specified order and no redundant nodes exist. RODGs have several interesting properties that has made them the representation of choice for the manipulation of Boolean functions in the logic synthesis community. We derive a RODG representation of the function implemented by a decision tree. This decision tree can be obtained from a training set using any one of the different algorithms proposed to date. This RODG is then used as the starting point for an algorithm that derives another RODG of minimal description length. The reduction in complexity is obtained by performing incremental changes in the RODG. By using ordered decision diagrams, the task of identifying common subgraphs is made much simpler than the identification of common sub-trees in a decision tree. Ordered decision graphs require that a variable ordering be specified in advance. The algorithm that derives such an ordering is based on a reordering algorithm commonly used that finds a locally optimal ordering by swapping the order of two adjacent variables. These algorithms are tested in a set of examples that are known to be hard to solve using decision trees. The results show that when an effective reduction of the description length is obtained, significant gains in generalization accuracycan be achieved. In all casesthe generalization accuracy of the final RODG was better than the generalization accuracy of the decision tree that was used as the starting point.

Using the Minimum Description Length Principle to Infer Reduced Ordered Decision Graphs

by Arlindo Oliveira, Alberto Sangiovanni-Vincentelli, Jude Shavlik - Machine Learning , 1996
"... . We propose an algorithm for the inference of decision graphs from a set of labeled instances. In particular, we propose to infer decision graphs where the variables can only be tested in accordance with a given order and no redundant nodes exist. This type of graphs, reduced ordered decision graph ..."
Abstract - Cited by 8 (1 self) - Add to MetaCart
. We propose an algorithm for the inference of decision graphs from a set of labeled instances. In particular, we propose to infer decision graphs where the variables can only be tested in accordance with a given order and no redundant nodes exist. This type of graphs, reduced ordered decision graphs, can be used as canonical representations of Boolean functions and can be manipulated using algorithms developed for that purpose. This work proposes a local optimization algorithm that generates compact decision graphs by performing local changes in an existing graph until a minimum is reached. The algorithm uses Rissanen's minimum description length principle to control the tradeoff between accuracy in the training set and complexity of the description. Techniques for the selection of the initial decision graph and for the selection of an appropriate ordering of the variables are also presented. Experimental results obtained using this algorithm in two sets of examples are presented and ...

Learning Monotonic Linear Functions

by Adam Kalai - Proceedings of the 17th Annual Conference on Learning Theory, 2004 , 2004
"... Abstract. Learning probabilities (p-concepts [13]) and other real-valued concepts (regression) is an important role of machine learning. For example, a doctor may need to predict the probability of getting a disease P [y|x], which depends on a number of risk factors. Generalized additive models [9] ..."
Abstract - Cited by 6 (3 self) - Add to MetaCart
Abstract. Learning probabilities (p-concepts [13]) and other real-valued concepts (regression) is an important role of machine learning. For example, a doctor may need to predict the probability of getting a disease P [y|x], which depends on a number of risk factors. Generalized additive models [9] are a well-studied nonparametric model in the statistics literature, usually with monotonic link functions. However, no known efficient algorithms exist for learning such a general class. We show that regression graphs efficiently learn such real-valued concepts, while regression trees inefficiently learn them. One corollary is that any function E[y|x] = u(w · x) for u monotonic can be learned to arbitrarily small squared error ɛ in time polynomial in 1/ɛ, |w|1, and the Lipschitz constant of u (analogous to a margin). The model includes, as special cases, linear and logistic regression, as well as learning a noisy half-space with a margin [5, 4]. Kearns, Mansour, and McAllester [12, 15], analyzed decision trees and decision graphs as boosting algorithms for classification accuracy. We extend their analysis and the boosting analogy to the case of real-valued predictors, where a small positive correlation coefficient can be boosted to arbitrary accuracy. Viewed as a noisy boosting algorithm [3, 10], the algorithm learns both the target function and the asymmetric noise. 1
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University