Results 1  10
of
24
AN INTRODUCTION TO VARIATIONAL METHODS FOR GRAPHICAL MODELS
 TO APPEAR: M. I. JORDAN, (ED.), LEARNING IN GRAPHICAL MODELS
"... ..."
Bucket Elimination: A Unifying Framework for Reasoning
"... Bucket elimination is an algorithmic framework that generalizes dynamic programming to accommodate many problemsolving and reasoning tasks. Algorithms such as directionalresolution for propositional satisfiability, adaptiveconsistency for constraint satisfaction, Fourier and Gaussian elimination ..."
Abstract

Cited by 274 (60 self)
 Add to MetaCart
(Show Context)
Bucket elimination is an algorithmic framework that generalizes dynamic programming to accommodate many problemsolving and reasoning tasks. Algorithms such as directionalresolution for propositional satisfiability, adaptiveconsistency for constraint satisfaction, Fourier and Gaussian elimination for solving linear equalities and inequalities, and dynamic programming for combinatorial optimization, can all be accommodated within the bucket elimination framework. Many probabilistic inference tasks can likewise be expressed as bucketelimination algorithms. These include: belief updating, finding the most probable explanation, and expected utility maximization. These algorithms share the same performance guarantees; all are time and space exponential in the inducedwidth of the problem's interaction graph. While elimination strategies have extensive demands on memory, a contrasting class of algorithms called "conditioning search" require only linear space. Algorithms in this class split a problem into subproblems by instantiating a subset of variables, called a conditioning set, or a cutset. Typical examples of conditioning search algorithms are: backtracking (in constraint satisfaction), and branch and bound (for combinatorial optimization). The paper presents the bucketelimination framework as a unifying theme across probabilistic and deterministic reasoning tasks and show how conditioning search can be augmented to systematically trade space for time.
Automatic Construction of Decision Trees from Data: A MultiDisciplinary Survey
 Data Mining and Knowledge Discovery
, 1997
"... Decision trees have proved to be valuable tools for the description, classification and generalization of data. Work on constructing decision trees from data exists in multiple disciplines such as statistics, pattern recognition, decision theory, signal processing, machine learning and artificial ne ..."
Abstract

Cited by 164 (1 self)
 Add to MetaCart
(Show Context)
Decision trees have proved to be valuable tools for the description, classification and generalization of data. Work on constructing decision trees from data exists in multiple disciplines such as statistics, pattern recognition, decision theory, signal processing, machine learning and artificial neural networks. Researchers in these disciplines, sometimes working on quite different problems, identified similar issues and heuristics for decision tree construction. This paper surveys existing work on decision tree construction, attempting to identify the important issues involved, directions the work has taken and the current state of the art. Keywords: classification, treestructured classifiers, data compaction 1. Introduction Advances in data collection methods, storage and processing technology are providing a unique challenge and opportunity for automated data exploration techniques. Enormous amounts of data are being collected daily from major scientific projects e.g., Human Genome...
Exploiting Tractable Substructures in Intractable Networks
 Advances in Neural Information Processing Systems 8
, 1995
"... We develop a refined mean field approximation for inference and learning in probabilistic neural networks. Our mean field theory, unlike most, does not assume that the units behave as independent degrees of freedom; instead, it exploits in a principled way the existence of large substructures that a ..."
Abstract

Cited by 104 (11 self)
 Add to MetaCart
We develop a refined mean field approximation for inference and learning in probabilistic neural networks. Our mean field theory, unlike most, does not assume that the units behave as independent degrees of freedom; instead, it exploits in a principled way the existence of large substructures that are computationally tractable. To illustrate the advantages of this framework, we show how to incorporate weak higher order interactions into a firstorder hidden Markov model, treating the corrections (but not the first order structure) within mean field theory. 1 INTRODUCTION Learning the parameters in a probabilistic neural network may be viewed as a problem in statistical estimation. In networks with sparse connectivity (e.g. trees and chains), there exist efficient algorithms for the exact probabilistic calculations that support inference and learning. In general, however, these calculations are intractable, and approximations are required. Mean field theory provides a framework for app...
Efficient learning in Boltzmann Machines using linear response theory
 Neural Computation
, 1997
"... The learning process in Boltzmann Machines is computationally very expensive. The computational complexity of the exact algorithm is exponential in the number of neurons. We present a new approximate learning algorithm for Boltzmann Machines, which is based on mean field theory and the linear respon ..."
Abstract

Cited by 44 (5 self)
 Add to MetaCart
(Show Context)
The learning process in Boltzmann Machines is computationally very expensive. The computational complexity of the exact algorithm is exponential in the number of neurons. We present a new approximate learning algorithm for Boltzmann Machines, which is based on mean field theory and the linear response theorem. The computational complexity of the algorithm is cubic in the number of neurons. In the absence of hidden units, we show how the weights can be directly computed from the fixed point equation of the learning rules. Thus, in this case we do not need to use a gradient descent procedure for the learning process. We show that the solutions of this method are close to the optimal solutions and give a significant improvement when correlations play a significant role. Finally, we apply the method to a pattern completion task and show good performance for networks up to 100 neurons. 1 Introduction Boltzmann Machines (BMs) (Ackley et al., 1985), are networks of binary neurons with a stoc...
Learning Predictive Compositional Hierarchies
, 2000
"... This paper explores the vital but overlooked problem of learning compositional hierarchies in predictive models and presents a new sequential learning paradigm in which to study such models. Hierarchical compositional structure, like taxonomic structure, is a critical representation tool for A ..."
Abstract

Cited by 10 (2 self)
 Add to MetaCart
(Show Context)
This paper explores the vital but overlooked problem of learning compositional hierarchies in predictive models and presents a new sequential learning paradigm in which to study such models. Hierarchical compositional structure, like taxonomic structure, is a critical representation tool for Articial Intelligence. Prominent existing work with handbuilt systems demonstrates the potential of predictive models based on compositional hierarchies for making inferences that smoothly integrate bottomup and topdown in uences and for enabling the processing of representations spanning multiple levels of spatial or temporal resolution. Additionally, like taxonomic hierarchies, compositional hierarchies can be learned purely from primitive data in a general, unsupervised fashion and subsequently used to make predictions about unseen data. However, unlike taxonomies, for which numerous foundational learning algorithms exist, there has not been analogous foundational work on learning predictive compositional hierarchies. The core aim of learning such models is to identify in a bottomup fashion frequently occurring repeated patterns, enabling the future discovery of even larger patterns. This process holds the potential to scale up automatically from negrained, lowlevel data to coarser, highlevel representations, bridging a gap that has proved to be one of the biggest stumbling blocks on the way to creating signicantly more complex and intelligent autonomous agents. 1
An Introduction to Variational Methods for Graphical Methods
 Machine Learning
, 1998
"... . This paper presents a tutorial introduction to the use of variational methods for inference and learning in graphical models (Bayesian networks and Markov random fields). We present a number of examples of graphical models, including the QMRDT database, the sigmoid belief network, the Boltzmann m ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
. This paper presents a tutorial introduction to the use of variational methods for inference and learning in graphical models (Bayesian networks and Markov random fields). We present a number of examples of graphical models, including the QMRDT database, the sigmoid belief network, the Boltzmann machine, and several variants of hidden Markov models, in which it is infeasible to run exact inference algorithms. We then introduce variational methods, which exploit laws of large numbers to transform the original graphical model into a simplified graphical model in which inference is efficient. Inference in the simpified model provides bounds on probabilities of interest in the original model. We describe a general framework for generating variational transformations based on convex duality. Finally we return to the examples and demonstrate how variational algorithms can be formulated in each case.
Bucket Elimination: a Unifying Framework for Structuredriven Inference
 Artificial Intelligence
, 1998
"... Bucket elimination is an algorithmic framework that generalizes dynamic programming to accommodate many complex problemsolving and reasoning tasks. Algorithms such as directionalresolution for propositional satisfiability, adaptiveconsistency for constraint satisfaction, Fourier and Gaussian ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
(Show Context)
Bucket elimination is an algorithmic framework that generalizes dynamic programming to accommodate many complex problemsolving and reasoning tasks. Algorithms such as directionalresolution for propositional satisfiability, adaptiveconsistency for constraint satisfaction, Fourier and Gaussian elimination, for solving linear equalities and inequalities and dynamic programming for combinatorial optimization, can all be accommodated within the bucket elimination framework. Many probabilistic inference tasks can likewise be expressed as bucketelimination algorithms. These include: belief updating, finding the most probable explanation and expected utility maximization. All these algorithms share the same performance guarantees; all are time and space exponential in the inducedwidth of the problem's interaction graph. While elimination strategies have extensive demands on memory, pure "conditioning" algorithms require only linear space. Conditioning is a generic name for a...
Efficient learning in sparsely connected Boltzmann machines
, 1996
"... We present a heuristical procedure for efficient estimation of the partition function in the Boltzmann distribution. The resulting speedup is of immediate relevance for the speedup of Boltzmann Machine learning rules, especially for networks with a sparse connectivity. ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
We present a heuristical procedure for efficient estimation of the partition function in the Boltzmann distribution. The resulting speedup is of immediate relevance for the speedup of Boltzmann Machine learning rules, especially for networks with a sparse connectivity.
Pruning Boltzmann Networks And Hidden Markov Models
 in Record of the Thirtieth Asilomar Conference on Signals, Systems and Computers
, 1996
"... We present sensitivitybased pruning algorithms for general Boltzmann networks. Central to our methods is the efficient calculation of a secondorder approximation to the true weight saliencies in a crossentropy error. Building upon recent work which shows a formal correspondence between linear Bol ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
We present sensitivitybased pruning algorithms for general Boltzmann networks. Central to our methods is the efficient calculation of a secondorder approximation to the true weight saliencies in a crossentropy error. Building upon recent work which shows a formal correspondence between linear Boltzmann chains and Hidden Markov Models (HMMs), we argue that our method can be applied to HMMs as well. We illustrate pruning on Boltzmann zippers, which are equivalent to two HMMs with crossconnection links. We verify that our secondorder approximation preserves the rank ordering of weight saliencies and thus the proper weight is pruned at each pruning step. In all our experiments in small problems, pruning reduces the generalization error; in most cases the pruned networks facilitate interpretation as well. 1. INTRODUCTION There is an enormous body of simulation work demonstrating the value of architecture optimization for networks for pattern classification, and this has properly led ...