Results 11  20
of
156
Structure and Strength in Causal Induction
"... We present a framework for the rational analysis of elemental causal induction – learning about the existence of a relationship between a single cause and effect – based upon causal graphical models. This framework makes precise the distinction between causal structure and causal strength: the diffe ..."
Abstract

Cited by 89 (30 self)
 Add to MetaCart
We present a framework for the rational analysis of elemental causal induction – learning about the existence of a relationship between a single cause and effect – based upon causal graphical models. This framework makes precise the distinction between causal structure and causal strength: the difference between asking whether a causal relationship exists and asking how strong that causal relationship might be. We show that two leading rational models of elemental causal induction, ∆P and causal power, both estimate causal strength, and introduce a new rational model, causal support, that assesses causal structure. Causal support predicts several key phenomena of causal induction that cannot be accounted for by other rational models, which we explore through a series of experiments. These phenomena include the complex interaction between ∆P and the baserate probability of the effect in the absence of the cause, sample size effects, inferences from incomplete contingency tables, and causal learning from rates. Causal support also provides a better account of a number of existing datasets than either ∆P or causal power.
Spiking Boltzmann machines
 In Advances in Neural Information Processing Systems
, 1998
"... A Boltzmann Machine is a network of symmetrically connected, neuronlike units that make stochastic decisions about whether to be on or off. Boltzmann machines have a simple learning algorithm that allows them to discover interesting features in datasets composed of binary vectors. The learning algor ..."
Abstract

Cited by 85 (14 self)
 Add to MetaCart
A Boltzmann Machine is a network of symmetrically connected, neuronlike units that make stochastic decisions about whether to be on or off. Boltzmann machines have a simple learning algorithm that allows them to discover interesting features in datasets composed of binary vectors. The learning algorithm is very slow in networks with many layers of feature detectors, but it can be made much faster by learning one layer of feature detectors at a time. Boltzmann machines are used to solve two quite different computational problems. For a search problem, the weights on the connections are fixed and are used to represent the cost function of an optimization problem. The stochastic dynamics of a Boltzmann machine then allow it to sample binary state vectors that represent good solutions to the optimization problem. For a learning problem, the Boltzmann machine is shown a set of binary data vectors and it must find weights on the connections so that the data vectors are good solutions to the optimization problem defined by those weights. To solve a learning problem, Boltzmann machines make many small updates to their weights, and each update requires them to solve many different search problems. The stochastic dynamics of a Boltzmann machine When unit i is given the opportunity to update its binary state, it first computes its total input, zi, which is the sum of its own bias, bi, and the weights on connections coming from other active units: zi = bi + �
Markovian Models for Sequential Data
, 1996
"... Hidden Markov Models (HMMs) are statistical models of sequential data that have been used successfully in many machine learning applications, especially for speech recognition. Furthermore, in the last few years, many new and promising probabilistic models related to HMMs have been proposed. We firs ..."
Abstract

Cited by 84 (2 self)
 Add to MetaCart
Hidden Markov Models (HMMs) are statistical models of sequential data that have been used successfully in many machine learning applications, especially for speech recognition. Furthermore, in the last few years, many new and promising probabilistic models related to HMMs have been proposed. We first summarize the basics of HMMs, and then review several recent related learning algorithms and extensions of HMMs, including in particular hybrids of HMMs with artificial neural networks, InputOutput HMMs (which are conditional HMMs using neural networks to compute probabilities), weighted transducers, variablelength Markov models and Markov switching statespace models. Finally, we discuss some of the challenges of future research in this very active area. 1 Introduction Hidden Markov Models (HMMs) are statistical models of sequential data that have been used successfully in many applications in artificial intelligence, pattern recognition, speech recognition, and modeling of biological ...
Training restricted Boltzmann machines using approximations to the likelihood gradient
 Proceedings of the 25th international conference on Machine learning
, 2008
"... A new algorithm for training Restricted Boltzmann Machines is introduced. The algorithm, named Persistent Contrastive Divergence, is different from the standard Contrastive Divergence algorithms in that it aims to draw samples from almost exactly the model distribution. It is compared to some standa ..."
Abstract

Cited by 84 (2 self)
 Add to MetaCart
A new algorithm for training Restricted Boltzmann Machines is introduced. The algorithm, named Persistent Contrastive Divergence, is different from the standard Contrastive Divergence algorithms in that it aims to draw samples from almost exactly the model distribution. It is compared to some standard Contrastive Divergence and PseudoLikelihood algorithms on the tasks of modeling and classifying various types of data. The Persistent Contrastive Divergence algorithm outperforms the other algorithms, and is equally fast and simple.
Local Learning in Probabilistic Networks With Hidden Variables
, 1995
"... Probabilistic networks, which provide compact descriptions of complex stochastic relationships among several random variables, are rapidly becoming the tool of choice for uncertain reasoning in artificial intelligence. We show that networks with fixed structure containing hidden variables can be lea ..."
Abstract

Cited by 77 (4 self)
 Add to MetaCart
Probabilistic networks, which provide compact descriptions of complex stochastic relationships among several random variables, are rapidly becoming the tool of choice for uncertain reasoning in artificial intelligence. We show that networks with fixed structure containing hidden variables can be learned automatically from data using a gradientdescent mechanism similar to that used in neural networks. We also extend the method to networks with intensionally represented distributions, including networks with continuous variables and dynamic probabilistic networks. Because probabilistic networks provide explicit representations of causal structure, human experts can easily contribute prior knowledge to the training process, thereby significantly improving the learning rate. Adaptive probabilistic networks (APNs) may soon compete directly with neural networks as models in computational neuroscience as well as in industrial and financial applications. 1 Introduction Intelligent systems, ...
Tutorial on Variational Approximation Methods
 In Advanced Mean Field Methods: Theory and Practice
, 2000
"... We provide an introduction to the theory and use of variational methods for inference and estimation in the context of graphical models. Variational methods become useful as ecient approximate methods when the structure of the graph model no longer admits feasible exact probabilistic calculations. T ..."
Abstract

Cited by 73 (1 self)
 Add to MetaCart
We provide an introduction to the theory and use of variational methods for inference and estimation in the context of graphical models. Variational methods become useful as ecient approximate methods when the structure of the graph model no longer admits feasible exact probabilistic calculations. The emphasis of this tutorial is on illustrating how inference and estimation problems can be transformed into variational form along with describing the resulting approximation algorithms and their properties insofar as these are currently known. 1 Introduction The term variational methods refers to a large collection of optimization techniques. The classical context for these methods involves nding the extremum of an integral depending on an unknown function and its derivatives. This classical de nition, however, and the accompanying calculus of variation no longer adequately characterizes modern variational methods. Modern variational approaches have become indispensable tools in...
A Joint Model of Text and Aspect Ratings for Sentiment Summarization
 PROC. ACL08: HLT
, 2008
"... Online reviews are often accompanied with numerical ratings provided by users for a set of service or product aspects. We propose a statistical model which is able to discover corresponding topics in text and extract textual evidence from reviews supporting each of these aspect ratings – a fundament ..."
Abstract

Cited by 67 (0 self)
 Add to MetaCart
Online reviews are often accompanied with numerical ratings provided by users for a set of service or product aspects. We propose a statistical model which is able to discover corresponding topics in text and extract textual evidence from reviews supporting each of these aspect ratings – a fundamental problem in aspectbased sentiment summarization (Hu and Liu, 2004a). Our model achieves high accuracy, without any explicitly labeled data except the user provided opinion ratings. The proposed approach is general and can be used for segmentation in other applications where sequential data is accompanied with correlated signals.
Causal independence for probability assessment and inference using Bayesian networks
 IEEE Trans. on Systems, Man and Cybernetics
, 1994
"... ABayesian network is a probabilistic representation for uncertain relationships, which has proven to be useful for modeling realworld problems. When there are many potential causes of a given e ect, however, both probability assessment and inference using a Bayesian network can be di cult. In this ..."
Abstract

Cited by 65 (2 self)
 Add to MetaCart
ABayesian network is a probabilistic representation for uncertain relationships, which has proven to be useful for modeling realworld problems. When there are many potential causes of a given e ect, however, both probability assessment and inference using a Bayesian network can be di cult. In this paper, we describe causal independence, a collection of conditional independence assertions and functional relationships that are often appropriate to apply to the representation of the uncertain interactions between causes and e ect. We show how the use of causal independence in a Bayesian network can greatly simplify probability assessment aswell as probabilistic inference. 1
Discretizing Continuous Attributes While Learning Bayesian Networks
 In Proc. ICML
, 1996
"... We introduce a method for learning Bayesian networks that handles the discretization of continuous variables as an integral part of the learning process. The main ingredient in this method is a new metric based on the Minimal Description Length principle for choosing the threshold values for the dis ..."
Abstract

Cited by 64 (4 self)
 Add to MetaCart
We introduce a method for learning Bayesian networks that handles the discretization of continuous variables as an integral part of the learning process. The main ingredient in this method is a new metric based on the Minimal Description Length principle for choosing the threshold values for the discretization while learning the Bayesian network structure. This score balances the complexity of the learned discretization and the learned network structure against how well they model the training data. This ensures that the discretization of each variable introduces just enough intervals to capture its interaction with adjacent variables in the network. We formally derive the new metric, study its main properties, and propose an iterative algorithm for learning a discretization policy. Finally, we illustrate its behavior in applications to supervised learning. 1 INTRODUCTION Bayesian networks provide efficient and effective representation of the joint probability distribution over a set ...
Acoustic modeling using deep belief networks
 IEEE Trans. Audio, Speech, Lang. Process
, 2012
"... Abstract—Gaussian mixture models are currently the dominant technique for modeling the emission distribution of hidden Markov models for speech recognition. We show that better phone recognition on the TIMIT dataset can be achieved by replacing Gaussian mixture models by deep neural networks that co ..."
Abstract

Cited by 64 (13 self)
 Add to MetaCart
Abstract—Gaussian mixture models are currently the dominant technique for modeling the emission distribution of hidden Markov models for speech recognition. We show that better phone recognition on the TIMIT dataset can be achieved by replacing Gaussian mixture models by deep neural networks that contain many layers of features and a very large number of parameters. These networks are first pretrained as a multilayer generative model of a window of spectral feature vectors without making use of any discriminative information. Once the generative pretraining has designed the features, we perform discriminative finetuning using backpropagation to adjust the features slightly to make them better at predicting a probability distribution over the states of monophone hidden Markov models. Index Terms—Acoustic modeling, deep belief networks (DBNs), neural networks, phone recognition. I.