Results 21 - 30
of
122
Utilities as random variables: Density estimation and structure discovery
- In UAI
, 2000
"... Decision theory does not traditionally include uncertainty over utility functions. We argue that the a person’s utility value for a given outcome can be treated as we treat other domain attributes: as a random variable with a density function over its possible values. We show that we can apply stati ..."
Abstract
-
Cited by 27 (1 self)
- Add to MetaCart
Decision theory does not traditionally include uncertainty over utility functions. We argue that the a person’s utility value for a given outcome can be treated as we treat other domain attributes: as a random variable with a density function over its possible values. We show that we can apply statistical density estimation techniques to learn such a density function from a database of partially elicited utility functions. In particular, we define a Bayesian learning framework for this problem, assuming the distribution over utilities is a mixture of Gaussians, where the mixture components represent statistically coherent subpopulations. We can also extend our techniques to the problem of discovering generalized additivity structure in the utility functions in the population. We define a Bayesian model selection criterion for utility function structure and a search procedure over structures. The factorization of the utilities in the learned model, and the generalization obtained from density estimation, allows us to provide robust estimates of utilities using a significantly smaller number of utility elicitation questions. We experiment with our technique on synthetic utility data and on a real database of utility functions in the domain of prenatal diagnosis. 1
Variational bayesian learning of directed graphical models with hidden variables, Bayesian Analysis 1
, 2006
"... Abstract. A key problem in statistics and machine learning is inferring suitable structure of a model given some observed data. A Bayesian approach to model comparison makes use of the marginal likelihood of each candidate model to form a posterior distribution over models; unfortunately for most mo ..."
Abstract
-
Cited by 27 (2 self)
- Add to MetaCart
Abstract. A key problem in statistics and machine learning is inferring suitable structure of a model given some observed data. A Bayesian approach to model comparison makes use of the marginal likelihood of each candidate model to form a posterior distribution over models; unfortunately for most models of interest, notably those containing hidden or latent variables, the marginal likelihood is intractable to compute. We present the variational Bayesian (VB) algorithm for directed graphical models, which optimises a lower bound approximation to the marginal likelihood in a procedure similar to the standard EM algorithm. We show that for a large class of models, which we call conjugate exponential, the VB algorithm is a straightforward generalisation of the EM algorithm that incorporates uncertainty over model parameters. In a thorough case study using a small class of bipartite DAGs containing hidden variables, we compare the accuracy of the VB approximation to existing asymptotic-data approximations such as the Bayesian Information Criterion (BIC) and the Cheeseman-Stutz (CS) criterion, and also to a sampling based gold standard, Annealed Importance Sampling (AIS). We find that the VB algorithm is empirically superior to CS and BIC, and much faster than AIS. Moreover, we prove that a VB approximation can always be constructed in such a way that guarantees it to be more accurate than the CS approximation.
Variable Selection for Model-Based Clustering
- Journal of the American Statistical Association
, 2006
"... We consider the problem of variable or feature selection for model-based clustering. We recast the problem of comparing two nested subsets of variables as a model comparison problem, and address it using approximate Bayes factors. We develop a greedy search algorithm for finding a local optimum in m ..."
Abstract
-
Cited by 27 (4 self)
- Add to MetaCart
We consider the problem of variable or feature selection for model-based clustering. We recast the problem of comparing two nested subsets of variables as a model comparison problem, and address it using approximate Bayes factors. We develop a greedy search algorithm for finding a local optimum in model space. The resulting method selects variables (or features), the number of clusters, and the clustering model simultaneously. We applied the method to several simulated and real examples, and found that removing irrelevant variables often improved performance. Compared to methods based on all the variables, our variable selection method consistently yielded more accurate estimates of the number of clusters, and lower classification error rates, as well as more parsimonious clustering models and easier visualization of results.
Computational methods for dynamic graphs
- Journal of Computational and Graphical Statistics
, 2003
"... ..."
Learning mixtures of DAG models
, 1997
"... We describe computationally efficient methods for learning mixtures in which each component is a directed acyclic graphical model (mixtures of DAGs or MDAGs). We argue that simple search-and-score algorithms are infeasible for a variety of problems, and introduce a feasible approach in which paramet ..."
Abstract
-
Cited by 24 (2 self)
- Add to MetaCart
We describe computationally efficient methods for learning mixtures in which each component is a directed acyclic graphical model (mixtures of DAGs or MDAGs). We argue that simple search-and-score algorithms are infeasible for a variety of problems, and introduce a feasible approach in which parameter and structure search is interleaved and expected data is treated as real data. Our approach can be viewed as a combination of (1) the Cheeseman–Stutz asymptotic approximation for model posterior probability and (2) the Expectation–Maximization algorithm. We evaluate our procedure for selecting among MDAGs on synthetic and real examples. 1
Learning the dimensionality of hidden variables
- In UAI ’01
, 2001
"... A serious problem in learning probabilistic models is the presence of hidden variables. These variables are not observed, yet interact with several of the observed variables. Detecting hidden variables poses two problems: determining the relations to other variables in the model and determining the ..."
Abstract
-
Cited by 23 (3 self)
- Add to MetaCart
A serious problem in learning probabilistic models is the presence of hidden variables. These variables are not observed, yet interact with several of the observed variables. Detecting hidden variables poses two problems: determining the relations to other variables in the model and determining the number of states of the hidden variable. In this paper, we address the latter problem in the context of Bayesian networks. We describe an approach that utilizes a score-based agglomerative state-clustering. As we show, this approach allows us to efficiently evaluate models with a range of cardinalities for the hidden variable. We show how to extend this procedure to deal with multiple interacting hidden variables. We demonstrate the effectiveness of this approach by evaluating it on synthetic and real-life data. We show that our approach learns models with hidden variables that generalize better and have better structure than previous approaches. 1
A simple constraint-based algorithm for efficiently mining observational databases for causal relationships
- Data Mining and Knowledge Discovery
, 1997
"... Abstract. This paper presents a simple, efficient computer-based method for discovering causal relationships from databases that contain observational data. Observational data is passively observed, as contrasted with experimental data. Most of the databases available for data mining are observation ..."
Abstract
-
Cited by 21 (1 self)
- Add to MetaCart
Abstract. This paper presents a simple, efficient computer-based method for discovering causal relationships from databases that contain observational data. Observational data is passively observed, as contrasted with experimental data. Most of the databases available for data mining are observational. There is great potential for mining such databases to discover causal relationships. We illustrate how observational data can constrain the causal relationships among measured variables, sometimes to the point that we can conclude that one variable is causing another variable. The presentation here is based on a constraint-based approach to causal discovery. A primary purpose of this paper is to present the constraint-based causal discovery method in the simplest possible fashion in order to (1) readily convey the basic ideas that underlie more complex constraint-based causal discovery techniques, and (2) permit interested readers to rapidly program and apply the method to their own databases, as a start toward using more elaborate causal discovery algorithms.
Learning hidden variable networks: The information bottleneck approach
- Journal of Machine Learning Research
, 2005
"... A central challenge in learning probabilistic graphical models is dealing with domains that involve hidden variables. The common approach for learning model parameters in such domains is the expectation maximization (EM) algorithm. This algorithm, however, can easily get trapped in suboptimal local ..."
Abstract
-
Cited by 19 (0 self)
- Add to MetaCart
A central challenge in learning probabilistic graphical models is dealing with domains that involve hidden variables. The common approach for learning model parameters in such domains is the expectation maximization (EM) algorithm. This algorithm, however, can easily get trapped in suboptimal local maxima. Learning the model structure is even more challenging. The structural EM algorithm can adapt the structure in the presence of hidden variables, but usually performs poorly without prior knowledge about the cardinality and location of the hidden variables. In this work, we present a general approach for learning Bayesian networks with hidden variables that overcomes these problems. The approach builds on the information bottleneck framework of Tishby et al. (1999). We start by proving formal correspondence between the information bottleneck objective and the standard parametric EM functional. We then use this correspondence to construct a learning algorithm that combines an information-theoretic smoothing term with a continuation procedure. Intuitively, the algorithm bypasses local maxima and achieves superior solutions by following a continuous path from a solution of, an easy and smooth, target function, to a solution of the desired likelihood function. As we show, our algorithmic framework allows learning of the parameters as well as the structure of a network. In addition, it also allows us to introduce new hidden variables during model selection and learn their cardinality. We demonstrate the performance of our procedure on several challenging real-life data sets.
Generalized Model Selection For Unsupervised Learning In High Dimensions
- Proceedings of Neural Information Processing Systems
, 1999
"... In this paper we describe an approach to model selection in unsupervised learning. This approach determines both the feature set and the number of clusters. To this end we first derive an objective function that explicitly incorporates this generalization. We then evaluate two schemes for model sele ..."
Abstract
-
Cited by 14 (2 self)
- Add to MetaCart
In this paper we describe an approach to model selection in unsupervised learning. This approach determines both the feature set and the number of clusters. To this end we first derive an objective function that explicitly incorporates this generalization. We then evaluate two schemes for model selection - one using this objective function (a Bayesian estimation scheme that selects the best model structure using the marginal or integrated likelihood) and the second based on a technique using a cross-validated likelihood criterion. In the first scheme, for a particular application in document clustering, we derive a closed-form solution of the integrated likelihood by assuming an appropriate form of the likelihood function and prior. Extensive experiments are carried out to ascertain the validity of both approaches and all results are verified by comparison against ground truth. In our experiments the Bayesian scheme using our objective function gave better results tha n cross-validatio...

