Results 1 - 10
of
29
Multiresolution markov models for signal and image processing
- Proceedings of the IEEE
, 2002
"... This paper reviews a significant component of the rich field of statistical multiresolution (MR) modeling and processing. These MR methods have found application and permeated the literature of a widely scattered set of disciplines, and one of our principal objectives is to present a single, coheren ..."
Abstract
-
Cited by 83 (11 self)
- Add to MetaCart
This paper reviews a significant component of the rich field of statistical multiresolution (MR) modeling and processing. These MR methods have found application and permeated the literature of a widely scattered set of disciplines, and one of our principal objectives is to present a single, coherent picture of this framework. A second goal is to describe how this topic fits into the even larger field of MR methods and concepts–in particular making ties to topics such as wavelets and multigrid methods. A third is to provide several alternate viewpoints for this body of work, as the methods and concepts we describe intersect with a number of other fields. The principle focus of our presentation is the class of MR Markov processes defined on pyramidally organized trees. The attractiveness of these models stems from both the very efficient algorithms they admit and their expressive power and broad applicability. We show how a variety of methods and models relate to this framework including models for self-similar and 1/f processes. We also illustrate how these methods have been used in practice. We discuss the construction of MR models on trees and show how questions that arise in this context make contact with wavelets, state space modeling of time series, system and parameter identification, and hidden
On the complexity of non-projective data-driven dependency parsing
- In Proc. IWPT
, 2007
"... In this paper we investigate several nonprojective parsing algorithms for dependency parsing, providing novel polynomial time solutions under the assumption that each dependency decision is independent of all the others, called here the edge-factored model. We also investigate algorithms for non-pro ..."
Abstract
-
Cited by 22 (0 self)
- Add to MetaCart
In this paper we investigate several nonprojective parsing algorithms for dependency parsing, providing novel polynomial time solutions under the assumption that each dependency decision is independent of all the others, called here the edge-factored model. We also investigate algorithms for non-projective parsing that account for nonlocal information, and present several hardness results. This suggests that it is unlikely that exact non-projective dependency parsing is tractable for any model richer than the edge-factored model. 1
Discriminative, Generative and Imitative Learning
, 2002
"... I propose a common framework that combines three different paradigms in machine learning: generative, discriminative and imitative learning. A generative probabilistic distribution is a principled way to model many machine learning and machine perception problems. Therein, one provides domain specif ..."
Abstract
-
Cited by 21 (1 self)
- Add to MetaCart
I propose a common framework that combines three different paradigms in machine learning: generative, discriminative and imitative learning. A generative probabilistic distribution is a principled way to model many machine learning and machine perception problems. Therein, one provides domain specific knowledge in terms of structure and parameter priors over the joint space of variables. Bayesian networks and Bayesian statistics provide a rich and flexible language for specifying this knowledge and subsequently refining it with data and observations. The final result is a distribution that is a good generator of novel exemplars.
Model averaging for prediction with discrete Bayesian networks
- Journal of Machine Learning Research
, 1177
"... In this paper1 we consider the problem of performing Bayesian model-averaging over a class of discrete Bayesian network structures consistent with a partial ordering and with bounded in-degree k. We show that for N nodes this class contains in the worst-case at least Ω ( �N/2�N/2 k) distinct network ..."
Abstract
-
Cited by 13 (0 self)
- Add to MetaCart
In this paper1 we consider the problem of performing Bayesian model-averaging over a class of discrete Bayesian network structures consistent with a partial ordering and with bounded in-degree k. We show that for N nodes this class contains in the worst-case at least Ω ( �N/2�N/2 k) distinct network structures, and yet model averaging over these structures can be performed using O ( �N � k · N) operations. Furthermore we show that there exists a single Bayesian network that defines a joint distribution over the variables that is equivalent to model averaging over these structures. Although constructing this network is computationally prohibitive, we show that it can be approximated by a tractable network, allowing approximate model-averaged probability calculations to be performed in O(N) time. Our result also leads to an exact and linear-time solution to the problem of averaging over the 2N possible feature sets in a naïve Bayes model, providing an exact Bayesian solution to the troublesome feature-selection problem for naïve Bayes classifiers. We demonstrate the utility of these techniques in the context of supervised classification, showing empirically that model averaging consistently beats other generative Bayesian-network-based models, even when the generating model is not guaranteed to be a member of the class being averaged over. We characterize the performance over several parameters on simulated and real-world data.
Learning with tree-averaged densities and distributions
- NIPS
"... We utilize the ensemble of trees framework, a tractable mixture over superexponential number of tree-structured distributions [1], to develop a new model for multivariate density estimation. The model is based on a construction of treestructured copulas – multivariate distributions with uniform on [ ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
We utilize the ensemble of trees framework, a tractable mixture over superexponential number of tree-structured distributions [1], to develop a new model for multivariate density estimation. The model is based on a construction of treestructured copulas – multivariate distributions with uniform on [0, 1] marginals. By averaging over all possible tree structures, the new model can approximate distributions with complex variable dependencies. We propose an EM algorithm to estimate the parameters for these tree-averaged models for both the real-valued and the categorical case. Based on the tree-averaged framework, we propose a new model for joint precipitation amounts data on networks of rain stations. 1
Exact model averaging with naive Bayesian classifiers
- Proceedings of the Nineteenth International Conference on Machine Learning (ICML 2002
, 2002
"... The naive classifier is a well-established mathematical model whose simplicity, speed and accuracy have made it a popular choice for classification in AI and engineering. In this paper we show that, given N features of interest, it is possible to perform tractable exact model averaging (MA) over all ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
The naive classifier is a well-established mathematical model whose simplicity, speed and accuracy have made it a popular choice for classification in AI and engineering. In this paper we show that, given N features of interest, it is possible to perform tractable exact model averaging (MA) over all 2 N possible feature-set models. In fact, we show that it is possible to calculate parameters for a single naive classifier C ∗ such that C ∗ produces predictions equivalent to those obtained by the full model-averaging, and we show that C ∗ can be constructed using the same time and space complexity required to construct a single naive classifier with MAP parameters. We present experimental results which show that on average the MA classifier typically outperforms the MAP classifier on simulated data, and we characterize how the relative performance varies with number of variables, number of training records, and complexity of the generating distribution. Finally, we examine the performance of the MA naive model on the real-world ALARM and HEPAR networks and show MA improved classification here as well. 1.
Bayesian structure learning using dynamic programming and MCMC
- In UAI, 2007b
"... We show how to significantly speed up MCMC sampling of DAG structures by using a powerful non-local proposal based on Koivisto’s dynamic programming (DP) algorithm (11; 10), which computes the exact marginal posterior edge probabilities by analytically summing over orders. Furthermore, we show how s ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
We show how to significantly speed up MCMC sampling of DAG structures by using a powerful non-local proposal based on Koivisto’s dynamic programming (DP) algorithm (11; 10), which computes the exact marginal posterior edge probabilities by analytically summing over orders. Furthermore, we show how sampling in DAG space can avoid subtle biases that are introduced by approaches that work only with orders, such as Koivisto’s DP algorithm and MCMC order samplers (6; 5). 1
Tree Dependent Identically Distributed Learning
"... We view a dataset of points or samples as having an underlying, yet unspecified, tree structure and exploit this assumption in learning problems. Such a tree structure assumption is equivalent to treating a dataset as being tree dependent identically distributed or tdid and preserves exchangeability ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
We view a dataset of points or samples as having an underlying, yet unspecified, tree structure and exploit this assumption in learning problems. Such a tree structure assumption is equivalent to treating a dataset as being tree dependent identically distributed or tdid and preserves exchangeability. This extends traditional iid assumptions on data since each datum can be sampled sequentially after being conditioned on a parent. Instead of hypothesizing a single best tree structure, we infer a richer Bayesian posterior distribution over tree structures from a given dataset. We compute this posterior over (directed or undirected) trees via the Laplacian of conditional distributions between pairs of input data points. This posterior distribution is efficiently normalized by the Laplacian’s determinant and also facilitates novel maximum likelihood estimators, efficient expectations and other useful inference computations. In a classification setting, tdid assumptions yield a criterion that maximizes the determinant of a matrix of conditional distributions between pairs of input and output points. This leads to a novel classification algorithm we call the Maximum Determinant Machine. Unsupervised and supervised experiments are shown. 1

