Results 1  10
of
39
Multiresolution markov models for signal and image processing
 Proceedings of the IEEE
, 2002
"... This paper reviews a significant component of the rich field of statistical multiresolution (MR) modeling and processing. These MR methods have found application and permeated the literature of a widely scattered set of disciplines, and one of our principal objectives is to present a single, coheren ..."
Abstract

Cited by 142 (18 self)
 Add to MetaCart
(Show Context)
This paper reviews a significant component of the rich field of statistical multiresolution (MR) modeling and processing. These MR methods have found application and permeated the literature of a widely scattered set of disciplines, and one of our principal objectives is to present a single, coherent picture of this framework. A second goal is to describe how this topic fits into the even larger field of MR methods and concepts–in particular making ties to topics such as wavelets and multigrid methods. A third is to provide several alternate viewpoints for this body of work, as the methods and concepts we describe intersect with a number of other fields. The principle focus of our presentation is the class of MR Markov processes defined on pyramidally organized trees. The attractiveness of these models stems from both the very efficient algorithms they admit and their expressive power and broad applicability. We show how a variety of methods and models relate to this framework including models for selfsimilar and 1/f processes. We also illustrate how these methods have been used in practice. We discuss the construction of MR models on trees and show how questions that arise in this context make contact with wavelets, state space modeling of time series, system and parameter identification, and hidden
Discriminative, Generative and Imitative Learning
, 2002
"... I propose a common framework that combines three different paradigms in machine learning: generative, discriminative and imitative learning. A generative probabilistic distribution is a principled way to model many machine learning and machine perception problems. Therein, one provides domain specif ..."
Abstract

Cited by 42 (1 self)
 Add to MetaCart
I propose a common framework that combines three different paradigms in machine learning: generative, discriminative and imitative learning. A generative probabilistic distribution is a principled way to model many machine learning and machine perception problems. Therein, one provides domain specific knowledge in terms of structure and parameter priors over the joint space of variables. Bayesian networks and Bayesian statistics provide a rich and flexible language for specifying this knowledge and subsequently refining it with data and observations. The final result is a distribution that is a good generator of novel exemplars.
On the complexity of nonprojective datadriven dependency parsing
 In Proc. IWPT
, 2007
"... In this paper we investigate several nonprojective parsing algorithms for dependency parsing, providing novel polynomial time solutions under the assumption that each dependency decision is independent of all the others, called here the edgefactored model. We also investigate algorithms for nonpro ..."
Abstract

Cited by 33 (1 self)
 Add to MetaCart
In this paper we investigate several nonprojective parsing algorithms for dependency parsing, providing novel polynomial time solutions under the assumption that each dependency decision is independent of all the others, called here the edgefactored model. We also investigate algorithms for nonprojective parsing that account for nonlocal information, and present several hardness results. This suggests that it is unlikely that exact nonprojective dependency parsing is tractable for any model richer than the edgefactored model. 1
Bayesian structure learning using dynamic programming and MCMC
 In UAI, 2007b
"... We show how to significantly speed up MCMC sampling of DAG structures by using a powerful nonlocal proposal based on Koivisto’s dynamic programming (DP) algorithm (11; 10), which computes the exact marginal posterior edge probabilities by analytically summing over orders. Furthermore, we show how s ..."
Abstract

Cited by 24 (1 self)
 Add to MetaCart
We show how to significantly speed up MCMC sampling of DAG structures by using a powerful nonlocal proposal based on Koivisto’s dynamic programming (DP) algorithm (11; 10), which computes the exact marginal posterior edge probabilities by analytically summing over orders. Furthermore, we show how sampling in DAG space can avoid subtle biases that are introduced by approaches that work only with orders, such as Koivisto’s DP algorithm and MCMC order samplers (6; 5). 1
Learning with treeaveraged densities and distributions
 NIPS
"... We utilize the ensemble of trees framework, a tractable mixture over superexponential number of treestructured distributions [1], to develop a new model for multivariate density estimation. The model is based on a construction of treestructured copulas – multivariate distributions with uniform on [ ..."
Abstract

Cited by 22 (0 self)
 Add to MetaCart
We utilize the ensemble of trees framework, a tractable mixture over superexponential number of treestructured distributions [1], to develop a new model for multivariate density estimation. The model is based on a construction of treestructured copulas – multivariate distributions with uniform on [0, 1] marginals. By averaging over all possible tree structures, the new model can approximate distributions with complex variable dependencies. We propose an EM algorithm to estimate the parameters for these treeaveraged models for both the realvalued and the categorical case. Based on the treeaveraged framework, we propose a new model for joint precipitation amounts data on networks of rain stations. 1
Model averaging for prediction with discrete Bayesian networks
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2004
"... In this paper we consider the problem of performing Bayesian modelaveraging over a class of discrete Bayesian network structures consistent with a partial ordering and with bounded indegree k. We show that for N nodes this class contains in the worstcase at least Ω ( �N/2�N/2 k) distinct network ..."
Abstract

Cited by 19 (1 self)
 Add to MetaCart
(Show Context)
In this paper we consider the problem of performing Bayesian modelaveraging over a class of discrete Bayesian network structures consistent with a partial ordering and with bounded indegree k. We show that for N nodes this class contains in the worstcase at least Ω ( �N/2�N/2 k) distinct network structures, and yet model averaging over these structures can be performed using O ( �N � k · N) operations. Furthermore we show that there exists a single Bayesian network that defines a joint distribution over the variables that is equivalent to model averaging over these structures. Although constructing this network is computationally prohibitive, we show that it can be approximated by a tractable network, allowing approximate modelaveraged probability calculations to be performed in O(N) time. Our result also leads to an exact and lineartime solution to the problem of averaging over the 2N possible feature sets in a naïve Bayes model, providing an exact Bayesian solution to the troublesome featureselection problem for naïve Bayes classifiers. We demonstrate the utility of these techniques in the context of supervised classification, showing empirically that model averaging consistently beats other generative Bayesiannetworkbased models, even when the generating model is not guaranteed to be a member of the class being averaged over. We characterize the performance over several parameters on simulated and realworld data.
Exact model averaging with naive Bayesian classifiers
 Proceedings of the Nineteenth International Conference on Machine Learning (ICML 2002
, 2002
"... The naive classifier is a wellestablished mathematical model whose simplicity, speed and accuracy have made it a popular choice for classification in AI and engineering. In this paper we show that, given N features of interest, it is possible to perform tractable exact model averaging (MA) over all ..."
Abstract

Cited by 12 (4 self)
 Add to MetaCart
(Show Context)
The naive classifier is a wellestablished mathematical model whose simplicity, speed and accuracy have made it a popular choice for classification in AI and engineering. In this paper we show that, given N features of interest, it is possible to perform tractable exact model averaging (MA) over all 2N possible featureset models. In fact, we show that it is possible to calculate parameters for a single naive classifier C ∗ such that C∗ produces predictions equivalent to those obtained by the full modelaveraging, and we show that C ∗ can be constructed using the same time and space complexity required to construct a single naive classifier with MAP parameters. We present experimental results which show that on average the MA classifier typically outperforms the MAP classifier on simulated data, and we characterize how the relative performance varies with number of variables, number of training records, and complexity of the generating distribution. Finally, we examine the performance of the MA naive model on the realworld ALARM and HEPAR networks and show MA improved classification here as well. 1.
Tractable Bayesian Inference of TimeSeries Dependence Structure
"... We consider the problem of Bayesian inference of graphical structure describing the interactions among multiple vector timeseries. A directed temporal interaction model is presented which assumes a fixed dependence structure among timeseries. Using a conjugate prior over this model’s structure and ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
(Show Context)
We consider the problem of Bayesian inference of graphical structure describing the interactions among multiple vector timeseries. A directed temporal interaction model is presented which assumes a fixed dependence structure among timeseries. Using a conjugate prior over this model’s structure and parameters, we focus our attention on characterizing the exact posterior uncertainty in the structure given data. The model is extended via the introduction of a dynamically evolving latent variable which indexes dependence structures over time. Performing inference using this model yields promising results when analyzing the interaction of multiple tracked moving objects. 1