Results 1 - 10
of
85
Learning in graphical models
, 2004
"... Statistical applications in fields such as bioinformatics, information retrieval, speech processing, image processing and communications often involve large-scale models in which thousands or millions of random variables are linked in complex ways. Graphical models provide a general methodology for ..."
Abstract
-
Cited by 469 (8 self)
- Add to MetaCart
Statistical applications in fields such as bioinformatics, information retrieval, speech processing, image processing and communications often involve large-scale models in which thousands or millions of random variables are linked in complex ways. Graphical models provide a general methodology for approaching these problems, and indeed many of the models developed by researchers in these applied fields are instances of the general graphical model formalism. We review some of the basic ideas underlying graphical models, including the algorithmic ideas that allow graphical models to be deployed in large-scale data analysis problems. We also present examples of graphical models in bioinformatics, error-control coding and language processing. Key words and phrases: Probabilistic graphical models, junction tree algorithm, sum-product algorithm, Markov chain Monte Carlo, variational inference, bioinformatics, error-control coding.
Inducing Features of Random Fields
- IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
, 1997
"... We present a technique for constructing random fields from a set of training samples. The learning paradigm builds increasingly complex fields by allowing potential functions, or features, that are supported by increasingly large subgraphs. Each feature has a weight that is trained by minimizing the ..."
Abstract
-
Cited by 464 (14 self)
- Add to MetaCart
We present a technique for constructing random fields from a set of training samples. The learning paradigm builds increasingly complex fields by allowing potential functions, or features, that are supported by increasingly large subgraphs. Each feature has a weight that is trained by minimizing the Kullback-Leibler divergence between the model and the empirical distribution of the training data. A greedy algorithm determines how features are incrementally added to the field and an iterative scaling algorithm is used to estimate the optimal values of the weights. The random field models and techniques introduced in this paper differ from those common to much of the computer vision literature in that the underlying random fields are non-Markovian and have a large number of parameters that must be estimated. Relations to other learning approaches, including decision trees, are given. As a demonstration of the method, we describe its application to the problem of automatic word classifica...
Minimax Entropy Principle and Its Application to Texture Modeling
, 1997
"... This article proposes a general theory and methodology, called the minimax entropy principle, for building statistical models for images (or signals) in a variety of applications. This principle consists of two parts. The first is the maximum entropy principle for feature binding (or fusion): for a ..."
Abstract
-
Cited by 165 (33 self)
- Add to MetaCart
This article proposes a general theory and methodology, called the minimax entropy principle, for building statistical models for images (or signals) in a variety of applications. This principle consists of two parts. The first is the maximum entropy principle for feature binding (or fusion): for a certain set of feature statistics, a distribution can be built to bind these feature statistics together by maximizing the entropy over all distributions that reproduce these feature statistics. The second part is the minimum entropy principle for feature selection: among all plausible sets of feature statistics, we choose the set whose maximum entropy distribution has the minimum entropy. Computational and inferential issues in both parts are addressed, in particular, a feature pursuit procedure is proposed for approximately selecting the optimal set of features. The model complexity is restricted because of the sample variation in the observed feature statistics. The minimax entropy principle is applied to texture modeling, where a novel Markov random field (MRF) model, called FRAME (Filter, Random field, And Minimax Entropy), is derived, and encouraging results are obtained in experiments on a variety of texture images. Relationship between our theory and the mechanisms of neural computation is also discussed.
Variational inference for Dirichlet process mixtures
- Bayesian Analysis
, 2005
"... Abstract. Dirichlet process (DP) mixture models are the cornerstone of nonparametric Bayesian statistics, and the development of Monte-Carlo Markov chain (MCMC) sampling methods for DP mixtures has enabled the application of nonparametric Bayesian methods to a variety of practical data analysis prob ..."
Abstract
-
Cited by 90 (12 self)
- Add to MetaCart
Abstract. Dirichlet process (DP) mixture models are the cornerstone of nonparametric Bayesian statistics, and the development of Monte-Carlo Markov chain (MCMC) sampling methods for DP mixtures has enabled the application of nonparametric Bayesian methods to a variety of practical data analysis problems. However, MCMC sampling can be prohibitively slow, and it is important to explore alternatives. One class of alternatives is provided by variational methods, a class of deterministic algorithms that convert inference problems into optimization problems (Opper and Saad 2001; Wainwright and Jordan 2003). Thus far, variational methods have mainly been explored in the parametric setting, in particular within the formalism of the exponential family (Attias 2000; Ghahramani and Beal 2001; Blei et al. 2003). In this paper, we present a variational inference algorithm for DP mixtures. We present experiments that compare the algorithm to Gibbs sampling algorithms for DP mixtures of Gaussians and present an application to a large-scale image analysis problem.
Convexity, Classification, and Risk Bounds
- JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
, 2003
"... Many of the classification algorithms developed in the machine learning literature, including the support vector machine and boosting, can be viewed as minimum contrast methods that minimize a convex surrogate of the 0-1 loss function. The convexity makes these algorithms computationally efficien ..."
Abstract
-
Cited by 73 (11 self)
- Add to MetaCart
Many of the classification algorithms developed in the machine learning literature, including the support vector machine and boosting, can be viewed as minimum contrast methods that minimize a convex surrogate of the 0-1 loss function. The convexity makes these algorithms computationally efficient. The use of a surrogate, however, has statistical consequences that must be balanced against the computational virtues of convexity. To study these issues, we provide a general quantitative relationship between the risk as assessed using the 0-1 loss and the risk as assessed using any nonnegative surrogate loss function. We show that this relationship gives nontrivial upper bounds on excess risk under the weakest possible condition on the loss function: that it satisfy a pointwise form of Fisher consistency for classification. The relationship is based on a simple variational transformation of the loss function that is easy to compute in many applications. We also present a refined version of this result in the case of low noise. Finally, we
A principled approach to detecting surprising events in video
- in Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR
, 2005
"... Primates demonstrate unparalleled ability at rapidly orienting towards important events in complex dynamic environments. During rapid guidance of attention and gaze towards potential objects of interest or threats, often there is no time for detailed visual analysis. Thus, heuristic computations are ..."
Abstract
-
Cited by 60 (5 self)
- Add to MetaCart
Primates demonstrate unparalleled ability at rapidly orienting towards important events in complex dynamic environments. During rapid guidance of attention and gaze towards potential objects of interest or threats, often there is no time for detailed visual analysis. Thus, heuristic computations are necessary to locate the most interesting events in quasi real-time. We present a new theory of sensory surprise, which provides a principled and computable shortcut to important information. We develop a model that computes instantaneous low-level surprise at every location in video streams. The algorithm significantly correlates with eye movements of two humans watching complex video clips, including television programs (17,936 frames, 2,152 saccadic gaze shifts). The system allows more sophisticated and time-consuming image analysis to be efficiently focused onto the most surprising subsets of the incoming data. 1.
Additive Models, Boosting, and Inference for Generalized Divergences
- In Proc. 12th Annu. Conf. on Comput. Learning Theory
, 1999
"... We present a framework for designing incremental learning algorithms derived from generalized entropy functionals. Our approach is based on the use of Bregman divergences together with the associated class of additive models constructed using the Legendre transform. A particular one-parameter family ..."
Abstract
-
Cited by 36 (3 self)
- Add to MetaCart
We present a framework for designing incremental learning algorithms derived from generalized entropy functionals. Our approach is based on the use of Bregman divergences together with the associated class of additive models constructed using the Legendre transform. A particular one-parameter family of Bregman divergences is shown to yield a family of loss functions that includes the log-likelihood criterion of logistic regression as a special case, and that closely approximates the exponential loss criterion used in the AdaBoost algorithms of Schapire et al., as the natural parameter of the family varies. We also show how the quadratic approximation of the gain in Bregman divergence results in a weighted least-squares criterion. This leads to a family of incremental learning algorithms that builds upon and extends the recent interpretation of boosting in terms of additive models proposed by Friedman, Hastie, and Tibshirani. 1 Introduction Logistic regression is a widely used statisti...
On the toric algebra of graphical models
, 2006
"... We formulate necessary and sufficient conditions for an arbitrary discrete probability distribution to factor according to an undirected graphical model, or a log-linear model, or other more general exponential models. For decomposable graphical models these conditions are equivalent to a set of con ..."
Abstract
-
Cited by 28 (5 self)
- Add to MetaCart
We formulate necessary and sufficient conditions for an arbitrary discrete probability distribution to factor according to an undirected graphical model, or a log-linear model, or other more general exponential models. For decomposable graphical models these conditions are equivalent to a set of conditional independence statements similar to the Hammersley–Clifford theorem; however, we show that for nondecomposable graphical models they are not. We also show that nondecomposable models can have nonrational maximum likelihood estimates. These results are used to give several novel characterizations of decomposable graphical models.

