Results 1  10
of
702
Learning in graphical models
 STATISTICAL SCIENCE
, 2004
"... Statistical applications in fields such as bioinformatics, information retrieval, speech processing, image processing and communications often involve largescale models in which thousands or millions of random variables are linked in complex ways. Graphical models provide a general methodology for ..."
Abstract

Cited by 655 (10 self)
 Add to MetaCart
(Show Context)
Statistical applications in fields such as bioinformatics, information retrieval, speech processing, image processing and communications often involve largescale models in which thousands or millions of random variables are linked in complex ways. Graphical models provide a general methodology for approaching these problems, and indeed many of the models developed by researchers in these applied fields are instances of the general graphical model formalism. We review some of the basic ideas underlying graphical models, including the algorithmic ideas that allow graphical models to be deployed in largescale data analysis problems. We also present examples of graphical models in bioinformatics, errorcontrol coding and language processing.
Dynamic Bayesian Networks: Representation, Inference and Learning
, 2002
"... Modelling sequential data is important in many areas of science and engineering. Hidden Markov models (HMMs) and Kalman filter models (KFMs) are popular for this because they are simple and flexible. For example, HMMs have been used for speech recognition and biosequence analysis, and KFMs have bee ..."
Abstract

Cited by 598 (3 self)
 Add to MetaCart
Modelling sequential data is important in many areas of science and engineering. Hidden Markov models (HMMs) and Kalman filter models (KFMs) are popular for this because they are simple and flexible. For example, HMMs have been used for speech recognition and biosequence analysis, and KFMs have been used for problems ranging from tracking planes and missiles to predicting the economy. However, HMMs
and KFMs are limited in their “expressive power”. Dynamic Bayesian Networks (DBNs) generalize HMMs by allowing the state space to be represented in factored form, instead of as a single discrete random variable. DBNs generalize KFMs by allowing arbitrary probability distributions, not just (unimodal) linearGaussian. In this thesis, I will discuss how to represent many different kinds of models as DBNs, how to perform exact and approximate inference in DBNs, and how to learn DBN models from sequential data.
In particular, the main novel technical contributions of this thesis are as follows: a way of representing
Hierarchical HMMs as DBNs, which enables inference to be done in O(T) time instead of O(T 3), where T is the length of the sequence; an exact smoothing algorithm that takes O(log T) space instead of O(T); a simple way of using the junction tree algorithm for online inference in DBNs; new complexity bounds on exact online inference in DBNs; a new deterministic approximate inference algorithm called factored frontier; an analysis of the relationship between the BK algorithm and loopy belief propagation; a way of
applying RaoBlackwellised particle filtering to DBNs in general, and the SLAM (simultaneous localization
and mapping) problem in particular; a way of extending the structural EM algorithm to DBNs; and a variety of different applications of DBNs. However, perhaps the main value of the thesis is its catholic presentation of the field of sequential data modelling.
Maxmargin Markov networks
, 2003
"... In typical classification tasks, we seek a function which assigns a label to a single object. Kernelbased approaches, such as support vector machines (SVMs), which maximize the margin of confidence of the classifier, are the method of choice for many such tasks. Their popularity stems both from the ..."
Abstract

Cited by 461 (12 self)
 Add to MetaCart
In typical classification tasks, we seek a function which assigns a label to a single object. Kernelbased approaches, such as support vector machines (SVMs), which maximize the margin of confidence of the classifier, are the method of choice for many such tasks. Their popularity stems both from the ability to use highdimensional feature spaces, and from their strong theoretical guarantees. However, many realworld tasks involve sequential, spatial, or structured data, where multiple labels must be assigned. Existing kernelbased methods ignore structure in the problem, assigning labels independently to each object, losing much useful information. Conversely, probabilistic graphical models, such as Markov networks, can represent correlations between labels, by exploiting problem structure, but cannot handle highdimensional feature spaces, and lack strong theoretical generalization guarantees. In this paper, we present a new framework that combines the advantages of both approaches: Maximum margin Markov (M 3) networks incorporate both kernels, which efficiently deal with highdimensional features, and the ability to capture correlations in structured data. We present an efficient algorithm for learning M 3 networks based on a compact quadratic program formulation. We provide a new theoretical bound for generalization in structured domains. Experiments on the task of handwritten character recognition and collective hypertext classification demonstrate very significant gains over previous approaches. 1
Incorporating nonlocal information into information extraction systems by gibbs sampling
 In ACL
, 2005
"... Most current statistical natural language processing models use only local features so as to permit dynamic programming in inference, but this makes them unable to fully account for the long distance structure that is prevalent in language use. We show how to solve this dilemma with Gibbs sampling, ..."
Abstract

Cited by 425 (20 self)
 Add to MetaCart
Most current statistical natural language processing models use only local features so as to permit dynamic programming in inference, but this makes them unable to fully account for the long distance structure that is prevalent in language use. We show how to solve this dilemma with Gibbs sampling, a simple Monte Carlo method used to perform approximate inference in factored probabilistic models. By using simulated annealing in place of Viterbi decoding in sequence models such as HMMs, CMMs, and CRFs, it is possible to incorporate nonlocal structure while preserving tractable inference. We use this technique to augment an existing CRFbased information extraction system with longdistance dependency models, enforcing label consistency and extraction template consistency constraints. This technique results in an error reduction of up to 9 % over stateoftheart systems on two established information extraction tasks. 1
FeatureRich PartofSpeech Tagging with a Cyclic Dependency Network
 IN PROCEEDINGS OF HLTNAACL
, 2003
"... We present a new partofspeech tagger that demonstrates the following ideas: (i) explicit use of both preceding and following tag contexts via a dependency network representation, (ii) broad use of lexical features, including jointly conditioning on multiple consecutive words, (iii) effective ..."
Abstract

Cited by 378 (19 self)
 Add to MetaCart
We present a new partofspeech tagger that demonstrates the following ideas: (i) explicit use of both preceding and following tag contexts via a dependency network representation, (ii) broad use of lexical features, including jointly conditioning on multiple consecutive words, (iii) effective use of priors in conditional loglinear models, and (iv) finegrained modeling of unknown word features. Using these ideas together, the resulting tagger gives a 97.24% accuracy on the Penn Treebank WSJ, an error reduction of 4.4% on the best previous single automatically learned tagging result.
ModelDriven Data Acquisition in Sensor Networks
 IN VLDB
, 2004
"... Declarative queries are proving to be an attractive paradigm for interacting with networks of wireless sensors. The metaphor that "the sensornet is a database" is problematic, however, because sensors do not exhaustively represent the data in the real world. In order to map the raw sensor ..."
Abstract

Cited by 364 (32 self)
 Add to MetaCart
(Show Context)
Declarative queries are proving to be an attractive paradigm for interacting with networks of wireless sensors. The metaphor that "the sensornet is a database" is problematic, however, because sensors do not exhaustively represent the data in the real world. In order to map the raw sensor readings onto physical reality, a model of that reality is required to complement the readings. In this paper, we enrich interactive sensor querying with statistical modeling techniques. We demonstrate that such models can help provide answers that are both more meaningful, and, by introducing approximations with probabilistic confidences, significantly more efficient to compute in both time and energy. Utilizing the combination of a model and live data acquisition raises the challenging optimization problem of selecting the best sensor readings to acquire, balancing the increase in the confidence of our answer against the communication and data acquisition costs in the network. We describe an exponential time algorithm for finding the optimal solution to this optimization problem, and a polynomialtime heuristic for identifying solutions that perform well in practice. We evaluate our approach on several realworld sensornetwork data sets, taking into account the real measured data and communication quality, demonstrating that our modelbased approach provides a highfidelity representation of the real phenomena and leads to significant performance gains versus traditional data acquisition techniques.
Convergent Treereweighted Message Passing for Energy Minimization
 ACCEPTED TO IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (PAMI), 2006. ABSTRACTACCEPTED TO IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (PAMI)
, 2006
"... Algorithms for discrete energy minimization are of fundamental importance in computer vision. In this paper we focus on the recent technique proposed by Wainwright et al. [33] treereweighted maxproduct message passing (TRW). It was inspired by the problem of maximizing a lower bound on the energy ..."
Abstract

Cited by 324 (10 self)
 Add to MetaCart
(Show Context)
Algorithms for discrete energy minimization are of fundamental importance in computer vision. In this paper we focus on the recent technique proposed by Wainwright et al. [33] treereweighted maxproduct message passing (TRW). It was inspired by the problem of maximizing a lower bound on the energy. However, the algorithm is not guaranteed to increase this bound it may actually go down. In addition, TRW does not always converge. We develop a modification of this algorithm which we call sequential treereweighted message passing. Its main property is that the bound is guaranteed not to decrease. We also give a weak tree agreement condition which characterizes local maxima of the bound with respect to TRW algorithms. We prove that our algorithm has a limit point that achieves weak tree agreement. Finally, we show that our algorithm requires half as much memory as traditional message passing approaches. Experimental results demonstrate that on certain synthetic and real problems our algorithm outperforms both the ordinary belief propagation and treereweighted algorithm in [33]. In addition, on stereo problems with Potts interactions we obtain a lower energy than graph cuts.
The Bayes Net Toolbox for MATLAB
 Computing Science and Statistics
, 2001
"... The Bayes Net Toolbox (BNT) is an opensource Matlab package for directed graphical models. BNT supports many kinds of nodes (probability distributions), exact and approximate inference, parameter and structure learning, and static and dynamic models. BNT is widely used in teaching and research: the ..."
Abstract

Cited by 186 (1 self)
 Add to MetaCart
(Show Context)
The Bayes Net Toolbox (BNT) is an opensource Matlab package for directed graphical models. BNT supports many kinds of nodes (probability distributions), exact and approximate inference, parameter and structure learning, and static and dynamic models. BNT is widely used in teaching and research: the web page has received over 28,000 hits since May 2000. In this paper, we discuss a broad spectrum of issues related to graphical models (directed and undirected), and describe, at a highlevel, how BNT was designed to cope with them all. We also compare BNT to other software packages for graphical models, and to the nascent OpenBayes effort.
A New Class of Upper Bounds on the Log Partition Function
 In Uncertainty in Artificial Intelligence
, 2002
"... Bounds on the log partition function are important in a variety of contexts, including approximate inference, model fitting, decision theory, and large deviations analysis [11, 5, 4]. We introduce a new class of upper bounds on the log partition function, based on convex combinations of distribution ..."
Abstract

Cited by 163 (28 self)
 Add to MetaCart
Bounds on the log partition function are important in a variety of contexts, including approximate inference, model fitting, decision theory, and large deviations analysis [11, 5, 4]. We introduce a new class of upper bounds on the log partition function, based on convex combinations of distributions in the exponential domain, that is applicable to an arbitrary undirected graphical model. In the special case of convex combinations of treestructured distributions, we obtain a family of variational problems, similar to the Bethe free energy, but distinguished by the following desirable properties: (i) they are convex, and have a unique global minimum; and (ii) the global minimum gives an upper bound on the log partition function. The global minimum is defined by stationary conditions very similar to those defining xed points of belief propagation (BP) or treebased reparameterization [see 13, 14]. As with BP fixed points, the elements of the minimizing argument can be used as approximations to the marginals of the original model. The analysis described here can be extended to structures of higher treewidth (e.g., hypertrees), thereby making connections with more advanced approximations (e.g., Kikuchi and variants [15, 10]).
MAP estimation via agreement on trees: Messagepassing and linear programming
, 2002
"... We develop and analyze methods for computing provably optimal maximum a posteriori (MAP) configurations for a subclass of Markov random fields defined on graphs with cycles. By decomposing the original distribution into a convex combination of treestructured distributions, we obtain an upper bound ..."
Abstract

Cited by 140 (7 self)
 Add to MetaCart
(Show Context)
We develop and analyze methods for computing provably optimal maximum a posteriori (MAP) configurations for a subclass of Markov random fields defined on graphs with cycles. By decomposing the original distribution into a convex combination of treestructured distributions, we obtain an upper bound on the optimal value of the original problem (i.e., the log probability of the MAP assignment) in terms of the combined optimal values of the tree problems. We prove that this upper bound is tight if and only if all the tree distributions share an optimal configuration in common. An important implication is that any such shared configuration must also be a MAP configuration for the original distribution. Next we develop two approaches to attempting to obtain tight upper bounds: (a) a treerelaxed linear program (LP), which is derived from the Lagrangian dual of the upper bounds; and (b) a treereweighted maxproduct messagepassing algorithm that is related to but distinct from the maxproduct algorithm. In this way, we establish a connection between a certain LP relaxation of the modefinding problem, and a reweighted form of the maxproduct (minsum) messagepassing algorithm.