A New Class of Upper Bounds on the Log Partition Function
 In Uncertainty in Artificial Intelligence
, 2002
Cited by 220 (33 self)
Bounds on the log partition function are important in a variety of contexts, including approximate inference, model fitting, decision theory, and large deviations analysis [11, 5, 4]. We introduce a new class of upper bounds on the log partition function, based on convex combinations of distributions in the exponential domain, that is applicable to an arbitrary undirected graphical model. In the special case of convex combinations of treestructured distributions, we obtain a family of variational problems, similar to the Bethe free energy, but distinguished by the following desirable properties: (i) they are convex, and have a unique global minimum; and (ii) the global minimum gives an upper bound on the log partition function. The global minimum is defined by stationary conditions very similar to those defining xed points of belief propagation (BP) or treebased reparameterization [see 13, 14]. As with BP fixed points, the elements of the minimizing argument can be used as approximations to the marginals of the original model. The analysis described here can be extended to structures of higher treewidth (e.g., hypertrees), thereby making connections with more advanced approximations (e.g., Kikuchi and variants [15, 10]).
Machine recognition of human activities: A survey
, 2008
Cited by 213 (0 self)
The past decade has witnessed a rapid proliferation of video cameras in all walks of life and has resulted in a tremendous explosion of video content. Several applications such as contentbased video annotation and retrieval, highlight extraction and video summarization require recognition of the activities occurring in the video. The analysis of human activities in videos is an area with increasingly important consequences from security and surveillance to entertainment and personal archiving. Several challenges at various levels of processing—robustness against errors in lowlevel processing, view and rateinvariant representations at midlevel processing and semantic representation of human activities at higher level processing—make this problem hard to solve. In this review paper, we present a comprehensive survey of efforts in the past couple of decades to address the problems of representation, recognition, and learning of human activities from video and related applications. We discuss the problem at two major levels of complexity: 1) “actions ” and 2) “activities. ” “Actions ” are characterized by simple motion patterns typically executed by a single human. “Activities ” are more complex and involve coordinated actions among a small number of humans. We will discuss several approaches and classify them according to their ability to handle varying degrees of complexity as interpreted above. We begin with a discussion of approaches to model the simplest of action classes known as atomic or primitive actions that do not require sophisticated dynamical modeling. Then, methods to model actions with more complex dynamics are discussed. The discussion then leads naturally to methods for higher level representation of complex activities.
Pictorial structures revisited: People detection and articulated pose estimation
 In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2009
, 2009
Cited by 210 (17 self)
Nonrigid object detection and articulated pose estimation are two related and challenging problems in computer vision. Numerous models have been proposed over the years and often address different special cases, such as pedestrian detection or upper body pose estimation in TV footage. This paper shows that such specialization may not be necessary, and proposes a generic approach based on the pictorial structures framework. We show that the right selection of components for both appearance and spatial modeling is crucial for general applicability and overall performance of the model. The appearance of body parts is modeled using densely sampled shape context descriptors and discriminatively trained AdaBoost classifiers. Furthermore, we interpret the normalized margin of each classifier as likelihood in a generative model. NonGaussian relationships between parts are represented as Gaussians in the coordinate system of the joint between parts. The marginal posterior of each part is inferred using belief propagation. We demonstrate that such a model is equally suitable for both detection and pose estimation tasks, outperforming the state of the art on three recently proposed datasets. 1. Introduction and Related
An Introduction to Factor Graphs
 IEEE SIGNAL PROCESSING MAG., JAN. 2004
, 2004
Cited by 197 (36 self)
A large variety of algorithms in coding, signal processing, and artificial intelligence may be viewed as instances of the summaryproduct algorithm (or belief/probability
Regular and Irregular Progressive EdgeGrowth Tanner Graphs
 IEEE TRANS. INFORM. THEORY
, 2003
Cited by 192 (0 self)
We propose a general method for constructing Tanner graphs having a large girth by progressively establishing edges or connections between symbol and check nodes in an edgebyedge manner, called progressive edgegrowth (PEG) construction. Lower bounds on the girth of PEG Tanner graphs and on the minimum distance of the resulting lowdensity paritycheck (LDPC) codes are derived in terms of parameters of the graphs. The PEG construction attains essentially the same girth as Gallager's explicit construction for regular graphs, both of which meet or exceed the ErdosSachs bound. Asymptotic analysis of a relaxed version of the PEG construction is presented. We describe an empirical approach using a variant of the "downhill simplex" search algorithm to design irregular PEG graphs for short codes with fewer than a thousand of bits, complementing the design approach of "density evolution" for larger codes. Encoding of LDPC codes based on the PEG construction is also investigated. We show how to exploit the PEG principle to obtain LDPC codes that allow linear time encoding. We also investigate regular and irregular LDPC codes using PEG Tanner graphs but allowing the symbol nodes to take values over GF(q), q > 2. Analysis and simulation demonstrate that one can obtain better performance with increasing field size, which contrasts with previous observations.
Combining topdown and bottomup segmentation
 In Proceedings IEEE workshop on Perceptual Organization in Computer Vision, CVPR
, 2004
Cited by 190 (2 self)
In this work we show how to combine bottomup and topdown approaches into a single figureground segmentation process. This process provides accurate delineation of object boundaries that cannot be achieved by either the topdown or bottomup approach alone. The topdown approach uses object representation learned from examples to detect an object in a given input image and provide an approximation to its figureground segmentation. The bottomup approach uses imagebased criteria to define coherent groups of pixels that are likely to belong together to either the figure or the background part. The combination provides a final segmentation that draws on the relative merits of both approaches: The result is as close as possible to the topdown approximation, but is also constrained by the bottomup process to be consistent with significant image discontinuities. We construct a global cost function that represents these topdown and bottomup requirements. We then show how the global minimum of this function can be efficiently found by applying the sumproduct algorithm. This algorithm also provides a confidence map that can be used to identify image regions where additional topdown or bottomup information may further improve the segmentation. Our experiments show that the results derived from the algorithm are superior to results given by a pure topdown or pure bottomup approach. The scheme has broad applicability, enabling the combined use of a range of existing bottomup and topdown segmentations. 1.
Lowdensity paritycheck codes based on finite geometries: A rediscovery and new results
 IEEE Trans. Inform. Theory
, 2001
Cited by 182 (7 self)
This paper presents a geometric approach to the construction of lowdensity paritycheck (LDPC) codes. Four classes of LDPC codes are constructed based on the lines and points of Euclidean and projective geometries over finite fields. Codes of these four classes have good minimum distances and their Tanner graphs have girth T. Finitegeometry LDPC codes can be decoded in various ways, ranging from low to high decoding complexity and from reasonably good to very good performance. They perform very well with iterative decoding. Furthermore, they can be put in either cyclic or quasicyclic form. Consequently, their encoding can be achieved in linear time and implemented with simple feedback shift registers. This advantage is not shared by other LDPC codes in general and is important in practice. Finitegeometry LDPC codes can be extended and shortened in various ways to obtain other good LDPC codes. Several techniques of extension and shortening are presented. Long extended finitegeometry LDPC codes have been constructed and they achieve a performance only a few tenths of a decibel away from the Shannon theoretical limit with iterative decoding.
A Scalable Method for Multiagent Constraint Optimization
Cited by 179 (18 self)
We present in this paper a new, complete method for distributed constraint optimization, based on dynamic programming. It is a utility propagation method, inspired by the sumproduct algorithm, which is correct only for treeshaped constraint networks. In this paper, we show how to extend that algorithm to arbitrary topologies using a pseudotree arrangement of the problem graph. Our algorithm requires a linear number of messages, whose maximal size depends on the induced width along the particular pseudotree chosen. We compare our algorithm with backtracking algorithms, and present experimental results. For some problem types we report orders of magnitude fewer messages, and the ability to deal with arbitrarily large problems. Our algorithm is formulated for optimization problems, but can be easily applied to satisfaction problems as well.
Collective classification in network data
, 2008
Cited by 174 (33 self)
Numerous realworld applications produce networked data such as web data (hypertext documents connected via hyperlinks) and communication networks (people connected via communication links). A recent focus in machine learning research has been to extend traditional machine learning classification techniques to classify nodes in such data. In this report, we attempt to provide a brief introduction to this area of research and how it has progressed during the past decade. We introduce four of the most widely used inference algorithms for classifying networked data and empirically compare them on both synthetic and realworld data.
Square Root SAM: Simultaneous localization and mapping via square root information smoothing
 International Journal of Robotics Reasearch
, 2006
Cited by 144 (39 self)
Solving the SLAM problem is one way to enable a robot to explore, map, and navigate in a previously unknown environment. We investigate smoothing approaches as a viable alternative to extended Kalman filterbased solutions to the problem. In particular, we look at approaches that factorize either the associated information matrix or the measurement Jacobian into square root form. Such techniques have several significant advantages over the EKF: they are faster yet exact, they can be used in either batch or incremental mode, are better equipped to deal with nonlinear process and measurement models, and yield the entire robot trajectory, at lower cost for a large class of SLAM problems. In addition, in an indirect but dramatic way, column ordering heuristics automatically exploit the locality inherent in the geographic nature of the SLAM problem. In this paper we present the theory underlying these methods, along with an interpretation of factorization in terms of the graphical model associated with the SLAM problem. We present both simulation results and actual SLAM experiments in largescale environments that underscore the potential of these methods as an alternative to EKFbased approaches. 1