Results 1  10
of
31
Object Bank: A HighLevel Image Representation for Scene Classification & Semantic Feature Sparsification
"... Robust lowlevel image features have been proven to be effective representations for a variety of visual recognition tasks such as object recognition and scene classification; but pixels, or even local image patches, carry little semantic meanings. For high level visual tasks, such lowlevel image r ..."
Abstract

Cited by 82 (1 self)
 Add to MetaCart
(Show Context)
Robust lowlevel image features have been proven to be effective representations for a variety of visual recognition tasks such as object recognition and scene classification; but pixels, or even local image patches, carry little semantic meanings. For high level visual tasks, such lowlevel image representations are potentially not enough. In this paper, we propose a highlevel image representation, called the Object Bank, where an image is represented as a scaleinvariant response map of a large number of pretrained generic object detectors, blind to the testing dataset or visual task. Leveraging on the Object Bank representation, superior performances on high level visual recognition tasks can be achieved with simple offtheshelf classifiers such as logistic regression and linear SVM. Sparsity algorithms make our representation more efficient and scalable for large scene datasets, and reveal semantically meaningful feature patterns. 1
BottomUp Learning of Markov Network Structure
"... The structure of a Markov network is typically learned using topdown search. At each step, the search specializes a feature by conjoining it to the variable or feature that most improves the score. This is inefficient, testing many feature variations with no support in the data, and highly prone to ..."
Abstract

Cited by 10 (5 self)
 Add to MetaCart
The structure of a Markov network is typically learned using topdown search. At each step, the search specializes a feature by conjoining it to the variable or feature that most improves the score. This is inefficient, testing many feature variations with no support in the data, and highly prone to local optima. We propose bottomup search as an alternative, inspired by the analogous approach in the field of rule induction. Our BLM algorithm starts with each complete training example as a long feature, and repeatedly generalizes a feature to match its k nearest examples by dropping variables. An extensive empirical evaluation demonstrates that BLM is both faster and more accurate than the standard topdown approach, and also outperforms other stateoftheart methods. 1.
Learning Markov Network Structure with Decision Trees
"... Abstract—Traditional Markov network structure learning algorithms perform a search for globally useful features. However, these algorithms are often slow and prone to finding local optima due to the large space of possible structures. Ravikumar et al. [1] recently proposed the alternative idea of ap ..."
Abstract

Cited by 8 (5 self)
 Add to MetaCart
(Show Context)
Abstract—Traditional Markov network structure learning algorithms perform a search for globally useful features. However, these algorithms are often slow and prone to finding local optima due to the large space of possible structures. Ravikumar et al. [1] recently proposed the alternative idea of applying L1 logistic regression to learn a set of pairwise features for each variable, which are then combined into a global model. This paper presents the DTSL algorithm, which uses probabilistic decision trees as the local model. Our approach has two significant advantages: it is more efficient, and it is able to discover features that capture more complex interactions among the variables. Our approach can also be seen as a method for converting a dependency network into a consistent probabilistic model. In an extensive empirical evaluation on 13 datasets, our algorithm obtains comparable accuracy to three standard structure learning algorithms while running 14 orders of magnitude faster. KeywordsMarkov networks; structure learning; decision trees; probabilistic methods I.
Learning Efficient Markov Networks
"... We present an algorithm for learning hightreewidth Markov networks where inference is still tractable. This is made possible by exploiting contextspecific independence and determinism in the domain. The class of models our algorithm can learn has the same desirable properties as thin junction tree ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
(Show Context)
We present an algorithm for learning hightreewidth Markov networks where inference is still tractable. This is made possible by exploiting contextspecific independence and determinism in the domain. The class of models our algorithm can learn has the same desirable properties as thin junction trees: polynomial inference, closedform weight learning, etc., but is much broader. Our algorithm searches for a feature that divides the state space into subspaces where the remaining variables decompose into independent subsets (conditioned on the feature and its negation) and recurses on each subspace/subset of variables until no useful new features can be found. We provide probabilistic performance guarantees for our algorithm under the assumption that the maximum feature length is bounded by a constant k (the treewidth can be much larger) and dependences are of bounded strength. We also propose a greedy version of the algorithm that, while forgoing these guarantees, is much more efficient. Experiments on a variety of domains show that our approach outperforms many stateoftheart Markov network structure learners. 1
Which graphical models are difficult to learn?
"... We consider the problem of learning the structure of Ising models (pairwise binary Markov random fields) from i.i.d. samples. While several methods have been proposed to accomplish this task, their relative merits and limitations remain somewhat obscure. By analyzing a number of concrete examples, w ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
(Show Context)
We consider the problem of learning the structure of Ising models (pairwise binary Markov random fields) from i.i.d. samples. While several methods have been proposed to accomplish this task, their relative merits and limitations remain somewhat obscure. By analyzing a number of concrete examples, we show that lowcomplexity algorithms systematically fail when the Markov random field develops longrange correlations. More precisely, this phenomenon appears to be related to the Ising model phase transition (although it does not coincide with it). 1 Introduction and main results Given a graph G = (V = [p], E), and a positive parameter θ> 0 the ferromagnetic Ising model on G is the pairwise Markov random field µG,θ(x) = 1 ∏
Learning Exponential Families in HighDimensions:
"... The versatility of exponential families, along with their attendant convexity properties, make them a popular and effective statistical model. A central issue is learning these models in highdimensions when the optimal parameter vector is sparse. This work characterizes a certain strong convexity p ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
The versatility of exponential families, along with their attendant convexity properties, make them a popular and effective statistical model. A central issue is learning these models in highdimensions when the optimal parameter vector is sparse. This work characterizes a certain strong convexity property of general exponential families, which allows their generalization ability to be quantified. In particular, we show how this property can be used to analyze generic exponential families under L1 regularization. 1
Learning HigherOrder Graph Structure with Features by Structure Penalty
"... In discrete undirected graphical models, the conditional independence of node labels Y is specified by the graph structure. We study the case where there is another input random vector X (e.g. observed features) such that the distribution P(Y  X) is determined by functions of X that characterize th ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
(Show Context)
In discrete undirected graphical models, the conditional independence of node labels Y is specified by the graph structure. We study the case where there is another input random vector X (e.g. observed features) such that the distribution P(Y  X) is determined by functions of X that characterize the (higherorder) interactions among the Y ’s. The main contribution of this paper is to learn the graph structure and the functions conditioned on X at the same time. We prove that discrete undirected graphical models with feature X are equivalent to multivariate discrete models. The reparameterization of the potential functions in graphical models by conditional log odds ratios of the latter offers advantages in representation of the conditional independence structure. The functional spaces can be flexibly determined by kernels. Additionally, we impose a Structure Lasso (SLasso) penalty on groups of functions to learn the graph structure. These groups with overlaps are designed to enforce hierarchical function selection. In this way, we are able to shrink higher order interactions to obtain a sparse graph structure. 1
Distributed Parameter Estimation via Pseudolikelihood
"... Estimating statistical models within sensor networks requires distributed algorithms, in which both data and computation are distributed across the nodes of the network. We propose a general approach for distributed learning based on combining local estimators defined by pseudolikelihood components ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
Estimating statistical models within sensor networks requires distributed algorithms, in which both data and computation are distributed across the nodes of the network. We propose a general approach for distributed learning based on combining local estimators defined by pseudolikelihood components, encompassing a number of combination methods, and provide both theoretical and experimental analysis. We show that simple linear combination or maxvoting methods, when combined with secondorder information, are statistically competitive with more advanced and costly joint optimization. Our algorithms have many attractive properties including low communication and computational cost and “anytime ” behavior. 1.
Learning Markov Networks With Arithmetic Circuits
"... Markov networks are an effective way to represent complex probability distributions. However, learning their structure and parameters or using them to answer queries is typically intractable. One approach to making learning and inference tractable is to use approximations, such as pseudolikelihood ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
(Show Context)
Markov networks are an effective way to represent complex probability distributions. However, learning their structure and parameters or using them to answer queries is typically intractable. One approach to making learning and inference tractable is to use approximations, such as pseudolikelihood or approximate inference. An alternate approach is to use a restricted class of models where exact inference is always efficient. Previous work has explored low treewidth models, models with treestructured features, and latent variable models. In this paper, we introduce ACMN, the first ever method for learning efficient Markov networks with arbitrary conjunctive features. The secret to ACMN’s greater flexibility is its use of arithmetic circuits, a lineartime inference representation that can handle many high treewidth models by exploiting local structure. ACMN uses the size of the corresponding arithmetic circuit as a learning bias, allowing it to trade off accuracy and inference complexity. In experiments on 12 standard datasets, the tractable models learned by ACMN are more accurate than both tractable models learned by other algorithms and approximate inference in intractable models. 1
Learning Mixtures of Tree Graphical Models
"... We consider unsupervised estimation of mixtures of discrete graphical models, where the class variable is hidden and each mixture component can have a potentially different Markov graph structure and parameters over the observed variables. We propose a novel method for estimating the mixture compone ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
(Show Context)
We consider unsupervised estimation of mixtures of discrete graphical models, where the class variable is hidden and each mixture component can have a potentially different Markov graph structure and parameters over the observed variables. We propose a novel method for estimating the mixture components with provable guarantees. Our output is a treemixture model which serves as a good approximation to the underlying graphical model mixture. The sample and computational requirements for our method scale aspoly(p,r), for anrcomponent mixture ofpvariate graphical models, for a wide class of models which includes tree mixtures and mixtures over bounded degree graphs. Keywords: Graphical models, mixture models, spectral methods, tree approximation.