Results 1  10
of
24
Object Bank: A HighLevel Image Representation for Scene Classification & Semantic Feature Sparsification
"... Robust lowlevel image features have been proven to be effective representations for a variety of visual recognition tasks such as object recognition and scene classification; but pixels, or even local image patches, carry little semantic meanings. For high level visual tasks, such lowlevel image r ..."
Abstract

Cited by 69 (1 self)
 Add to MetaCart
Robust lowlevel image features have been proven to be effective representations for a variety of visual recognition tasks such as object recognition and scene classification; but pixels, or even local image patches, carry little semantic meanings. For high level visual tasks, such lowlevel image representations are potentially not enough. In this paper, we propose a highlevel image representation, called the Object Bank, where an image is represented as a scaleinvariant response map of a large number of pretrained generic object detectors, blind to the testing dataset or visual task. Leveraging on the Object Bank representation, superior performances on high level visual recognition tasks can be achieved with simple offtheshelf classifiers such as logistic regression and linear SVM. Sparsity algorithms make our representation more efficient and scalable for large scene datasets, and reveal semantically meaningful feature patterns. 1
BottomUp Learning of Markov Network Structure
"... The structure of a Markov network is typically learned using topdown search. At each step, the search specializes a feature by conjoining it to the variable or feature that most improves the score. This is inefficient, testing many feature variations with no support in the data, and highly prone to ..."
Abstract

Cited by 9 (5 self)
 Add to MetaCart
The structure of a Markov network is typically learned using topdown search. At each step, the search specializes a feature by conjoining it to the variable or feature that most improves the score. This is inefficient, testing many feature variations with no support in the data, and highly prone to local optima. We propose bottomup search as an alternative, inspired by the analogous approach in the field of rule induction. Our BLM algorithm starts with each complete training example as a long feature, and repeatedly generalizes a feature to match its k nearest examples by dropping variables. An extensive empirical evaluation demonstrates that BLM is both faster and more accurate than the standard topdown approach, and also outperforms other stateoftheart methods. 1.
Learning Markov Network Structure with Decision Trees
"... Abstract—Traditional Markov network structure learning algorithms perform a search for globally useful features. However, these algorithms are often slow and prone to finding local optima due to the large space of possible structures. Ravikumar et al. [1] recently proposed the alternative idea of ap ..."
Abstract

Cited by 7 (5 self)
 Add to MetaCart
Abstract—Traditional Markov network structure learning algorithms perform a search for globally useful features. However, these algorithms are often slow and prone to finding local optima due to the large space of possible structures. Ravikumar et al. [1] recently proposed the alternative idea of applying L1 logistic regression to learn a set of pairwise features for each variable, which are then combined into a global model. This paper presents the DTSL algorithm, which uses probabilistic decision trees as the local model. Our approach has two significant advantages: it is more efficient, and it is able to discover features that capture more complex interactions among the variables. Our approach can also be seen as a method for converting a dependency network into a consistent probabilistic model. In an extensive empirical evaluation on 13 datasets, our algorithm obtains comparable accuracy to three standard structure learning algorithms while running 14 orders of magnitude faster. KeywordsMarkov networks; structure learning; decision trees; probabilistic methods I.
Which graphical models are difficult to learn?
"... We consider the problem of learning the structure of Ising models (pairwise binary Markov random fields) from i.i.d. samples. While several methods have been proposed to accomplish this task, their relative merits and limitations remain somewhat obscure. By analyzing a number of concrete examples, w ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
We consider the problem of learning the structure of Ising models (pairwise binary Markov random fields) from i.i.d. samples. While several methods have been proposed to accomplish this task, their relative merits and limitations remain somewhat obscure. By analyzing a number of concrete examples, we show that lowcomplexity algorithms systematically fail when the Markov random field develops longrange correlations. More precisely, this phenomenon appears to be related to the Ising model phase transition (although it does not coincide with it). 1 Introduction and main results Given a graph G = (V = [p], E), and a positive parameter θ> 0 the ferromagnetic Ising model on G is the pairwise Markov random field µG,θ(x) = 1 ∏
Learning Efficient Markov Networks
"... We present an algorithm for learning hightreewidth Markov networks where inference is still tractable. This is made possible by exploiting contextspecific independence and determinism in the domain. The class of models our algorithm can learn has the same desirable properties as thin junction tree ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
We present an algorithm for learning hightreewidth Markov networks where inference is still tractable. This is made possible by exploiting contextspecific independence and determinism in the domain. The class of models our algorithm can learn has the same desirable properties as thin junction trees: polynomial inference, closedform weight learning, etc., but is much broader. Our algorithm searches for a feature that divides the state space into subspaces where the remaining variables decompose into independent subsets (conditioned on the feature and its negation) and recurses on each subspace/subset of variables until no useful new features can be found. We provide probabilistic performance guarantees for our algorithm under the assumption that the maximum feature length is bounded by a constant k (the treewidth can be much larger) and dependences are of bounded strength. We also propose a greedy version of the algorithm that, while forgoing these guarantees, is much more efficient. Experiments on a variety of domains show that our approach outperforms many stateoftheart Markov network structure learners. 1
Learning Exponential Families in HighDimensions:
"... The versatility of exponential families, along with their attendant convexity properties, make them a popular and effective statistical model. A central issue is learning these models in highdimensions when the optimal parameter vector is sparse. This work characterizes a certain strong convexity p ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
The versatility of exponential families, along with their attendant convexity properties, make them a popular and effective statistical model. A central issue is learning these models in highdimensions when the optimal parameter vector is sparse. This work characterizes a certain strong convexity property of general exponential families, which allows their generalization ability to be quantified. In particular, we show how this property can be used to analyze generic exponential families under L1 regularization. 1
Learning HigherOrder Graph Structure with Features by Structure Penalty
"... In discrete undirected graphical models, the conditional independence of node labels Y is specified by the graph structure. We study the case where there is another input random vector X (e.g. observed features) such that the distribution P(Y  X) is determined by functions of X that characterize th ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
In discrete undirected graphical models, the conditional independence of node labels Y is specified by the graph structure. We study the case where there is another input random vector X (e.g. observed features) such that the distribution P(Y  X) is determined by functions of X that characterize the (higherorder) interactions among the Y ’s. The main contribution of this paper is to learn the graph structure and the functions conditioned on X at the same time. We prove that discrete undirected graphical models with feature X are equivalent to multivariate discrete models. The reparameterization of the potential functions in graphical models by conditional log odds ratios of the latter offers advantages in representation of the conditional independence structure. The functional spaces can be flexibly determined by kernels. Additionally, we impose a Structure Lasso (SLasso) penalty on groups of functions to learn the graph structure. These groups with overlaps are designed to enforce hierarchical function selection. In this way, we are able to shrink higher order interactions to obtain a sparse graph structure. 1
Learning Networks of Stochastic Differential Equations
"... We consider linear models for stochastic dynamics. To any such model can be associated a network (namely a directed graph) describing which degrees of freedom interact under the dynamics. We tackle the problem of learning such a network from observation of the system trajectory over a time interval ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
We consider linear models for stochastic dynamics. To any such model can be associated a network (namely a directed graph) describing which degrees of freedom interact under the dynamics. We tackle the problem of learning such a network from observation of the system trajectory over a time interval T. We analyze theℓ1regularized least squares algorithm and, in the setting in which the underlying network is sparse, we prove performance guarantees that are uniform in the sampling rate as long as this is sufficiently high. This result substantiates the notion of a well defined ‘time complexity ’ for the network inference problem.
Learning Scale Free Networks by Reweighted ℓ1 regularization
"... Methods for ℓ1type regularization have been widely used in Gaussian graphical model selection tasks to encourage sparse structures. However, often we would like to include more structural information than mere sparsity. In this work, we focus on learning socalled “scalefree ” models, a common fea ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
Methods for ℓ1type regularization have been widely used in Gaussian graphical model selection tasks to encourage sparse structures. However, often we would like to include more structural information than mere sparsity. In this work, we focus on learning socalled “scalefree ” models, a common feature that appears in many realwork networks. We replace the ℓ1 regularization with a power law regularization and optimize the objective function by a sequence of iteratively reweighted ℓ1 regularization problems, where the regularization coefficients of nodes with high degree are reduced, encouraging the appearance of hubs with high degree. Our method can be easily adapted to improve any existing ℓ1based methods, such as graphical lasso, neighborhood selection, and JSRM when the underlying networks are believed to be scale free or have dominating hubs. We demonstrate in simulation that our method significantly outperforms the a baseline ℓ1 method at learning scalefree networks and hub networks, and also illustrate its behavior on gene expression data. 1
Multivariate Bernoulli Distribution Models
, 2012
"... First and most importantly, I would like to express my deepest gratitude toward my advisor Professor Grace Wahba. Her guidance and encouragement through my PhD study into various statistical machine learning methods is the key factor to the success of this dissertation. Grace is a brilliant and pass ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
First and most importantly, I would like to express my deepest gratitude toward my advisor Professor Grace Wahba. Her guidance and encouragement through my PhD study into various statistical machine learning methods is the key factor to the success of this dissertation. Grace is a brilliant and passionate statistician, and her insightful ideas in both statistical theories and applications inspire me. It is a great honor and privilege to have the opportunity to work closely and learn from her. This work is also the product of collaboration with a number of researchers. In particular, I would like to thank Professor Stephen Wright from Department of Computer Science for his guidance in computation. Without him, the proposed models in this thesis would not be solved with efficient optimization techniques. In addition, I am grateful to other professors in my thesis committee. I benefit from Professor Sündüz Keles ’ expertise in biostatistics and her valuable ideas in the Thursday group. On the other hand, Professor Peter Qian and Sijian Wang raised questions with deep perception and helped greatly improve the thesis. I am also greatly influenced by Professor Karl Rohe and Xinwei Deng for their improving suggestion to this work. ii I want to thank Xiwen Ma and Shilin Ding for their effort on our collaborative