Results 1  10
of
25
Greedy sparsityconstrained optimization
 in Signals, Systems and Computers (ASILOMAR), 2011 Conference Record of the Forty Fifth Asilomar Conference on, IEEE, 2011
"... Abstract—Finding optimal sparse solutions to estimation problems, particularly in underdetermined regimes has recently gained much attention. Most existing literature study linear models in which the squared error is used as the measure of discrepancy to be minimized. However, in many applications d ..."
Abstract

Cited by 18 (4 self)
 Add to MetaCart
Abstract—Finding optimal sparse solutions to estimation problems, particularly in underdetermined regimes has recently gained much attention. Most existing literature study linear models in which the squared error is used as the measure of discrepancy to be minimized. However, in many applications discrepancy is measured in more general forms such as loglikelihood. Regularization by ℓ1norm has been shown to induce sparse solutions, but their sparsity level can be merely suboptimal. In this paper we present a greedy algorithm, dubbed Gradient Support Pursuit (GraSP), for sparsityconstrained optimization. Quantifiable guarantees are provided for GraSP when cost functions have the “Stable Hessian Property”. I.
Learning Mixtures of Tree Graphical Models
"... We consider unsupervised estimation of mixtures of discrete graphical models, where the class variable is hidden and each mixture component can have a potentially different Markov graph structure and parameters over the observed variables. We propose a novel method for estimating the mixture compone ..."
Abstract

Cited by 13 (1 self)
 Add to MetaCart
(Show Context)
We consider unsupervised estimation of mixtures of discrete graphical models, where the class variable is hidden and each mixture component can have a potentially different Markov graph structure and parameters over the observed variables. We propose a novel method for estimating the mixture components with provable guarantees. Our output is a treemixture model which serves as a good approximation to the underlying graphical model mixture. The sample and computational requirements for our method scale aspoly(p,r), for anrcomponent mixture ofpvariate graphical models, for a wide class of models which includes tree mixtures and mixtures over bounded degree graphs. Keywords: Graphical models, mixture models, spectral methods, tree approximation.
Highdimensional Sparse Inverse Covariance Estimation using Greedy Methods
, 2012
"... ..."
(Show Context)
Greedy Learning of Graphical Models with Small Girth
"... Abstract — This paper develops two new greedy algorithms for learning the Markov graph of discrete probability distributions, from samples thereof. For finding the neighborhood of a node (i.e. variable), the simple, naive greedy algorithm iteratively adds the new node that gives the biggest improvem ..."
Abstract

Cited by 7 (3 self)
 Add to MetaCart
(Show Context)
Abstract — This paper develops two new greedy algorithms for learning the Markov graph of discrete probability distributions, from samples thereof. For finding the neighborhood of a node (i.e. variable), the simple, naive greedy algorithm iteratively adds the new node that gives the biggest improvement in prediction performance over the existing set. While fast to implement, this can yield incorrect graphs when there are many short cycles, as now the single node that gives the best prediction can be outside the neighborhood. Our new algorithms get around this in two different ways. The forwardbackward greedy algorithm includes a deletion step, which goes back and prunes incorrect nodes that may have initially been added. The recursive greedy algorithm uses forward steps in a twolevel process, running greedy iterations in an inner loop, but only including the final node. We show, both analytically and empirically, that these algorithms can learn graphs with small girth which other algorithms both greedy, and those based on convex optimization cannot. I.
Gradient Hard Thresholding Pursuit for SparsityConstrained Optimization
"... Hard Thresholding Pursuit (HTP) is an iterative greedy selection procedure for finding sparse solutions of underdetermined linear systems. This method has been shown to have strong theoretical guarantees and impressive numerical performance. In this paper, we generalize HTP from compressed sensin ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
(Show Context)
Hard Thresholding Pursuit (HTP) is an iterative greedy selection procedure for finding sparse solutions of underdetermined linear systems. This method has been shown to have strong theoretical guarantees and impressive numerical performance. In this paper, we generalize HTP from compressed sensing to a generic problem setup of sparsityconstrained convex optimization. The proposed algorithm iterates between a standard gradient descent step and a hard truncation step with or without debiasing. We prove that our method enjoys the strong guarantees analogous to HTP in terms of rate of convergence and parameter estimation accuracy. Numerical evidences show that our method is superior to the stateoftheart greedy selection methods when applied to learning tasks of sparse logistic regression and sparse support vector machines. 1.
ForwardBackward Greedy Algorithms for General Convex Smooth Functions over A Cardinality Constraint
"... We consider forwardbackward greedy algorithms for solving sparse feature selection problems with general convex smooth functions. A stateoftheart greedy method, the ForwardBackward greedy algorithm (FoBaobj) requires to solve a large number of optimization problems, thus it is not scalable ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
(Show Context)
We consider forwardbackward greedy algorithms for solving sparse feature selection problems with general convex smooth functions. A stateoftheart greedy method, the ForwardBackward greedy algorithm (FoBaobj) requires to solve a large number of optimization problems, thus it is not scalable for largesize problems. The FoBagdt algorithm, which uses the gradient information for feature selection at each forward iteration, significantly improves the efficiency of FoBaobj. In this paper, we systematically analyze the theoretical properties of both algorithms. Our main contributions are: 1) We derive better theoretical bounds than existing analyses regarding FoBaobj for general smooth convex functions; 2) We show that FoBagdt achieves the same theoretical performance as FoBaobj under the same condition: restricted strong convexity condition. Our new bounds are consistent with the bounds of a special case (least squares) and fills a previously existing theoretical gap for general convex smooth functions; 3) We show that the restricted strong convexity condition is satisfied if the number of independent samples is more than k ̄ log d where k ̄ is the sparsity number and d is the dimension of the variable; 4) We apply FoBagdt (with the conditional random field objective) to the sensor selection problem for human indoor activity recognition and our results show that FoBagdt outperforms other methods based on forward greedy selection
Structure learning of antiferromagnetic Ising models
 In NIPS
, 2014
"... In this paper we investigate the computational complexity of learning the graph structure underlying a discrete undirected graphical model from i.i.d. samples. Our first result is an unconditional computational lower bound of (pd/2) for learning general graphical models on p nodes of maximum degree ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
(Show Context)
In this paper we investigate the computational complexity of learning the graph structure underlying a discrete undirected graphical model from i.i.d. samples. Our first result is an unconditional computational lower bound of (pd/2) for learning general graphical models on p nodes of maximum degree d, for the class of socalled statistical algorithms recently introduced by Feldman et al. [1]. The construction is related to the notoriously dicult learning parities with noise problem in computational learning theory. Our lower bound suggests that the ÂO(pd+2) runtime required by Bresler, Mossel, and Sly’s [2] exhaustivesearch algorithm cannot be significantly improved without restricting the class of models. Aside from structural assumptions on the graph such as it being a tree, hypertree, treelike, etc., many recent papers on structure learning assume that the model has the correlation decay property. Indeed, focusing on ferromagnetic Ising models, Bento and Montanari [3] showed that all known lowcomplexity algorithms fail to learn simple graphs when the interaction strength exceeds a number related to the correlation decay threshold. Our second set of results gives a class of repelling (antiferromagnetic) models that have the opposite behavior: very strong interaction allows ecient learning in time ÂO(p2). We provide an algorithm whose performance interpolates between ÂO(p2) and ÂO(pd+2) depending on the strength of the repulsion. 1
FullyAutomatic Bayesian Piecewise Sparse Linear Models
"... Piecewise linear models (PLMs) have been widely used in many enterprise machine learning problems, which assign linear experts to individual partitions on feature spaces and express whole models as patches of local experts. This paper addresses simultaneous model selection issues of PLMs; partitio ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
Piecewise linear models (PLMs) have been widely used in many enterprise machine learning problems, which assign linear experts to individual partitions on feature spaces and express whole models as patches of local experts. This paper addresses simultaneous model selection issues of PLMs; partition structure determination and feature selection of individual experts. Our contributions are mainly threefold. First, we extend factorized asymptotic Bayesian (FAB) inference for hierarchical mixtures of experts (probabilistic PLMs). FAB inference offers penalty terms w.r.t. partition and expert complexities, and enable us to resolve the model selection issue. Second, we propose posterior optimization which signicantly improves predictive accuracy. Roughly speaking, our new posterior optimization mitigates accuracy degradation due to a gap between marginal loglikelihood maximization and predictive accuracy. Third, we present an application of energy demand forecasting as well as benchmark comparisons. The experiments show our capability of acquiring compact and highlyaccurate models. 1
A junction tree framework for undirected graphical model selection
 Journal of Machine Learning Research
, 2014
"... ar ..."
Learning Structure of PowerLaw Markov Networks
"... Abstract—We consider the problem of learning the underlying graph structure of discrete Markov networks based on powerlaw graphs, generated using the configuration model. We translate the learning problem into an equivalent channel coding problem and obtain necessary conditions for solvability in t ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract—We consider the problem of learning the underlying graph structure of discrete Markov networks based on powerlaw graphs, generated using the configuration model. We translate the learning problem into an equivalent channel coding problem and obtain necessary conditions for solvability in terms of problem parameters. In particular, we relate the exponent of the powerlaw graph to the hardness of the learning problem, and show that more number of samples are required for exact recovery of discrete powerlaw Markov graphs with small exponent values. We develop an efficient learning algorithm for accurate reconstruction of graph structure of Ising model on powerlaw graphs. Finally, we show that orderwise optimal number of samples suffice for recovering the exact graph under certain constraints on Ising model parameters and scalings of node degrees. Index Terms—Markov network, powerlaw graph, Ising model I.