Results 1  10
of
20
Discriminative Structure and Parameter Learning for Markov Logic Networks
"... Markov logic networks (MLNs) are an expressive representation for statistical relational learning that generalizes both firstorder logic and graphical models. Existing methods for learning the logical structure of an MLN are not discriminative; however, many relational learning problems involve spe ..."
Abstract

Cited by 36 (5 self)
 Add to MetaCart
Markov logic networks (MLNs) are an expressive representation for statistical relational learning that generalizes both firstorder logic and graphical models. Existing methods for learning the logical structure of an MLN are not discriminative; however, many relational learning problems involve specific target predicates that must be inferred from given background information. We found that existing MLN methods perform very poorly on several such ILP benchmark problems, and we present improved discriminative methods for learning MLN clauses and weights that outperform existing MLN and traditional ILP methods. 1.
Boosting with Structural Sparsity
"... We derive generalizations of AdaBoost and related gradientbased coordinate descent methods that incorporate sparsitypromoting penalties for the norm of the predictor that is being learned. The end result is a family of coordinate descent algorithms that integrate forward feature induction and back ..."
Abstract

Cited by 13 (1 self)
 Add to MetaCart
We derive generalizations of AdaBoost and related gradientbased coordinate descent methods that incorporate sparsitypromoting penalties for the norm of the predictor that is being learned. The end result is a family of coordinate descent algorithms that integrate forward feature induction and backpruning through regularization and give an automatic stopping criterion for feature induction. We study penalties based on the ℓ1, ℓ2, and ℓ ∞ norms of the predictor and introduce mixednorm penalties that build upon the initial penalties. The mixednorm regularizers facilitate structural sparsity in parameter space, which is a useful property in multiclass prediction and other related tasks. We report empirical results that demonstrate the power of our approach in building accurate and structurally sparse models. 1. Introduction and
Maximum Entropy Discrimination Markov Networks
, 2008
"... Standard maxmargin structured prediction methods concentrate directly on the inputoutput mapping, and the lack of an elegant probabilistic interpretation causes limitations. In this paper, we present a novel framework called Maximum Entropy Discrimination Markov Networks (MaxEntNet) to do Bayesian ..."
Abstract

Cited by 11 (6 self)
 Add to MetaCart
Standard maxmargin structured prediction methods concentrate directly on the inputoutput mapping, and the lack of an elegant probabilistic interpretation causes limitations. In this paper, we present a novel framework called Maximum Entropy Discrimination Markov Networks (MaxEntNet) to do Bayesian maxmargin structured learning by using expected margin constraints to define a feasible distribution subspace and applying the maximum entropy principle to choose the best distribution from this subspace. We show that MaxEntNet subsumes the standard maxmargin Markov networks (M 3 N) as a spacial case where the predictive model is assumed to be linear and the parameter prior is a standard normal. Based on this understanding, we propose the Laplace maxmargin Markov networks (LapM 3 N) which use the Laplace prior instead of the standard normal. We show that the adoption of a Laplace prior of the parameter makes LapM 3 N enjoy properties expected from a sparsified M 3 N. Unlike the L1regularized maximum likelihood estimation which sets small weights to zeros to achieve sparsity, LapM 3 N posteriorly weights the parameters and features with smaller weights are shrunk more. This posterior weighting effect makes LapM 3 N more stable with respect to the magnitudes of the regularization coefficients and more generalizable. To
PACBayesian Analysis of Coclustering and Beyond
"... We derive PACBayesian generalization bounds for supervised and unsupervised learning models based on clustering, such as coclustering, matrix trifactorization, graphical models, graph clustering, and pairwise clustering. 1 We begin with the analysis of coclustering, which is a widely used approa ..."
Abstract

Cited by 10 (5 self)
 Add to MetaCart
We derive PACBayesian generalization bounds for supervised and unsupervised learning models based on clustering, such as coclustering, matrix trifactorization, graphical models, graph clustering, and pairwise clustering. 1 We begin with the analysis of coclustering, which is a widely used approach to the analysis of data matrices. We distinguish among two tasks in matrix data analysis: discriminative prediction of the missing entries in data matrices and estimation of the joint probability distribution of row and column variables in cooccurrence matrices. We derive PACBayesian generalization bounds for the expected outofsample performance of coclusteringbased solutions for these two tasks. The analysis yields regularization terms that were absent in the previous formulations of coclustering. The bounds suggest that the expected performance of coclustering is governed by a tradeoff between its empirical performance and the mutual information preserved by the cluster variables on row and column IDs. We derive an iterative projection algorithm for finding a local optimum of this tradeoff for discriminative prediction tasks. This algorithm achieved stateoftheart performance in the MovieLens collaborative filtering task. Our coclustering model can also be seen as matrix trifactorization and the results provide generalization bounds, regularization
Partially Observed Maximum Entropy Discrimination Markov Networks
"... Learning graphical models with hidden variables can offer semantic insights to complex data and lead to salient structured predictors without relying on expensive, sometime unattainable fully annotated training data. While likelihoodbased methods have been extensively explored, to our knowledge, le ..."
Abstract

Cited by 9 (4 self)
 Add to MetaCart
Learning graphical models with hidden variables can offer semantic insights to complex data and lead to salient structured predictors without relying on expensive, sometime unattainable fully annotated training data. While likelihoodbased methods have been extensively explored, to our knowledge, learning structured prediction models with latent variables based on the maxmargin principle remains largely an open problem. In this paper, we present a partially observed Maximum Entropy Discrimination Markov Network (PoMEN) model that attempts to combine the advantages of Bayesian and margin based paradigms for learning Markov networks from partially labeled data. PoMEN leads to an averaging prediction rule that resembles a Bayes predictor that is more robust to overfitting, but is also built on the desirable discriminative laws resemble those of the M 3 N. We develop an EMstyle algorithm utilizing existing convex optimization algorithms for M 3 N as a subroutine. We demonstrate competent performance of PoMEN over existing methods on a realworld web data extraction task. 1
Lifted coordinate descent for learning with tracenorm regularization
 AISTATS
, 2012
"... We consider the minimization of a smooth loss with tracenorm regularization, which is a natural objective in multiclass and multitask learning. Even though the problem is convex, existing approaches rely on optimizing a nonconvex variational bound, which is not guaranteed to converge, or repeated ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
We consider the minimization of a smooth loss with tracenorm regularization, which is a natural objective in multiclass and multitask learning. Even though the problem is convex, existing approaches rely on optimizing a nonconvex variational bound, which is not guaranteed to converge, or repeatedly perform singularvalue decomposition, which prevents scaling beyond moderate matrix sizes. We lift the nonsmooth convex problem into an infinitely dimensional smooth problem and apply coordinate descent to solve it. We prove that our approach converges to the optimum, and is competitive or outperforms state of the art. 1
Learning from the Wisdom of Crowds by Minimax Entropy
"... An important way to make large training sets is to gather noisy labels from crowds of nonexperts. We propose a minimax entropy principle to improve the quality of these labels. Our method assumes that labels are generated by a probability distribution over workers, items, and labels. By maximizing t ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
An important way to make large training sets is to gather noisy labels from crowds of nonexperts. We propose a minimax entropy principle to improve the quality of these labels. Our method assumes that labels are generated by a probability distribution over workers, items, and labels. By maximizing the entropy of this distribution, the method naturally infers item confusability and worker expertise. We infer the ground truth by minimizing the entropy of this distribution, which we show minimizes the KullbackLeibler (KL) divergence between the probability distribution and the unknown truth. We show that a simple coordinate descent scheme can optimize minimax entropy. Empirically, our results are substantially better than previously published methods for the same problem. 1
Markov Networks
, 2008
"... * * To whom correspondence should be addressed. Keywords: Maximum entropy discrimination Markov networks, Bayesian maxmargin ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
* * To whom correspondence should be addressed. Keywords: Maximum entropy discrimination Markov networks, Bayesian maxmargin
Learning Symmetric Relational Markov Random Fields
, 2007
"... Relational Markov Random Fields (rMRF’s) are a general and flexible framework for reasoning about the joint distribution over attributes of a large number of interacting entities, such as graphs, social networks or gene networks. When modeling such a network using an rMRF one of the major problems i ..."
Abstract
 Add to MetaCart
Relational Markov Random Fields (rMRF’s) are a general and flexible framework for reasoning about the joint distribution over attributes of a large number of interacting entities, such as graphs, social networks or gene networks. When modeling such a network using an rMRF one of the major problems is choosing the set of features to include in the model and setting their weights. The main computational difficulty in learning such models from evidence is that the estimation of each set of features requires the use of a parameter estimation procedure. Even when dealing with complete data, where one can summarize a large domain by sufficient statistics, parameter estimation requires one to compute the expectation of the sufficient statistics given different parameter choices. This means that we run inference in the network for each step in the iterative algorithm used for parameter estimation. Since exact inference is usually intractable, the typical solution to this problem is to resort to approximate inference procedures, such as loopy belief propagation. Although these procedures are quite efficient, they still require computation that is on the order of the number of interactions (or features) in the model. When learning a large relational model over a complex domain even such approximations require unrealistic running time. In this work we show that for a particular class of rMRFs, which have inherent symmetry, we can perform the inference needed for learning procedures using a lifted templatelevel belief propagation. This procedure’s running time is proportional to the size of the relational model rather than the size of the domain. Moreover, we show that this computational procedure is equivalent to synchronous loopy belief propagation. This yields a dramatic speedup in inference time. We use this speedup to learn such symmetric rMRF’s from evidence in an efficient way. This enables us to explore problem domains which were impossible to handle with existing methods.
Discriminative Learning with Markov Logic Networks
"... Statistical relational learning (SRL) is an emerging area of research that addresses the problem of learning from noisy structured/relational data. Markov logic networks (MLNs), sets of weighted clauses, are a simple but powerful SRL formalism that combines the expressivity of firstorder logic with ..."
Abstract
 Add to MetaCart
Statistical relational learning (SRL) is an emerging area of research that addresses the problem of learning from noisy structured/relational data. Markov logic networks (MLNs), sets of weighted clauses, are a simple but powerful SRL formalism that combines the expressivity of firstorder logic with the flexibility of probabilistic reasoning. Most of the existing learning algorithms for MLNs are in the generative setting: they try to learn a model that maximizes the likelihood of the training data. However, most of the learning problems in relational data are discriminative. So to utilize the power of MLNs, we need discriminative learning methods that well match these discriminative tasks. In this proposal, we present two new discriminative learning algorithms for MLNs. The first one is a discriminative structure and weight learner for MLNs with nonrecursive clauses. We use a variant of ALEPH, an offtheshelf Inductive Logic Programming (ILP) system, to learn a large set of Horn clauses from the training data, then we apply an L1regularization weight learner to select a small set of nonzero weight clauses that maximizes the conditional loglikelihood (CLL) of the training data. The experimental results show that our proposed algorithm outperforms existing learning methods for MLNs and traditional ILP systems in term of predictive accuracy, and its performance is comparable to stateoftheart