Results 1  10
of
43
Discriminative Structure and Parameter Learning for Markov Logic Networks
"... Markov logic networks (MLNs) are an expressive representation for statistical relational learning that generalizes both firstorder logic and graphical models. Existing methods for learning the logical structure of an MLN are not discriminative; however, many relational learning problems involve spe ..."
Abstract

Cited by 56 (5 self)
 Add to MetaCart
(Show Context)
Markov logic networks (MLNs) are an expressive representation for statistical relational learning that generalizes both firstorder logic and graphical models. Existing methods for learning the logical structure of an MLN are not discriminative; however, many relational learning problems involve specific target predicates that must be inferred from given background information. We found that existing MLN methods perform very poorly on several such ILP benchmark problems, and we present improved discriminative methods for learning MLN clauses and weights that outperform existing MLN and traditional ILP methods. 1.
Learning from the Wisdom of Crowds by Minimax Entropy
"... An important way to make large training sets is to gather noisy labels from crowds of nonexperts. We propose a minimax entropy principle to improve the quality of these labels. Our method assumes that labels are generated by a probability distribution over workers, items, and labels. By maximizing t ..."
Abstract

Cited by 29 (3 self)
 Add to MetaCart
(Show Context)
An important way to make large training sets is to gather noisy labels from crowds of nonexperts. We propose a minimax entropy principle to improve the quality of these labels. Our method assumes that labels are generated by a probability distribution over workers, items, and labels. By maximizing the entropy of this distribution, the method naturally infers item confusability and worker expertise. We infer the ground truth by minimizing the entropy of this distribution, which we show minimizes the KullbackLeibler (KL) divergence between the probability distribution and the unknown truth. We show that a simple coordinate descent scheme can optimize minimax entropy. Empirically, our results are substantially better than previously published methods for the same problem. 1
Lifted coordinate descent for learning with tracenorm regularization
 AISTATS
, 2012
"... We consider the minimization of a smooth loss with tracenorm regularization, which is a natural objective in multiclass and multitask learning. Even though the problem is convex, existing approaches rely on optimizing a nonconvex variational bound, which is not guaranteed to converge, or repeated ..."
Abstract

Cited by 29 (5 self)
 Add to MetaCart
(Show Context)
We consider the minimization of a smooth loss with tracenorm regularization, which is a natural objective in multiclass and multitask learning. Even though the problem is convex, existing approaches rely on optimizing a nonconvex variational bound, which is not guaranteed to converge, or repeatedly perform singularvalue decomposition, which prevents scaling beyond moderate matrix sizes. We lift the nonsmooth convex problem into an infinitely dimensional smooth problem and apply coordinate descent to solve it. We prove that our approach converges to the optimum, and is competitive or outperforms state of the art. 1
Boosting with Structural Sparsity
"... We derive generalizations of AdaBoost and related gradientbased coordinate descent methods that incorporate sparsitypromoting penalties for the norm of the predictor that is being learned. The end result is a family of coordinate descent algorithms that integrate forward feature induction and back ..."
Abstract

Cited by 25 (2 self)
 Add to MetaCart
We derive generalizations of AdaBoost and related gradientbased coordinate descent methods that incorporate sparsitypromoting penalties for the norm of the predictor that is being learned. The end result is a family of coordinate descent algorithms that integrate forward feature induction and backpruning through regularization and give an automatic stopping criterion for feature induction. We study penalties based on the ℓ1, ℓ2, and ℓ ∞ norms of the predictor and introduce mixednorm penalties that build upon the initial penalties. The mixednorm regularizers facilitate structural sparsity in parameter space, which is a useful property in multiclass prediction and other related tasks. We report empirical results that demonstrate the power of our approach in building accurate and structurally sparse models. 1. Introduction and
Maximum Entropy Discrimination Markov Networks
, 2008
"... Standard maxmargin structured prediction methods concentrate directly on the inputoutput mapping, and the lack of an elegant probabilistic interpretation causes limitations. In this paper, we present a novel framework called Maximum Entropy Discrimination Markov Networks (MaxEntNet) to do Bayesian ..."
Abstract

Cited by 15 (8 self)
 Add to MetaCart
(Show Context)
Standard maxmargin structured prediction methods concentrate directly on the inputoutput mapping, and the lack of an elegant probabilistic interpretation causes limitations. In this paper, we present a novel framework called Maximum Entropy Discrimination Markov Networks (MaxEntNet) to do Bayesian maxmargin structured learning by using expected margin constraints to define a feasible distribution subspace and applying the maximum entropy principle to choose the best distribution from this subspace. We show that MaxEntNet subsumes the standard maxmargin Markov networks (M 3 N) as a spacial case where the predictive model is assumed to be linear and the parameter prior is a standard normal. Based on this understanding, we propose the Laplace maxmargin Markov networks (LapM 3 N) which use the Laplace prior instead of the standard normal. We show that the adoption of a Laplace prior of the parameter makes LapM 3 N enjoy properties expected from a sparsified M 3 N. Unlike the L1regularized maximum likelihood estimation which sets small weights to zeros to achieve sparsity, LapM 3 N posteriorly weights the parameters and features with smaller weights are shrunk more. This posterior weighting effect makes LapM 3 N more stable with respect to the magnitudes of the regularization coefficients and more generalizable. To
Bayesian inference with posterior regularization and applications to infinite latent svms
 In arXiv:1210.1766v2
, 2013
"... Existing Bayesian models, especially nonparametric Bayesian methods, rely on specially conceived priors to incorporate domain knowledge for discovering improved latent representations. While priors affect posterior distributions through Bayes ’ rule, imposing posterior regularization is arguably mo ..."
Abstract

Cited by 14 (9 self)
 Add to MetaCart
Existing Bayesian models, especially nonparametric Bayesian methods, rely on specially conceived priors to incorporate domain knowledge for discovering improved latent representations. While priors affect posterior distributions through Bayes ’ rule, imposing posterior regularization is arguably more direct and in some cases more natural and general. In this paper, we present regularized Bayesian inference (RegBayes), a novel computational framework that performs posterior inference with a regularization term on the desired postdata posterior distribution under an information theoretical formulation. RegBayes is more flexible than the procedure that elicits expert knowledge via priors, and it covers both directed Bayesian networks and undirected Markov networks. When the regularization is induced from a linear operator on the posterior distributions, such as the expectation operator, we present a general convexanalysis theorem to characterize the solution of RegBayes. Furthermore, we present two concrete examples of RegBayes, infinite latent support vector machines (iLSVM) and multitask infinite latent support vector machines (MTiLSVM), which explore the largemargin idea in combination with a nonparametric Bayesian model for dis
PACBayesian Analysis of Coclustering and Beyond
"... We derive PACBayesian generalization bounds for supervised and unsupervised learning models based on clustering, such as coclustering, matrix trifactorization, graphical models, graph clustering, and pairwise clustering. 1 We begin with the analysis of coclustering, which is a widely used approa ..."
Abstract

Cited by 14 (7 self)
 Add to MetaCart
We derive PACBayesian generalization bounds for supervised and unsupervised learning models based on clustering, such as coclustering, matrix trifactorization, graphical models, graph clustering, and pairwise clustering. 1 We begin with the analysis of coclustering, which is a widely used approach to the analysis of data matrices. We distinguish among two tasks in matrix data analysis: discriminative prediction of the missing entries in data matrices and estimation of the joint probability distribution of row and column variables in cooccurrence matrices. We derive PACBayesian generalization bounds for the expected outofsample performance of coclusteringbased solutions for these two tasks. The analysis yields regularization terms that were absent in the previous formulations of coclustering. The bounds suggest that the expected performance of coclustering is governed by a tradeoff between its empirical performance and the mutual information preserved by the cluster variables on row and column IDs. We derive an iterative projection algorithm for finding a local optimum of this tradeoff for discriminative prediction tasks. This algorithm achieved stateoftheart performance in the MovieLens collaborative filtering task. Our coclustering model can also be seen as matrix trifactorization and the results provide generalization bounds, regularization
Computational Rationalization: The Inverse Equilibrium Problem
"... Modeling the purposeful behavior of imperfect agents from a small number of observations is a challenging task. When restricted to the singleagent decisiontheoretic setting, inverse optimal control techniques assume that observed behavior is an approximately optimal solution to an unknown decision ..."
Abstract

Cited by 13 (2 self)
 Add to MetaCart
Modeling the purposeful behavior of imperfect agents from a small number of observations is a challenging task. When restricted to the singleagent decisiontheoretic setting, inverse optimal control techniques assume that observed behavior is an approximately optimal solution to an unknown decision problem. These techniques learn a utility function that explains the example behavior and can then be used to accurately predict or imitate future behavior in similar observed or unobserved situations. In this work, we consider similar tasks in competitive and cooperative multiagent domains. Here, unlike singleagent settings, a player cannot myopically maximize its reward — it must speculate on how the other agents may act to influence the game’s outcome. Employing the gametheoretic notion of regret and the principle of maximum entropy, we introduce a technique for predicting and generalizing behavior, as well as recovering a reward function in these domains. 1.
Partially Observed Maximum Entropy Discrimination Markov Networks
"... Learning graphical models with hidden variables can offer semantic insights to complex data and lead to salient structured predictors without relying on expensive, sometime unattainable fully annotated training data. While likelihoodbased methods have been extensively explored, to our knowledge, le ..."
Abstract

Cited by 11 (5 self)
 Add to MetaCart
(Show Context)
Learning graphical models with hidden variables can offer semantic insights to complex data and lead to salient structured predictors without relying on expensive, sometime unattainable fully annotated training data. While likelihoodbased methods have been extensively explored, to our knowledge, learning structured prediction models with latent variables based on the maxmargin principle remains largely an open problem. In this paper, we present a partially observed Maximum Entropy Discrimination Markov Network (PoMEN) model that attempts to combine the advantages of Bayesian and margin based paradigms for learning Markov networks from partially labeled data. PoMEN leads to an averaging prediction rule that resembles a Bayes predictor that is more robust to overfitting, but is also built on the desirable discriminative laws resemble those of the M 3 N. We develop an EMstyle algorithm utilizing existing convex optimization algorithms for M 3 N as a subroutine. We demonstrate competent performance of PoMEN over existing methods on a realworld web data extraction task. 1
A tractable combinatorial market maker using constraint generation.
 In ACM Conference on Electronic Commerce,
, 2012
"... ..."