Markov Logic Networks
 Machine Learning
, 2006
"... Abstract. We propose a simple approach to combining firstorder logic and probabilistic graphical models in a single representation. A Markov logic network (MLN) is a firstorder knowledge base with a weight attached to each formula (or clause). Together with a set of constants representing objects ..."
Cited by 569 (34 self)
Abstract. We propose a simple approach to combining firstorder logic and probabilistic graphical models in a single representation. A Markov logic network (MLN) is a firstorder knowledge base with a weight attached to each formula (or clause). Together with a set of constants representing objects in the domain, it specifies a ground Markov network containing one feature for each possible grounding of a firstorder formula in the KB, with the corresponding weight. Inference in MLNs is performed by MCMC over the minimal subset of the ground network required for answering the query. Weights are efficiently learned from relational databases by iteratively optimizing a pseudolikelihood measure. Optionally, additional clauses are learned using inductive logic programming techniques. Experiments with a realworld database and knowledge base in a university domain illustrate the promise of this approach.
Snopt: An SQP Algorithm For LargeScale Constrained Optimization
, 1997
"... Sequential quadratic programming (SQP) methods have proved highly effective for solving constrained optimization problems with smooth nonlinear functions in the objective and constraints. Here we consider problems with general inequality constraints (linear and nonlinear). We assume that first deriv ..."
Cited by 328 (18 self)
Sequential quadratic programming (SQP) methods have proved highly effective for solving constrained optimization problems with smooth nonlinear functions in the objective and constraints. Here we consider problems with general inequality constraints (linear and nonlinear). We assume that first derivatives are available, and that the constraint gradients are sparse.
CUTE: Constrained and unconstrained testing environment
, 1993
"... The purpose of this paper is to discuss the scope and functionality of a versatile environment for testing small and largescale nonlinear optimization algorithms. Although many of these facilities were originally produced by the authors in conjunction with the software package LANCELOT, we belie ..."
Cited by 152 (3 self)
The purpose of this paper is to discuss the scope and functionality of a versatile environment for testing small and largescale nonlinear optimization algorithms. Although many of these facilities were originally produced by the authors in conjunction with the software package LANCELOT, we believe that they will be useful in their own right and should be available to researchers for their development of optimization software. The tools are available by anonymous ftp from a number of sources and may, in many cases, be installed automatically. The scope of a major collection of test problems written in the standard input format (SIF) used by the LANCELOT software package is described. Recognising that most software was not written with the SIF in mind, we provide tools to assist in building an interface between this input format and other optimization packages. These tools already provide a link between the SIF and an number of existing packages, including MINOS and OSL. In ad...
Representations Of QuasiNewton Matrices And Their Use In Limited Memory Methods
, 1994
"... We derive compact representations of BFGS and symmetric rankone matrices for optimization. These representations allow us to efficiently implement limited memory methods for large constrained optimization problems. In particular, we discuss how to compute projections of limited memory matrices onto ..."
Cited by 103 (8 self)
We derive compact representations of BFGS and symmetric rankone matrices for optimization. These representations allow us to efficiently implement limited memory methods for large constrained optimization problems. In particular, we discuss how to compute projections of limited memory matrices onto subspaces. We also present a compact representation of the matrices generated by Broyden's update for solving systems of nonlinear equations. Key words: QuasiNewton method, constrained optimization, limited memory method, largescale optimization. Abbreviated title: Representation of quasiNewton matrices. 1. Introduction. Limited memory quasiNewton methods are known to be effective techniques for solving certain classes of largescale unconstrained optimization problems (Buckley and Le Nir (1983), Liu and Nocedal (1989), Gilbert and Lemar'echal (1989)) . They make simple approximations of Hessian matrices, which are often good enough to provide a fast rate of linear convergence, and re...
Predicting clicks: Estimating the clickthrough rate for new ads
 In Proceedings of the 16th International World Wide Web Conference (WWW07
, 2007
"... Search engine advertising has become a significant element of the Web browsing experience. Choosing the right ads for the query and the order in which they are displayed greatly affects the probability that a user will see and click on each ad. This ranking has a strong impact on the revenue the sea ..."
Cited by 100 (1 self)
Search engine advertising has become a significant element of the Web browsing experience. Choosing the right ads for the query and the order in which they are displayed greatly affects the probability that a user will see and click on each ad. This ranking has a strong impact on the revenue the search engine receives from the ads. Further, showing the user an ad that they prefer to click on improves user satisfaction. For these reasons, it is important to be able to accurately estimate the clickthrough rate of ads in the system. For ads that have been displayed repeatedly, this is empirically measurable, but for new ads, other means must be used. We show that we can use features of ads, terms, and advertisers to learn a model that accurately predicts the clickthough rate for new ads. We also show that using our model improves the convergence and performance of an advertising system. As a result, our model increases both revenue and user satisfaction.
Line Search Algorithms With Guaranteed Sufficient Decrease
 ACM Trans. Math. Software
, 1992
"... The problem of finding a point that satisfies the sufficient decrease and curvature condition is formulated in terms of finding a point in a set T (). We describe a search algorithms for this problem that produces a sequence of iterates that converge to a point in T () and that, except for pathologi ..."
Cited by 86 (0 self)
The problem of finding a point that satisfies the sufficient decrease and curvature condition is formulated in terms of finding a point in a set T (). We describe a search algorithms for this problem that produces a sequence of iterates that converge to a point in T () and that, except for pathological cases, terminates in a finite number of steps. Numerical results for an implementation of the search algorithm on a set of test functions show that the algorithm terminates within a small number of iterations. LINE SEARCH ALGORITHMS WITH GUARANTEED SUFFICIENT DECREASE Jorge J. Mor'e and David J. Thuente 1 Introduction Given a continuously differentiable function OE : IR ! IR defined on [0; 1) with OE 0 (0) ! 0, and constants and j in (0; 1), we are interested in finding an ff ? 0 such that OE(ff) OE(0) + OE 0 (0)ff (1:1) and jOE 0 (ff)j jjOE 0 (0)j: (1:2) The development of a search procedure that satisfies these conditions is a crucial ingredient in a line search meth...
Theory of Algorithms for Unconstrained Optimization
, 1992
"... this article I will attempt to review the most recent advances in the theory of unconstrained optimization, and will also describe some important open questions. Before doing so, I should point out that the value of the theory of optimization is not limited to its capacity for explaining the behavio ..."
Cited by 84 (1 self)
this article I will attempt to review the most recent advances in the theory of unconstrained optimization, and will also describe some important open questions. Before doing so, I should point out that the value of the theory of optimization is not limited to its capacity for explaining the behavior of the most widely used techniques. The question
LargeScale Optimization of Eigenvalues
 SIAM J. Optimization
, 1991
"... Optimization problems involving eigenvalues arise in many applications. Let x be a vector of real parameters and let A(x) be a continuously differentiable symmetric matrix function of x. We consider a particular problem which occurs frequently: the minimization of the maximum eigenvalue of A(x), ..."
Cited by 83 (4 self)
Optimization problems involving eigenvalues arise in many applications. Let x be a vector of real parameters and let A(x) be a continuously differentiable symmetric matrix function of x. We consider a particular problem which occurs frequently: the minimization of the maximum eigenvalue of A(x), subject to linear constraints and bounds on x. The eigenvalues of A(x) are not differentiable at points x where they coalesce, so the optimization problem is said to be nonsmooth. Furthermore, it is typically the case that the optimization objective tends to make eigenvalues coalesce at a solution point. There are three main purposes of the paper. The first is to present a clear and selfcontained derivation of the Clarke generalized gradient of the max eigenvalue function in terms of a "dual matrix". The second purpose is to describe a new algorithm, based on the ideas of a previous paper by the author (SIAM J. Matrix Anal. Appl. 9 (1988) 256268), which is suitable for solving l...
Prototypedriven learning for sequence models
 In Proceedings of HLTNAACL
, 2006
"... We investigate prototypedriven learning for primarily unsupervised sequence modeling. Prior knowledge is specified declaratively, by providing a few canonical examples of each target annotation label. This sparse prototype information is then propagated across a corpus using distributional similari ..."
Cited by 74 (5 self)
We investigate prototypedriven learning for primarily unsupervised sequence modeling. Prior knowledge is specified declaratively, by providing a few canonical examples of each target annotation label. This sparse prototype information is then propagated across a corpus using distributional similarity features in a loglinear generative model. On partofspeech induction in English and Chinese, as well as an information extraction task, prototype features provide substantial error rate reductions over competitive baselines and outperform previous work. For example, we can achieve an English partofspeech tagging accuracy of 80.5 % using only three examples of each tag and no dictionary constraints. We also compare to semisupervised learning and discuss the systemâ€™s error trends. 1
cdec: A decoder, alignment, and learning framework for finitestate and contextfree translation models
 In Proceedings of ACL System Demonstrations
, 2010
"... We present cdec, an open source framework for decoding, aligning with, and training a number of statistical machine translation models, including wordbased models, phrasebased models, and models based on synchronous contextfree grammars. Using a single unified internal representation for translat ..."
Cited by 65 (29 self)
We present cdec, an open source framework for decoding, aligning with, and training a number of statistical machine translation models, including wordbased models, phrasebased models, and models based on synchronous contextfree grammars. Using a single unified internal representation for translation forests, the decoder strictly separates modelspecific translation logic from general rescoring, pruning, and inference algorithms. From this unified representation, the decoder can extract not only the 1 or kbest translations, but also alignments to a reference, or the quantities necessary to drive discriminative training using gradientbased or gradientfree optimization techniques. Its efficient C++ implementation means that memory use and runtime performance are significantly better than comparable decoders. 1