Results 1  10
of
31
Markov Logic Networks
 Machine Learning
, 2006
"... Abstract. We propose a simple approach to combining firstorder logic and probabilistic graphical models in a single representation. A Markov logic network (MLN) is a firstorder knowledge base with a weight attached to each formula (or clause). Together with a set of constants representing objects ..."
Abstract

Cited by 569 (34 self)
 Add to MetaCart
Abstract. We propose a simple approach to combining firstorder logic and probabilistic graphical models in a single representation. A Markov logic network (MLN) is a firstorder knowledge base with a weight attached to each formula (or clause). Together with a set of constants representing objects in the domain, it specifies a ground Markov network containing one feature for each possible grounding of a firstorder formula in the KB, with the corresponding weight. Inference in MLNs is performed by MCMC over the minimal subset of the ground network required for answering the query. Weights are efficiently learned from relational databases by iteratively optimizing a pseudolikelihood measure. Optionally, additional clauses are learned using inductive logic programming techniques. Experiments with a realworld database and knowledge base in a university domain illustrate the promise of this approach.
Learning the structure of Markov logic networks
 In Proceedings of the 22nd International Conference on Machine Learning
, 2005
"... Markov logic networks (MLNs) combine logic and probability by attaching weights to firstorder clauses, and viewing these as templates for features of Markov networks. In this paper we develop an algorithm for learning the structure of MLNs from relational databases, combining ideas from inductive l ..."
Abstract

Cited by 88 (17 self)
 Add to MetaCart
Markov logic networks (MLNs) combine logic and probability by attaching weights to firstorder clauses, and viewing these as templates for features of Markov networks. In this paper we develop an algorithm for learning the structure of MLNs from relational databases, combining ideas from inductive logic programming (ILP) and feature induction in Markov networks. The algorithm performs a beam or shortestfirst search of the space of clauses, guided by a weighted pseudolikelihood measure. This requires computing the optimal weights for each candidate structure, but we show how this can be done efficiently. The algorithm can be used to learn an MLN from scratch, or to refine an existing knowledge base. We have applied it in two realworld domains, and found that it outperforms using offtheshelf ILP systems to learn the MLN structure, as well as pure ILP, purely probabilistic and purely knowledgebased approaches. 1.
Parameter Estimation in Stochastic Logic Programs
 Machine Learning
, 2000
"... . Stochastic logic programs (SLPs) are logic programs with labelled clauses which dene a loglinear distribution over refutations of goals. The loglinear distribution provides, by marginalisation, a distribution over variable bindings, allowing SLPs to compactly represent quite complex distributions ..."
Abstract

Cited by 69 (4 self)
 Add to MetaCart
. Stochastic logic programs (SLPs) are logic programs with labelled clauses which dene a loglinear distribution over refutations of goals. The loglinear distribution provides, by marginalisation, a distribution over variable bindings, allowing SLPs to compactly represent quite complex distributions. We analyse the fundamental statistical properties of SLPs addressing issues concerning innite derivations, `unnormalised' SLPs and impure SLPs. After detailing existing approaches to parameter estimation for loglinear models and their application to SLPs, we present a new algorithm called failureadjusted maximisation (FAM). FAM is an instance of the EM algorithm that applies specically to normalised SLPs and provides a closedform for computing parameter updates within an iterative maximisation approach. We empirically show that FAM works on some small examples and discuss methods for applying it to bigger problems. c 2000 Kluwer Academic Publishers. Printed in the Netherlands. ...
Improving Accuracy in Wordclass Tagging through Combination of Machine Learning Systems
 Computational Linguistics
, 2000
"... this paper, we combine different systems employing known representations. The observation that suggests this approach is that systems that are designed differently, either because they use a different formalism or because they contain different knowledge, will typically produce different errors. We ..."
Abstract

Cited by 45 (5 self)
 Add to MetaCart
this paper, we combine different systems employing known representations. The observation that suggests this approach is that systems that are designed differently, either because they use a different formalism or because they contain different knowledge, will typically produce different errors. We hope to make use of this fact and reduce the number of errors with very little additional effort by exploiting the disagreement between different language models. Al though the approach is applicable to any type of language model, we focus on the case of statistical disambiguators that are trained on annotated corpora. The examples of the task that are present in the corpus and its annotation are fed into a learning algorithm, which induces a model of the desired inputoutput mapping in the form of a classifier. * EO. Box 9103, 6500 HD Nijmegen, The Netherlands, hvh@let.ktm.nl t Universiteitsplein 1, 2610 Wilrijk, Belgium, {zavrel, daelem}@uia.ua.ac.be () 2000 Association for Computational Linguistics We use a number of different learning algorithms simultaneously on the same training corpus. Each type of learning method brings its own 'inductive bias' to the task and will produce a classifier with slightly different characteristics, so that different methods will tend to produce different errors
Statistical Relational Learning for Document Mining
, 2003
"... A major obstacle to fully integrated deployment of statistical learners is the assumption that data sits in a single table, even though most realworld databases have complex relational structures. In this paper, we introduce an integrated approach to building regression models from data stored ..."
Abstract

Cited by 36 (5 self)
 Add to MetaCart
A major obstacle to fully integrated deployment of statistical learners is the assumption that data sits in a single table, even though most realworld databases have complex relational structures. In this paper, we introduce an integrated approach to building regression models from data stored in relational databases. Potential features are generated by structured search of the space of queries to the database, and then tested for inclusion in a logistic regression. We present experimental results for the task of predicting where scientific papers will be published based on relational data taken from CiteSeer. This data includes word counts in the document, frequently cited authors or papers, cocitations, publication venues of cited papers, word cooccurrences, and word counts in cited or citing documents. Our approach results in classification accuracies superior to those achieved when using classical "flat" features. Our classification task also serves as a "where to publish?" conference/journal recommendation task.
Discriminative Structure and Parameter Learning for Markov Logic Networks
"... Markov logic networks (MLNs) are an expressive representation for statistical relational learning that generalizes both firstorder logic and graphical models. Existing methods for learning the logical structure of an MLN are not discriminative; however, many relational learning problems involve spe ..."
Abstract

Cited by 36 (5 self)
 Add to MetaCart
Markov logic networks (MLNs) are an expressive representation for statistical relational learning that generalizes both firstorder logic and graphical models. Existing methods for learning the logical structure of an MLN are not discriminative; however, many relational learning problems involve specific target predicates that must be inferred from given background information. We found that existing MLN methods perform very poorly on several such ILP benchmark problems, and we present improved discriminative methods for learning MLN clauses and weights that outperform existing MLN and traditional ILP methods. 1.
Loglinear Models for FirstOrder Probabilistic Reasoning
 In Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence
, 1999
"... Recent work on loglinear models in probabilistic constraint logic programming is applied to firstorder probabilistic reasoning. Probabilities are defined directly on the proofs of atomic formulae, and by marginalisation on the atomic formulae themselves. We use Stochastic Logic Programs (SLPs) com ..."
Abstract

Cited by 35 (5 self)
 Add to MetaCart
Recent work on loglinear models in probabilistic constraint logic programming is applied to firstorder probabilistic reasoning. Probabilities are defined directly on the proofs of atomic formulae, and by marginalisation on the atomic formulae themselves. We use Stochastic Logic Programs (SLPs) composed of labelled and unlabelled definite clauses to define the proof probabilities. We have a conservative extension of firstorder reasoning, so that, for example, there is a oneone mapping between logical and random variables. We show how, in this framework, Inductive Logic Programming (ILP) can be used to induce the features of a loglinear model from data. We also compare the presented framework with other approaches to firstorder probabilistic reasoning. Keywords: loglinear models, constraint logic programming, inductive logic programming 1 Introduction A framework which merges firstorder logical and probabilistic inference in a theoretically sound and applicable manner promises ma...
Integrating probabilistic extraction models and data mining to discover relations and patterns in text
 In Proceedings of the HLTNAACL2006
, 2006
"... In order for relation extraction systems to obtain humanlevel performance, they must be able to incorporate relational patterns inherent in the data (for example, that one’s sister is likely one’s mother’s daughter, or that children are likely to attend the same college as their parents). Handcodi ..."
Abstract

Cited by 35 (0 self)
 Add to MetaCart
In order for relation extraction systems to obtain humanlevel performance, they must be able to incorporate relational patterns inherent in the data (for example, that one’s sister is likely one’s mother’s daughter, or that children are likely to attend the same college as their parents). Handcoding such knowledge can be timeconsuming and inadequate. Additionally, there may exist many interesting, unknown relational patterns that both improve extraction performance and provide insight into text. We describe a probabilistic extraction model that provides mutual
Chunking with Maximum Entropy Models
, 2000
"... this paper I discuss a first attempt to create a text chunker using a Maximum Entropy model. The first experiments, implementing classifiers that tag every word in a sentence with a phrasetag using very local lexical information, partof speech tags and phrase tags of surrounding words, give encoura ..."
Abstract

Cited by 30 (1 self)
 Add to MetaCart
this paper I discuss a first attempt to create a text chunker using a Maximum Entropy model. The first experiments, implementing classifiers that tag every word in a sentence with a phrasetag using very local lexical information, partof speech tags and phrase tags of surrounding words, give encouraging results
nFOIL: Integrating Naïve Bayes and FOIL
, 2005
"... We present the system nFOIL. It tightly integrates the naïve Bayes learning scheme with the inductive logic programming rulelearner FOIL. In contrast to previous combinations, which have employed naïve Bayes only for postprocessing the rule sets, nFOIL employs the naïve Bayes criterion to directly ..."
Abstract

Cited by 24 (3 self)
 Add to MetaCart
We present the system nFOIL. It tightly integrates the naïve Bayes learning scheme with the inductive logic programming rulelearner FOIL. In contrast to previous combinations, which have employed naïve Bayes only for postprocessing the rule sets, nFOIL employs the naïve Bayes criterion to directly guide its search. Experimental evidence shows that nFOIL performs better than both its base line algorithm FOIL or the postprocessing approach, and is at the same time competitive with more sophisticated approaches.