Results 1  10
of
109
Entity Resolution with Markov Logic
 In ICDM
, 2006
"... Entity resolution is the problem of determining which records in a database refer to the same entities, and is a crucial and expensive step in the data mining process. Interest in it has grown rapidly in recent years, and many approaches have been proposed. However, they tend to address only isolate ..."
Abstract

Cited by 76 (9 self)
 Add to MetaCart
Entity resolution is the problem of determining which records in a database refer to the same entities, and is a crucial and expensive step in the data mining process. Interest in it has grown rapidly in recent years, and many approaches have been proposed. However, they tend to address only isolated aspects of the problem, and are often ad hoc. This paper proposes a wellfounded, integrated solution to the entity resolution problem based on Markov logic. Markov logic combines firstorder logic and probabilistic graphical models by attaching weights to firstorder formulas, and viewing them as templates for features of Markov networks. We show how a number of previous approaches can be formulated and seamlessly combined in Markov logic, and how the resulting learning and inference problems can be solved efficiently. Experiments on two citation databases show the utility of this approach, and evaluate the contribution of the different components. 1
Firstorder probabilistic models for coreference resolution
 In HLT/NAACL
, 2007
"... Traditional noun phrase coreference resolution systems represent features only of pairs of noun phrases. In this paper, we propose a machine learning method that enables features over sets of noun phrases, resulting in a firstorder probabilistic model for coreference. We outline a set of approximat ..."
Abstract

Cited by 57 (17 self)
 Add to MetaCart
Traditional noun phrase coreference resolution systems represent features only of pairs of noun phrases. In this paper, we propose a machine learning method that enables features over sets of noun phrases, resulting in a firstorder probabilistic model for coreference. We outline a set of approximations that make this approach practical, and apply our method to the ACE coreference dataset, achieving a 45 % error reduction over a comparable method that only considers features of pairs of noun phrases. This result demonstrates an example of how a firstorder logic representation can be incorporated into a probabilistic model and scaled efficiently. 1
Learning overhypotheses with hierarchical Bayesian models
"... Inductive learning is impossible without overhypotheses, or constraints on the hypotheses considered by the learner. Some of these overhypotheses must be innate, but we suggest that hierarchical Bayesian models help explain how the rest can be acquired. To illustrate this claim, we develop models th ..."
Abstract

Cited by 54 (22 self)
 Add to MetaCart
Inductive learning is impossible without overhypotheses, or constraints on the hypotheses considered by the learner. Some of these overhypotheses must be innate, but we suggest that hierarchical Bayesian models help explain how the rest can be acquired. To illustrate this claim, we develop models that acquire two kinds of overhypotheses — overhypotheses about feature variability (e.g. the shape bias in word learning) and overhypotheses about the grouping of categories into ontological kinds like objects and substances.
Church: A language for generative models
 In UAI
, 2008
"... Formal languages for probabilistic modeling enable reuse, modularity, and descriptive clarity, and can foster generic inference techniques. We introduce Church, a universal language for describing stochastic generative processes. Church is based on the Lisp model of lambda calculus, containing a pu ..."
Abstract

Cited by 54 (11 self)
 Add to MetaCart
Formal languages for probabilistic modeling enable reuse, modularity, and descriptive clarity, and can foster generic inference techniques. We introduce Church, a universal language for describing stochastic generative processes. Church is based on the Lisp model of lambda calculus, containing a pure Lisp as its deterministic subset. The semantics of Church is defined in terms of evaluation histories and conditional distributions on such histories. Church also includes a novel language construct, the stochastic memoizer, which enables simple description of many complex nonparametric models. We illustrate language features through several examples, including: a generalized Bayes net in which parameters cluster over trials, infinite PCFGs, planning by inference, and various nonparametric clustering models. Finally, we show how to implement query on any Church program, exactly and approximately, using Monte Carlo techniques. 1
MEBN: A Language for FirstOrder Bayesian Knowledge Bases
"... Although classical firstorder logic is the de facto standard logical foundation for artificial intelligence, the lack of a builtin, semantically grounded capability for reasoning under uncertainty renders it inadequate for many important classes of problems. Probability is the bestunderstood and m ..."
Abstract

Cited by 45 (18 self)
 Add to MetaCart
Although classical firstorder logic is the de facto standard logical foundation for artificial intelligence, the lack of a builtin, semantically grounded capability for reasoning under uncertainty renders it inadequate for many important classes of problems. Probability is the bestunderstood and most widely applied formalism for computational scientific reasoning under uncertainty. Increasingly expressive languages are emerging for which the fundamental logical basis is probability. This paper presents MultiEntity Bayesian Networks (MEBN), a firstorder language for specifying probabilistic knowledge bases as parameterized fragments of Bayesian networks. MEBN fragments (MFrags) can be instantiated and combined to form arbitrarily complex graphical probability models. An MFrag represents probabilistic relationships among a conceptually meaningful group of uncertain hypotheses. Thus, MEBN facilitates representation of knowledge at a natural level of granularity. The semantics of MEBN assigns a probability distribution over interpretations of an associated classical firstorder theory on a finite or countably infinite domain. Bayesian inference provides both a proof theory for combining prior knowledge with observations, and a learning theory for refining a representation as evidence accrues. A proof is given that MEBN can represent a probability distribution on interpretations of any finitely axiomatizable firstorder theory.
Describing Visual Scenes Using Transformed Objects and Parts
 INT J COMPUT VIS
, 2005
"... We develop hierarchical, probabilistic models for objects, the parts composing them, and the visual scenes surrounding them. Our approach couples topic models originally developed for text analysis with spatial transformations, and thus consistently accounts for geometric constraints. By building i ..."
Abstract

Cited by 43 (6 self)
 Add to MetaCart
We develop hierarchical, probabilistic models for objects, the parts composing them, and the visual scenes surrounding them. Our approach couples topic models originally developed for text analysis with spatial transformations, and thus consistently accounts for geometric constraints. By building integrated scene models, we may discover contextual relationships, and better exploit partially labeled training images. We first consider images of isolated objects, and show that sharing parts among object categories improves detection accuracy when learning from few examples. Turning to multiple object scenes, we propose nonparametric models which use Dirichlet processes to automatically learn the number of parts underlying each object category, and objects composing each scene. The resulting transformed Dirichlet process (TDP) leads to Monte Carlo algorithms which simultaneously segment and recognize objects in street and office scenes.
Factorie: Probabilistic programming via imperatively defined factor graphs
 In Advances in Neural Information Processing Systems 22
, 2009
"... Discriminatively trained undirected graphical models have had wide empirical success, and there has been increasing interest in toolkits that ease their application to complex relational data. The power in relational models is in their repeated structure and tied parameters; at issue is how to defin ..."
Abstract

Cited by 39 (7 self)
 Add to MetaCart
Discriminatively trained undirected graphical models have had wide empirical success, and there has been increasing interest in toolkits that ease their application to complex relational data. The power in relational models is in their repeated structure and tied parameters; at issue is how to define these structures in a powerful and flexible way. Rather than using a declarative language, such as SQL or firstorder logic, we advocate using an imperative language to express various aspects of model structure, inference, and learning. By combining the traditional, declarative, statistical semantics of factor graphs with imperative definitions of their construction and operation, we allow the user to mix declarative and procedural domain knowledge, and also gain significant efficiencies. We have implemented such imperatively defined factor graphs in a system we call FACTORIE, a software library for an objectoriented, stronglytyped, functional language. In experimental comparisons to Markov Logic Networks on joint segmentation and coreference, we find our approach to be 315 times faster while reducing error by 2025%—achieving a new state of the art. 1
Probabilistic Theorem Proving
"... Many representation schemes combining firstorder logic and probability have been proposed in recent years. Progress in unifying logical and probabilistic inference has been slower. Existing methods are mainly variants of lifted variable elimination and belief propagation, neither of which take logic ..."
Abstract

Cited by 25 (7 self)
 Add to MetaCart
Many representation schemes combining firstorder logic and probability have been proposed in recent years. Progress in unifying logical and probabilistic inference has been slower. Existing methods are mainly variants of lifted variable elimination and belief propagation, neither of which take logical structure into account. We propose the first method that has the full power of both graphical model inference and firstorder theorem proving (in finite domains with Herbrand interpretations). We first define probabilistic theorem proving, their generalization, as the problem of computing the probability of a logical formula given the probabilities or weights of a set of formulas. We then show how this can be reduced to the problem of lifted weighted model counting, and develop an efficient algorithm for the latter. We prove the correctness of this algorithm, investigate its properties, and show how it generalizes previous approaches. Experiments show that it greatly outperforms lifted variable elimination when logical structure is present. Finally, we propose an algorithm for approximate probabilistic theorem proving, and show that it can greatly outperform lifted belief propagation. 1
Generalpurpose mcmc inference over relational structures
 In Proceedings of the Proceedings of the TwentySecond Conference Annual Conference on Uncertainty in Artificial Intelligence (UAI06
"... Tasks such as record linkage and multitarget tracking, which involve reconstructing the set of objects that underlie some observed data, are particularly challenging for probabilistic inference. Recent work has achieved efficient and accurate inference on such problems using Markov chain Monte Carl ..."
Abstract

Cited by 22 (6 self)
 Add to MetaCart
Tasks such as record linkage and multitarget tracking, which involve reconstructing the set of objects that underlie some observed data, are particularly challenging for probabilistic inference. Recent work has achieved efficient and accurate inference on such problems using Markov chain Monte Carlo (MCMC) techniques with customized proposal distributions. Currently, implementing such a system requires coding MCMC state representations and acceptance probability calculations that are specific to a particular application. An alternative approach, which we pursue in this paper, is to use a generalpurpose probabilistic modeling language (such as BLOG) and a generic MetropolisHastings MCMC algorithm that supports usersupplied proposal distributions. Our algorithm gains flexibility by using MCMC states that are only partial descriptions of possible worlds; we provide conditions under which MCMC over partial worlds yields correct answers to queries. We also show how to use a contextspecific Bayes net to identify the factors in the acceptance probability that need to be computed for a given proposed move. Experimental results on a citation matching task show that our generalpurpose MCMC engine compares favorably with an applicationspecific system. 1
MEBN: A Logic for OpenWorld Probabilistic Reasoning
 Research Paper
, 2004
"... Uncertainty is a fundamental and irreducible aspect of our knowledge about the world. Probability is the most wellunderstood and widely applied logic for computational scientific reasoning under uncertainty. As theory and practice advance, generalpurpose languages are beginning to emerge for which ..."
Abstract

Cited by 20 (8 self)
 Add to MetaCart
Uncertainty is a fundamental and irreducible aspect of our knowledge about the world. Probability is the most wellunderstood and widely applied logic for computational scientific reasoning under uncertainty. As theory and practice advance, generalpurpose languages are beginning to emerge for which the fundamental logical basis is probability. However, such languages have lacked a logical foundation that fully integrates classical firstorder logic with probability theory. This paper presents such an integrated logical foundation. A formal specification is presented for multientity Bayesian networks (MEBN), a knowledge representation language based on directed graphical probability models. A proof is given that a probability distribution over interpretations of any consistent, finitely axiomatizable firstorder theory can be defined using MEBN. A semantics based on random variables provides a logically coherent foundation for open world reasoning and a means of analyzing tradeoffs between accuracy and computation cost. Furthermore, the underlying Bayesian logic is inherently open, having the ability to absorb new facts about the world, incorporate them into existing theories, and/or modify theories in the light of evidence. Bayesian inference provides both a proof theory for combining prior knowledge with observations, and a learning theory for refining a representation as evidence accrues. The results of this paper provide a logical foundation for the rapidly evolving literature on firstorder Bayesian knowledge representation, and point the way toward Bayesian languages suitable for generalpurpose knowledge representation and computing. Because firstorder Bayesian logic contains classical firstorder logic as a deterministic subset, it is a natural candidate as a universal representation for integrating domain ontologies expressed in languages based on classical firstorder logic or subsets thereof.