Results 1  10
of
68
Link Mining: A Survey
 SigKDD Explorations Special Issue on Link Mining
, 2005
"... Many datasets of interest today are best described as a linked collection of interrelated objects. These may represent homogeneous networks, in which there is a singleobject type and link type, or richer, heterogeneous networks, in which there may be multiple object and link types (and possibly oth ..."
Abstract

Cited by 47 (0 self)
 Add to MetaCart
Many datasets of interest today are best described as a linked collection of interrelated objects. These may represent homogeneous networks, in which there is a singleobject type and link type, or richer, heterogeneous networks, in which there may be multiple object and link types (and possibly other semantic information). Examples of homogeneous networks include single mode social networks, such as people connected by friendship links, or the WWW, a collection of linked web pages. Examples of heterogeneous networks include those in medical domains describing patients, diseases, treatments and contacts, or in bibliographic domains describing publications, authors, and venues. Link mining refers to data mining techniques that explicitly consider these links when building predictive or descriptive models of the linked data. Commonly addressed link mining tasks include object ranking, group detection, collective classification, link prediction and subgraph discovery. While network analysis has been studied in depth in particular areas such as social network analysis, hypertext mining, and web analysis, only recently has there been a crossfertilization of ideas among these different communities. This is an exciting, rapidly expanding area. In this article, we review some of the common emerging themes. 1.
Largescale deduplication with constraints using dedupalog
 in: Proceedings of the 25th International Conference on Data Engineering (ICDE
"... Abstract — We present a declarative framework for collective deduplication of entity references in the presence of constraints. Constraints occur naturally in many data cleaning domains and can improve the quality of deduplication. An example of a constraint is “each paper has a unique publication v ..."
Abstract

Cited by 28 (2 self)
 Add to MetaCart
Abstract — We present a declarative framework for collective deduplication of entity references in the presence of constraints. Constraints occur naturally in many data cleaning domains and can improve the quality of deduplication. An example of a constraint is “each paper has a unique publication venue”; iftwo paper references are duplicates, then their associated conference references must be duplicates as well. Our framework supports collective deduplication, meaning that we can dedupe both paper references and conference references collectively in the example above. Our framework is based on a simple declarative Datalogstyle language with precise semantics. Most previous work on deduplication either ignore constraints or use them in an adhoc domainspecific manner. We also present efficient algorithms to support the framework. Our algorithms have precise theoretical guarantees for a large subclass of our framework. We show, using a prototype implementation, that our algorithms scale to very large datasets. We provide thorough experimental results over realworld data demonstrating the utility of our framework for highquality and scalable deduplication. I.
Learning firstorder probabilistic models with combining rules
 IN PROCEEDINGS OF THE INTERNATIONAL CONFERENCE IN MACHINE LEARNING
, 2005
"... Many realworld domains exhibit rich relational structure and stochasticity and motivate the development of models that combine predicate logic with probabilities. These models describe probabilistic influences between attributes of objects that are related to each other through known domain relatio ..."
Abstract

Cited by 25 (11 self)
 Add to MetaCart
Many realworld domains exhibit rich relational structure and stochasticity and motivate the development of models that combine predicate logic with probabilities. These models describe probabilistic influences between attributes of objects that are related to each other through known domain relationships. To keep these models succinct, each such influence is considered independent of others, which is called the assumption of “independence of causal influences” (ICI). In this paper, we describe a language that consists of quantified conditional influence statements and captures most relational probabilistic models based on directed graphs. The influences due to different statements are combined using a set of combining rules such as NoisyOR. We motivate and introduce multilevel combining rules, where the lower level rules combine the influences due to different ground instances of the same statement, and the upper level rules combine the influences due to different statements. We present algorithms and empirical results for parameter learning in the presence of such combining rules. Specifically, we derive and implement algorithms based on gradient descent and expectation maximization for different combining rules and evaluate them on synthetic data and on a realworld task. The results demonstrate that the algorithms are able to learn both the conditional probability distributions of the influence statements and the parameters of the combining rules.
Nonparametric bayesian logic
 In UAI
, 2005
"... The Bayesian Logic (BLOG) language was recently developed for defining firstorder probability models over worlds with unknown numbers of objects. It handles important problems in AI, including data association and population estimation. This paper extends BLOG by adopting generative processes over ..."
Abstract

Cited by 20 (1 self)
 Add to MetaCart
The Bayesian Logic (BLOG) language was recently developed for defining firstorder probability models over worlds with unknown numbers of objects. It handles important problems in AI, including data association and population estimation. This paper extends BLOG by adopting generative processes over function spaces — known as nonparametrics in the Bayesian literature. We introduce syntax for reasoning about arbitrary collections of objects, and their properties, in an intuitive manner. By exploiting exchangeability, distributions over unknown objects and their attributes are cast as Dirichlet processes, which resolve difficulties in model selection and inference caused by varying numbers of objects. We demonstrate these concepts with application to citation matching. 1
Embedded probabilistic programming
 In Working conf. on domain specific lang
, 2009
"... Abstract. Two general techniques for implementing a domainspecific language (DSL) with less overhead are the finallytagless embedding of object programs and the directstyle representation of side effects. We use these techniques to build a DSL for probabilistic programming, for expressing countab ..."
Abstract

Cited by 18 (3 self)
 Add to MetaCart
Abstract. Two general techniques for implementing a domainspecific language (DSL) with less overhead are the finallytagless embedding of object programs and the directstyle representation of side effects. We use these techniques to build a DSL for probabilistic programming, for expressing countable probabilistic models and performing exact inference and importance sampling on them. Our language is embedded as an ordinary OCaml library and represents probability distributions as ordinary OCaml programs. We use delimited continuations to reify probabilistic programs as lazy search trees, which inference algorithms may traverse without imposing any interpretive overhead on deterministic parts of a model. We thus take advantage of the existing OCaml implementation to achieve competitive performance and ease of use. Inference algorithms can easily be embedded in probabilistic programs themselves.
Information, Divergence and Risk for Binary Experiments
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2009
"... We unify fdivergences, Bregman divergences, surrogate regret bounds, proper scoring rules, cost curves, ROCcurves and statistical information. We do this by systematically studying integral and variational representations of these various objects and in so doing identify their primitives which all ..."
Abstract

Cited by 17 (6 self)
 Add to MetaCart
We unify fdivergences, Bregman divergences, surrogate regret bounds, proper scoring rules, cost curves, ROCcurves and statistical information. We do this by systematically studying integral and variational representations of these various objects and in so doing identify their primitives which all are related to costsensitive binary classification. As well as developing relationships between generative and discriminative views of learning, the new machinery leads to tight and more general surrogate regret bounds and generalised Pinsker inequalities relating fdivergences to variational divergence. The new viewpoint also illuminates existing algorithms: it provides a new derivation of Support Vector Machines in terms of divergences and relates Maximum Mean Discrepancy to Fisher Linear Discriminants.
Genic interaction extraction with semantic and syntactic chains
 In Proceedings of the Fourth Workshop on Learning Language in Logic
, 2005
"... This paper describes the system that we submitted to the “Learning Language in Logic” Challenge of extracting directed genic interactions from sentences in Medline abstracts. The system uses Markov Logic, a framework that combines loglinear models and First Order Logic, to create a set of weighted ..."
Abstract

Cited by 13 (0 self)
 Add to MetaCart
This paper describes the system that we submitted to the “Learning Language in Logic” Challenge of extracting directed genic interactions from sentences in Medline abstracts. The system uses Markov Logic, a framework that combines loglinear models and First Order Logic, to create a set of weighted clauses which can classify pairs of gene named entities as genic interactions. These clauses are based on chains of syntactic and semantic relations in the parse or Discourse Representation Structure (drs) of a sentence, respectively. Our submitted results achieved 52.6 % FMeasure on the dataset without and 54.3 % on the dataset with coreferences. After adding explicit clauses which model noninteraction we were able to improve these numbers to 68.4 % and 64.7%, respectively.
Cold Start Link Prediction
"... Inthetraditionallinkpredictionproblem, asnapshotofasocial network is used as a starting point to predict, by means of graphtheoretic measures, the links that are likely to appear in the future. In this paper, we introduce cold start link prediction as the problem of predicting the structure of a so ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
Inthetraditionallinkpredictionproblem, asnapshotofasocial network is used as a starting point to predict, by means of graphtheoretic measures, the links that are likely to appear in the future. In this paper, we introduce cold start link prediction as the problem of predicting the structure of a social network when the network itself is totally missing while some other information regarding the nodes is available. Weproposeatwophasemethodbasedonthebootstrap probabilistic graph. The first phase generates an implicit social network under the form of a probabilistic graph. The second phase applies probabilistic graphbased measures to produce the final prediction. We assess our method empirically over a large data collection obtained from Flickr, using interest groups as the initial information. The experiments confirm the effectiveness of our approach.
A Probabilistic Abduction Engine for Media Interpretation Based on Ontologies
 In Pascal Hitzler and Thomas Lukasiewicz, 148 editors, RR, volume 6333 of Lecture Notes in Computer Science
"... For multimedia interpretation, and in particular for the combined interpretation of information coming from different modalities, a semantically wellfounded formalization is required in the context of an agentbased scenario. Lowlevel percepts, which are represented symbolically, define the observ ..."
Abstract

Cited by 9 (6 self)
 Add to MetaCart
For multimedia interpretation, and in particular for the combined interpretation of information coming from different modalities, a semantically wellfounded formalization is required in the context of an agentbased scenario. Lowlevel percepts, which are represented symbolically, define the observations of an agent, and interpretations of content are defined as explanations for the observations. We propose an abductionbased formalism that uses description logics for the ontology and Horn rules for defining the space of hypotheses for explanations (i.e., the space of possible interpretations of media content), and we use Markov logic to define the motivation for the agent to generate explanations on the one hand, and
Extending markov logic to model probability distributions in relational domains
 In KI
, 2007
"... Abstract. Markov logic, as a highly expressive representation formalism that essentially combines the semantics of probabilistic graphical models with the full power of firstorder logic, is one of the most intriguing representations in the field of probabilistic logical modelling. However, as we wi ..."
Abstract

Cited by 8 (4 self)
 Add to MetaCart
Abstract. Markov logic, as a highly expressive representation formalism that essentially combines the semantics of probabilistic graphical models with the full power of firstorder logic, is one of the most intriguing representations in the field of probabilistic logical modelling. However, as we will show, models in Markov logic often fail to generalize because the parameters they contain are highly domainspecific. We take the perspective of generative stochastic processes in order to describe probability distributions in relational domains and illustrate the problem in this context by means of simple examples. We propose an extension of the language that involves the specification of a priori independent attributes and that furthermore introduces a dynamic parameter adjustment whenever a model in Markov logic is instantiated for a certain domain (set of objects). Our extension removes the corresponding restrictions on processes for which models can be learned using standard methods and thus enables Markov logic networks to be practically applied to a far greater class of generative stochastic processes. 1