Results 1  10
of
24
Learning Probabilistic Models of Link Structure
 Journal of Machine Learning Research
, 2002
"... Most realworld data is heterogeneous and richly interconnected. Examples include the Web, hypertext, bibliometric data and social networks. In contrast, most statistical learning methods work with "flat" data representations, forcing us to convert our data into a form that loses much of ..."
Abstract

Cited by 110 (13 self)
 Add to MetaCart
Most realworld data is heterogeneous and richly interconnected. Examples include the Web, hypertext, bibliometric data and social networks. In contrast, most statistical learning methods work with "flat" data representations, forcing us to convert our data into a form that loses much of the link structure. The recently introduced framework of probabilistic relational models (PRMs) embraces the objectrelational nature of structured data by capturing probabilistic interactions between attributes of related entities. In this paper, we extend this framework by modeling interactions between the attributes and the link structure itself. An advantage of our approach is a unified generarive model for both content and relational structure. We propose two mechanisms for representing a probabilistic distribution over link structures: reference uncertainty and existence uncertainty. We describe the appropriate conditions for using each model and present learning algorithms for each. We present experimental results showing that the learned models can be used to predict link structure and, moreover, the observed link structure can be used to provide better predictions for the attributes in the model.
Statistical relational learning for link prediction
 In Proceedings of the Workshop on Learning Statistical Models from Relational Data at IJCAI2003
, 2003
"... Link prediction is a complex, inherently relational, task. Be it in the domain of scientific citations, social networks or hypertext links, the underlying data are extremely noisy and the characteristics useful for prediction are not readily available in a “flat ” file format, but rather involve com ..."
Abstract

Cited by 68 (5 self)
 Add to MetaCart
Link prediction is a complex, inherently relational, task. Be it in the domain of scientific citations, social networks or hypertext links, the underlying data are extremely noisy and the characteristics useful for prediction are not readily available in a “flat ” file format, but rather involve complex relationships among objects. In this paper, we propose the application of our methodology for Statistical Relational Learning to building link prediction models. We propose an integrated approach to building regression models from data stored in relational databases in which potential predictors are generated by structured search of the space of queries to the database, and then tested for inclusion in a logistic regression. We present experimental results for the task of predicting citations made in scientific literature using relational data taken from CiteSeer. This data includes the citation graph, authorship and publication venues of papers, as well as their word content. 1
CrossMine: Efficient Classification Across Multiple Database Relations
 In Proc. 2004 Int. Conf. on Data Engineering (ICDE’04), Boston,MA
, 2004
"... Most of today's structured data is stored in relational databases. Such a database consists of multiple relations which are linked together conceptually via entityrelationship links in the design of relational database schemas. Multirelational classification can be widely used in many discipl ..."
Abstract

Cited by 54 (12 self)
 Add to MetaCart
(Show Context)
Most of today's structured data is stored in relational databases. Such a database consists of multiple relations which are linked together conceptually via entityrelationship links in the design of relational database schemas. Multirelational classification can be widely used in many disciplines, such as financial decision making, medical research, and geographical applications. However, most classification approaches only work on single "flat" data relations. It is usually difficult to convert multiple relations into a single flat relation without either introducing huge, undesirable "universal relation" or losing essential information. Previous works using Inductive Logic Programming approaches (recently also known as Relational Mining) have proven effective with high accuracy in multirelational classification. Unfortunately, they suffer from poor scalability w.r.t. the number of relations and the number of attributes in databases.
Learning Statistical Models from Relational Data
, 2001
"... This workshop is the second in a series of workshops held in conjunction with AAAI and IJCAI. The first workshop was held in July, 2000 at AAAI. Notes from that workshop are available at ..."
Abstract

Cited by 41 (6 self)
 Add to MetaCart
(Show Context)
This workshop is the second in a series of workshops held in conjunction with AAAI and IJCAI. The first workshop was held in July, 2000 at AAAI. Notes from that workshop are available at
Link Mining: A New Data Mining Challenge
 SIGKDD Explorations
, 2003
"... A key challenge for data mining is tackling the problem of mining richly structured datasets, where the objects are linked in some way. Links among the objects may demonstrate certain patterns, which can be helpful for many data mining tasks and are usually hard to capture with traditional statistic ..."
Abstract

Cited by 39 (0 self)
 Add to MetaCart
(Show Context)
A key challenge for data mining is tackling the problem of mining richly structured datasets, where the objects are linked in some way. Links among the objects may demonstrate certain patterns, which can be helpful for many data mining tasks and are usually hard to capture with traditional statistical models. Recently there has been a surge of interest in this area, fueled largely by interest in web and hypertext mining, but also by interest in mining social networks, security and law enforcement data, bibliographic citations and epidemiological records. 1.
Statistical Relational Learning for Document Mining
, 2003
"... A major obstacle to fully integrated deployment of statistical learners is the assumption that data sits in a single table, even though most realworld databases have complex relational structures. In this paper, we introduce an integrated approach to building regression models from data stored ..."
Abstract

Cited by 39 (5 self)
 Add to MetaCart
(Show Context)
A major obstacle to fully integrated deployment of statistical learners is the assumption that data sits in a single table, even though most realworld databases have complex relational structures. In this paper, we introduce an integrated approach to building regression models from data stored in relational databases. Potential features are generated by structured search of the space of queries to the database, and then tested for inclusion in a logistic regression. We present experimental results for the task of predicting where scientific papers will be published based on relational data taken from CiteSeer. This data includes word counts in the document, frequently cited authors or papers, cocitations, publication venues of cited papers, word cooccurrences, and word counts in cited or citing documents. Our approach results in classification accuracies superior to those achieved when using classical "flat" features. Our classification task also serves as a "where to publish?" conference/journal recommendation task.
Distributionbased aggregation for relational learning with identifier attributes
 Machine Learning
, 2004
"... Feature construction through aggregation plays an essential role in modeling relational domains with onetomany relationships between tables. Onetomany relationships lead to bags (multisets) of related entities, from which predictive information must be captured. This paper focuses on aggregation ..."
Abstract

Cited by 35 (10 self)
 Add to MetaCart
(Show Context)
Feature construction through aggregation plays an essential role in modeling relational domains with onetomany relationships between tables. Onetomany relationships lead to bags (multisets) of related entities, from which predictive information must be captured. This paper focuses on aggregation from categorical attributes that can take many values (e.g., object identifiers). We present a novel aggregation method as part of a relational learning system ACORA, that combines the use of vector distance and metadata about the classconditional distributions of attribute values. We provide a theoretical foundation for this approach deriving a “relational fixedeffect ” model within a Bayesian framework, and discuss the implications of identifier aggregation on the expressive power of the induced model. One advantage of using identifier attributes is the circumvention of limitations caused either by missing/unobserved object properties or by independence assumptions. Finally, we show empirically that the novel aggregators can generalize in the presence of identifier (and other highdimensional) attributes, and also explore the limitations of the applicability of the methods. 1
Structural Logistic Regression for Link Analysis
, 2003
"... We present Structural Logistic Regression, an extension of logistic regression to modeling relational data. It is an integrated approach to building regression models from data stored in relational databases in which potential predictors, both boolean and realvalued, are generated by structured ..."
Abstract

Cited by 27 (5 self)
 Add to MetaCart
We present Structural Logistic Regression, an extension of logistic regression to modeling relational data. It is an integrated approach to building regression models from data stored in relational databases in which potential predictors, both boolean and realvalued, are generated by structured search in the space of queries to the database, and then tested with statistical information criteria for inclusion in a logistic regression. Using statistics and relational representation allows modeling in noisy domains with complex structure. Link prediction is a task of high interest with exactly such characteristics. Be it in the domain of scientific citations, social networks or hypertext, the underlying data are extremely noisy and the features useful for prediction are not readily available in a "flat" file format. We propose the application of Structural Logistic Regression to building link prediction models, and present experimental results for the task of predicting citations made in scientific literature using relational data taken from the CiteSeer search engine. This data includes the citation graph, authorship and publication venues of papers, as well as their word content.
Avoiding bias when aggregating relational data with degree disparity
 In Proceedings of the 20th International Conference on Machine Learning
, 2003
"... A common characteristic of relational data sets —degree disparity—can lead relational learning algorithms to discover misleading correlations. Degree disparity occurs when the frequency of a relation is correlated with the values of the target variable. In such cases, aggregation functions used by m ..."
Abstract

Cited by 25 (15 self)
 Add to MetaCart
A common characteristic of relational data sets —degree disparity—can lead relational learning algorithms to discover misleading correlations. Degree disparity occurs when the frequency of a relation is correlated with the values of the target variable. In such cases, aggregation functions used by many relational learning algorithms will result in misleading correlations and added complexity in models. We examine this problem through a combination of simulations and experiments. We show how two novel hypothesis testing procedures can adjust for the effects of using aggregation functions in the presence of degree disparity. 1.
Pseudolikelihood EM for WithinNetwork Relational Learning
"... In this work, we study the problem of withinnetwork relational learning and inference, where models are learned on a partially labeled relational dataset and then are applied to predict the classes of unlabeled instance in the same graph. We categorized recent work in statistical relational learnin ..."
Abstract

Cited by 9 (2 self)
 Add to MetaCart
(Show Context)
In this work, we study the problem of withinnetwork relational learning and inference, where models are learned on a partially labeled relational dataset and then are applied to predict the classes of unlabeled instance in the same graph. We categorized recent work in statistical relational learning into three alternative approaches for this setting: disjoint learning with disjoint inference, disjoint learning with collective inference, and collective learning with collective inference. Models from each of these categories has been employed previously in different settings, but to our knowledge there has been no systematic comparison of models from all three categories. In this paper, we develop a novel pseudolikelihood EM method that facilitates more general collective learning and collective inference on partially labeled relational networks. We then compare this method to competing methods from the other categories on both synthetic and realworld data. We show that there is a region of performance, when there is a moderate number of labeled examples, where the pseudolikelihood EM approach achieves significantly higher accuracy. 1