Abstract:
Link prediction is a complex, inherently relational, task. Be it in the domain of scientific citations, social networks or hypertext links, the underlying data are extremely noisy and the characteristics useful for prediction are not readily available in a "flat" file format, but rather involve complex relationships among objects. In this paper, we propose the application of our methodology for Statistical Relational Learning to building link prediction models. We propose an integrated approach to building regression models from data stored in relational databases in which potential predictors are generated by structured search of the space of queries to the database, and then tested for inclusion in a logistic regression. We present experimental results for the task of predicting citations made in scientific literature using relational data taken from CiteSeer. This data includes the citation graph, authorship and publication venues of papers, as well as their word content.
Citations
|
1669
|
Authoritative sources in a hyperlinked environment
– Kleinberg
- 1999
|
|
1064
|
The PageRank Citation Ranking: Bringing Order to the Web
– Page, Brin, et al.
- 1999
|
|
866
|
Inductive Logic Programming
– Muggleton
- 1991
|
|
848
|
Conditional random fields: Probabilistic models for segmenting and labeling sequence data
– Lafferty, McCallum, et al.
- 2001
|
|
415
|
Algorithmic Program Debugging
– Shapiro
- 1982
|
|
382
|
A note on inductive generalization
– Plotkin
- 1970
|
|
306
|
Learning probabilistic relational models
– Friedman, Getoor, et al.
- 1999
|
|
288
|
Applied Logistic Regression
– Hosmer, Lemeshow
- 1989
|
|
211
|
Digital Libraries and Autonomous Citation Indexing
– Lawrence, Giles, et al.
- 1999
|
|
189
|
Estimating the dimension of a model
– Schwartz
- 1978
|
|
164
|
Discriminative probabilistic models for relational data
– Taskar, Abbeel, et al.
- 2002
|
|
107
|
The missing link - a probabilistic model of document content and hypertext connectivity
– Cohn, Hofmann
- 2001
|
|
95
|
Discovery of frequent datalog patterns
– Dehaspe, Toivonen
- 1999
|
|
68
|
Propositionalization Approaches to Relational Data Mining
– Kramer, Lavrač, et al.
- 2001
|
|
67
|
Methods and metrics for cold-start recommendations
– Schein, Popescul, et al.
- 2002
|
|
63
|
Iterative classification in relational data
– Neville, Jensen
- 2000
|
|
60
|
Relational Markov models and their application to adaptive Web navigation
– Anderson, Domingos, et al.
|
|
55
|
Feature construction with inductive logic programming: a study of quantitative predictions of biological activity aided by structural attributes
– Sriniviasan, King
- 1999
|
|
53
|
Linkage and autocorrelation cause feature selection bias in relational learning
– Jensen, Neville
- 2002
|
|
50
|
Induction of logic programs: FOIL and related systems
– Quinlan, Cameron-Jones
- 1995
|
|
43
|
Relational learning with statistical predicate invention: Better models for hypertext
– Craven, Slattery
|
|
28
|
Statistical Relational Learning for Document Mining
– Popescul, Ungar, et al.
- 2003
|
|
18
|
Towards structural logistic regression: Combining relational and statistical learning
– Popescul, Ungar, et al.
- 2002
|
|
12
|
Blockeel and Luc De Raedt. Top-down induction of logical decision trees
– Hendrik
- 1998
|
|
11
|
Random effects models for network data
– Hoff
- 2003
|
|
8
|
A proposal for learning by ontological leaps
– Foster, Ungar
- 2002
|
|
8
|
Statistical models for relational data
– Getoor, Koller, et al.
- 2002
|
|
7
|
Cumulativity as inductive bias
– Blockeel, Dehaspe
- 2000
|