MetaCart Sign in to MyCiteSeerX

Include Citations | Advanced Search | Help

Disambiguated Search | Include Citations | Advanced Search | Help

Statistical Relational Learning for Document Mining (2003) [28 citations — 5 self]

by Alexandrin Popescul ,  Lyle H. Ungar ,  Steve Lawrence ,  David Pennock
Add To MetaCart

Abstract:

A major obstacle to fully integrated deployment of many data mining algorithms is the assumption that data sits in a single table, even though most real-world databases have complex relational structures. We propose an integrated approach to statistical modeling from relational databases. We structure the search space based on "refinement graphs", which are widely used in inductive logic programming for learning logic descriptions. The use of statistics allows us to extend the search space to include richer set of features, including many which are not boolean. Search and model selection are integrated into a single process, allowing information criteria native to the statistical model, for example logistic regression, to make feature selection decisions in a step-wise manner. We present experimental results for the task of predicting where scientific papers will be published based on relational data taken from CiteSeer. Our approach results in classification accuracies superior to those achieved when using classical "flat" features. The resulting classifier can be used to recommend where to publish articles.

Citations

866 Inductive Logic Programming – Muggleton - 1991
848 Conditional random fields: Probabilistic models for segmenting and labeling sequence data – Lafferty, McCallum, et al. - 2001
506 Information Theory and an Extension of the Maximum Likelihood Principle – Akaike - 1973
415 Algorithmic Program Debugging – Shapiro - 1982
382 A note on inductive generalization – Plotkin - 1970
306 Learning probabilistic relational models – Friedman, Getoor, et al. - 1999
288 Applied Logistic Regression – Hosmer, Lemeshow - 1989
261 Logic programming and Databases – Ceri, Gottlob, et al. - 1990
254 Enhanced hypertext categorization using hyperlinks – Chakrabarti, Dom, et al. - 1998
211 Digital Libraries and Autonomous Citation Indexing – Lawrence, Giles, et al. - 1999
189 Estimating the dimension of a model – Schwartz - 1978
164 Discriminative probabilistic models for relational data – Taskar, Abbeel, et al. - 2002
107 The missing link - a probabilistic model of document content and hypertext connectivity – Cohn, Hofmann - 2001
102 Shih: A Comparison of Prediction Accuracy, Complexity, and Training Time of Thirty-Three Old and New Classification Algorithms. Machine Learning 40(3 – Lim, Loh, et al. - 2000
95 Discovery of frequent datalog patterns – Dehaspe, Toivonen - 1999
68 Propositionalization Approaches to Relational Data Mining – Kramer, Lavrač, et al. - 2001
67 Raedt. Relational reinforcement learning – Dzeroski, de - 1998
66 Relational Data Mining – Dzeroski, Lavrac - 2001
63 Iterative classification in relational data – Neville, Jensen - 2000
60 Relational Markov models and their application to adaptive Web navigation – Anderson, Domingos, et al.
58 Using Web Structure for Classifying and Describing Web Pages. WWW2002 – Glover, Lawrence, et al. - 2002
55 Feature construction with inductive logic programming: a study of quantitative predictions of biological activity aided by structural attributes – Sriniviasan, King - 1999
53 Linkage and autocorrelation cause feature selection bias in relational learning – Jensen, Neville - 2002
50 Induction of logic programs: FOIL and related systems – Quinlan, Cameron-Jones - 1995
48 Towards combining inductive logic programming with Bayesian networks – Kersting, Raedt
47 Exploiting structural information for text classi cation on the WWW – Furnkranz - 1999
47 A statistical learning method for logic programs with distribution semantics – Sato - 1995
43 Relational learning with statistical predicate invention: Better models for hypertext – Craven, Slattery
39 Aggregation-based feature invention and relational concept classes – Perlich, Provost - 2003
33 Relational learning via propositional algorithms: An information extraction case study – Roth, &Yih - 2001
32 First order regression – Karalič, Bratko - 1997
31 An introduction to inductive logic programming – D˘zeroski, Lavra˘c - 2001
31 Statistical relational learning for link prediction – Popescul, Ungar - 2003
28 Maximum entropy modeling with clausal constraints – Dehaspe - 1997
24 Top-down induction of Logical Decision Trees – Blockeel - 1998
20 Propositionalisation and aggregates – Knobbe, Haas, et al. - 2001
18 Towards structural logistic regression: Combining relational and statistical learning – Popescul, Ungar, et al. - 2002
11 Random effects models for network data – Hoff - 2003
9 Inducing classification and regression trees in first order logic – Kramer, Widmer - 2001
8 A proposal for learning by ontological leaps – Foster, Ungar - 2002
8 Statistical models for relational data – Getoor, Koller, et al. - 2002
7 Cumulativity as inductive bias – Blockeel, Dehaspe - 2000
1 Top-down induction of logical decision trees – Raedt - 1998
1 How to upgrade propositional learners to first order logic: A case study – Laer, Raedt - 2001