Results 11 - 20
of
75
An exploration of entity models, collective classification and relation description
- In Proceedings of KDD Workshop on Link Analysis and Group Detection
, 2004
"... Traditional information retrieval typically represents data using a bag of words; data mining typically uses a highly structured database representation. This paper explores the middle ground using a representation which we term entity models, in which questions about structured data may be posed an ..."
Abstract
-
Cited by 13 (1 self)
- Add to MetaCart
Traditional information retrieval typically represents data using a bag of words; data mining typically uses a highly structured database representation. This paper explores the middle ground using a representation which we term entity models, in which questions about structured data may be posed and answered, but the complexities and task-specific restrictions of ontologies are avoided. An entity model is a language model or word distribution associated with an entity, such as a person, place or organization. Using these perentity language models, entities may be clustered, links may be detected or described with a short summary, entities may be collectively classified, and question answering may be performed. On a corpus of entities extracted from newswire and the Web, we group entities by profession with 90 % accuracy, improve accuracy further on the task of classifying politicians as liberal or conservative using collective classification and conditional random fields, and answer questions about “who a person is ” with mean reciprocal rank (MRR) of 0.52. 1.
Online learning and exploiting relational models in reinforcement learning
- In M. Veloso (Ed.), Proceedings of the 20th International Joint Conference on Artificial Intelligence (p
, 2007
"... In recent years, there has been a growing interest in using rich representations such as relational languages for reinforcement learning. However, while expressive languages have many advantages in terms of generalization and reasoning, extending existing approaches to such a relational setting is a ..."
Abstract
-
Cited by 13 (2 self)
- Add to MetaCart
In recent years, there has been a growing interest in using rich representations such as relational languages for reinforcement learning. However, while expressive languages have many advantages in terms of generalization and reasoning, extending existing approaches to such a relational setting is a non-trivial problem. In this paper, we present a first step towards the online learning and exploitation of relational models. We propose a representation for the transition and reward function that can be learned online and present a method that exploits these models by augmenting Relational Reinforcement Learning algorithms with planning techniques. The benefits and robustness of our approach are evaluated experimentally. 1
Temporal-relational classifiers for prediction in evolving domains
- In Proceedings of the IEEE International Conference on Data Mining
, 2008
"... Many relational domains contain temporal information and dynamics that are important to model (e.g., social networks, protein networks). However, past work in relational learning has focused primarily on modeling static “snapshots” of the data and has largely ignored the temporal dimension of these ..."
Abstract
-
Cited by 12 (3 self)
- Add to MetaCart
Many relational domains contain temporal information and dynamics that are important to model (e.g., social networks, protein networks). However, past work in relational learning has focused primarily on modeling static “snapshots” of the data and has largely ignored the temporal dimension of these data. In this work, we extend relational techniques to temporally-evolving domains and outline a representational framework that is capable of modeling both temporal and relational dependencies in the data. We develop efficient learning and inference techniques within the framework by considering a restricted set of temporalrelational dependencies and using parameter-tying methods to generalize across relationships and entities. More specifically, we model dynamic relational data with a twophase process, first summarizing the temporal-relational information with kernel smoothing, and then moderating attribute dependencies with the summarized relational information. We develop a number of novel temporal-relational models using the framework and then show that the current approaches to modeling static relational data are special cases within the framework. We compare the new models to the competing static relational methods on three real-world datasets and show that the temporal-relational models consistently outperform the relational models that ignore temporal information—achieving significant reductions in error ranging from 15 % to 70%. 1
Identifying Predictive Structures in Relational Data Using Multiple Instance Learning
- In International Conference on Machine Learning. AAAI Press, Menlo
, 2003
"... This paper introduces an approach for identifying predictive structures in relational data using the multiple-instance framework. By a predictive structure, we mean a structure that can explain a given labeling of the data and can predict labels of unseen data. Multiple-instance learning has p ..."
Abstract
-
Cited by 12 (4 self)
- Add to MetaCart
This paper introduces an approach for identifying predictive structures in relational data using the multiple-instance framework. By a predictive structure, we mean a structure that can explain a given labeling of the data and can predict labels of unseen data. Multiple-instance learning has previously only been applied to flat, or propositional, data and we present a modification to the framework that allows multiple-instance techniques to be used on relational data. We present experimental results using a relational modification of the diverse density method (Maron, 1998; Maron & Lozano-P erez, 1998) and of a method based on the chi-squared statistic (McGovern & Jensen, 2003). We demonstrate that multipleinstance learning can be used to identify predictive structures on both a small illustrative data set and the Internet Movie Database. We compare the classification results to a k-nearest neighbor approach.
View learning for statistical relational learning: With an application to mammography
- Proceeding of the 19th International Joint Conference on Artificial Intelligence
, 2005
"... Statistical relational learning (SRL) constructs probabilistic models from relational databases. A key capability of SRL is the learning of arcs (in the Bayes net sense) connecting entries in different rows of a relational table, or in different tables. Nevertheless, SRL approaches currently are con ..."
Abstract
-
Cited by 11 (6 self)
- Add to MetaCart
Statistical relational learning (SRL) constructs probabilistic models from relational databases. A key capability of SRL is the learning of arcs (in the Bayes net sense) connecting entries in different rows of a relational table, or in different tables. Nevertheless, SRL approaches currently are constrained to use the existing database schema. For many database applications, users find it profitable to define alternative “views ” of the database, in effect defining new fields or tables. Such new fields or tables can also be highly useful in learning. We provide SRL with the capability of learning new views. 1
Schemas and Models
- IN PROCEEDINGS OF THE SIGKDD-2002 WORKSHOP ON MULTI-RELATIONAL LEARNING
, 2002
"... We propose the Schema-Model Framework, which characterizes algorithms that learn probabilistic models from relational data as having two parts: a schema that identifies sets of related data items and groups them into relevant categories; and a model that allows probabilistic inference about those ..."
Abstract
-
Cited by 10 (1 self)
- Add to MetaCart
We propose the Schema-Model Framework, which characterizes algorithms that learn probabilistic models from relational data as having two parts: a schema that identifies sets of related data items and groups them into relevant categories; and a model that allows probabilistic inference about those data items. The framework
First-order probabilistic languages: Into the unknown
- PROCEEDINGS OF THE 16TH INTERNATIONAL CONFERENCE ON INDUCTIVE LOGIC PROGRAMMING. (2007
, 2007
"... This paper surveys first-order probabilistic languages (FOPLs), which combine the expressive power of first-order logic with a probabilistic treatment of uncertainty. We provide a taxonomy that helps make sense of the profusion of FOPLs that have been proposed over the past fifteen years. We also e ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
This paper surveys first-order probabilistic languages (FOPLs), which combine the expressive power of first-order logic with a probabilistic treatment of uncertainty. We provide a taxonomy that helps make sense of the profusion of FOPLs that have been proposed over the past fifteen years. We also emphasize the importance of representing uncertainty not just about the attributes and relations of a fixed set of objects, but also about what objects exist. This leads us to Bayesian logic, or BLOG, a new language for defining probabilistic models with unknown objects. We give a brief overview of BLOG syntax and semantics, and emphasize some of the design decisions that distinguish it from other languages. Finally, we consider the challenge of constructing FOPL models automatically from data.
A Comparison of Approaches for Learning Probability Trees
- In Proceedings of 16th European Conference on Machine Learning
, 2005
"... Probability trees (or Probability Estimation Trees, PET's) are decision trees with probability distributions in the leaves. Several alternative approaches for learning probability trees have been proposed but no thorough comparison of these approaches exists. ..."
Abstract
-
Cited by 9 (7 self)
- Add to MetaCart
Probability trees (or Probability Estimation Trees, PET's) are decision trees with probability distributions in the leaves. Several alternative approaches for learning probability trees have been proposed but no thorough comparison of these approaches exists.
Detecting outliers using transduction and statistical testing
- In Proceedings of the 12th Annual SIGKDD International Conference on Knowledge Discovery and Data Mining
, 2006
"... Outlier detection can uncover malicious behavior in fields like intrusion detection and fraud analysis. Although there has been a significant amount of work in outlier detection, most of the algorithms proposed in the literature are based on a particular definition of outliers (e.g., density-based), ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
Outlier detection can uncover malicious behavior in fields like intrusion detection and fraud analysis. Although there has been a significant amount of work in outlier detection, most of the algorithms proposed in the literature are based on a particular definition of outliers (e.g., density-based), and use ad-hoc thresholds to detect them. In this paper we present a novel technique to detect outliers with respect to an existing clustering model. However, the test can also be successfully utilized to recognize outliers when the clustering information is not available. Our method is based on Transductive Confidence Machines, which have been previously proposed as a mechanism to provide individual confidence measures on classification decisions. The test uses hypothesis testing to prove or disprove whether a point is fit to be in each of the clusters of the model. We experimentally demonstrate that the test is highly robust, and produces very few misdiagnosed points, even when no clustering information is available. Furthermore, our experiments demonstrate the robustness of our method under the circumstances of data contaminated by outliers. We finally show that our technique can be successfully applied to identify outliers in a noisy data set for which no information is available (e.g., ground truth, clustering structure, etc.). As such our proposed methodology is capable of bootstrapping from a noisy data set a clean one that can be used to identify future outliers.
STATISTICAL MODELS AND ANALYSIS TECHNIQUES FOR LEARNING IN RELATIONAL DATA
, 2006
"... Many data sets routinely captured by organizations are relational in nature - from marketing and sales transactions, to scientific observations and medical records. Relational data record characteristics of heterogeneous objects and persistent relationships
among those objects (e.g., citation graphs ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
Many data sets routinely captured by organizations are relational in nature - from marketing and sales transactions, to scientific observations and medical records. Relational data record characteristics of heterogeneous objects and persistent relationships
among those objects (e.g., citation graphs, the World Wide Web, genomic structures). These data offer unique opportunities to improve model accuracy, and
thereby decision-making, if machine learning techniques can effectively exploit the relational information.
This work focuses on how to learn accurate statistical models of complex, relational data sets and develops two novel probabilistic models to represent, learn, and reason
about statistical dependencies in these data. Relational dependency networks are the first relational model capable of learning general autocorrelation dependencies, an important class of statistical dependencies that are ubiquitous in relational data. Latent group models are the first relational model to generalize about the properties of underlying group structures to improve inference accuracy and efficiency. Not only do these two models offer performance gains over current relational models, but they also offer efficiency gains which will make relational modeling feasible for large, relational datasets where current methods are computationally intensive, if not intractable.
We also formulate of a novel analysis framework to analyze relational model performance and ascribe errors to model learning and inference procedures. Within this
framework, we explore the effects of data characteristics and representation choices on inference accuracy and investigate the mechanisms behind model performance. In
particular, we show that the inference process in relational models can be a significant source of error and that relative model performance varies significantly across
different types of relational data.

