Results 1 - 10
of
25
Learning Probabilistic Models of Link Structure
- Journal of Machine Learning Research
, 2002
"... Most real-world data is heterogeneous and richly interconnected. Examples include the Web, hypertext, bibliometric data and social networks. In contrast, most statistical learning methods work with "flat" data representations, forcing us to convert our data into a form that loses much of the link ..."
Abstract
-
Cited by 89 (11 self)
- Add to MetaCart
Most real-world data is heterogeneous and richly interconnected. Examples include the Web, hypertext, bibliometric data and social networks. In contrast, most statistical learning methods work with "flat" data representations, forcing us to convert our data into a form that loses much of the link structure. The recently introduced framework of probabilistic relational models (PRMs) embraces the object-relational nature of structured data by capturing probabilistic interactions between attributes of related entities. In this paper, we extend this framework by modeling interactions between the attributes and the link structure itself. An advantage of our approach is a unified generarive model for both content and relational structure. We propose two mechanisms for representing a probabilistic distribution over link structures: reference uncertainty and existence uncertainty. We describe the appropriate conditions for using each model and present learning algorithms for each. We present experimental results showing that the learned models can be used to predict link structure and, moreover, the observed link structure can be used to provide better predictions for the attributes in the model.
Using Probabilistic Models for Data Management in Acquisitional Environments
, 2005
"... Traditional database systems, particularly those focused on capturing and managing data from the real world, are poorly equipped to deal with the noise, loss, and uncertainty in data. We discuss a suite of techniques based on probabilistic models that are designed to allow database to tolerate noise ..."
Abstract
-
Cited by 35 (3 self)
- Add to MetaCart
Traditional database systems, particularly those focused on capturing and managing data from the real world, are poorly equipped to deal with the noise, loss, and uncertainty in data. We discuss a suite of techniques based on probabilistic models that are designed to allow database to tolerate noise and loss. These techniques are based on exploiting correlations to predict missing values and identify outliers. Interestingly, correlations also provide a way to give approximate answers to users at a significantly lower cost and enable a range of new types of queries over the correlation structure itself. We illustrate a host of applications for our new techniques and queries, ranging from sensor networks to network monitoring to data stream management. We also present a unified architecture for integrating such models into database systems, focusing in particular on acquisitional systems where the cost of capturing data (e.g., from sensors) is itself a significant part of the query processing cost.
Probabilistic Logic Learning
- ACM-SIGKDD Explorations: Special issue on Multi-Relational Data Mining
, 2004
"... The past few years have witnessed an significant interest in probabilistic logic learning, i.e. in research lying at the intersection of probabilistic reasoning, logical representations, and machine learning. A rich variety of di#erent formalisms and learning techniques have been developed. This pap ..."
Abstract
-
Cited by 31 (8 self)
- Add to MetaCart
The past few years have witnessed an significant interest in probabilistic logic learning, i.e. in research lying at the intersection of probabilistic reasoning, logical representations, and machine learning. A rich variety of di#erent formalisms and learning techniques have been developed. This paper provides an introductory survey and overview of the stateof -the-art in probabilistic logic learning through the identification of a number of important probabilistic, logical and learning concepts.
Learning probabilistic relational planning rules
- PROCEEDINGS OF THE FOURTEENTH INTERNATIONAL CONFERENCE ON AUTOMATED PLANNING AND SCHEDULING
, 2004
"... To learn to behave in highly complex domains, agents must represent and learn compact models of the world dynamics. In this paper, we present an algorithm for learning probabilistic STRIPS-like planning operators from examples. We demonstrate the effective learning of rule-based operators for a wide ..."
Abstract
-
Cited by 30 (4 self)
- Add to MetaCart
To learn to behave in highly complex domains, agents must represent and learn compact models of the world dynamics. In this paper, we present an algorithm for learning probabilistic STRIPS-like planning operators from examples. We demonstrate the effective learning of rule-based operators for a wide range of traditional planning domains.
Learning symbolic models of stochastic domains
- JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH
, 2005
"... In this article, we work towards the goal of developing agents that can learn to act in complex worlds. We develop a a new probabilistic planning rule representation to compactly model model noisy, nondeterministic action effects and show how these rules can be effectively learned. Through experimen ..."
Abstract
-
Cited by 26 (1 self)
- Add to MetaCart
In this article, we work towards the goal of developing agents that can learn to act in complex worlds. We develop a a new probabilistic planning rule representation to compactly model model noisy, nondeterministic action effects and show how these rules can be effectively learned. Through experiments in simple planning domains and a 3D simulated blocks world with realistic physics, we demonstrate that this learning algorithm allows agents to effectively model world dynamics.
Multi-relational data mining using probabilistic relational models: research summary
- In Proceedings of the First Workshop in Multi-relational Data Mining
, 2001
"... Abstract. We are often faced with the challenge of mining data represented in relational form. Unfortunately, most statistical learning methods work only with “flat ” data representations. Thus, to apply these methods, we are forced to convert the data into a flat form, thereby not only losing its c ..."
Abstract
-
Cited by 14 (0 self)
- Add to MetaCart
Abstract. We are often faced with the challenge of mining data represented in relational form. Unfortunately, most statistical learning methods work only with “flat ” data representations. Thus, to apply these methods, we are forced to convert the data into a flat form, thereby not only losing its compact representation and structure but also potentially introducing statistical skew. These drawbacks severely limit the ability of current statistical methods to mine relational databases. Probabilistic models, in particular probabilistic relational models, allow us to represent a statistical model over a relational domain. These models can represent correlations between attributes within a single table, and between attributes in multiple tables, when these tables are related via foreign key joins. In previous work [4, 6, 8], we have developed algorithms for automatically constructing a probabilistic relational model directly from a relational database. We survey the results here and describe how the methods can be used to discover interesting dependencies the data. We show how this class of models and our construction algorithm are ideally suited to mining multi-relational data. 1
The relational vector-space model and industry classification
- IJCAI 2003 Workshop on Learning Statistical Models from Relational Data (SRL-2003
, 2003
"... This paper addresses the classification of linked entities. We introduce a relational vector-space (VS) model (in analogy to the VS model used in information retrieval) that abstracts the linked structure, representing entities by vectors of weights. Given labeled data as background knowledge/traini ..."
Abstract
-
Cited by 11 (4 self)
- Add to MetaCart
This paper addresses the classification of linked entities. We introduce a relational vector-space (VS) model (in analogy to the VS model used in information retrieval) that abstracts the linked structure, representing entities by vectors of weights. Given labeled data as background knowledge/training data, classification procedures can be defined for this model, including a straightforward, “direct ” model using weighted adjacency vectors. Using a large set of tasks from the domain of company affiliation identification, we demonstrate that such classification procedures can be effective. We then examine the method in more detail, showing that as expected the classification performance correlates with the relational autocorrelation of the data set. We then turn the tables and use the relational VS scores as a way to analyze/visualize the relational autocorrelation present in a complex linked structure. The main contribution of the paper is to introduce the relational VS model as a potentially useful addition to the toolkit for relational data mining. It could provide useful constructed features for domains with low to moderate relational autocorrelation; it may be effective by itself for domains with high levels of relational autocorrelation, and it provides a useful abstraction for analyzing the properties of linked data. Keywords relational data mining, vector-space models, industry classification, homophily, relational autocorrelation, relational-neighbor classifier 1.
Mr-SBC: a Multi-Relational Naive Bayes Classifier
- Todorovski & H. Blockeel (Eds.), Knowledge Discovery in Databases PKDD 2003, Lecture Notes in Artificial Intelligence
, 2003
"... Abstract. In this paper we propose an extension of the naïve Bayes classification method to the multi-relational setting. In this setting, training data are stored in several tables related by foreign key constraints and each example is represented by a set of related tuples rather than a single row ..."
Abstract
-
Cited by 10 (4 self)
- Add to MetaCart
Abstract. In this paper we propose an extension of the naïve Bayes classification method to the multi-relational setting. In this setting, training data are stored in several tables related by foreign key constraints and each example is represented by a set of related tuples rather than a single row as in the classical data mining setting. This work is characterized by three aspects. First, an integrated approach in the computation of the posterior probabilities for each class that make use of first order classification rules. Second, the applicability to both discrete and continuous attributes by means a supervised discretization. Third, the consideration of knowledge on the data model embedded in the database schema during the generation of classification rules. The proposed method has been implemented in the new system Mr-SBC, which is tightly integrated with a relational DBMS. Testing has been performed on two datasets and four benchmark tasks. Results on predictive accuracy and efficiency are in favour of Mr-SBC for the most complex tasks. 1
Information awareness: A prospective technical assessment
- In Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
, 2003
"... Recent proposals to apply data mining systems to problems in law enforcement, national security, and fraud detection have attracted both media attention and technical critiques of their expected accuracy and impact on privacy. Unfortunately, the majority of technical critiques have been based on sim ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
Recent proposals to apply data mining systems to problems in law enforcement, national security, and fraud detection have attracted both media attention and technical critiques of their expected accuracy and impact on privacy. Unfortunately, the majority of technical critiques have been based on simplistic assumptions about data, classifiers, inference procedures, and the overall architecture of such systems. We consider these critiques in detail, and we construct a simulation model that more closely matches realistic systems. We show how both the accuracy and privacy impact of a hypothetical system could be substantially improved, and we discuss the necessary and sufficient conditions for this improvement to be achieved. This analysis is neither a defense nor a critique of any particular system concept. Rather, our model suggests alternative technical designs that could mitigate some concerns, but also raises more specific conditions that must be met for such systems to be both accurate and socially desirable.
A Visual Query Language for Relational Knowledge Discovery
, 2001
"... QGRAPH is a visual query language for knowledge discovery in relational data. Using QGRAPH, a user can query and update relational data in ways that support data exploration, data transformation, and sampling. When combined with modeling algorithms, such as those developed in inductive logic prog ..."
Abstract
-
Cited by 6 (3 self)
- Add to MetaCart
QGRAPH is a visual query language for knowledge discovery in relational data. Using QGRAPH, a user can query and update relational data in ways that support data exploration, data transformation, and sampling. When combined with modeling algorithms, such as those developed in inductive logic programming and relational learning, the language assists analysis of relational data, such as data drawn from the Web, chemical structure-activity relationships, and social networks. Several features distinguish QGRAPH from other query languages such as SQL and Datalog. It is a visual language, so its queries are annotated graphs that reflect potential structures within a database. QGRAPH treats objects, links, and attributes as first-class entities, so its queries can dynamically alter a data schema by adding and deleting those entities. Finally, the language provides grouping and counting constructs that facilitate calculation of attributes that can capture features of local graph structure. We describe the language in detail, discuss key aspects of the underlying data model and implementation, and discuss several uses of QGRAPH for knowledge discovery.

