Results 11  20
of
31
A Visual Query Language for Relational Knowledge Discovery
, 2001
"... QGRAPH is a visual query language for knowledge discovery in relational data. Using QGRAPH, a user can query and update relational data in ways that support data exploration, data transformation, and sampling. When combined with modeling algorithms, such as those developed in inductive logic prog ..."
Abstract

Cited by 10 (5 self)
 Add to MetaCart
QGRAPH is a visual query language for knowledge discovery in relational data. Using QGRAPH, a user can query and update relational data in ways that support data exploration, data transformation, and sampling. When combined with modeling algorithms, such as those developed in inductive logic programming and relational learning, the language assists analysis of relational data, such as data drawn from the Web, chemical structureactivity relationships, and social networks. Several features distinguish QGRAPH from other query languages such as SQL and Datalog. It is a visual language, so its queries are annotated graphs that reflect potential structures within a database. QGRAPH treats objects, links, and attributes as firstclass entities, so its queries can dynamically alter a data schema by adding and deleting those entities. Finally, the language provides grouping and counting constructs that facilitate calculation of attributes that can capture features of local graph structure. We describe the language in detail, discuss key aspects of the underlying data model and implementation, and discuss several uses of QGRAPH for knowledge discovery.
Hierarchical probabilistic relational models for collaborative filtering
 In Proc. Workshop on Statistical Relational Learning, 21st International Conference on Machine Learning
, 2004
"... This paper applies Probabilistic Relational Models (PRMs) to the Collaborative Filtering task, focussing on the EachMovie data set. We first learn a standard PRM, and show that its performance is competitive with the best known techniques. We then define hierarchical PRMs, which extend standard PRMs ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
This paper applies Probabilistic Relational Models (PRMs) to the Collaborative Filtering task, focussing on the EachMovie data set. We first learn a standard PRM, and show that its performance is competitive with the best known techniques. We then define hierarchical PRMs, which extend standard PRMs by dynamically refining classes into hierarchies. This represnetation is more expressive that standard PRMs, and allows greater context sensitivity. Finally, we show that hierarchical PRMs achieve stateoftheart results on this dataset. 1.
Reasoning with Recursive Loops under the PLP Framework
"... Recursive loops in a logic program present a challenging problem to the PLP (Probabilistic Logic Programming) framework. On the one hand, they loop forever so that the PLP backwardchaining inferences would never stop. On the other hand, they may generate cyclic influences, which are disallowed in B ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Recursive loops in a logic program present a challenging problem to the PLP (Probabilistic Logic Programming) framework. On the one hand, they loop forever so that the PLP backwardchaining inferences would never stop. On the other hand, they may generate cyclic influences, which are disallowed in Bayesian networks. Therefore, in existing PLP approaches logic programs with recursive loops are considered to be problematic and thus are excluded. In this paper, we propose a novel solution to this problem by making use of recursive loops to build a stationary dynamic Bayesian network. We introduce a new PLP formalism, called a Bayesian knowledge base. It allows recursive loops and contains logic clauses of the form A ← A1,..., Al, true, Context, T ypes, which naturally formulates the knowledge that the Ais have direct influences on A in the context Context under the type constraints Types. We use the wellfounded model of a logic program to define the direct influence relation and apply SLGresolution to compute the space of random variables together with their parental connections. This establishes a clear declarative semantics for a Bayesian knowledge base. We view a logic program with recursive loops as a special temporal model, where backwardchaining cycles of the form A ←...A ←... are interpreted as feedbacks. This extends existing PLP approaches, which mainly aim at (nontemporal) relational models.
SharingAware Horizontal Partitioning for Exploiting Correlations During Query Processing
"... Optimization of join queries based on average selectivities is suboptimal in highly correlated databases. In such databases, relations are naturally divided into partitions, each partition having substantially different statistical characteristics. It is very compelling to discover such data partiti ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
Optimization of join queries based on average selectivities is suboptimal in highly correlated databases. In such databases, relations are naturally divided into partitions, each partition having substantially different statistical characteristics. It is very compelling to discover such data partitions during query optimization and create multiple plans for a given query, one plan being optimal for a particular combination of data partitions. This scenario calls for the sharing of state among plans, so that common intermediate results are not recomputed. We study this problem in a setting with a routingbased query execution engine based on eddies [1]. Eddies naturally encapsulate horizontal partitioning and maximal state sharing across multiple plans. We define the notion of a conditional join plan, a novel representation of the search space that enables us to address the problem in a principled way. We present a lowoverhead greedy algorithm that uses statistical summaries based on graphical models. Experimental results suggest an order of magnitude faster execution time over traditional optimization for high correlations, while maintaining the same performance for low correlations. 1.
Multirelational data mining in Microsoft SQL Server 2005
"... Most real life data are relational by nature. Database mining integration is an essential goal to be achieved. Microsoft SQL Server (MSSQL) seems to provide an interesting and promising environment to develop aggregated multirelational data mining algorithms by using nested tables and the plugin a ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Most real life data are relational by nature. Database mining integration is an essential goal to be achieved. Microsoft SQL Server (MSSQL) seems to provide an interesting and promising environment to develop aggregated multirelational data mining algorithms by using nested tables and the plugin algorithm approach. However, it is currently unclear how these nested tables can best be used by data mining algorithms. In this paper we look at how the Microsoft Decision Trees (MSDT) handles multirelational data, and we compare it with the multirelational decision tree learner TILDE. In the experiments we perform, MSDT has equally good predictive accuracy as TILDE, but the trees it gives either ignore the relational information, or use it in a way that yields noninterpretable trees. As such, one could say that its explanatory power is reduced, when compared to a multirelational decision tree learner. We conclude that it may be worthwhile to integrate a multirelational decision tree learner in MSSQL.
Toward Optimal Ordering of Prediction Tasks
"... Many applications involve a set of prediction tasks that must be accomplished sequentially through user interaction. If the tasks are interdependent, the order in which they are performed may have a significant impact on the overall performance of the prediction systems. However, manual specificatio ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Many applications involve a set of prediction tasks that must be accomplished sequentially through user interaction. If the tasks are interdependent, the order in which they are performed may have a significant impact on the overall performance of the prediction systems. However, manual specification of an optimal order may be difficult when the interdependencies are complex, especially if the number of tasks is large, making exhaustive search intractable. This paper presents the first attempt at solving the optimal task ordering problem using an approximate formulation in terms of pairwise task order preferences, reducing the problem to the wellknown Linear Ordering Problem. We propose two approaches for inducing the pairwise task order preferences – 1) a classifieragnostic approach based on conditional entropy that determines the prediction tasks whose correct labels lead to the least uncertainty for the remaining predictions, and 2) a classifierdependent approach that empirically determines which tasks are favored before others for better predictive performance. We apply the proposed solutions to two practical applications that involve computerassisted trouble report generation and document annotation, respectively. In both applications, the user fills up a series of fields and at each step, the system is expected to provide useful suggestions, which comprise the prediction (i.e. classification and ranking) tasks. Our experiments show encouraging improvements in predictive performance, as compared to approaches that do not take task dependencies into account. 1
Modelling Relational Statistics With Bayes Nets
"... Abstract. Classlevel models capture relational statistics over object attributes and their connecting links, answering questions such as “what is the percentage of friendship pairs where both friends are women?” Classlevel relationships are important in themselves, and they support applications li ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Abstract. Classlevel models capture relational statistics over object attributes and their connecting links, answering questions such as “what is the percentage of friendship pairs where both friends are women?” Classlevel relationships are important in themselves, and they support applications like policy making, strategic planning, and query optimization. We represent class statistics using Parametrized Bayes Nets (PBNs), a firstorder logic extension of Bayes nets. Queries about classes require a new semantics for PBNs, as the standard grounding semantics is only appropriate for answering queries about specific ground facts. We propose a novel random selection semantics for PBNs, which does not make reference to a ground model, and supports classlevel queries. The parameters for this semantics can be learned using the recent pseudolikelihood measure [1] as the objective function. This objective function is maximized by taking the empirical frequencies in the relational data as the parameter settings. We render the computation of these empirical frequencies tractable in the presence of negated relations by the inverse Möbius transform. Evaluation of our method on four benchmark datasets shows that maximum pseudolikelihood provides fast and accurate estimates at different sample sizes. 1
Probabilistic Abductive Logic Programming using Possible Worlds
"... Abstract Reasoning in very complex contexts often requires purely deductive reasoning to be supported by a variety of techniques that can cope with incomplete data. Abductive inference allows to guess information that has not been explicitly observed. Since there are many explanations for such guess ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Abstract Reasoning in very complex contexts often requires purely deductive reasoning to be supported by a variety of techniques that can cope with incomplete data. Abductive inference allows to guess information that has not been explicitly observed. Since there are many explanations for such guesses, there is the need for assigning a probability to each one. This work exploits logical abduction to produce multiple explanations consistent with a given background knowledge and defines a strategy to prioritize them using their chance of being true. Another novelty is the introduction of probabilistic integrity constraints rather than hard ones. Then we propose a strategy that learns model and parameters from data and exploits our Probabilistic Abductive Proof Procedure to classify neverseen instances. This approach has been tested on some standard datasets showing that it improves accuracy in presence of corruptions and missing data. 1
ALGORITHMS FOR NONPARAMETRIC CLASSIFIERS IN MULTIRELATIONAL DATA MINING
, 2006
"... Over the last decades, due to the advances in information technologies, both the industrial and scientific communities have acquired large volumes of data in digital form. Most of these data sets are stored using relational databases consisting of multiple tables and associations. Moreover, the dat ..."
Abstract
 Add to MetaCart
Over the last decades, due to the advances in information technologies, both the industrial and scientific communities have acquired large volumes of data in digital form. Most of these data sets are stored using relational databases consisting of multiple tables and associations. Moreover, the data used in the fields of bioinformatics, computational biology, HTML and XML documents are relational in nature. However, most of the existing approaches to knowledge discovery in databases, assume that the data are stored in a single table. Therefore, new algorithms are needed in order to exploit the relational information provided in these data sets. This thesis proposes two novel solutions to the task of supervised classification in relational domains, based on traditional nonparametric classifiers and built upon relational algebra. The first approach is based on Kernel Density Estimation, and the second technique is based on Gaussian Mixture Models. Both techniques are evaluated using three real world relational data sets, drawn from the fields of organic chemistry, medicine and genetics.
2005 7th International Conference on Information Fusion (FUSION) Situation Assessments Using Object Oriented Probabilistic Relational
"... Abstract This paper presents Oriented Probabilistic Models ..."