Results 11  20
of
44
The relational vectorspace model and industry classification
 IJCAI 2003 Workshop on Learning Statistical Models from Relational Data (SRL2003
, 2003
"... This paper addresses the classification of linked entities. We introduce a relational vectorspace (VS) model (in analogy to the VS model used in information retrieval) that abstracts the linked structure, representing entities by vectors of weights. Given labeled data as background knowledge/traini ..."
Abstract

Cited by 12 (5 self)
 Add to MetaCart
(Show Context)
This paper addresses the classification of linked entities. We introduce a relational vectorspace (VS) model (in analogy to the VS model used in information retrieval) that abstracts the linked structure, representing entities by vectors of weights. Given labeled data as background knowledge/training data, classification procedures can be defined for this model, including a straightforward, “direct ” model using weighted adjacency vectors. Using a large set of tasks from the domain of company affiliation identification, we demonstrate that such classification procedures can be effective. We then examine the method in more detail, showing that as expected the classification performance correlates with the relational autocorrelation of the data set. We then turn the tables and use the relational VS scores as a way to analyze/visualize the relational autocorrelation present in a complex linked structure. The main contribution of the paper is to introduce the relational VS model as a potentially useful addition to the toolkit for relational data mining. It could provide useful constructed features for domains with low to moderate relational autocorrelation; it may be effective by itself for domains with high levels of relational autocorrelation, and it provides a useful abstraction for analyzing the properties of linked data. Keywords relational data mining, vectorspace models, industry classification, homophily, relational autocorrelation, relationalneighbor classifier 1.
A Visual Query Language for Relational Knowledge Discovery
, 2001
"... QGRAPH is a visual query language for knowledge discovery in relational data. Using QGRAPH, a user can query and update relational data in ways that support data exploration, data transformation, and sampling. When combined with modeling algorithms, such as those developed in inductive logic prog ..."
Abstract

Cited by 10 (5 self)
 Add to MetaCart
(Show Context)
QGRAPH is a visual query language for knowledge discovery in relational data. Using QGRAPH, a user can query and update relational data in ways that support data exploration, data transformation, and sampling. When combined with modeling algorithms, such as those developed in inductive logic programming and relational learning, the language assists analysis of relational data, such as data drawn from the Web, chemical structureactivity relationships, and social networks. Several features distinguish QGRAPH from other query languages such as SQL and Datalog. It is a visual language, so its queries are annotated graphs that reflect potential structures within a database. QGRAPH treats objects, links, and attributes as firstclass entities, so its queries can dynamically alter a data schema by adding and deleting those entities. Finally, the language provides grouping and counting constructs that facilitate calculation of attributes that can capture features of local graph structure. We describe the language in detail, discuss key aspects of the underlying data model and implementation, and discuss several uses of QGRAPH for knowledge discovery.
Hierarchical probabilistic relational models for collaborative filtering
 In Proc. Workshop on Statistical Relational Learning, 21st International Conference on Machine Learning
, 2004
"... This paper applies Probabilistic Relational Models (PRMs) to the Collaborative Filtering task, focussing on the EachMovie data set. We first learn a standard PRM, and show that its performance is competitive with the best known techniques. We then define hierarchical PRMs, which extend standard PRMs ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
(Show Context)
This paper applies Probabilistic Relational Models (PRMs) to the Collaborative Filtering task, focussing on the EachMovie data set. We first learn a standard PRM, and show that its performance is competitive with the best known techniques. We then define hierarchical PRMs, which extend standard PRMs by dynamically refining classes into hierarchies. This represnetation is more expressive that standard PRMs, and allows greater context sensitivity. Finally, we show that hierarchical PRMs achieve stateoftheart results on this dataset. 1.
SharingAware Horizontal Partitioning for Exploiting Correlations During Query Processing
"... Optimization of join queries based on average selectivities is suboptimal in highly correlated databases. In such databases, relations are naturally divided into partitions, each partition having substantially different statistical characteristics. It is very compelling to discover such data partiti ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
Optimization of join queries based on average selectivities is suboptimal in highly correlated databases. In such databases, relations are naturally divided into partitions, each partition having substantially different statistical characteristics. It is very compelling to discover such data partitions during query optimization and create multiple plans for a given query, one plan being optimal for a particular combination of data partitions. This scenario calls for the sharing of state among plans, so that common intermediate results are not recomputed. We study this problem in a setting with a routingbased query execution engine based on eddies [1]. Eddies naturally encapsulate horizontal partitioning and maximal state sharing across multiple plans. We define the notion of a conditional join plan, a novel representation of the search space that enables us to address the problem in a principled way. We present a lowoverhead greedy algorithm that uses statistical summaries based on graphical models. Experimental results suggest an order of magnitude faster execution time over traditional optimization for high correlations, while maintaining the same performance for low correlations. 1.
Reasoning with Recursive Loops under the PLP Framework
"... Recursive loops in a logic program present a challenging problem to the PLP (Probabilistic Logic Programming) framework. On the one hand, they loop forever so that the PLP backwardchaining inferences would never stop. On the other hand, they may generate cyclic influences, which are disallowed in B ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Recursive loops in a logic program present a challenging problem to the PLP (Probabilistic Logic Programming) framework. On the one hand, they loop forever so that the PLP backwardchaining inferences would never stop. On the other hand, they may generate cyclic influences, which are disallowed in Bayesian networks. Therefore, in existing PLP approaches logic programs with recursive loops are considered to be problematic and thus are excluded. In this paper, we propose a novel solution to this problem by making use of recursive loops to build a stationary dynamic Bayesian network. We introduce a new PLP formalism, called a Bayesian knowledge base. It allows recursive loops and contains logic clauses of the form A ← A1,..., Al, true, Context, T ypes, which naturally formulates the knowledge that the Ais have direct influences on A in the context Context under the type constraints Types. We use the wellfounded model of a logic program to define the direct influence relation and apply SLGresolution to compute the space of random variables together with their parental connections. This establishes a clear declarative semantics for a Bayesian knowledge base. We view a logic program with recursive loops as a special temporal model, where backwardchaining cycles of the form A ←...A ←... are interpreted as feedbacks. This extends existing PLP approaches, which mainly aim at (nontemporal) relational models.
Modelling Relational Statistics With Bayes Nets
"... Abstract. Classlevel models capture relational statistics over object attributes and their connecting links, answering questions such as “what is the percentage of friendship pairs where both friends are women?” Classlevel relationships are important in themselves, and they support applications li ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Abstract. Classlevel models capture relational statistics over object attributes and their connecting links, answering questions such as “what is the percentage of friendship pairs where both friends are women?” Classlevel relationships are important in themselves, and they support applications like policy making, strategic planning, and query optimization. We represent class statistics using Parametrized Bayes Nets (PBNs), a firstorder logic extension of Bayes nets. Queries about classes require a new semantics for PBNs, as the standard grounding semantics is only appropriate for answering queries about specific ground facts. We propose a novel random selection semantics for PBNs, which does not make reference to a ground model, and supports classlevel queries. The parameters for this semantics can be learned using the recent pseudolikelihood measure [1] as the objective function. This objective function is maximized by taking the empirical frequencies in the relational data as the parameter settings. We render the computation of these empirical frequencies tractable in the presence of negated relations by the inverse Möbius transform. Evaluation of our method on four benchmark datasets shows that maximum pseudolikelihood provides fast and accurate estimates at different sample sizes. 1
Multirelational data mining in Microsoft SQL Server 2005
"... Most real life data are relational by nature. Database mining integration is an essential goal to be achieved. Microsoft SQL Server (MSSQL) seems to provide an interesting and promising environment to develop aggregated multirelational data mining algorithms by using nested tables and the plugin a ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Most real life data are relational by nature. Database mining integration is an essential goal to be achieved. Microsoft SQL Server (MSSQL) seems to provide an interesting and promising environment to develop aggregated multirelational data mining algorithms by using nested tables and the plugin algorithm approach. However, it is currently unclear how these nested tables can best be used by data mining algorithms. In this paper we look at how the Microsoft Decision Trees (MSDT) handles multirelational data, and we compare it with the multirelational decision tree learner TILDE. In the experiments we perform, MSDT has equally good predictive accuracy as TILDE, but the trees it gives either ignore the relational information, or use it in a way that yields noninterpretable trees. As such, one could say that its explanatory power is reduced, when compared to a multirelational decision tree learner. We conclude that it may be worthwhile to integrate a multirelational decision tree learner in MSSQL.
Stacked graphical learning
, 2007
"... In reality there are many relational datasets in which both features of instances and the relationships among the instances are recorded, such as hyperlinked web pages, scientific literature with citations, and social networks. Collective classification has been widely used to classify a group of re ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
In reality there are many relational datasets in which both features of instances and the relationships among the instances are recorded, such as hyperlinked web pages, scientific literature with citations, and social networks. Collective classification has been widely used to classify a group of related instances simultaneously. Recently there have been several studies on statistical relational learning for collective classification, including relational dependency networks, relational Markov networks, and Markov logic networks. In statistical relational learning models, collective classification is usually formulated as an inference problem over graphical models. Hence the existing collective classification methods are expensive due to the iterative inference procedure required for general graphical models. Procedures that learn collective classifiers are also expensive, especially if they are based on iterative optimization of an expensive iterative inference procedure. Our goal is to develop an efficient model for collective
Toward Optimal Ordering of Prediction Tasks
"... Many applications involve a set of prediction tasks that must be accomplished sequentially through user interaction. If the tasks are interdependent, the order in which they are performed may have a significant impact on the overall performance of the prediction systems. However, manual specificatio ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
Many applications involve a set of prediction tasks that must be accomplished sequentially through user interaction. If the tasks are interdependent, the order in which they are performed may have a significant impact on the overall performance of the prediction systems. However, manual specification of an optimal order may be difficult when the interdependencies are complex, especially if the number of tasks is large, making exhaustive search intractable. This paper presents the first attempt at solving the optimal task ordering problem using an approximate formulation in terms of pairwise task order preferences, reducing the problem to the wellknown Linear Ordering Problem. We propose two approaches for inducing the pairwise task order preferences – 1) a classifieragnostic approach based on conditional entropy that determines the prediction tasks whose correct labels lead to the least uncertainty for the remaining predictions, and 2) a classifierdependent approach that empirically determines which tasks are favored before others for better predictive performance. We apply the proposed solutions to two practical applications that involve computerassisted trouble report generation and document annotation, respectively. In both applications, the user fills up a series of fields and at each step, the system is expected to provide useful suggestions, which comprise the prediction (i.e. classification and ranking) tasks. Our experiments show encouraging improvements in predictive performance, as compared to approaches that do not take task dependencies into account. 1
Probabilistic Abductive Logic Programming using Possible Worlds
"... Abstract Reasoning in very complex contexts often requires purely deductive reasoning to be supported by a variety of techniques that can cope with incomplete data. Abductive inference allows to guess information that has not been explicitly observed. Since there are many explanations for such guess ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
Abstract Reasoning in very complex contexts often requires purely deductive reasoning to be supported by a variety of techniques that can cope with incomplete data. Abductive inference allows to guess information that has not been explicitly observed. Since there are many explanations for such guesses, there is the need for assigning a probability to each one. This work exploits logical abduction to produce multiple explanations consistent with a given background knowledge and defines a strategy to prioritize them using their chance of being true. Another novelty is the introduction of probabilistic integrity constraints rather than hard ones. Then we propose a strategy that learns model and parameters from data and exploits our Probabilistic Abductive Proof Procedure to classify neverseen instances. This approach has been tested on some standard datasets showing that it improves accuracy in presence of corruptions and missing data. 1