Results 1 -
9 of
9
Naive Bayesian Classification of Structured Data
, 2003
"... In this paper we present 1BC and 1BC2, two systems that perform naive Bayesian classification of structured individuals. The approach of 1BC is to project the individuals along first-order features. These features are built from the individual using structural predicates referring to related objects ..."
Abstract
-
Cited by 18 (0 self)
- Add to MetaCart
In this paper we present 1BC and 1BC2, two systems that perform naive Bayesian classification of structured individuals. The approach of 1BC is to project the individuals along first-order features. These features are built from the individual using structural predicates referring to related objects (e.g. atoms within molecules), and properties applying to the individual or one or several of its related objects (e.g. a bond between two atoms). We describe an individual in terms of elementary features consisting of zero or more structural predicates and one property; these features are treated as conditionally independent in the spirit of the naive Bayes assumption. 1BC2 represents an alternative first-order upgrade to the naive Bayesian classifier by considering probability distributions over structured objects (e.g., a molecule as a set of atoms), and estimating those distributions from the probabilities of its elements (which are assumed to be independent). We present a unifying view on both systems in which 1BC works in language space, and 1BC2 works in individual space. We also present a new, efficient recursive algorithm improving upon the original propositionalisation approach of 1BC. Both systems have been implemented in the context of the first-order descriptive learner Tertius, and we investigate the differences between the two systems both in computational terms and on artificially generated data. Finally, we describe a range of experiments on ILP benchmark data sets demonstrating the viability of our approach.
Spatial Associative Classification: Propositional vs. Structural approach
- JOURNAL OF INTELLIGENT INFORMATION SYSTEMS
, 2006
"... Spatial associative classification takes advantage of employing association rules for spatial classification purposes. In this work, we investigate spatial associative classification in multi-relational data mining setting to deal with spatial objects having different properties, which are modeled ..."
Abstract
-
Cited by 9 (5 self)
- Add to MetaCart
Spatial associative classification takes advantage of employing association rules for spatial classification purposes. In this work, we investigate spatial associative classification in multi-relational data mining setting to deal with spatial objects having different properties, which are modeled by as many data tables (relations) as the number of spatial object types (layers). Spatial classification is based on two alternative approaches: a propositional approach and a structural approach. The propositional approach uses spatial association rules to construct an attribute-value representation (propositionalisation) of spatial data and performs spatial classification according to well-known propositional classification methods. Since the attribute-value representation should capture relational properties of spatial data, multi-relational association rules are used in propositionalisation step. The structural approach resorts to an extension of naïve Bayes classifiers to multi-relational data where the classification is driven by multi-relational association rules modelling regularities in spatial data. In both cases the spatial associative classification is performed at different levels of granularity and takes advantage from domain knowledge expressed in form of hierarchies and rules. Experiments on realworld geo-referenced census data analysis show the advantage of the structural approach over the propositional one.
An efficient multi-relational naive Bayesian classifier based on semantic relationship graph
- In Proceedings of the 4th international workshop on Multi-relational mining (MRDM ’05
"... Classification is one of the most popular data mining tasks with a wide range of applications, and lots of algorithms have been proposed to build accurate and scalable classifiers. Most of these algorithms only take a single table as input, whereas in the real world most data are stored in multiple ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Classification is one of the most popular data mining tasks with a wide range of applications, and lots of algorithms have been proposed to build accurate and scalable classifiers. Most of these algorithms only take a single table as input, whereas in the real world most data are stored in multiple tables and managed by relational database systems. As transferring data from multiple tables into a single one usually causes many problems, development of multi-relational classification algorithms becomes important and attracts many researchers ’ interests. Existing works about extending Naïve Bayes to deal with multi-relational data either have to transform data stored in tables to mainmemory Prolog facts, or limit the search space to only a small subset of real world applications. In this work, we aim at solving these problems and building an efficient, accurate Naïve Bayesian classifier to deal with data in multiple tables directly. We propose an algorithm named Graph-NB, which upgrades Naïve Bayesian classifier to deal with multiple tables directly. In order to take advantage of linkage relationships among tables, and treat different tables linked to the target table differently, a semantic relationship graph is developed to describe the relationship and to avoid unnecessary joins. Furthermore, to improve accuracy, a pruning strategy is given to simplify the graph to avoid examining too many weakly linked tables. Experimental study on both realworld and synthetic databases shows its high efficiency and good accuracy.
Transductive Learning from Relational Data
- In: P. Perner (Ed.), Machine Learning and Data Mining in Pattern Recognition, LNAI 4571
, 2007
"... Abstract. Transduction is an inference mechanism “from particular to particular”. Its application to classification tasks implies the use of both labeled (training) data and unlabeled (working) data to build a classifier whose main goal is that of classifying (only) unlabeled data as accurately as p ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
Abstract. Transduction is an inference mechanism “from particular to particular”. Its application to classification tasks implies the use of both labeled (training) data and unlabeled (working) data to build a classifier whose main goal is that of classifying (only) unlabeled data as accurately as possible. Unlike the classical inductive setting, no general rule valid for all possible instances is generated. Transductive learning is most suited for those applications where the examples for which a prediction is needed are already known when training the classifier. Several approaches have been proposed in the literature on building transductive classifiers from data stored in a single table of a relational database. Nonetheless, no attention has been paid to the application of the transduction principle in a (multi-)relational setting, where data are stored in multiple tables of a relational database. In this paper we propose a new transductive classifier, named TRANSC, which is based on a probabilistic approach to making transductive inferences from relational data. This new method works in a transductive setting and employs a principled probabilistic classification in multi-relational data mining to face the challenges posed by some spatial data mining problems. Probabilistic inference allows us to compute the class probability and return, in addition to result of transductive classification, the confidence in the classification. The predictive accuracy of TRANSC has been compared to that of its inductive counterpart in an empirical study involving both a benchmark relational dataset and two spatial datasets. The results obtained are generally in favor of TRANSC, although improvements are small by a narrow margin. 1
A relational approach to probabilistic classification in a . . .
- ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE
, 2009
"... ..."
Multi-relational data mining in Microsoft SQL Server 2005
"... Most real life data are relational by nature. Database mining integration is an essential goal to be achieved. Microsoft SQL Server (MSSQL) seems to provide an interesting and promising environment to develop aggregated multi-relational data mining algorithms by using nested tables and the plug-in a ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Most real life data are relational by nature. Database mining integration is an essential goal to be achieved. Microsoft SQL Server (MSSQL) seems to provide an interesting and promising environment to develop aggregated multi-relational data mining algorithms by using nested tables and the plug-in algorithm approach. However, it is currently unclear how these nested tables can best be used by data mining algorithms. In this paper we look at how the Microsoft Decision Trees (MSDT) handles multi-relational data, and we compare it with the multi-relational decision tree learner TILDE. In the experiments we perform, MSDT has equally good predictive accuracy as TILDE, but the trees it gives either ignore the relational information, or use it in a way that yields noninterpretable trees. As such, one could say that its explanatory power is reduced, when compared to a multi-relational decision tree learner. We conclude that it may be worthwhile to integrate a multi-relational decision tree learner in MSSQL.
A Probabilistic Graphical Model Framework for Higher-Order Term-Based Representations
, 2005
"... This thesis introduces Higher-Order Bayesian networks (HOBNs), a probabilistic graphical model framework for inference and learning over structured data. HOBNs extend the expressive power of standard Bayesian networks with random variables ranging over domains of certain families of higher-order ter ..."
Abstract
- Add to MetaCart
This thesis introduces Higher-Order Bayesian networks (HOBNs), a probabilistic graphical model framework for inference and learning over structured data. HOBNs extend the expressive power of standard Bayesian networks with random variables ranging over domains of certain families of higher-order terms. The formalism allows the expression of conditional independence assumptions on the domain, which are exploited in order to give an efficient method of defining probability distributions over the higher-order types. Methods for probabilistic inference and model construction from data observations are discussed, and experimental results on real-world domains are presented. Acknowledgements and Dedication
Transductive Learning for Spatial Data Classification
"... Abstract. Learning classifiers of spatial data presents several issues, such as the heterogeneity of spatial objects, the implicit definition of spatial relationships among objects, the spatial autocorrelation and the abundance of unlabelled data which potentially convey a large amount of informatio ..."
Abstract
- Add to MetaCart
Abstract. Learning classifiers of spatial data presents several issues, such as the heterogeneity of spatial objects, the implicit definition of spatial relationships among objects, the spatial autocorrelation and the abundance of unlabelled data which potentially convey a large amount of information. The first three issues are due to the inherent structure of spatial units of analysis, which can be easily accommodated if a (multi-)relational data mining approach is considered. The fourth issue demands for the adoption of a transductive setting, which aims to make predictions for a given set of unlabelled data. Transduction is also motivated by the contiguity of the concept of positive autocorrelation, which typically affect spatial phenomena, with the smoothness assumption which characterize the transductive setting. In this work, we investigate a relational approach to spatial classification in a transductive setting. Computational solutions to the main difficulties met in this approach are presented. In particular, a relational upgrade of the naïve Bayes classifier is proposed as discriminative model, an iterative algorithm is designed for the transductive classification of unlabelled data, and a distance measure between relational descriptions of spatial objects is defined in order to determine the k-nearest neighbors of each example in the dataset. Computational solutions have been tested on two real-world spatial datasets. The transformation of spatial data into a multi-relational representation and experimental results are reported and commented. 1
Analysis and Comparative Study of Classifiers for Relational Data Mining
"... As an important task of relational database, relational classification can directly classify the data that involve multiple relations from a relational database and have more advantages than propositional data mining approaches. The information age has provided us with huge data repositories which c ..."
Abstract
- Add to MetaCart
As an important task of relational database, relational classification can directly classify the data that involve multiple relations from a relational database and have more advantages than propositional data mining approaches. The information age has provided us with huge data repositories which cannot longer be analyzed manually. Most available existing data mining algorithms looks for pattern in a single relation. To classify data from relational database need of multi-relational classification arise which is used to analyze relational database and used to predict behavior and unknown pattern automatically which include business data, bioinformatics, pharmacology, web mining, credit card fraud detection, disease diagnosis system, computational biology, online retailers. In this paper, we present the several kinds of multi-relational classification methods including Inductive Logic Programming (ILP) based, Associative based multirelational classification, Emerging Patterns based, Relational database based classification approaches and discuss each relational classification approaches, their characteristics, their comparisons and challenging issues in detail.

