Results 1  10
of
15
Naive Bayesian Classification of Structured Data
, 2003
"... In this paper we present 1BC and 1BC2, two systems that perform naive Bayesian classification of structured individuals. The approach of 1BC is to project the individuals along firstorder features. These features are built from the individual using structural predicates referring to related objects ..."
Abstract

Cited by 33 (0 self)
 Add to MetaCart
In this paper we present 1BC and 1BC2, two systems that perform naive Bayesian classification of structured individuals. The approach of 1BC is to project the individuals along firstorder features. These features are built from the individual using structural predicates referring to related objects (e.g. atoms within molecules), and properties applying to the individual or one or several of its related objects (e.g. a bond between two atoms). We describe an individual in terms of elementary features consisting of zero or more structural predicates and one property; these features are treated as conditionally independent in the spirit of the naive Bayes assumption. 1BC2 represents an alternative firstorder upgrade to the naive Bayesian classifier by considering probability distributions over structured objects (e.g., a molecule as a set of atoms), and estimating those distributions from the probabilities of its elements (which are assumed to be independent). We present a unifying view on both systems in which 1BC works in language space, and 1BC2 works in individual space. We also present a new, efficient recursive algorithm improving upon the original propositionalisation approach of 1BC. Both systems have been implemented in the context of the firstorder descriptive learner Tertius, and we investigate the differences between the two systems both in computational terms and on artificially generated data. Finally, we describe a range of experiments on ILP benchmark data sets demonstrating the viability of our approach.
Spatial Associative Classification: Propositional vs. Structural approach
 JOURNAL OF INTELLIGENT INFORMATION SYSTEMS
, 2006
"... Spatial associative classification takes advantage of employing association rules for spatial classification purposes. In this work, we investigate spatial associative classification in multirelational data mining setting to deal with spatial objects having different properties, which are modeled ..."
Abstract

Cited by 18 (10 self)
 Add to MetaCart
(Show Context)
Spatial associative classification takes advantage of employing association rules for spatial classification purposes. In this work, we investigate spatial associative classification in multirelational data mining setting to deal with spatial objects having different properties, which are modeled by as many data tables (relations) as the number of spatial object types (layers). Spatial classification is based on two alternative approaches: a propositional approach and a structural approach. The propositional approach uses spatial association rules to construct an attributevalue representation (propositionalisation) of spatial data and performs spatial classification according to wellknown propositional classification methods. Since the attributevalue representation should capture relational properties of spatial data, multirelational association rules are used in propositionalisation step. The structural approach resorts to an extension of naïve Bayes classifiers to multirelational data where the classification is driven by multirelational association rules modelling regularities in spatial data. In both cases the spatial associative classification is performed at different levels of granularity and takes advantage from domain knowledge expressed in form of hierarchies and rules. Experiments on realworld georeferenced census data analysis show the advantage of the structural approach over the propositional one.
An efficient multirelational naive Bayesian classifier based on semantic relationship graph
 In Proceedings of the 4th international workshop on Multirelational mining (MRDM ’05
"... Classification is one of the most popular data mining tasks with a wide range of applications, and lots of algorithms have been proposed to build accurate and scalable classifiers. Most of these algorithms only take a single table as input, whereas in the real world most data are stored in multiple ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
(Show Context)
Classification is one of the most popular data mining tasks with a wide range of applications, and lots of algorithms have been proposed to build accurate and scalable classifiers. Most of these algorithms only take a single table as input, whereas in the real world most data are stored in multiple tables and managed by relational database systems. As transferring data from multiple tables into a single one usually causes many problems, development of multirelational classification algorithms becomes important and attracts many researchers ’ interests. Existing works about extending Naïve Bayes to deal with multirelational data either have to transform data stored in tables to mainmemory Prolog facts, or limit the search space to only a small subset of real world applications. In this work, we aim at solving these problems and building an efficient, accurate Naïve Bayesian classifier to deal with data in multiple tables directly. We propose an algorithm named GraphNB, which upgrades Naïve Bayesian classifier to deal with multiple tables directly. In order to take advantage of linkage relationships among tables, and treat different tables linked to the target table differently, a semantic relationship graph is developed to describe the relationship and to avoid unnecessary joins. Furthermore, to improve accuracy, a pruning strategy is given to simplify the graph to avoid examining too many weakly linked tables. Experimental study on both realworld and synthetic databases shows its high efficiency and good accuracy.
Transductive Learning for Spatial Data Classification
"... Abstract. Learning classifiers of spatial data presents several issues, such as the heterogeneity of spatial objects, the implicit definition of spatial relationships among objects, the spatial autocorrelation and the abundance of unlabelled data which potentially convey a large amount of informatio ..."
Abstract

Cited by 3 (3 self)
 Add to MetaCart
(Show Context)
Abstract. Learning classifiers of spatial data presents several issues, such as the heterogeneity of spatial objects, the implicit definition of spatial relationships among objects, the spatial autocorrelation and the abundance of unlabelled data which potentially convey a large amount of information. The first three issues are due to the inherent structure of spatial units of analysis, which can be easily accommodated if a (multi)relational data mining approach is considered. The fourth issue demands for the adoption of a transductive setting, which aims to make predictions for a given set of unlabelled data. Transduction is also motivated by the contiguity of the concept of positive autocorrelation, which typically affect spatial phenomena, with the smoothness assumption which characterize the transductive setting. In this work, we investigate a relational approach to spatial classification in a transductive setting. Computational solutions to the main difficulties met in this approach are presented. In particular, a relational upgrade of the naïve Bayes classifier is proposed as discriminative model, an iterative algorithm is designed for the transductive classification of unlabelled data, and a distance measure between relational descriptions of spatial objects is defined in order to determine the knearest neighbors of each example in the dataset. Computational solutions have been tested on two realworld spatial datasets. The transformation of spatial data into a multirelational representation and experimental results are reported and commented. 1
A relational approach to probabilistic classification in a . . .
 ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE
, 2009
"... ..."
Transductive Learning from Relational Data
 In: P. Perner (Ed.), Machine Learning and Data Mining in Pattern Recognition, LNAI 4571
, 2007
"... Abstract. Transduction is an inference mechanism “from particular to particular”. Its application to classification tasks implies the use of both labeled (training) data and unlabeled (working) data to build a classifier whose main goal is that of classifying (only) unlabeled data as accurately as p ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
(Show Context)
Abstract. Transduction is an inference mechanism “from particular to particular”. Its application to classification tasks implies the use of both labeled (training) data and unlabeled (working) data to build a classifier whose main goal is that of classifying (only) unlabeled data as accurately as possible. Unlike the classical inductive setting, no general rule valid for all possible instances is generated. Transductive learning is most suited for those applications where the examples for which a prediction is needed are already known when training the classifier. Several approaches have been proposed in the literature on building transductive classifiers from data stored in a single table of a relational database. Nonetheless, no attention has been paid to the application of the transduction principle in a (multi)relational setting, where data are stored in multiple tables of a relational database. In this paper we propose a new transductive classifier, named TRANSC, which is based on a probabilistic approach to making transductive inferences from relational data. This new method works in a transductive setting and employs a principled probabilistic classification in multirelational data mining to face the challenges posed by some spatial data mining problems. Probabilistic inference allows us to compute the class probability and return, in addition to result of transductive classification, the confidence in the classification. The predictive accuracy of TRANSC has been compared to that of its inductive counterpart in an empirical study involving both a benchmark relational dataset and two spatial datasets. The results obtained are generally in favor of TRANSC, although improvements are small by a narrow margin. 1
Simple Decision Forests for MultiRelational Classification
"... An important task in multirelational data mining is linkbased classification which takes advantage of attributes of links and linked entities, to predict the class label. The relational naive Bayes classifier exploits independence assumptions to achieve scalability. We introduce a weaker independe ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
(Show Context)
An important task in multirelational data mining is linkbased classification which takes advantage of attributes of links and linked entities, to predict the class label. The relational naive Bayes classifier exploits independence assumptions to achieve scalability. We introduce a weaker independence assumption to the e↵ect that information from di↵erent data tables is independent given the class label. The independence assumption entails a closedform formula for combining probabilistic predictions based on decision trees learned on di↵erent database tables. Logistic regression learns di↵erent weights for information from di↵erent tables and prunes irrelevant tables. In experiments, learning was very fast with competitive accuracy.
Multirelational data mining in Microsoft SQL Server 2005
"... Most real life data are relational by nature. Database mining integration is an essential goal to be achieved. Microsoft SQL Server (MSSQL) seems to provide an interesting and promising environment to develop aggregated multirelational data mining algorithms by using nested tables and the plugin a ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Most real life data are relational by nature. Database mining integration is an essential goal to be achieved. Microsoft SQL Server (MSSQL) seems to provide an interesting and promising environment to develop aggregated multirelational data mining algorithms by using nested tables and the plugin algorithm approach. However, it is currently unclear how these nested tables can best be used by data mining algorithms. In this paper we look at how the Microsoft Decision Trees (MSDT) handles multirelational data, and we compare it with the multirelational decision tree learner TILDE. In the experiments we perform, MSDT has equally good predictive accuracy as TILDE, but the trees it gives either ignore the relational information, or use it in a way that yields noninterpretable trees. As such, one could say that its explanatory power is reduced, when compared to a multirelational decision tree learner. We conclude that it may be worthwhile to integrate a multirelational decision tree learner in MSSQL.
Mining geospatial data in a transductive setting
 Data Mining VIII
, 2007
"... Many organizations collects large amounts of spatially referenced data. Spatial Data Mining targets the discovery of interesting, implicit knowledge from such data. The specific classification task has been extensively investigated in the classical inductive setting, where only labeled examples are ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
Many organizations collects large amounts of spatially referenced data. Spatial Data Mining targets the discovery of interesting, implicit knowledge from such data. The specific classification task has been extensively investigated in the classical inductive setting, where only labeled examples are used to generate a classifier, discarding a large amount of information potentially conveyed by the unlabeled instances to be classified. In this work spatial classification is based on transduction, an inference mechanism “from particular to particular ” which uses both labeled and unlabeled data to build a classifier whose main goal is that of classifying (only) unlabeled data as accurately as possible. The proposed method, named TRANSC, employs a principled probabilistic classification in multirelational data mining to face the challenges posed by handling spatial data. The predictive accuracy of TRANSC has been evaluated on two realworld spatial datasets. 1
A Probabilistic Graphical Model Framework for HigherOrder TermBased Representations
, 2005
"... This thesis introduces HigherOrder Bayesian networks (HOBNs), a probabilistic graphical model framework for inference and learning over structured data. HOBNs extend the expressive power of standard Bayesian networks with random variables ranging over domains of certain families of higherorder ter ..."
Abstract
 Add to MetaCart
(Show Context)
This thesis introduces HigherOrder Bayesian networks (HOBNs), a probabilistic graphical model framework for inference and learning over structured data. HOBNs extend the expressive power of standard Bayesian networks with random variables ranging over domains of certain families of higherorder terms. The formalism allows the expression of conditional independence assumptions on the domain, which are exploited in order to give an efficient method of defining probability distributions over the higherorder types. Methods for probabilistic inference and model construction from data observations are discussed, and experimental results on realworld domains are presented. Acknowledgements and Dedication