Results 1  10
of
13
Distributionbased aggregation for relational learning with identifier attributes
 Machine Learning
, 2004
"... Feature construction through aggregation plays an essential role in modeling relational domains with onetomany relationships between tables. Onetomany relationships lead to bags (multisets) of related entities, from which predictive information must be captured. This paper focuses on aggregation ..."
Abstract

Cited by 31 (10 self)
 Add to MetaCart
Feature construction through aggregation plays an essential role in modeling relational domains with onetomany relationships between tables. Onetomany relationships lead to bags (multisets) of related entities, from which predictive information must be captured. This paper focuses on aggregation from categorical attributes that can take many values (e.g., object identifiers). We present a novel aggregation method as part of a relational learning system ACORA, that combines the use of vector distance and metadata about the classconditional distributions of attribute values. We provide a theoretical foundation for this approach deriving a “relational fixedeffect ” model within a Bayesian framework, and discuss the implications of identifier aggregation on the expressive power of the induced model. One advantage of using identifier attributes is the circumvention of limitations caused either by missing/unobserved object properties or by independence assumptions. Finally, we show empirically that the novel aggregators can generalize in the presence of identifier (and other highdimensional) attributes, and also explore the limitations of the applicability of the methods. 1
nFOIL: Integrating Naïve Bayes and FOIL
, 2005
"... We present the system nFOIL. It tightly integrates the naïve Bayes learning scheme with the inductive logic programming rulelearner FOIL. In contrast to previous combinations, which have employed naïve Bayes only for postprocessing the rule sets, nFOIL employs the naïve Bayes criterion to directly ..."
Abstract

Cited by 24 (3 self)
 Add to MetaCart
We present the system nFOIL. It tightly integrates the naïve Bayes learning scheme with the inductive logic programming rulelearner FOIL. In contrast to previous combinations, which have employed naïve Bayes only for postprocessing the rule sets, nFOIL employs the naïve Bayes criterion to directly guide its search. Experimental evidence shows that nFOIL performs better than both its base line algorithm FOIL or the postprocessing approach, and is at the same time competitive with more sophisticated approaches.
Spatial Associative Classification: Propositional vs. Structural approach
 JOURNAL OF INTELLIGENT INFORMATION SYSTEMS
, 2006
"... Spatial associative classification takes advantage of employing association rules for spatial classification purposes. In this work, we investigate spatial associative classification in multirelational data mining setting to deal with spatial objects having different properties, which are modeled ..."
Abstract

Cited by 12 (6 self)
 Add to MetaCart
Spatial associative classification takes advantage of employing association rules for spatial classification purposes. In this work, we investigate spatial associative classification in multirelational data mining setting to deal with spatial objects having different properties, which are modeled by as many data tables (relations) as the number of spatial object types (layers). Spatial classification is based on two alternative approaches: a propositional approach and a structural approach. The propositional approach uses spatial association rules to construct an attributevalue representation (propositionalisation) of spatial data and performs spatial classification according to wellknown propositional classification methods. Since the attributevalue representation should capture relational properties of spatial data, multirelational association rules are used in propositionalisation step. The structural approach resorts to an extension of naïve Bayes classifiers to multirelational data where the classification is driven by multirelational association rules modelling regularities in spatial data. In both cases the spatial associative classification is performed at different levels of granularity and takes advantage from domain knowledge expressed in form of hierarchies and rules. Experiments on realworld georeferenced census data analysis show the advantage of the structural approach over the propositional one.
Bridging the gap between distance and generalisation: Symbolic learning in metric spaces
, 2008
"... Distancebased and generalisationbased methods are two families of artificial intelligence techniques that have been successfully used over a wide range of realworld problems. In the first case, general algorithms can be applied to any data representation by just changing the distance. The metric ..."
Abstract

Cited by 7 (4 self)
 Add to MetaCart
Distancebased and generalisationbased methods are two families of artificial intelligence techniques that have been successfully used over a wide range of realworld problems. In the first case, general algorithms can be applied to any data representation by just changing the distance. The metric space sets the search and learning space, which is generally instanceoriented. In the second case, models can be obtained for a given pattern language, which can be comprehensible. The generalityordered space sets the search and learning space, which is generally modeloriented. However, the concepts of distance and generalisation clash in many different ways, especially when knowledge representation is complex (e.g. structured data). This work establishes a framework where these two fields can be integrated in a consistent way. We introduce the concept of distancebased generalisation, which connects all the generalised examples in such a way that all of them are reachable inside the generalisation by using straight paths in the metric space. This makes the metric space and the generalityordered space coherent (or even dual). Additionally, we also introduce a definition of minimal distancebased generalisation that can be seen as the first formulation of the Minimum Description Length (MDL)/Minimum Message Length (MML) principle in terms of a distance function. We instantiate and develop the framework for the most common data representations and distances, where we show that consistent instances can be found for numerical data, nominal data, sets, lists, tuples, graphs, firstorder atoms and clauses. As a result, general learning methods that integrate the best from distancebased and generalisationbased methods can be defined and adapted to any specific problem by appropriately choosing the distance, the pattern language and the generalisation operator.
Social Network Classification Incorporating Link Type Values
"... Abstract—Classification of nodes in a social network and its applications to security informatics have been extensively studied in the past. However, previous work generally does not consider the types of links (e.g., whether a person is friend or a close friend) that connect social networks members ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
Abstract—Classification of nodes in a social network and its applications to security informatics have been extensively studied in the past. However, previous work generally does not consider the types of links (e.g., whether a person is friend or a close friend) that connect social networks members for classification purposes. Here, we propose modified Naive Bayes Classification schemes to make use of the link type information in classification tasks. Basically, we suggest two new Bayesian classification methods that extend a traditional relational Naive Bayes Classifier, namely, the Link Type relational Bayes Classifier and the Weighted Link Type Bayes Classifier. We then show the efficacy of our proposed techniques by conducting experiments on data obtained from the Internet Movie Database. I.
Transductive Learning from Relational Data
 In: P. Perner (Ed.), Machine Learning and Data Mining in Pattern Recognition, LNAI 4571
, 2007
"... Abstract. Transduction is an inference mechanism “from particular to particular”. Its application to classification tasks implies the use of both labeled (training) data and unlabeled (working) data to build a classifier whose main goal is that of classifying (only) unlabeled data as accurately as p ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
Abstract. Transduction is an inference mechanism “from particular to particular”. Its application to classification tasks implies the use of both labeled (training) data and unlabeled (working) data to build a classifier whose main goal is that of classifying (only) unlabeled data as accurately as possible. Unlike the classical inductive setting, no general rule valid for all possible instances is generated. Transductive learning is most suited for those applications where the examples for which a prediction is needed are already known when training the classifier. Several approaches have been proposed in the literature on building transductive classifiers from data stored in a single table of a relational database. Nonetheless, no attention has been paid to the application of the transduction principle in a (multi)relational setting, where data are stored in multiple tables of a relational database. In this paper we propose a new transductive classifier, named TRANSC, which is based on a probabilistic approach to making transductive inferences from relational data. This new method works in a transductive setting and employs a principled probabilistic classification in multirelational data mining to face the challenges posed by some spatial data mining problems. Probabilistic inference allows us to compute the class probability and return, in addition to result of transductive classification, the confidence in the classification. The predictive accuracy of TRANSC has been compared to that of its inductive counterpart in an empirical study involving both a benchmark relational dataset and two spatial datasets. The results obtained are generally in favor of TRANSC, although improvements are small by a narrow margin. 1
Simple Decision Forests for MultiRelational Classification
"... An important task in multirelational data mining is linkbased classification which takes advantage of attributes of links and linked entities, to predict the class label. The relational naive Bayes classifier exploits independence assumptions to achieve scalability. We introduce a weaker independe ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
An important task in multirelational data mining is linkbased classification which takes advantage of attributes of links and linked entities, to predict the class label. The relational naive Bayes classifier exploits independence assumptions to achieve scalability. We introduce a weaker independence assumption to the e↵ect that information from di↵erent data tables is independent given the class label. The independence assumption entails a closedform formula for combining probabilistic predictions based on decision trees learned on di↵erent database tables. Logistic regression learns di↵erent weights for information from di↵erent tables and prunes irrelevant tables. In experiments, learning was very fast with competitive accuracy.
Classification in networked data  A Toolkit and a . . .
, 2007
"... This paper is about classifying entities that are interlinked with entities for which the class is known. After surveying prior work, we present NetKit, a modular toolkit for classification in networked data, and a casestudy of its application to networked data used in prior machine learning resear ..."
Abstract
 Add to MetaCart
This paper is about classifying entities that are interlinked with entities for which the class is known. After surveying prior work, we present NetKit, a modular toolkit for classification in networked data, and a casestudy of its application to networked data used in prior machine learning research. NetKit is based on a nodecentric framework in which classifiers comprise a local classifier, a relational classifier, and a collective inference procedure. Various existing nodecentric relational learning algorithms can be instantiated with appropriate choices for these components, and new combinations of components realize new algorithms. The case study focuses on univariate network classification, for which the only information used is the structure of class linkage in the network (i.e., only links and some class labels). To our knowledge, no work previously has evaluated systematically the power of classlinkage alone for classification in machine learning benchmark data sets. The results demonstrate that very simple networkclassification models perform quite well—well enough that they should be used regularly as baseline classifiers for studies of learning with networked data. The simplest method (which performs remarkably well) highlights the close correspondence between several existing methods introduced for different purposes—that is, Gaussianfield classifiers, Hopfield networks, and relationalneighbor classifiers. The case study also shows that there are two sets of techniques that are preferable in different situations, namely when few versus many labels are known initially. We also demonstrate that link selection plays an important role similar to traditional feature selection.
A Probabilistic Graphical Model Framework for HigherOrder TermBased Representations
, 2005
"... This thesis introduces HigherOrder Bayesian networks (HOBNs), a probabilistic graphical model framework for inference and learning over structured data. HOBNs extend the expressive power of standard Bayesian networks with random variables ranging over domains of certain families of higherorder ter ..."
Abstract
 Add to MetaCart
This thesis introduces HigherOrder Bayesian networks (HOBNs), a probabilistic graphical model framework for inference and learning over structured data. HOBNs extend the expressive power of standard Bayesian networks with random variables ranging over domains of certain families of higherorder terms. The formalism allows the expression of conditional independence assumptions on the domain, which are exploited in order to give an efficient method of defining probability distributions over the higherorder types. Methods for probabilistic inference and model construction from data observations are discussed, and experimental results on realworld domains are presented. Acknowledgements and Dedication
Integrating Naïve Bayes and FOIL ∗
"... A novel relational learning approach that tightly integrates the naïve Bayes learning scheme with the inductive logic programming rulelearner FOIL is presented. In contrast to previous combinations that have employed naïve Bayes only for postprocessing the rule sets, the presented approach employs ..."
Abstract
 Add to MetaCart
A novel relational learning approach that tightly integrates the naïve Bayes learning scheme with the inductive logic programming rulelearner FOIL is presented. In contrast to previous combinations that have employed naïve Bayes only for postprocessing the rule sets, the presented approach employs the naïve Bayes criterion to guide its search directly. The proposed technique is implemented in the NFOIL and TFOIL systems, which employ standard naïve Bayes and tree augmented naïve Bayes models respectively. We show that these integrated approaches to probabilistic model and rule learning outperform postprocessing approaches. They also yield significantly more accurate models than simple rule learning and are competitive with more sophisticated ILP systems.