Results 1 - 10
of
15
Learning relational probability trees
- In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
, 2003
"... Classification trees are widely used in the machine learning and data mining communities for modeling propositional data. Recent work has extended this basic paradigm to probability estimation trees. Traditional tree learning algorithms assume that instances in the training data are homogenous and i ..."
Abstract
-
Cited by 96 (24 self)
- Add to MetaCart
Classification trees are widely used in the machine learning and data mining communities for modeling propositional data. Recent work has extended this basic paradigm to probability estimation trees. Traditional tree learning algorithms assume that instances in the training data are homogenous and independently distributed. Relational probability trees (RPTs) extend standard probability estimation trees to a relational setting in which data instances are heterogeneous and interdependent. Our algorithm for learning the structure and parameters of an RPT searches over a space of relational features that use aggregation functions (e.g. AVERAGE, MODE, COUNT) to dynamically propositionalize relational data and create binary splits within the RPT. Previous work has identified a number of statistical biases due to characteristics of relational data such as autocorrelation and degree disparity. The RPT algorithm uses a novel form of randomization test to adjust for these biases. On a variety of relational learning tasks, RPTs built using randomization tests are significantly smaller than other models and achieve equivalent, or better, performance. 1.
Experiments with MRDTL – a multi-relational decision tree learning algorithm
- University of Alberta
, 2002
"... www.cs.iastate.edu/~honavar/aigroup.html Abstract. We describe experiments with an implementation of Multi-relational decision tree learning (MRDTL) algorithm for induction of decision trees from relational databases using an approach proposed by Knobbe et al. [1999a]. Our results show that the perf ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
www.cs.iastate.edu/~honavar/aigroup.html Abstract. We describe experiments with an implementation of Multi-relational decision tree learning (MRDTL) algorithm for induction of decision trees from relational databases using an approach proposed by Knobbe et al. [1999a]. Our results show that the performance of MRDTL is competitive with that of other algorithms for learning classifiers from multiple relations including Progol [Muggleton, 1995], FOIL [Quinlan, 1993], Tilde [Blockeel, 1998]. Preliminary results indicate that MRDTL, when augmented with principled methods for handling missing attribute values, could be competitive with the state-of-the-art algorithms for learning classifiers from multiple relations on real-world data sets such as those used in the KDD Cup 2001 data mining competition [Cheng et al., 2002]. 1
RELATIONAL DATA MINING AND ILP FOR DOCUMENT IMAGE UNDERSTANDING
"... & Document image understanding denotes the recognition of semantically relevant components in the layout extracted from a document image. This recognition process is based on domain-specific knowledge that can be acquired automatically by applying data mining techniques. The spatial dimension of pag ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
& Document image understanding denotes the recognition of semantically relevant components in the layout extracted from a document image. This recognition process is based on domain-specific knowledge that can be acquired automatically by applying data mining techniques. The spatial dimension of page layout makes classification methods developed in inductive logic programming (ILP) and multi-relational data mining (MRDM) the most suitable candidates for this specific task. In this paper, both approaches are considered and empirically compared on three different data sets consisting of multi-page articles published in an international journal and historical documents. The ILP method is able to learn recursive logical theories that express dependencies between logical components, while the MRDM method extends the na€ıve Bayesian classifier to data stored in multiple tables of a relational database. Experimental results confirm the importance of the spatial dimension for this application and show that the ILP method tends to be conservative with a high (low) percentage of omission (commission) errors, while the probabilistic nature of the MRDM method allows us to tradeoff between the two types of error. Document image analysis is the subfield of digital image processing that aims to convert document images to symbolic form for modification, storage,
A decision tree for multi-layered spatial data
- In: Joint International Symposium on Geospatial Theory, Processing and Applications
"... Spatial data mining fulfils real needs of many geomatic applications. It allows taking advantage of the growing availability of geographically referenced data and their potential richness. Nowadays, spatial data mining is a clearly identified field of data mining. This article deals with the spatial ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Spatial data mining fulfils real needs of many geomatic applications. It allows taking advantage of the growing availability of geographically referenced data and their potential richness. Nowadays, spatial data mining is a clearly identified field of data mining. This article deals with the spatial data classification using a decision tree. We propose a new method called SCART. This method differs from conventional decision trees by considering the specificities of geographical data, namely their organization in thematic layers, and the spatial relationships. SCART is an extension of CART method in two directions. From one hand, the algorithm considers several thematic layers as in the so-called relational data mining area, and from the other hand, it extends discriminating criteria to criteria on the neighborhood. For this purpose, it determines which combination of attribute value and spatial relationship of neighboring objects provides the best criterion.
Spatial data mining implementation. Alternatives and performances
- Presented at GeoInfo2004, Brazilian Imopsium on GeoInformatics, 22/11-23/11 2004. 201
, 2004
"... Spatial data mining requires the analysis of the interactions in space. These interactions can be materialized using distance tables, reducing spatial data mining to multi-table analysis. However, conventional data mining algorithms consider only one input table where each row is an observation to a ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Spatial data mining requires the analysis of the interactions in space. These interactions can be materialized using distance tables, reducing spatial data mining to multi-table analysis. However, conventional data mining algorithms consider only one input table where each row is an observation to analyze. Simple relational joins between these tables does not resolve the problem and mislead the results because of the multiple counting of observations. We propose three alternatives of multi-table data mining in the context of spatial data mining. The first makes a hard modification in the conventional algorithm in order to consider those tables. The second is an optimization of the first approach. It pre-computes all join operations and adapts the conventional algorithm. The third re-organizes data into a unique table by completing-not joining- the target table using the existing data in the other tables, then applies any standard data mining algorithm without modification. This article presents these three alternatives. It describes their implementation for classification algorithms and compares their performances. Key words: spatial data mining, spatial relationship, spatial database, spatial decision tree. 1.
Multi-relational data mining in Microsoft SQL Server 2005
"... Most real life data are relational by nature. Database mining integration is an essential goal to be achieved. Microsoft SQL Server (MSSQL) seems to provide an interesting and promising environment to develop aggregated multi-relational data mining algorithms by using nested tables and the plug-in a ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Most real life data are relational by nature. Database mining integration is an essential goal to be achieved. Microsoft SQL Server (MSSQL) seems to provide an interesting and promising environment to develop aggregated multi-relational data mining algorithms by using nested tables and the plug-in algorithm approach. However, it is currently unclear how these nested tables can best be used by data mining algorithms. In this paper we look at how the Microsoft Decision Trees (MSDT) handles multi-relational data, and we compare it with the multi-relational decision tree learner TILDE. In the experiments we perform, MSDT has equally good predictive accuracy as TILDE, but the trees it gives either ignore the relational information, or use it in a way that yields noninterpretable trees. As such, one could say that its explanatory power is reduced, when compared to a multi-relational decision tree learner. We conclude that it may be worthwhile to integrate a multi-relational decision tree learner in MSSQL.
ReMauve: A Relational Model Tree Learner
"... Abstract. Model trees are a special case of regression trees in which linear regression models are predicted in the leaves. Little attention has been paid to model trees in relational learning, mainly because the task of learning linear regression equations in this context involves dealing with nond ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract. Model trees are a special case of regression trees in which linear regression models are predicted in the leaves. Little attention has been paid to model trees in relational learning, mainly because the task of learning linear regression equations in this context involves dealing with nondeterminacy of predictive attributes. Whereas existing approaches handle this non-determinacy issue either by selecting a single value or by aggregating over all values, in this paper we present a model tree learning system that tries to combine both. 1
Type Extension Trees: a Unified Framework for Relational Feature Construction
"... Abstract. We introduce type extension trees as a formal representation language for complex combinatorial features of relational data. Based on a very simple syntax this language provides a unified framework for expressing features as diverse as embedded subgraphs on the one hand, and marginal count ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Abstract. We introduce type extension trees as a formal representation language for complex combinatorial features of relational data. Based on a very simple syntax this language provides a unified framework for expressing features as diverse as embedded subgraphs on the one hand, and marginal counts of attribute values on the other. We show by various examples how many existing relational data mining techniques can be expressed as the problem of constructing a type extension tree and a discriminant function. 1
Feature Discovery with Type Extension Trees
"... Abstract. We are interested in learning complex combinatorial features from relational data. We rely on an expressive and general representation language whose semantics allows us to express many features that have been used in different statistical relational learning settings. To avoid expensive e ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Abstract. We are interested in learning complex combinatorial features from relational data. We rely on an expressive and general representation language whose semantics allows us to express many features that have been used in different statistical relational learning settings. To avoid expensive exhaustive search over the space of relational features, we introduce a heuristic search algorithm guided by a generalized relational notion of information gain and a discriminant function. The algorithm succesfully finds interesting and interpretable features on artificial and real-world relational learning problems. 1
ALGORITHMS FOR NON-PARAMETRIC CLASSIFIERS IN MULTI-RELATIONAL DATA MINING
, 2006
"... Over the last decades, due to the advances in information technologies, both the indus-trial and scientific communities have acquired large volumes of data in digital form. Most of these data sets are stored using relational databases consisting of multiple tables and associations. Moreover, the dat ..."
Abstract
- Add to MetaCart
Over the last decades, due to the advances in information technologies, both the indus-trial and scientific communities have acquired large volumes of data in digital form. Most of these data sets are stored using relational databases consisting of multiple tables and associations. Moreover, the data used in the fields of bio-informatics, computational biol-ogy, HTML and XML documents are relational in nature. However, most of the existing approaches to knowledge discovery in databases, assume that the data are stored in a single table. Therefore, new algorithms are needed in order to exploit the relational infor-mation provided in these data sets. This thesis proposes two novel solutions to the task of supervised classification in relational domains, based on traditional non-parametric clas-sifiers and built upon relational algebra. The first approach is based on Kernel Density Estimation, and the second technique is based on Gaussian Mixture Models. Both tech-niques are evaluated using three real world relational data sets, drawn from the fields of organic chemistry, medicine and genetics.

