Results 1  10
of
39
Bridging the gap between distance and generalisation: Symbolic learning in metric spaces
, 2008
"... Distancebased and generalisationbased methods are two families of artificial intelligence techniques that have been successfully used over a wide range of realworld problems. In the first case, general algorithms can be applied to any data representation by just changing the distance. The metric ..."
Abstract

Cited by 10 (4 self)
 Add to MetaCart
Distancebased and generalisationbased methods are two families of artificial intelligence techniques that have been successfully used over a wide range of realworld problems. In the first case, general algorithms can be applied to any data representation by just changing the distance. The metric space sets the search and learning space, which is generally instanceoriented. In the second case, models can be obtained for a given pattern language, which can be comprehensible. The generalityordered space sets the search and learning space, which is generally modeloriented. However, the concepts of distance and generalisation clash in many different ways, especially when knowledge representation is complex (e.g. structured data). This work establishes a framework where these two fields can be integrated in a consistent way. We introduce the concept of distancebased generalisation, which connects all the generalised examples in such a way that all of them are reachable inside the generalisation by using straight paths in the metric space. This makes the metric space and the generalityordered space coherent (or even dual). Additionally, we also introduce a definition of minimal distancebased generalisation that can be seen as the first formulation of the Minimum Description Length (MDL)/Minimum Message Length (MML) principle in terms of a distance function. We instantiate and develop the framework for the most common data representations and distances, where we show that consistent instances can be found for numerical data, nominal data, sets, lists, tuples, graphs, firstorder atoms and clauses. As a result, general learning methods that integrate the best from distancebased and generalisationbased methods can be defined and adapted to any specific problem by appropriately choosing the distance, the pattern language and the generalisation operator.
An Efficiently Computable Graphbased Metric for the Classication of Small Molecules
, 2008
"... In machine learning, there has been an increased interest in metrics on structured data. The application we focus on is drug discovery. Although graphs have become very popular for the representation of molecules, a lot of operations on graphs are NPcomplete. Representing the molecules as outerplan ..."
Abstract

Cited by 7 (3 self)
 Add to MetaCart
In machine learning, there has been an increased interest in metrics on structured data. The application we focus on is drug discovery. Although graphs have become very popular for the representation of molecules, a lot of operations on graphs are NPcomplete. Representing the molecules as outerplanar graphs, a subclass within general graphs, and using the blockandbridge preserving subgraph isomorphism, we define a metric and we present an algorithm for computing it in polynomial time. We evaluate this metric and more generally also the blockandbridge preserving matching operator on a large dataset of molecules, obtaining favorable results.
Effective feature construction by maximum common subgraph sampling
 MACHINE LEARNING
, 2011
"... The standard approach to feature construction and predictive learning in molecular datasets is to employ computationally expensive graph mining techniques and to bias the feature search exploration using frequency or correlation measures. These features are then typically employed in predictive mode ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
The standard approach to feature construction and predictive learning in molecular datasets is to employ computationally expensive graph mining techniques and to bias the feature search exploration using frequency or correlation measures. These features are then typically employed in predictive models that can be constructed using, for example, SVMs or decision trees. We take a different approach: rather than mining for all optimal local patterns, we extract features from the set of pairwise maximum common subgraphs. The maximum common subgraphs are computed under the blockandbridgepreserving subgraph isomorphism from the outerplanar examples in polynomial time. We empirically observe a significant increase in predictive performance when using maximum common subgraph features instead of correlated local patterns on 60 benchmark datasets from NCI. Moreover, we show that when we randomly sample the pairs of graphs from which to extract the maximum common subgraphs, we obtain a smaller set of features that still allows the same predictive performance as methods that exhaustively enumerate all possible patterns. The sampling strategy turns out to be a very good compromise between a slight decrease in predictive performance (although still remaining comparable with stateoftheart methods) and a significant runtime reduction (two orders of magnitude on a popular medium size chemoinformatics dataset). This suggests that maximum common subgraphs are interesting and meaningful features.
Efficient frequent connected subgraph mining in graphs of bounded treewidth
 In Proc. ECML/PKDD
, 2008
"... Abstract. The frequent connected subgraph mining problem, i.e., the problem of listing all connected graphs that are subgraph isomorphic to at least a certain number of transaction graphs of a database, cannot be solved in output polynomial time in the general case. If, however, the transaction grap ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
(Show Context)
Abstract. The frequent connected subgraph mining problem, i.e., the problem of listing all connected graphs that are subgraph isomorphic to at least a certain number of transaction graphs of a database, cannot be solved in output polynomial time in the general case. If, however, the transaction graphs are restricted to forests then the problem becomes tractable. In this paper we generalize the positive result on forests to graphs of bounded treewidth. In particular, we show that for this class of transaction graphs, frequent connected subgraphs can be listed in incremental polynomial time. Since subgraph isomorphism remains NPcomplete for bounded treewidth graphs, the positive complexity result of this paper shows that efficient frequent pattern mining is possible even for computationally hard pattern matching operators. 1
Signatures of Combinatorial Maps
, 2009
"... In this paper, we address the problem of computing a canonical representation of an ndimensional combinatorial map. For that, we define two combinatorial map signatures: the first one has a quadratic space complexity and may be used to decide of isomorphism with a new map in linear time whereas th ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
In this paper, we address the problem of computing a canonical representation of an ndimensional combinatorial map. For that, we define two combinatorial map signatures: the first one has a quadratic space complexity and may be used to decide of isomorphism with a new map in linear time whereas the second one has a linear space complexity and may be used to decide of isomorphism in quadratic time. Experimental results show that these signatures can be used to recognize images very efficiently.
Graph Mining: An Overview
"... In the early years of data mining and knowledge discovery in databases, method development focused on rigidly and plainly structured data. Most often efforts were even confined to data that can be represented as a simple table, which describes a set of sample cases by attributevalue pairs. Recent y ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
(Show Context)
In the early years of data mining and knowledge discovery in databases, method development focused on rigidly and plainly structured data. Most often efforts were even confined to data that can be represented as a simple table, which describes a set of sample cases by attributevalue pairs. Recent years, however, have seen a constantly growing interest in
Enumerating rooted biconnected planar graphs . . .
"... A graph is called a triangulated planar graph if it admits a plane embedding in the plane such that all inner faces are triangle. In a rooted triangulated planar graph, a vertex and two edges incident to it are designated as an outer vertex and outer edges, respectively. Two plane embedding of roote ..."
Abstract

Cited by 4 (4 self)
 Add to MetaCart
A graph is called a triangulated planar graph if it admits a plane embedding in the plane such that all inner faces are triangle. In a rooted triangulated planar graph, a vertex and two edges incident to it are designated as an outer vertex and outer edges, respectively. Two plane embedding of rooted triangulated planar graphs are defined to be equivalent if they admit an isomorphism such that the designated vertices correspond each other. Given a positive integer n, we give an algorithm for enumerating all plane embeddings of rooted, biconnected and triangulated planar graphs with at most n vertices without delivering two equivalent embeddings. The algorithm runs in constant time per each by outputting the difference from the previous output.
GPM: A Graph Pattern Matching Kernel with Diffusion for Accurate Graph Classification
, 2008
"... Graph data mining is an active research area. Graphs are general modeling tools to organize information from heterogenous sources and have been applied in many scientific, engineering, and business fields. With the fast accumulation of graph data, building highly accurate predictive models for graph ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
Graph data mining is an active research area. Graphs are general modeling tools to organize information from heterogenous sources and have been applied in many scientific, engineering, and business fields. With the fast accumulation of graph data, building highly accurate predictive models for graph data emerges as a new challenge that has not been fully explored in the data mining community. In this paper, we demonstrate a novel technique called G raph P attern M atching kernel (GPM). Our idea is to leverage existing frequent pattern discovery methods and to explore the application of kernel classifier (e.g. support vector machine) in building highly accurate graph classification. In our method, we first identify all frequent patterns from a graph database. We then map subgraphs to graphs in the graph database and use a process we call “pattern diffusion” to label nodes in the graphs. Finally we designed a novel graph matching algorithm to compute a graph kernel. We have performed a comprehensive testing of our algorithm using 16 chemical structure data sets and have compared our methods to all major graph kernel functions that we know. The experimental results demonstrate that our method outperforms stateoftheart graph kernel methods with a large margin.
Polynomialdelay enumeration of monotonic graph classes
 Journal of Machine Learning Research
"... Algorithms that list graphs such that no two listed graphs are isomorphic, are important building blocks of systems for mining and learning in graphs. Algorithms are already known that solve this problem efficiently for many classes of graphs of restricted topology, such as trees. In this article we ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
Algorithms that list graphs such that no two listed graphs are isomorphic, are important building blocks of systems for mining and learning in graphs. Algorithms are already known that solve this problem efficiently for many classes of graphs of restricted topology, such as trees. In this article we introduce the concept of a dense augmentation schema, and introduce an algorithm that can be used to enumerate any class of graphs with polynomial delay, as long as the class of graphs can be described using a monotonic predicate operating on a dense augmentation schema. In practice this means that this is the first enumeration algorithm that can be applied theoretically efficiently in any frequent subgraph mining algorithm, and that this algorithm generalizes to situations beyond the standard frequent subgraph mining setting.
Efficient Search of Combinatorial Maps using Signatures
, 2010
"... In this paper, we address the problem of computing canonical representations of ndimensional combinatorial maps and of using them for efficiently searching for a map in a database. We define two combinatorial map signatures: the first one has a quadratic space complexity and may be used to decide o ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
In this paper, we address the problem of computing canonical representations of ndimensional combinatorial maps and of using them for efficiently searching for a map in a database. We define two combinatorial map signatures: the first one has a quadratic space complexity and may be used to decide of isomorphism with a new map in linear time whereas the second one has a linear space complexity and may be used to decide of isomorphism in quadratic time. We show that these signatures can be used to efficiently search for a map in a database.