Results 1  10
of
46
ShortestPath Kernels on Graphs
 In Proceedings of the 2005 International Conference on Data Mining
, 2005
"... Data mining algorithms are facing the challenge to deal with an increasing number of complex objects. For graph data, a whole toolbox of data mining algorithms becomes available by defining a kernel function on instances of graphs. Graph kernels based on walks, subtrees and cycles in graphs have bee ..."
Abstract

Cited by 61 (5 self)
 Add to MetaCart
(Show Context)
Data mining algorithms are facing the challenge to deal with an increasing number of complex objects. For graph data, a whole toolbox of data mining algorithms becomes available by defining a kernel function on instances of graphs. Graph kernels based on walks, subtrees and cycles in graphs have been proposed so far. As a general problem, these kernels are either computationally expensive or limited in their expressiveness. We try to overcome this problem by defining expressive graph kernels which are based on paths. As the computation of all paths and longest paths in a graph is NPhard, we propose graph kernels based on shortest paths. These kernels are computable in polynomial time, retain expressivity and are still positive definite. In experiments on classification of graph models of proteins, our shortestpath kernels show significantly higher classification accuracy than walkbased kernels. 1
Graph Kernels for Chemical Informatics
, 2005
"... Increased availability of large repositories of chemical compounds is creating new challenges and opportunities for the application of machine learning methods to problems in computational chemistry and chemical informatics. Because chemical compounds are often represented by the graph of their cova ..."
Abstract

Cited by 58 (7 self)
 Add to MetaCart
Increased availability of large repositories of chemical compounds is creating new challenges and opportunities for the application of machine learning methods to problems in computational chemistry and chemical informatics. Because chemical compounds are often represented by the graph of their covalent bonds, machine learning methods in this domain must be capable of processing graphical structures with variable size. Here we first briefly review the literature on graph kernels and then introduce three new kernels (Tanimoto, MinMax, Hybrid) based on the idea of molecular fingerprints and counting labeled paths of depth up to d using depthfirst search from each possible vertex. The kernels are applied to three classification problems to predict mutagenicity, toxicity, and anticancer activity on three publicly available data sets. The kernels achieve performances at least comparable, and most often superior, to those previously reported in the literature reaching accuracies of 91.5 % on the Mutag dataset, 6567 % on the PTC (Predictive Toxicology Challenge) dataset, and 72 % on the NCI (National Cancer Institute) dataset. Properties and tradeoffs of these kernels, as well as other proposed kernels that leverage 1D or 3D representations of molecules, are briefly discussed.
Characterizing structural relationships in scenes using graph kernels
 In ACM TOG
, 2011
"... Modeling virtual environments is a time consuming and expensive task that is becoming increasingly popular for both professional and casual artists. The model density and complexity of the scenes representing these virtual environments is rising rapidly. This trend suggests that datamining a 3D sc ..."
Abstract

Cited by 51 (5 self)
 Add to MetaCart
Modeling virtual environments is a time consuming and expensive task that is becoming increasingly popular for both professional and casual artists. The model density and complexity of the scenes representing these virtual environments is rising rapidly. This trend suggests that datamining a 3D scene corpus to facilitate collaborative content creation could be a very powerful tool enabling more efficient scene design. In this paper, we show how to represent scenes as graphs that encode models and their semantic relationships. We then define a kernel between these relationship graphs that compares common virtual substructures in two graphs and captures the similarity between their corresponding scenes. We apply this framework to several scene modeling problems, such as finding similar scenes, relevance feedback, and contextbased model search. We show that incorporating structural relationships allows our method to provide a more relevant set of results when compared against previous approaches to model context search.
Image classification with segmentation graph kernels
 In Proc. CVPR
, 2007
"... We propose a family of kernels between images, defined as kernels between their respective segmentation graphs. The kernels are based on soft matching of subtreepatterns of the respective graphs, leveraging the natural structure of images while remaining robust to the associated segmentation proces ..."
Abstract

Cited by 47 (12 self)
 Add to MetaCart
(Show Context)
We propose a family of kernels between images, defined as kernels between their respective segmentation graphs. The kernels are based on soft matching of subtreepatterns of the respective graphs, leveraging the natural structure of images while remaining robust to the associated segmentation process uncertainty. Indeed, output from morphological segmentation is often represented by a labelled graph, each vertex corresponding to a segmented region, with edges joining neighboring regions. However, such image representations have mostly remained underused for learning tasks, partly because of the observed instability of the segmentation process and the inherent hardness of inexact graph matching with uncertain graphs. Our kernels count common virtual substructures amongst images, which enables to perform efficient supervised classification of natural images with a support vector machine. Moreover, the kernel machinery allows us to take advantage of recent advances in kernelbased learning: i) semisupervised learning reduces the required number of labelled images, while ii) multiple kernel learning algorithms efficiently select the most relevant similarity measures between images within our family. 1.
Graph kernels for molecular structureactivity relationship analysis with support vector machines
 Journal of Chemical Information and Modeling
"... The support vector machine algorithm together with graph kernel functions has recently been introduced to model structureactivity relationships (SAR) of molecules from their 2D structure, without the need for explicit molecular descriptor computation. We propose two extensions to this approach with ..."
Abstract

Cited by 42 (5 self)
 Add to MetaCart
(Show Context)
The support vector machine algorithm together with graph kernel functions has recently been introduced to model structureactivity relationships (SAR) of molecules from their 2D structure, without the need for explicit molecular descriptor computation. We propose two extensions to this approach with the double goal to reduce the computational burden associated with the model and to enhance its predictive accuracy: description of the molecules by a Morgan index process and definition of a secondorder Markov model for random walks on 2D structures. Experiments on two mutagenicity data sets validate the proposed extensions, making this approach a possible complementary alternative to other modeling strategies. 1.
Fast subtree kernels on graphs
"... In this article, we propose fast subtree kernels on graphs. On graphs with n nodes and m edges and maximum degree d, these kernels comparing subtrees of height h can be computed in O(mh), whereas the classic subtree kernel by Ramon & Gärtner scales as O(n 2 4 d h). Key to this efficiency is the ..."
Abstract

Cited by 28 (2 self)
 Add to MetaCart
(Show Context)
In this article, we propose fast subtree kernels on graphs. On graphs with n nodes and m edges and maximum degree d, these kernels comparing subtrees of height h can be computed in O(mh), whereas the classic subtree kernel by Ramon & Gärtner scales as O(n 2 4 d h). Key to this efficiency is the observation that the WeisfeilerLehman test of isomorphism from graph theory elegantly computes a subtree kernel as a byproduct. Our fast subtree kernels can deal with labeled graphs, scale up easily to large graphs and outperform stateoftheart graph kernels on several classification benchmark datasets in terms of accuracy and runtime. 1
Direct mining of discriminative and essential frequent patterns via modelbased search tree
 In KDD
, 2008
"... Frequent patterns provide solutions to datasets that do not have wellstructured feature vectors. However, frequent pattern mining is nontrivial since the number of unique patterns is exponential but many are nondiscriminative and correlated. Currently, frequent pattern mining is performed in two ..."
Abstract

Cited by 26 (7 self)
 Add to MetaCart
(Show Context)
Frequent patterns provide solutions to datasets that do not have wellstructured feature vectors. However, frequent pattern mining is nontrivial since the number of unique patterns is exponential but many are nondiscriminative and correlated. Currently, frequent pattern mining is performed in two sequential steps: enumerating a set of frequent patterns, followed by feature selection. Although many methods have been proposed in the past few years on how to perform each separate step efficiently, there is still limited success in eventually finding highly compact and discriminative patterns. The culprit is due to the inherent nature of this widely adopted twostep approach. This paper discusses these problems and proposes a new and different method. It builds a decision tree that partitions the data onto different
Nearoptimal supervised feature selection among frequent subgraphs
 IN SIAM INT’L CONF. ON DATA MINING
, 2009
"... Graph classification is an increasingly important step in numerous application domains, such as function prediction of molecules and proteins, computerised scene analysis, and anomaly detection in program flows. Among the various approaches proposed in the literature, graph classification based on f ..."
Abstract

Cited by 24 (10 self)
 Add to MetaCart
(Show Context)
Graph classification is an increasingly important step in numerous application domains, such as function prediction of molecules and proteins, computerised scene analysis, and anomaly detection in program flows. Among the various approaches proposed in the literature, graph classification based on frequent subgraphs is a popular branch: Graphs are represented as (usually binary) vectors, with components indicating whether a graph contains a particular subgraph that is frequent across the dataset. On large graphs, however, one faces the enormous problem that the number of these frequent subgraphs may grow exponentially with the size of the graphs, but only few of them possess enough discriminative power to make them
Weighted decomposition kernels
 IN: ICML ’05: PROCEEDINGS OF THE 22ND INTERNATIONAL CONFERENCE ON MACHINE LEARNING
, 2005
"... We introduce a family of kernels on discrete data structures within the general class of decomposition kernels. A weighted decomposition kernel (WDK) is computed by dividing objects into substructures indexed by a selector. Two substructures are then matched if their selectors satisfy an equality pr ..."
Abstract

Cited by 23 (5 self)
 Add to MetaCart
(Show Context)
We introduce a family of kernels on discrete data structures within the general class of decomposition kernels. A weighted decomposition kernel (WDK) is computed by dividing objects into substructures indexed by a selector. Two substructures are then matched if their selectors satisfy an equality predicate, while the importance of the match is determined by a probability kernel on local distributions fitted on the substructures. Under reasonable assumptions, a WDK can be computed efficiently and can avoid combinatorial explosion of the feature space. We report experimental evidence that the proposed kernel is highly competitive with respect to more complex stateoftheart methods on a set of problems in bioinformatics.
Graph kernels for disease outcome prediction from proteinprotein interaction networks
 Proceedings of the Pacific Symposium of Biocomputing 2007, Maui Hawaii
, 2007
"... It is widely believed that comparing discrepancies in the proteinprotein interaction (PPI) networks of individuals will become an important tool in understanding and preventing diseases. Currently PPI networks for individuals are not available, but gene expression data is becoming easier to obtain ..."
Abstract

Cited by 19 (2 self)
 Add to MetaCart
(Show Context)
It is widely believed that comparing discrepancies in the proteinprotein interaction (PPI) networks of individuals will become an important tool in understanding and preventing diseases. Currently PPI networks for individuals are not available, but gene expression data is becoming easier to obtain and allows us to represent individuals by a cointegrated gene expression/protein interaction network. Two major problems hamper the application of graph kernels – stateoftheart methods for wholegraph comparison – to compare PPI networks. First, these methods do not scale to graphs of the size of a PPI network. Second, missing edges in these interaction networks are biologically relevant for detecting discrepancies, yet, these methods do not take this into account. In this article we present graph kernels for biological network comparison that are fast to compute and take into account missing interactions. We evaluate their practical performance on two datasets of cointegrated gene expression/PPI networks. 1.