## Multi-Instance Kernels (2002)

Venue: | In Proc. 19th International Conf. on Machine Learning |

Citations: | 112 - 3 self |

### BibTeX

@INPROCEEDINGS{Gärtner02multi-instancekernels,

author = {Thomas Gärtner and Peter A. Flach and Adam Kowalczyk and Alex J. Smola},

title = {Multi-Instance Kernels},

booktitle = {In Proc. 19th International Conf. on Machine Learning},

year = {2002},

pages = {179--186},

publisher = {Morgan Kaufmann}

}

### Years of Citing Articles

### OpenURL

### Abstract

Learning from structured data is becoming increasingly important. However, most prior work on kernel methods has focused on learning from attribute-value data. Only recently, research started investigating kernels for structured data. This paper considers kernels for multi-instance problems -- a class of concepts on individuals represented by sets. The main result of this paper is a kernel on multi-instance data that can be shown to separate positive and negative sets under natural assumptions. This kernel compares favorably with state of the art multi-instance learning algorithms in an empirical study. Finally, we give some concluding remarks and propose future work that might further improve the results.

### Citations

2036 | Online learning with kernels
- Kivinen, Smola, et al.
- 2001
(Show Context)
Citation Context ...we give some concluding remarks and propose future work that might further improve the results. 1. Introduction Support vector machines (SVM) and other kernel methods (Boser et al., 1992; Schölkopf &=-= Smola, 2002) ha-=-ve successfully been applied to various tasks in attribute-value learning. Most ‘real-world’ data, however, has no natural representation as a tuple of constants. Defining kernels on individuals t... |

1295 | A training algorithm for optimal margin classifiers - Boser, Guyon, et al. - 1992 |

322 | New support vector algorithms
- Schölkopf, Smola, et al.
- 2000
(Show Context)
Citation Context ...mean and unit variance on a per coordinate basis. In the ray-kernel case, we simply used a polynomial kernel on s(X). In order to avoid adjusting too many parameters, we chose the ν-parameterization =-=(Schölkopf et al., 2000)-=-, with ν set to 0.075. The latter corresponds to an error level comparable to the ones in the published literature. Normalization proved to be critical: the un-normalized sets, and the sets normalize... |

183 | Solving the multiple instance problem with axis-parallel rectangles - Dietterich, Lathrop, et al. - 1997 |

178 | A framework for multiple-instance learning - Maron, Lozano-Perez - 1998 |

152 | Theories for mutagenicity: a study in first-order and feature-based induction
- Srinivasan, Muggleton, et al.
- 1996
(Show Context)
Citation Context ... element in the bag into account would significantly improve generalization. We consider now another drug activity prediction problem - predicting the mutagenicity of molecules. The original dataset (=-=Srinivasan et al., 1996-=-), described by a set of prolog predicates, has widely been used in the inductive logic programming community. The only time this dataset has been used with MI learning algorithms is described in (Che... |

126 | Top-down Induction of First Order Logical Decision Trees - Blockeel - 1998 |

100 | Learning from data - Cherkassky, Mulier - 1998 |

78 |
Learning kernel classifiers: theory and algorithms
- Herbrich
- 2002
(Show Context)
Citation Context ... to feature-space normalization, sample normalization, variance rescaling, etc. Feature-space normalization: We simply set fnorm(X) := � kset(X, X). (4) Thus we recover the normalization proposed by=-= (Herbrich, 2002), w-=-hich proves to be very useful in MI learning. Averaging: In this case we need to compute the generalized cardinality of the set X, which can be achieved via fnorm(X) := � X(x). (5) x∈XsFor simple ... |

70 | Solving the multipleinstance problem: A lazy learning approach - Wang, Zucker |

55 | On Learning from Multi-Instance Examples: Empirical Evaluation of a Theoretical Approach
- Auer
- 1997
(Show Context)
Citation Context ...hms that have specifically been designed for MI problems are: The axis-parallel rectangles (APR) algorithm and variants (Dietterich et al., 1997), an algorithm based on simple statistics of the bags (=-=Auer, 1997)-=-, and algorithms based on the diverse density approach (Maron & Lozano-Pérez, 1998; Zhang & Goldman, 2002). Algorithms that have been upgraded until now are: the lazy learning algorithms Bayesian-kNN... |

46 | Integrated segmentation and recognition of hand-printed numerals - Keeler, Rumelhart, et al. - 1990 |

46 | S.: Transformation-based learning using multirelational aggregation - Krogel, Wrobel - 2001 |

42 | Multiple Instance Regression - Ray, Page - 2001 |

33 |
Learning single and multiple instance decision tree for computer security applications
- Ruffo
- 2000
(Show Context)
Citation Context ...t have been upgraded until now are: the lazy learning algorithms Bayesian-kNN and CitationkNN (Wang & Zucker, 2000), the neural network MINN (Ramon & De Raedt, 2000), the decision tree learner RELIC (=-=Ruffo, 2001-=-), and the rule learner (Naive)RipperMI (Chevaleyre & Zucker, 2001). Inductive logic programming algorithms have also been used, for instance, the first-order decision tree learner TILDE (Blockeel & D... |

28 | A framework for learning rules from multiple instance data - Chevaleyre, Zucker - 2001 |

24 | Multiple-Instance Learning of real-valued Data
- Dooly, Zhang, et al.
- 2002
(Show Context)
Citation Context ...ich other kernel methods can now be extended to MI problems. For example, by simply plugging our kernel into SVM regression, the only very recently formulated problem of MI regression can be tackled (=-=Amar et al., 2001-=-; Ray & Page, 2001). Clustering and feature extraction tasks have to the best of our knowledge not yet been investigated for MI data. Support vector clustering and kernel principal component analysis ... |

13 |
Convolution kernels on discrete structures (Tech
- Haussler
- 1999
(Show Context)
Citation Context ...xi |Xi| ≤ |X | = 4), and p = 1 > 0. It follows that: kMI (X, Y ) = � kδ(x, y) = |X ∩ Y | = kset(X, Y ) x∈X,y∈YsIt follows directly from the lemma above and from the definition of convolutio=-=n kernels (Haussler, 1999-=-) that (for finite example sets) MI concepts can be separated with convolved Gaussian RBF kernels if the underlying concept can be separated with Gaussian RBF kernels, since in this case k p I is a Ga... |

3 | Kernel-Based Feature Space Transformation in Inductive Logic Programming. MSc Dissertation - Gärtner - 2000 |

1 | Multi instance neural networks. Attribute-Value and Relational Learning: Crossing the Boundaries. A - Ramon - 2000 |