Results 1  10
of
153
Support vector machines for multipleinstance learning
 Advances in Neural Information Processing Systems 15
, 2003
"... This paper presents two new formulations of multipleinstance learning as a maximum margin problem. The proposed extensions of the Support Vector Machine (SVM) learning approach lead to mixed integer quadratic programs that can be solved heuristically. Our generalization of SVMs makes a stateofthe ..."
Abstract

Cited by 309 (2 self)
 Add to MetaCart
This paper presents two new formulations of multipleinstance learning as a maximum margin problem. The proposed extensions of the Support Vector Machine (SVM) learning approach lead to mixed integer quadratic programs that can be solved heuristically. Our generalization of SVMs makes a stateoftheart classification technique, including nonlinear classification via kernels, available to an area that up to now has been largely dominated by special purpose methods. We present experimental results on a pharmaceutical data set and on applications in automated image indexing and document categorization. 1
A Survey of Kernels for Structured Data
, 2003
"... Kernel methods in general and support vector machines in particular have been successful in various learning tasks on data represented in a single table. Much ‘realworld’ data, however, is structured – it has no natural representation in a single table. Usually, to apply kernel methods to ‘realwor ..."
Abstract

Cited by 146 (2 self)
 Add to MetaCart
Kernel methods in general and support vector machines in particular have been successful in various learning tasks on data represented in a single table. Much ‘realworld’ data, however, is structured – it has no natural representation in a single table. Usually, to apply kernel methods to ‘realworld’ data, extensive preprocessing is performed to embed the data into a real vector space and thus in a single table. This survey describes several approaches of defining positive definite kernels on structured instances directly.
A Hilbert space embedding for distributions
 In Algorithmic Learning Theory: 18th International Conference
, 2007
"... Abstract. We describe a technique for comparing distributions without the need for density estimation as an intermediate step. Our approach relies on mapping the distributions into a reproducing kernel Hilbert space. Applications of this technique can be found in twosample tests, which are used for ..."
Abstract

Cited by 110 (45 self)
 Add to MetaCart
(Show Context)
Abstract. We describe a technique for comparing distributions without the need for density estimation as an intermediate step. Our approach relies on mapping the distributions into a reproducing kernel Hilbert space. Applications of this technique can be found in twosample tests, which are used for determining whether two sets of observations arise from the same distribution, covariate shift correction, local learning, measures of independence, and density estimation. Kernel methods are widely used in supervised learning [1, 2, 3, 4], however they are much less established in the areas of testing, estimation, and analysis of probability distributions, where information theoretic approaches [5, 6] have long been dominant. Recent examples include [7] in the context of construction of graphical models, [8] in the context of feature extraction, and [9] in the context of independent component analysis. These methods have by and large a common issue: to compute quantities such as the mutual information, entropy, or KullbackLeibler divergence, we require sophisticated space partitioning and/or
Learning to extract relations from the web using minimal supervision
 In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (ACL’07
, 2007
"... We present a new approach to relation extraction that requires only a handful of training examples. Given a few pairs of named entities known to exhibit or not exhibit a particular relation, bags of sentences containing the pairs are extracted from the web. We extend an existing relation extraction ..."
Abstract

Cited by 79 (2 self)
 Add to MetaCart
We present a new approach to relation extraction that requires only a handful of training examples. Given a few pairs of named entities known to exhibit or not exhibit a particular relation, bags of sentences containing the pairs are extracted from the web. We extend an existing relation extraction method to handle this weaker form of supervision, and present experimental results demonstrating that our approach can reliably extract relations from web documents. 1
A kernel method for the two sample problem
 ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 19
, 2007
"... We propose a framework for analyzing and comparing distributions, allowing us to design statistical tests to determine if two samples are drawn from different distributions. Our test statistic is the largest difference in expectations over functions in the unit ball of a reproducing kernel Hilbert ..."
Abstract

Cited by 72 (19 self)
 Add to MetaCart
We propose a framework for analyzing and comparing distributions, allowing us to design statistical tests to determine if two samples are drawn from different distributions. Our test statistic is the largest difference in expectations over functions in the unit ball of a reproducing kernel Hilbert space (RKHS). We present two tests based on large deviation bounds for the test statistic, while a third is based on the asymptotic distribution of this statistic. The test statistic can be computed in quadratic time, although efficient linear time approximations are available. Several classical metrics on distributions are recovered when the function space used to compute the difference in expectations is allowed to be more general (eg. a Banach space). We apply our twosample tests to a variety of problems, including attribute matching for databases using the Hungarian marriage method, where they perform strongly. Excellent performance is also obtained when comparing distributions over graphs, for which these are the first such tests.
Kernels and Distances for Structured Data
 Machine Learning
, 2004
"... This paper brings together two strands of machine learning of increasing importance: kernel methods and highly structured data. We propose a general method for constructing a kernel following the syntactic structure of the data, as defined by its type signature in a higherorder logic. Our main theo ..."
Abstract

Cited by 65 (3 self)
 Add to MetaCart
This paper brings together two strands of machine learning of increasing importance: kernel methods and highly structured data. We propose a general method for constructing a kernel following the syntactic structure of the data, as defined by its type signature in a higherorder logic. Our main theoretical result is the positive definiteness of any kernel thus defined. We report encouraging experimental results on a range of realworld datasets. By converting our kernel to a distance pseudometric for 1nearest neighbour, we were able to improve the best accuracy from the literature on the Diterpene dataset by more than 10%.
Supervised versus multiple instance learning: An empirical comparison
 Proceedings of 22nd International Conference on Machine Learning (ICML2005
, 2005
"... We empirically study the relationship between supervised and multiple instance (MI) learning. Algorithms to learn various concepts have been adapted to the MI representation. However, it is also known that concepts that are PAClearnable with onesided noise can be learned from MI data. A relevant q ..."
Abstract

Cited by 64 (3 self)
 Add to MetaCart
(Show Context)
We empirically study the relationship between supervised and multiple instance (MI) learning. Algorithms to learn various concepts have been adapted to the MI representation. However, it is also known that concepts that are PAClearnable with onesided noise can be learned from MI data. A relevant question then is: how well do supervised learners do on MI data? We attempt to answer this question by looking at a cross section of MI data sets from various domains coupled with a number of learning algorithms including Diverse Density, Logistic Regression, nonlinear Support Vector Machines and FOIL. We consider a supervised and MI version of each learner. Several interesting conclusions emerge from our work: (1) no MI algorithm is superior across all tested domains, (2) some MI algorithms are consistently superior to their supervised counterparts, (3) using high falsepositive costs can improve a supervised learner’s performance in MI domains, and (4) in several domains, a supervised algorithm is superior to any MI algorithm we tested. 1.
Kernels for SemiStructured Data
 In Proc. of ICML
, 2002
"... Semistructured data such as XML and HTML is attracting considerable attention. It is important to develop various kinds of data mining techniques that can handle semistructured data. In this paper, we discuss applications of kernel methods for semistructured data. We model semistructured data by l ..."
Abstract

Cited by 61 (7 self)
 Add to MetaCart
(Show Context)
Semistructured data such as XML and HTML is attracting considerable attention. It is important to develop various kinds of data mining techniques that can handle semistructured data. In this paper, we discuss applications of kernel methods for semistructured data. We model semistructured data by labeled ordered trees, and present kernels for classifying labeled ordered trees based on their tag structures by generalizing the convolution kernel for parse trees introduced by Collins and Duffy (2001). We give algorithms to efficiently compute the kernels for labeled ordered trees. We also apply our kernels to node marking problems that are special cases of information extraction from trees. Preliminary experiments using artificial data and real HTML documents show encouraging results. 1.
Logistic Regression and Boosting for Labeled Bags of Instances
 Proc. of the PacificAsia Conf. on Knowledge Discovery and Data Mining
, 2004
"... In this paper we upgrade linear logistic regression and boosting to multiinstance data, where each example consists of a labeled bag of instances. This is done by connecting predictions for individual instances to a baglevel probability estimate by simple averaging and maximizing the likelihoo ..."
Abstract

Cited by 55 (3 self)
 Add to MetaCart
(Show Context)
In this paper we upgrade linear logistic regression and boosting to multiinstance data, where each example consists of a labeled bag of instances. This is done by connecting predictions for individual instances to a baglevel probability estimate by simple averaging and maximizing the likelihood at the bag levelin other words, by assuming that all instances contribute equally and independently to a bag's label. We present empirical results for artificial data generated according to the underlying generative model that we assume, and also show that the two algorithms produce competitive results on the Musk benchmark datasets.
A twolevel learning method for generalized multiinstance problems
 In Proceedings of the Fourteenth European Conference on Machine Learning
, 2003
"... Abstract. In traditional multiinstance (MI) learning, a single positive instance in a bag produces a positive class label. Hence, the learner knows how the bag’s class label depends on the labels of the instances in the bag and can explicitly use this information to solve the learning task. In this ..."
Abstract

Cited by 48 (5 self)
 Add to MetaCart
(Show Context)
Abstract. In traditional multiinstance (MI) learning, a single positive instance in a bag produces a positive class label. Hence, the learner knows how the bag’s class label depends on the labels of the instances in the bag and can explicitly use this information to solve the learning task. In this paper we investigate a generalized view of the MI problem where this simple assumption no longer holds. We assume that an “interaction” between instances in a bag determines the class label. Our twolevel learning method for this type of problem transforms an MI bag into a single metainstance that can be learned by a standard propositional method. The metainstance indicates which regions in the instance space are covered by instances of the bag. Results on both artificial and realworld data show that this twolevel classification approach is well suited for generalized MI problems. 1