Results 1  10
of
92
Support vector machines for multipleinstance learning
 Advances in Neural Information Processing Systems 15
, 2003
"... This paper presents two new formulations of multipleinstance learning as a maximum margin problem. The proposed extensions of the Support Vector Machine (SVM) learning approach lead to mixed integer quadratic programs that can be solved heuristically. Our generalization of SVMs makes a stateofthe ..."
Abstract

Cited by 184 (2 self)
 Add to MetaCart
This paper presents two new formulations of multipleinstance learning as a maximum margin problem. The proposed extensions of the Support Vector Machine (SVM) learning approach lead to mixed integer quadratic programs that can be solved heuristically. Our generalization of SVMs makes a stateoftheart classification technique, including nonlinear classification via kernels, available to an area that up to now has been largely dominated by special purpose methods. We present experimental results on a pharmaceutical data set and on applications in automated image indexing and document categorization. 1
A Survey of Kernels for Structured Data
"... Kernel methods in general and support vector machines in particular have been successful in various learning tasks on data represented in a single table. Much 'realworld ' data, however, is structured it has no natural representation in a single table. Usually, to apply kernel methods to 'realworl ..."
Abstract

Cited by 113 (3 self)
 Add to MetaCart
Kernel methods in general and support vector machines in particular have been successful in various learning tasks on data represented in a single table. Much 'realworld ' data, however, is structured it has no natural representation in a single table. Usually, to apply kernel methods to 'realworld' data, extensive preprocessing is performed toembed the data into areal vector space and thus in a single table. This survey describes several approaches ofdefining positive definite kernels on structured instances directly.
Learning to extract relations from the web using minimal supervision
 In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (ACL’07
, 2007
"... We present a new approach to relation extraction that requires only a handful of training examples. Given a few pairs of named entities known to exhibit or not exhibit a particular relation, bags of sentences containing the pairs are extracted from the web. We extend an existing relation extraction ..."
Abstract

Cited by 55 (2 self)
 Add to MetaCart
We present a new approach to relation extraction that requires only a handful of training examples. Given a few pairs of named entities known to exhibit or not exhibit a particular relation, bags of sentences containing the pairs are extracted from the web. We extend an existing relation extraction method to handle this weaker form of supervision, and present experimental results demonstrating that our approach can reliably extract relations from web documents. 1
A Hilbert space embedding for distributions
 In Algorithmic Learning Theory: 18th International Conference
, 2007
"... Abstract. We describe a technique for comparing distributions without the need for density estimation as an intermediate step. Our approach relies on mapping the distributions into a reproducing kernel Hilbert space. Applications of this technique can be found in twosample tests, which are used for ..."
Abstract

Cited by 52 (25 self)
 Add to MetaCart
Abstract. We describe a technique for comparing distributions without the need for density estimation as an intermediate step. Our approach relies on mapping the distributions into a reproducing kernel Hilbert space. Applications of this technique can be found in twosample tests, which are used for determining whether two sets of observations arise from the same distribution, covariate shift correction, local learning, measures of independence, and density estimation. Kernel methods are widely used in supervised learning [1, 2, 3, 4], however they are much less established in the areas of testing, estimation, and analysis of probability distributions, where information theoretic approaches [5, 6] have long been dominant. Recent examples include [7] in the context of construction of graphical models, [8] in the context of feature extraction, and [9] in the context of independent component analysis. These methods have by and large a common issue: to compute quantities such as the mutual information, entropy, or KullbackLeibler divergence, we require sophisticated space partitioning and/or
Kernels for SemiStructured Data
 In Proc. of ICML
, 2002
"... Semistructured data such as XML and HTML is attracting considerable attention. It is important to develop various kinds of data mining techniques that can handle semistructured data. In this paper, we discuss applications of kernel methods for semistructured data. We model semistructured data by l ..."
Abstract

Cited by 46 (5 self)
 Add to MetaCart
Semistructured data such as XML and HTML is attracting considerable attention. It is important to develop various kinds of data mining techniques that can handle semistructured data. In this paper, we discuss applications of kernel methods for semistructured data. We model semistructured data by labeled ordered trees, and present kernels for classifying labeled ordered trees based on their tag structures by generalizing the convolution kernel for parse trees introduced by Collins and Duffy (2001). We give algorithms to efficiently compute the kernels for labeled ordered trees. We also apply our kernels to node marking problems that are special cases of information extraction from trees. Preliminary experiments using artificial data and real HTML documents show encouraging results. 1.
Supervised versus multiple instance learning: An empirical comparison
 Proceedings of 22nd International Conference on Machine Learning (ICML2005
, 2005
"... We empirically study the relationship between supervised and multiple instance (MI) learning. Algorithms to learn various concepts have been adapted to the MI representation. However, it is also known that concepts that are PAClearnable with onesided noise can be learned from MI data. A relevant q ..."
Abstract

Cited by 45 (2 self)
 Add to MetaCart
We empirically study the relationship between supervised and multiple instance (MI) learning. Algorithms to learn various concepts have been adapted to the MI representation. However, it is also known that concepts that are PAClearnable with onesided noise can be learned from MI data. A relevant question then is: how well do supervised learners do on MI data? We attempt to answer this question by looking at a cross section of MI data sets from various domains coupled with a number of learning algorithms including Diverse Density, Logistic Regression, nonlinear Support Vector Machines and FOIL. We consider a supervised and MI version of each learner. Several interesting conclusions emerge from our work: (1) no MI algorithm is superior across all tested domains, (2) some MI algorithms are consistently superior to their supervised counterparts, (3) using high falsepositive costs can improve a supervised learner’s performance in MI domains, and (4) in several domains, a supervised algorithm is superior to any MI algorithm we tested. 1.
A kernel method for the two sample problem
 ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 19
, 2007
"... We propose a framework for analyzing and comparing distributions, allowing us to design statistical tests to determine if two samples are drawn from different distributions. Our test statistic is the largest difference in expectations over functions in the unit ball of a reproducing kernel Hilbert ..."
Abstract

Cited by 38 (13 self)
 Add to MetaCart
We propose a framework for analyzing and comparing distributions, allowing us to design statistical tests to determine if two samples are drawn from different distributions. Our test statistic is the largest difference in expectations over functions in the unit ball of a reproducing kernel Hilbert space (RKHS). We present two tests based on large deviation bounds for the test statistic, while a third is based on the asymptotic distribution of this statistic. The test statistic can be computed in quadratic time, although efficient linear time approximations are available. Several classical metrics on distributions are recovered when the function space used to compute the difference in expectations is allowed to be more general (eg. a Banach space). We apply our twosample tests to a variety of problems, including attribute matching for databases using the Hungarian marriage method, where they perform strongly. Excellent performance is also obtained when comparing distributions over graphs, for which these are the first such tests.
Logistic Regression and Boosting for Labeled Bags of Instances
 Proc. of the PacificAsia Conf. on Knowledge Discovery and Data Mining
, 2004
"... In this paper we upgrade linear logistic regression and boosting to multiinstance data, where each example consists of a labeled bag of instances. This is done by connecting predictions for individual instances to a baglevel probability estimate by simple averaging and maximizing the likelihoo ..."
Abstract

Cited by 34 (2 self)
 Add to MetaCart
In this paper we upgrade linear logistic regression and boosting to multiinstance data, where each example consists of a labeled bag of instances. This is done by connecting predictions for individual instances to a baglevel probability estimate by simple averaging and maximizing the likelihood at the bag levelin other words, by assuming that all instances contribute equally and independently to a bag's label. We present empirical results for artificial data generated according to the underlying generative model that we assume, and also show that the two algorithms produce competitive results on the Musk benchmark datasets.
A twolevel learning method for generalized multiinstance problems
 In Proceedings of the Fourteenth European Conference on Machine Learning
, 2003
"... Abstract. In traditional multiinstance (MI) learning, a single positive instance in a bag produces a positive class label. Hence, the learner knows how the bag’s class label depends on the labels of the instances in the bag and can explicitly use this information to solve the learning task. In this ..."
Abstract

Cited by 34 (4 self)
 Add to MetaCart
Abstract. In traditional multiinstance (MI) learning, a single positive instance in a bag produces a positive class label. Hence, the learner knows how the bag’s class label depends on the labels of the instances in the bag and can explicitly use this information to solve the learning task. In this paper we investigate a generalized view of the MI problem where this simple assumption no longer holds. We assume that an “interaction” between instances in a bag determines the class label. Our twolevel learning method for this type of problem transforms an MI bag into a single metainstance that can be learned by a standard propositional method. The metainstance indicates which regions in the instance space are covered by instances of the bag. Results on both artificial and realworld data show that this twolevel classification approach is well suited for generalized MI problems. 1
Multiple Instance Learning for Sparse Positive Bags
 In ICML
, 2007
"... We present a new approach to multiple instance learning (MIL) that is particularly effective when the positive bags are sparse (i.e. contain few positive instances). Unlike other SVMbased MIL methods, our approach more directly enforces the desired constraint that at least one of the instances in a ..."
Abstract

Cited by 29 (1 self)
 Add to MetaCart
We present a new approach to multiple instance learning (MIL) that is particularly effective when the positive bags are sparse (i.e. contain few positive instances). Unlike other SVMbased MIL methods, our approach more directly enforces the desired constraint that at least one of the instances in a positive bag is positive. Using both artificial and realworld data, we experimentally demonstrate that our approach achieves greater accuracy than stateoftheart MIL methods when positive bags are sparse, and performs competitively when they are not. In particular, our approach is the best performing method for image region classification. 1.