Results 1 - 10
of
62
Support vector machines for multiple-instance learning
- Advances in Neural Information Processing Systems 15
, 2003
"... This paper presents two new formulations of multiple-instance learning as a maximum margin problem. The proposed extensions of the Support Vector Machine (SVM) learning approach lead to mixed integer quadratic programs that can be solved heuristically. Our generalization of SVMs makes a state-of-the ..."
Abstract
-
Cited by 124 (2 self)
- Add to MetaCart
This paper presents two new formulations of multiple-instance learning as a maximum margin problem. The proposed extensions of the Support Vector Machine (SVM) learning approach lead to mixed integer quadratic programs that can be solved heuristically. Our generalization of SVMs makes a state-of-the-art classification technique, including non-linear classification via kernels, available to an area that up to now has been largely dominated by special purpose methods. We present experimental results on a pharmaceutical data set and on applications in automated image indexing and document categorization. 1
A Survey of Kernels for Structured Data
"... Kernel methods in general and support vector machines in particular have been successful in various learning tasks on data represented in a single table. Much 'real-world ' data, however, is structured- it has no natural representation in a single table. Usually, to apply kernel methods to 'realworl ..."
Abstract
-
Cited by 84 (3 self)
- Add to MetaCart
Kernel methods in general and support vector machines in particular have been successful in various learning tasks on data represented in a single table. Much 'real-world ' data, however, is structured- it has no natural representation in a single table. Usually, to apply kernel methods to 'realworld' data, extensive pre-processing is performed toembed the data into areal vector space and thus in a single table. This survey describes several approaches ofdefining positive definite kernels on structured instances directly.
Learning to extract relations from the web using minimal supervision
- In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (ACL’07
, 2007
"... We present a new approach to relation extraction that requires only a handful of training examples. Given a few pairs of named entities known to exhibit or not exhibit a particular relation, bags of sentences containing the pairs are extracted from the web. We extend an existing relation extraction ..."
Abstract
-
Cited by 34 (1 self)
- Add to MetaCart
We present a new approach to relation extraction that requires only a handful of training examples. Given a few pairs of named entities known to exhibit or not exhibit a particular relation, bags of sentences containing the pairs are extracted from the web. We extend an existing relation extraction method to handle this weaker form of supervision, and present experimental results demonstrating that our approach can reliably extract relations from web documents. 1
Supervised versus multiple instance learning: An empirical comparison
- Proceedings of 22nd International Conference on Machine Learning (ICML-2005
, 2005
"... We empirically study the relationship between supervised and multiple instance (MI) learning. Algorithms to learn various concepts have been adapted to the MI representation. However, it is also known that concepts that are PAC-learnable with one-sided noise can be learned from MI data. A relevant q ..."
Abstract
-
Cited by 33 (2 self)
- Add to MetaCart
We empirically study the relationship between supervised and multiple instance (MI) learning. Algorithms to learn various concepts have been adapted to the MI representation. However, it is also known that concepts that are PAC-learnable with one-sided noise can be learned from MI data. A relevant question then is: how well do supervised learners do on MI data? We attempt to answer this question by looking at a cross section of MI data sets from various domains coupled with a number of learning algorithms including Diverse Density, Logistic Regression, nonlinear Support Vector Machines and FOIL. We consider a supervised and MI version of each learner. Several interesting conclusions emerge from our work: (1) no MI algorithm is superior across all tested domains, (2) some MI algorithms are consistently superior to their supervised counterparts, (3) using high false-positive costs can improve a supervised learner’s performance in MI domains, and (4) in several domains, a supervised algorithm is superior to any MI algorithm we tested. 1.
Kernels for Semi-Structured Data
- In Proc. of ICML
, 2002
"... Semi-structured data such as XML and HTML is attracting considerable attention. It is important to develop various kinds of data mining techniques that can handle semistructured data. In this paper, we discuss applications of kernel methods for semistructured data. We model semi-structured data by l ..."
Abstract
-
Cited by 33 (4 self)
- Add to MetaCart
Semi-structured data such as XML and HTML is attracting considerable attention. It is important to develop various kinds of data mining techniques that can handle semistructured data. In this paper, we discuss applications of kernel methods for semistructured data. We model semi-structured data by labeled ordered trees, and present kernels for classifying labeled ordered trees based on their tag structures by generalizing the convolution kernel for parse trees introduced by Collins and Duffy (2001). We give algorithms to efficiently compute the kernels for labeled ordered trees. We also apply our kernels to node marking problems that are special cases of information extraction from trees. Preliminary experiments using artificial data and real HTML documents show encouraging results. 1.
A two-level learning method for generalized multi-instance problems
- In Proceedings of the Fourteenth European Conference on Machine Learning
, 2003
"... Abstract. In traditional multi-instance (MI) learning, a single positive instance in a bag produces a positive class label. Hence, the learner knows how the bag’s class label depends on the labels of the instances in the bag and can explicitly use this information to solve the learning task. In this ..."
Abstract
-
Cited by 28 (3 self)
- Add to MetaCart
Abstract. In traditional multi-instance (MI) learning, a single positive instance in a bag produces a positive class label. Hence, the learner knows how the bag’s class label depends on the labels of the instances in the bag and can explicitly use this information to solve the learning task. In this paper we investigate a generalized view of the MI problem where this simple assumption no longer holds. We assume that an “interaction” between instances in a bag determines the class label. Our two-level learning method for this type of problem transforms an MI bag into a single meta-instance that can be learned by a standard propositional method. The meta-instance indicates which regions in the instance space are covered by instances of the bag. Results on both artificial and realworld data show that this two-level classification approach is well suited for generalized MI problems. 1
A Hilbert space embedding for distributions
- In Algorithmic Learning Theory: 18th International Conference
, 2007
"... Abstract. We describe a technique for comparing distributions without the need for density estimation as an intermediate step. Our approach relies on mapping the distributions into a reproducing kernel Hilbert space. Applications of this technique can be found in two-sample tests, which are used for ..."
Abstract
-
Cited by 27 (15 self)
- Add to MetaCart
Abstract. We describe a technique for comparing distributions without the need for density estimation as an intermediate step. Our approach relies on mapping the distributions into a reproducing kernel Hilbert space. Applications of this technique can be found in two-sample tests, which are used for determining whether two sets of observations arise from the same distribution, covariate shift correction, local learning, measures of independence, and density estimation. Kernel methods are widely used in supervised learning [1, 2, 3, 4], however they are much less established in the areas of testing, estimation, and analysis of probability distributions, where information theoretic approaches [5, 6] have long been dominant. Recent examples include [7] in the context of construction of graphical models, [8] in the context of feature extraction, and [9] in the context of independent component analysis. These methods have by and large a common issue: to compute quantities such as the mutual information, entropy, or Kullback-Leibler divergence, we require sophisticated space partitioning and/or
Logistic Regression and Boosting for Labeled Bags of Instances
- Proc. of the PacificAsia Conf. on Knowledge Discovery and Data Mining
, 2004
"... In this paper we upgrade linear logistic regression and boosting to multi-instance data, where each example consists of a labeled bag of instances. This is done by connecting predictions for individual instances to a bag-level probability estimate by simple averaging and maximizing the likelihoo ..."
Abstract
-
Cited by 25 (2 self)
- Add to MetaCart
In this paper we upgrade linear logistic regression and boosting to multi-instance data, where each example consists of a labeled bag of instances. This is done by connecting predictions for individual instances to a bag-level probability estimate by simple averaging and maximizing the likelihood at the bag level---in other words, by assuming that all instances contribute equally and independently to a bag's label. We present empirical results for artificial data generated according to the underlying generative model that we assume, and also show that the two algorithms produce competitive results on the Musk benchmark datasets.
A kernel method for the two sample problem
- ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 19
, 2007
"... We propose a framework for analyzing and comparing distributions, allowing us to design statistical tests to determine if two samples are drawn from different distributions. Our test statistic is the largest difference in expectations over functions in the unit ball of a reproducing kernel Hilbert ..."
Abstract
-
Cited by 20 (9 self)
- Add to MetaCart
We propose a framework for analyzing and comparing distributions, allowing us to design statistical tests to determine if two samples are drawn from different distributions. Our test statistic is the largest difference in expectations over functions in the unit ball of a reproducing kernel Hilbert space (RKHS). We present two tests based on large deviation bounds for the test statistic, while a third is based on the asymptotic distribution of this statistic. The test statistic can be computed in quadratic time, although efficient linear time approximations are available. Several classical metrics on distributions are recovered when the function space used to compute the difference in expectations is allowed to be more general (eg. a Banach space). We apply our two-sample tests to a variety of problems, including attribute matching for databases using the Hungarian marriage method, where they perform strongly. Excellent performance is also obtained when comparing distributions over graphs, for which these are the first such tests.
Multiple Instance Learning for Sparse Positive Bags
- In ICML
, 2007
"... We present a new approach to multiple instance learning (MIL) that is particularly effective when the positive bags are sparse (i.e. contain few positive instances). Unlike other SVM-based MIL methods, our approach more directly enforces the desired constraint that at least one of the instances in a ..."
Abstract
-
Cited by 17 (0 self)
- Add to MetaCart
We present a new approach to multiple instance learning (MIL) that is particularly effective when the positive bags are sparse (i.e. contain few positive instances). Unlike other SVM-based MIL methods, our approach more directly enforces the desired constraint that at least one of the instances in a positive bag is positive. Using both artificial and real-world data, we experimentally demonstrate that our approach achieves greater accuracy than state-of-the-art MIL methods when positive bags are sparse, and performs competitively when they are not. In particular, our approach is the best performing method for image region classification. 1.

