Results 1 - 10
of
19
Online Bayes Point Machines
"... We present a new and simple algorithm for learning large margin classi ers that works in a truly online manner. The algorithm generates a linear classi er by averaging the weights associated with several perceptron-like algorithms run in parallel in order to approximate the Bayes point. A rand ..."
Abstract
-
Cited by 55 (2 self)
- Add to MetaCart
We present a new and simple algorithm for learning large margin classi ers that works in a truly online manner. The algorithm generates a linear classi er by averaging the weights associated with several perceptron-like algorithms run in parallel in order to approximate the Bayes point. A random subsample of the incoming data stream is used to ensure diversity in the perceptron solutions. We experimentally study the algorithm's performance on online and batch learning settings.
Algorithmic luckiness
- Journal of Machine Learning Research
, 2002
"... Classical statistical learning theory studies the generalisation performance of machine learning algorithms rather indirectly. One of the main detours is that algorithms are studied in terms of the hypothesis class that they draw their hypotheses from. In this paper, motivated by the luckiness frame ..."
Abstract
-
Cited by 19 (3 self)
- Add to MetaCart
Classical statistical learning theory studies the generalisation performance of machine learning algorithms rather indirectly. One of the main detours is that algorithms are studied in terms of the hypothesis class that they draw their hypotheses from. In this paper, motivated by the luckiness framework of Shawe-Taylor et al. (1998), we study learning algorithms more directly and in a way that allows us to exploit the serendipity of the training sample. The main dierence to previous approaches lies in the complexity measure; rather than covering all hypotheses in a given hypothesis space it is only necessary to cover the functions which could have been learned using the fixed learning algorithm. We show how the resulting framework relates to the VC, luckiness and compression frameworks. Finally, we present an application of this framework to the maximum margin algorithm for linear classiers which results in a bound that exploits the margin, the sparsity of the resultant weight vector, and the degree of clustering of the training data in feature space.
Generalization Error Bounds for Bayesian Mixture Algorithms
- Journal of Machine Learning Research
, 2003
"... Bayesian approaches to learning and estimation have played a significant role in the Statistics literature over many years. While they are often provably optimal in a frequentist setting, and lead to excellent performance in practical applications, there have not been many precise characterizations ..."
Abstract
-
Cited by 17 (2 self)
- Add to MetaCart
Bayesian approaches to learning and estimation have played a significant role in the Statistics literature over many years. While they are often provably optimal in a frequentist setting, and lead to excellent performance in practical applications, there have not been many precise characterizations of their performance for finite sample sizes under general conditions. In this paper we consider the class of Bayesian mixture algorithms, where an estimator is formed by constructing a data-dependent mixture over some hypothesis space. Similarly to what is observed in practice, our results demonstrate that mixture approaches are particularly robust, and allow for the construction of highly complex estimators, while avoiding undesirable overfitting effects. Our results, while being data-dependent in nature, are insensitive to the underlying model assumptions, and apply whether or not these hold. At a technical level, the approach applies to unbounded functions, constrained only by certain moment conditions. Finally, the bounds derived can be directly applied to non-Bayesian mixture approaches such as Boosting and Bagging. 1.
Classification in a Normalized Feature Space Using Support Vector Machines
, 2003
"... This paper discusses classification using support vector machines in a normalized feature space. We consider both normalization in input space and in feature space. Exploiting the fact that in this setting all points lie on the surface of a unit hypersphere we replace the optimal separating hyperpla ..."
Abstract
-
Cited by 9 (3 self)
- Add to MetaCart
This paper discusses classification using support vector machines in a normalized feature space. We consider both normalization in input space and in feature space. Exploiting the fact that in this setting all points lie on the surface of a unit hypersphere we replace the optimal separating hyperplane by one that is symmetric in its angles, leading to an improved estimator. Evaluation of these considerations is done in numerical experiments on two real-world datasets. The stability to noise of this offset correction is subsequently investigated as well as its optimality.
Textual Query of Personal Photos Facilitated by Large-scale Web Data
"... Abstract—The rapid popularization of digital cameras and mobile phone cameras has lead to an explosive growth of personal photo collections by consumers. In this paper, we present a real-time textual query based personal photo retrieval system by leveraging millions of web images and their associate ..."
Abstract
-
Cited by 8 (6 self)
- Add to MetaCart
Abstract—The rapid popularization of digital cameras and mobile phone cameras has lead to an explosive growth of personal photo collections by consumers. In this paper, we present a real-time textual query based personal photo retrieval system by leveraging millions of web images and their associated rich textual descriptions (captions, categories, etc.). After a user provides a textual query (e.g., “water”), our system exploits the inverted file to automatically find the positive web images that are related to the textual query “water ” as well as the negative web images that are irrelevant to the textual query. Based on these automatically retrieved relevant and irrelevant web images, we employ three simple but effective classification methods, k Nearest Neighbor (kNN), decision stumps and linear SVM, to rank personal photos. To further improve the photo retrieval performance, we propose two relevance feedback methods via cross-domain learning, which effectively utilize both the web images and personal images. In particular, our proposed cross-domain learning methods can learn robust classifiers with only a very limited amount of labeled personal photos from the user by leveraging the pre-learned linear SVM classifiers in real time. We further propose an incremental cross-domain learning method in order to significantly accelerate the relevance feedback process on large consumer photo databases. Extensive experiments on two consumer photo datasets demonstrate the effectiveness and efficiency of our system, which is also inherently not limited by any predefined lexicon.
Data-Dependent Bounds for Bayesian Mixture Methods
, 2003
"... We consider Bayesian mixture approaches, where a predictor is constructed by forming a weighted average of hypotheses from some space of functions. While such procedures are known to lead to optimal predictors in several cases, where su#ciently accurate prior information is available, it has not ..."
Abstract
-
Cited by 5 (4 self)
- Add to MetaCart
We consider Bayesian mixture approaches, where a predictor is constructed by forming a weighted average of hypotheses from some space of functions. While such procedures are known to lead to optimal predictors in several cases, where su#ciently accurate prior information is available, it has not been clear how they perform when some of the prior assumptions are violated. In this paper we establish data-dependent bounds for such procedures, extending previous randomized approaches such as the Gibbs algorithm to a fully Bayesian setting. The finite-sample guarantees established in this work, enable the utilization of Bayesian mixture approaches in agnostic settings, where the usual assumptions of the Bayesian paradigm are hard to justify. Moreover, the bounds derived can be directly applied to non-Bayesian mixture approaches such as Bagging and Boosting.
Normalization in support vector machines
- in Proc. DAGM 2001 Pattern Recognition
, 2001
"... Abstract. This article deals with various aspects of normalization in the context of Support Vector Machines. We consider fist normalization of the vectors in the input space and point out the inherent limitations. A natural extension to the feature space is then represented by the kernel function n ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
Abstract. This article deals with various aspects of normalization in the context of Support Vector Machines. We consider fist normalization of the vectors in the input space and point out the inherent limitations. A natural extension to the feature space is then represented by the kernel function normalization. A correction of the position of the Optimal Separating Hyperplane is subsequently introduced so as to suit better these normalized kernels. Numerical experiments finally evaluate the different approaches.
Margin maximizing discriminant analysis
- In Proceedings of the 15th European Conference on Machine Learning
, 2004
"... Abstract. We propose a new feature extraction method called Margin Maximizing Discriminant Analysis (MMDA) which seeks to extract features suitable for classification tasks. MMDA is based on the principle that an ideal feature should convey the maximum information about the class labels and it shoul ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
Abstract. We propose a new feature extraction method called Margin Maximizing Discriminant Analysis (MMDA) which seeks to extract features suitable for classification tasks. MMDA is based on the principle that an ideal feature should convey the maximum information about the class labels and it should depend only on the geometry of the optimal decision boundary and not on those parts of the distribution of the input data that do not participate in shaping this boundary. Further, distinct feature components should convey unrelated information about the data. Two feature extraction methods are proposed for calculating the parameters of such a projection that are shown to yield equivalent results. The kernel mapping idea is used to derive non-linear versions. Experiments with several real-world, publicly available data sets demonstrate that the new method yields competitive results. 1
On Generalization Bounds, Projection Profile, and Margin Distribution
, 2002
"... We study generalization properties of linear learning algorithms and develop a data dependent approach that is used to derive generalization bounds that depend on the margin distribution. Our method makes use of random projection techniques to allow the use of existing VC dimension bounds in the eff ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
We study generalization properties of linear learning algorithms and develop a data dependent approach that is used to derive generalization bounds that depend on the margin distribution. Our method makes use of random projection techniques to allow the use of existing VC dimension bounds in the effective, lower, dimension of the data. Comparisons with existing...
The typicalness framework: a comparison with the Bayesian approach
- Department of Computer Science, Royal Holloway, University of London
, 2001
"... When correct priors are known, Bayesian algorithms give optimal decisions, and accurate confidence values for predictions can be obtained. If the prior is incorrect however, these confidence values have no theoretical base -- even though the algorithms' predictive performance may be good. There ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
When correct priors are known, Bayesian algorithms give optimal decisions, and accurate confidence values for predictions can be obtained. If the prior is incorrect however, these confidence values have no theoretical base -- even though the algorithms' predictive performance may be good. There also exist many successful learning algorithms which only depend on the iid assumption. Often however they produce no confidence values for their predictions. Bayesian frameworks are often applied to these algorithms in order to obtain such values, however they can rely on unjustified priors. In this paper we outline the typicalness framework which can be used in conjunction with many other machine learning algorithms. The framework provides confidence information based only on the standard iid assumption and so is much more robust to different underlying data distributions. We show how the framework can be applied to existing algorithms. We also present experimental results which show that the typicalness approach performs close to Bayes when the prior is known to be correct. Unlike Bayes however, the method still gives accurate confidence values even when different data distributions are considered. 1

