Results 11  20
of
59
Query by committee made real
 In Advances in Neural Information Processing Systems 18
, 2005
"... Training a learning algorithm is a costly task. A major goal of active learning is to reduce this cost. In this paper we introduce a new algorithm, KQBC, which is capable of actively learning large scale problems by using selective sampling. The algorithm overcomes the costly sampling step of the we ..."
Abstract

Cited by 21 (1 self)
 Add to MetaCart
Training a learning algorithm is a costly task. A major goal of active learning is to reduce this cost. In this paper we introduce a new algorithm, KQBC, which is capable of actively learning large scale problems by using selective sampling. The algorithm overcomes the costly sampling step of the well known Query By Committee (QBC) algorithm by projecting onto a low dimensional space. KQBC also enables the use of kernels, providing a simple way of extending QBC to the nonlinear scenario. Sampling the low dimension space is done using the hit and run random walk. We demonstrate the success of this novel algorithm by applying it to both artificial and a real world problems. 1
The impact of parse quality on syntacticallyinformed statistical machine translation
, 2006
"... We investigate the impact of parse quality on a syntacticallyinformed statistical machine translation system applied to technical text. We vary parse quality by varying the amount of data used to train the parser. As the amount of data increases, parse quality improves, leading to improvements in ..."
Abstract

Cited by 20 (0 self)
 Add to MetaCart
We investigate the impact of parse quality on a syntacticallyinformed statistical machine translation system applied to technical text. We vary parse quality by varying the amount of data used to train the parser. As the amount of data increases, parse quality improves, leading to improvements in machine translation output and results that significantly outperform a stateoftheart phrasal baseline.
Optimal distributed online prediction using minibatches
, 2010
"... Online prediction methods are typically presented as serial algorithms running on a single processor. However, in the age of webscale prediction problems, it is increasingly common to encounter situations where a single processor cannot keep up with the high rate at which inputs arrive. In this wor ..."
Abstract

Cited by 16 (2 self)
 Add to MetaCart
Online prediction methods are typically presented as serial algorithms running on a single processor. However, in the age of webscale prediction problems, it is increasingly common to encounter situations where a single processor cannot keep up with the high rate at which inputs arrive. In this work, we present the distributed minibatch algorithm, a method of converting many serial gradientbased online prediction algorithms into distributed algorithms. We prove a regret bound for this method that is asymptotically optimal for smooth convex loss functions and stochastic inputs. Moreover, our analysis explicitly takes into account communication latencies between nodes in the distributed environment. We show how our method can be used to solve the closelyrelated distributed stochastic optimization problem, achieving an asymptotically linear speedup over multiple processors. Finally, we demonstrate the merits of our approach on a webscale online prediction problem.
The EMEP Algorithm for Gaussian Process Classification HyunChul Kim and Zoubin Ghahramani
, 2003
"... Gaussian process classifiers (GPCs) are fully statistical kernel classification models derived from Gaussian processes for regression. In GPCs, the probability of belonging to a certain class at an input location is monotonically related to the value of some latent function at that location. Startin ..."
Abstract

Cited by 15 (3 self)
 Add to MetaCart
Gaussian process classifiers (GPCs) are fully statistical kernel classification models derived from Gaussian processes for regression. In GPCs, the probability of belonging to a certain class at an input location is monotonically related to the value of some latent function at that location. Starting from a prior over this latent function, the data are used to infer both the posterior over the latent function and the values of hyperparameters determining various aspects of the function. GPCs can also be viewed as graphical models with latent variables. Based on the work of [1, 2], we present an approximate EM algorithm, the EMEP algorithm for learning both the latent function and the hyperparameters of a GPC. The algorithm alternates the following steps until convergence. In the Estep, given the hyperparameters, a density for the latent variables defining the latent function is computed via the ExpectationPropagation (EP) algorithm [1, 2]. In the Mstep, given the density for the latent values, the hyperparameters are selected to maximize a variational lower bound on the marginal likelihood (i.e. the model evidence). This algorithm is found to converge in practice and provides an efficient Bayesian framework for learning hyperparameters of the kernel. We examine the role of various different hyperparameters which model labeling errors, the lengthscales (i.e. relevances) of different features, and sharpness of the decision boundary. The added flexibility these provide results in significantly improved performance. Experimental results on synthetic and real data sets show that the EMEP algorithm works well, with GPCs giving equal or better performance than support vector machines (SVMs) on all data sets tested. 1
Ellipsoidal kernel machines
 In Proceedings of the Artificial Intelligence and Statistics
, 2007
"... A novel technique is proposed for improving the standard VapnikChervonenkis (VC) dimension estimate for the Support Vector Machine (SVM) framework. The improved VC estimates are based on geometric arguments. By considering bounding ellipsoids instead of the usual bounding hyperspheres and assuming ..."
Abstract

Cited by 12 (3 self)
 Add to MetaCart
A novel technique is proposed for improving the standard VapnikChervonenkis (VC) dimension estimate for the Support Vector Machine (SVM) framework. The improved VC estimates are based on geometric arguments. By considering bounding ellipsoids instead of the usual bounding hyperspheres and assuming gaptolerant classifiers, a linear classifier with a given margin is shown to shatter fewer points than previously estimated. This improved VC estimation method directly motivates a different estimator for the parameters of a linear classifier. Surprisingly, only VCbased arguments are needed to justify this modification to the SVM. The resulting technique is implemented using Semidefinite Programming (SDP) and is solvable in polynomial time. The new linear classifier also ensures certain invariances to affine transformations on the data which a standard SVM does not provide. We demonstrate that the technique can be kernelized via extensions to Hilbert spaces. Promising experimental results are shown on several standardized datasets. 1
Relevance feedback for image retrieval: a short survey
 In state of the Art in Audiovisual ContentBased Retrieval, Information Universal Access and Interaction, In 193 cluding Datamodels and Languages, Report of the DELOS2 European Network of Excellence (6th Framework Programme
, 2004
"... The difficulty and cost of providing rich and reliable textual annotations for images in large databases, as well as the “linguistic gap ” associated to these annotations, explains why the retrieval of images based directly on their visual content (contentbased image retrieval, CBIR) is of high int ..."
Abstract

Cited by 11 (1 self)
 Add to MetaCart
The difficulty and cost of providing rich and reliable textual annotations for images in large databases, as well as the “linguistic gap ” associated to these annotations, explains why the retrieval of images based directly on their visual content (contentbased image retrieval, CBIR) is of high interest
Retrieval of Difficult Image Classes Using SVMBased Relevance Feedback
, 2004
"... Userdefined classes in large generalist image databases are often composed of several groups of images and span very di#erent scales in the space of lowlevel visual descriptors. The interactive retrieval of such image classes is then very di#cult. To address this challenge, we propose and evaluate ..."
Abstract

Cited by 10 (4 self)
 Add to MetaCart
Userdefined classes in large generalist image databases are often composed of several groups of images and span very di#erent scales in the space of lowlevel visual descriptors. The interactive retrieval of such image classes is then very di#cult. To address this challenge, we propose and evaluate here two general improvements of SVMbased relevance feedback methods. First, to optimize the transfer of information between the user and the system, we focus on the criterion employed by the system for selecting the images presented to the user at every feedback round. We put forward a new active learning selection criterion that minimizes redundancy between the candidate images shown to the user. Second, for image classes having very di#erent scales, we find that a high sensitivity of the SVM to the scale of the data brings about a low retrieval performance. We then argue that insensitivity to scale is desirable in this context and we show how to obtain it by the use of specific kernel functions. The experimental evaluation of both ranking and classification performance on several image databases confirms the e#ectiveness of our selection criterion and of the use of kernels that reduce the sensitivity of SVMs to the scale of the data.
Feature selection using support vector machines
 In Proc. of the 3rd Int. Conf. on Data Mining Methods and Databases for Engineering, Finance, and Other Fields
, 2002
"... Text categorization is the task of classifying natural language documents into a set of predefined categories. Documents are typically represented by sparse vectors under the vector space model, where each word in the vocabulary is mapped to one coordinate axis and its occurrence in the document giv ..."
Abstract

Cited by 8 (3 self)
 Add to MetaCart
Text categorization is the task of classifying natural language documents into a set of predefined categories. Documents are typically represented by sparse vectors under the vector space model, where each word in the vocabulary is mapped to one coordinate axis and its occurrence in the document gives rise to one nonzero component in the vector representing that document. When training classifiers on large collections of documents, both the time and memory requirements connected with processing of these vectors may be prohibitive. This calls for using a feature selection method, not only to reduce the number of features but also to increase the sparsity of document vectors. We propose a feature selection method based on linear Support Vector Machines (SVMs). First, we train the linear SVM on a subset of training data and retain only those features that correspond to highly weighted components (in absolute value sense) of the normal to the resulting hyperplane that separates positive and negative examples. This reduced feature space is then used to train a classifier over a larger training set because more documents now fit into the same amount of memory. In our experiments we compare the effectiveness of the SVMbased feature selection with that of more traditional feature selection methods, such as odds ratio and information gain, in achieving the desired tradeoff between the vector sparsity and the classification performance. Experimental results indicate that, at the same level of vector sparsity, feature selection based on SVM normals yields better classification performance than odds ratio or information gainbased feature selection when linear SVM classifiers are used. 1
Gaussian margin machines
 In Proceedings on the International Conference on Artificial Intelligence and Statistics (AISTATS
, 2009
"... We introduce Gaussian Margin Machines (GMMs), which maintain a Gaussian distribution over weight vectors for binary classification. The learning algorithm for these machines seeks the least informative distribution that will classify the training data correctly with high probability. One formulation ..."
Abstract

Cited by 7 (3 self)
 Add to MetaCart
We introduce Gaussian Margin Machines (GMMs), which maintain a Gaussian distribution over weight vectors for binary classification. The learning algorithm for these machines seeks the least informative distribution that will classify the training data correctly with high probability. One formulation can be expressed as a convex constrained optimization problem whose solution can be represented linearly in terms of training instances and their inner and outer products, supporting kernelization. The algorithm admits a natural PACBayesian justification and is shown to minimize a quantity directly related to a PACBayesian generalization bound. A preliminary evaluation on handwriting recognition data shows that our algorithm improves on SVMs for the same task, achieving lower test error and lower test error variance. 1