Results 1 - 10
of
70
Learning methods for generic object recognition with invariance to pose and lighting
- In Proceedings of CVPR’04
, 2004
"... We assess the applicability of several popular learning methods for the problem of recognizing generic visual categories with invariance to pose, lighting, and surrounding clutter. A large dataset comprising stereo image pairs of 50 uniform-colored toys under 36 angles, 9 azimuths, and 6 lighting co ..."
Abstract
-
Cited by 117 (11 self)
- Add to MetaCart
We assess the applicability of several popular learning methods for the problem of recognizing generic visual categories with invariance to pose, lighting, and surrounding clutter. A large dataset comprising stereo image pairs of 50 uniform-colored toys under 36 angles, 9 azimuths, and 6 lighting conditions was collected (for a total of 194,400 individual images). The objects were 10 instances of 5 generic categories: four-legged animals, human figures, airplanes, trucks, and cars. Five instances of each category were used for training, and the other five for testing. Low-resolution grayscale images of the objects with various amounts of variability and surrounding clutter were used for training and testing. Nearest Neighbor methods, Support Vector Machines, and Convolutional Networks, operating on raw pixels or on PCA-derived features were tested. Test error rates for unseen object instances placed on uniform backgrounds were around 13 % for SVM and 7 % for Convolutional Nets. On a segmentation/recognition task with highly cluttered images, SVM proved impractical, while Convolutional nets yielded 14 % error. A real-time version of the system was implemented that can detect and classify objects in natural scenes at around 10 frames per second. 1
Improving timbre similarity: How high is the sky
- Results in Speech and Audio Sciences
"... Abstract. We report on experiments done in an attempt to improve the performance of a music similarity measure which we introduced earlier. The technique aims at comparing music titles on the basis of their global “timbre”, which has many applications in the field of Music Information Retrieval. Suc ..."
Abstract
-
Cited by 102 (12 self)
- Add to MetaCart
Abstract. We report on experiments done in an attempt to improve the performance of a music similarity measure which we introduced earlier. The technique aims at comparing music titles on the basis of their global “timbre”, which has many applications in the field of Music Information Retrieval. Such measures of timbre similarity have seen a growing interest lately, and every contribution (including ours) is yet another instantiation of the same basic pattern recognition architecture, only with different algorithm variants and parameters. Most give encouraging results with a little effort, and imply that near-perfect results would just extrapolate by fine-tuning the algorithms ’ parameters. However, such systematic testing over large, interdependent parameter spaces is both difficult and costly, as it requires to work on a whole general meta-database architecture. This paper contributes in two ways to the current state of the art. We report on extensive tests over very many parameters and algorithmic variants, either already envisioned in the literature or not. This leads to an improvement over existing algorithms of about 15 % R-precision. But most importantly, we describe many variants that surprisingly do not lead to any substancial improvement. Moreover, our simulations suggest the existence of a “glass ceiling ” at R-precision about 65 % which cannot probably be overcome by pursuing such variations on the same theme.
Automatic Analysis of Multimodal Group Actions in Meetings
, 2003
"... This paper investigates the recognition of group actions in meetings. A framework is employed in which group actions result from the interactions of the individual participants. The group actions are modelled using different HMM-based approaches, where the observations are provided by a set of audio ..."
Abstract
-
Cited by 90 (26 self)
- Add to MetaCart
This paper investigates the recognition of group actions in meetings. A framework is employed in which group actions result from the interactions of the individual participants. The group actions are modelled using different HMM-based approaches, where the observations are provided by a set of audio-visual features monitoring the actions of individuals. Experiments demonstrate the importance of taking interactions into account in modelling the group actions. It is also shown that the visual modality contains useful information, even for predominantly audio-based events, motivating a multimodal approach to meeting analysis.
Confidence Estimation for Machine Translation
- IN M. ROLLINS (ED.), MENTAL IMAGERY
, 2004
"... ..."
A Learning Approach to Improving Sentence-Level MT Evaluation
- IN PROCEEDINGS OF THE 10TH INTERNATIONAL CONFERENCE ON THEORETICAL AND METHODOLOGICAL ISSUES IN MACHINE TRANSLATION (TMI
, 2004
"... The problem of evaluating machine translation (MT) systems is more challenging than it may first appear, as diverse translations can often be considered equally correct. The task is even more difficult when practical circumstances require that evaluation be done automatically over short texts, fo ..."
Abstract
-
Cited by 35 (0 self)
- Add to MetaCart
The problem of evaluating machine translation (MT) systems is more challenging than it may first appear, as diverse translations can often be considered equally correct. The task is even more difficult when practical circumstances require that evaluation be done automatically over short texts, for instance, during incremental system development and error analysis. While several
Parallel support vector machines: The cascade svm
- In Advances in Neural Information Processing Systems
, 2005
"... We describe an algorithm for support vector machines (SVM) that can be parallelized efficiently and scales to very large problems with hundreds of thousands of training vectors. Instead of analyzing the whole training set in one optimization step, the data are split into subsets and optimized separa ..."
Abstract
-
Cited by 29 (2 self)
- Add to MetaCart
We describe an algorithm for support vector machines (SVM) that can be parallelized efficiently and scales to very large problems with hundreds of thousands of training vectors. Instead of analyzing the whole training set in one optimization step, the data are split into subsets and optimized separately with multiple SVMs. The partial results are combined and filtered again in a ‘Cascade ’ of SVMs, until the global optimum is reached. The Cascade SVM can be spread over multiple processors with minimal communication overhead and requires far less memory, since the kernel matrices are much smaller than for a regular SVM. Convergence to the global optimum is guaranteed with multiple passes through the Cascade, but already a single pass provides good generalization. A single pass is 5x – 10x faster than a regular SVM for problems of 100,000 vectors when implemented on a single processor. Parallel implementations on a cluster of 16 processors were tested with over 1 million vectors (2-class problems), converging in a day or two, while a regular SVM never converged in over a week. 1
Face Verification Using Adapted Generative Models
- IN PROC. INT. CONF. AUTOMATIC FACE AND GESTURE RECOGNITION (AFGR), SEOUL, KOREA
, 2004
"... It has been shown previously that systems based on local features and relatively complex generative models, namely 1D Hidden Markov Models (HMMs) and pseudo-2D HMMs, are suitable for face recognition (here we mean both identification and verification). Recently a simpler generative model, namely the ..."
Abstract
-
Cited by 28 (21 self)
- Add to MetaCart
It has been shown previously that systems based on local features and relatively complex generative models, namely 1D Hidden Markov Models (HMMs) and pseudo-2D HMMs, are suitable for face recognition (here we mean both identification and verification). Recently a simpler generative model, namely the Gaussian Mixture Model (GMM), was also shown to perform well. In this paper we first propose to increase the performance of the GMM approach (without sacrificing its simplicity) through the use of local features with embedded positional information; we show that the performance obtained is comparable to 1D HMMs. Secondly, we evaluate different training techniques for both GMM and HMM based systems. We show that the traditionally used Maximum Likelihood (ML) training approach has problems estimating robust model parameters when there is only a few training images available; we propose to tackle this problem through the use of Maximum a Posteriori (MAP) training, where the lack of data problem can be effectively circumvented; we show that models estimated with MAP are significantly more robust and are able to generalize to adverse conditions present in the BANCA database.
Constraining human body tracking
- In IEEE International Conference on Computer Vision
, 2003
"... Our paper addresses the problem of enforcing constraints in human body tracking. A projection technique is derived to impose kinematic constraints on independent multi-body motion: we show that for small motions the multi-body articulated motion space can be approximated by a linear manifold estimat ..."
Abstract
-
Cited by 26 (4 self)
- Add to MetaCart
Our paper addresses the problem of enforcing constraints in human body tracking. A projection technique is derived to impose kinematic constraints on independent multi-body motion: we show that for small motions the multi-body articulated motion space can be approximated by a linear manifold estimated directly from the previous body pose. We propose a learning approach to model non-linear constraints; we train a support vector classifier from motion capture data to model the boundary of the space of valid poses. Linear and non-linear body pose constraints are enforced by first projecting unconstrained motions onto the articulated motion space and then optimizing to find points on this linear manifold that lie within the non-linear constraint surface modeled by the SVM classifier. 1.
Recognition of Cursive Roman Handwriting - Past, Present and Future
- In Proc. 7th Int. Conf. on Document Analysis and Recognition
, 2003
"... This paper review the state of the art in o#-line Roman cursive han dw iting recognition. The input provided to an o#-line han iting recognition system is an image of a digit, aw ord, or - more generally - some text, and the system produces, as output, an ASCII transcription of the input. This taski ..."
Abstract
-
Cited by 16 (6 self)
- Add to MetaCart
This paper review the state of the art in o#-line Roman cursive han dw iting recognition. The input provided to an o#-line han iting recognition system is an image of a digit, aw ord, or - more generally - some text, and the system produces, as output, an ASCII transcription of the input. This taskinvolves a number of processing steps, some of w ich are quite di#cult. Typically, preprocessing, normalization, feature extraction, classification, and postprocessing operations are required. We'll survey the state of the art, analyze recent trends, and try to identify challenges for future research in this field.

