Results 1 - 10
of
16
Support vector machines for speech recognition
- Proceedings of the International Conference on Spoken Language Processing
, 1998
"... Statistical techniques based on hidden Markov Models (HMMs) with Gaussian emission densities have dominated signal processing and pattern recognition literature for the past 20 years. However, HMMs trained using maximum likelihood techniques suffer from an inability to learn discriminative informati ..."
Abstract
-
Cited by 47 (2 self)
- Add to MetaCart
Statistical techniques based on hidden Markov Models (HMMs) with Gaussian emission densities have dominated signal processing and pattern recognition literature for the past 20 years. However, HMMs trained using maximum likelihood techniques suffer from an inability to learn discriminative information and are prone to overfitting and over-parameterization. Recent work in machine learning has focused on models, such as the support vector machine (SVM), that automatically control generalization and parameterization as part of the overall optimization process. In this paper, we show that SVMs provide a significant improvement in performance on a static pattern classification task based on the Deterding vowel data. We also describe an application of SVMs to large vocabulary speech recognition, and demonstrate an improvement in error rate on a continuous alphadigit task (OGI Aphadigits) and a large vocabulary conversational speech task (Switchboard). Issues related to the development and optimization of an SVM/HMM hybrid system are discussed.
Hybrid SVM/HMM Architectures for Speech Recognition
- in Speech Transcription Workshop
, 2000
"... In this paper, we describe the use of a powerful machine learning scheme, Support Vector Machines (SVM), within the framework of hidden Markov model (HMM) based speech recognition. The hybrid SVM/HMM system has been developed based on our public domain toolkit. The hybrid system has been evalua ..."
Abstract
-
Cited by 21 (3 self)
- Add to MetaCart
In this paper, we describe the use of a powerful machine learning scheme, Support Vector Machines (SVM), within the framework of hidden Markov model (HMM) based speech recognition. The hybrid SVM/HMM system has been developed based on our public domain toolkit. The hybrid system has been evaluated on the OGI Alphadigits corpus and performs at 11.6% WER, as compared to 12.7% with a triphone mixture-Gaussian HMM system, while using only a fifth of the training data used by triphone system. Several important issues that arise out of the nature of SVM classifiers have been addressed. We are in the process of migrating this technology to large vocabulary recognition tasks like SWITCHBOARD. 1. INTRODUCTION Speech recogn i t i on can be v i ewed as a pa t t ern recognition problem where we desire each unique sound t o be d i s t i ngu i shab l e f r om a l l o t he r sounds . Traditionally statistical models, such as Gaussian mixture models, have been used to "represent" th...
Extreme learning machine: RBF network case
- in Proc. 8th Int. Conf. Control, Autom., Robot., Vis. (ICARCV 2004
"... Abstract – A new learning algorithm called extreme learning machine (ELM) has recently been proposed for single-hidden layer feedforward neural networks (SLFNs) to easily achieve good generalization performance at extremely fast learning speed. ELM randomly chooses the input weights and analytically ..."
Abstract
-
Cited by 13 (9 self)
- Add to MetaCart
Abstract – A new learning algorithm called extreme learning machine (ELM) has recently been proposed for single-hidden layer feedforward neural networks (SLFNs) to easily achieve good generalization performance at extremely fast learning speed. ELM randomly chooses the input weights and analytically determines the output weightsofSLFNs.ThispapershowsthatELMcanbe extended to radial basis function (RBF) network case, which allows the centers and impact widths of RBF kernels to be randomly generated and the output weights to be simply analytically calculated instead of iteratively tuned. Interestingly, the experimental results show that the ELM algorithm for RBF networks can complete learning at extremely fast speed and produce generalization performance very close to that of SVM in many artifical and real benchmarking function approximation and classification problems. Since ELM does not require validation and human-intervened parameters for given network architectures, ELM can be easily used. Index terms- Radial basis function network, feedforward neural networks, SLFN, real time learning, extreme learning machine, ELM. I.
Incorporating Audio Cues into Dialog and Action Scene Extraction
- In Proceedings of SPIE Conference on Storage and Retrieval for Media Databases
, 2003
"... In this paper, we present an approach to extract scenes in video. The approach is top-down and uses video editing rules and audio cues to extract simple dialog and action scenes. The underlying model is a finite state machine coupled with audio cues that are determined using an audio classifier. ..."
Abstract
-
Cited by 10 (1 self)
- Add to MetaCart
In this paper, we present an approach to extract scenes in video. The approach is top-down and uses video editing rules and audio cues to extract simple dialog and action scenes. The underlying model is a finite state machine coupled with audio cues that are determined using an audio classifier.
Extraction, layout analysis and classification of diagrams in PDF documents
- In ICDAR 2003
, 2003
"... Diagrams are a critical part of virtually all scientific and technical documents. Analyzing diagrams will be important for building comprehensive document retrieval systems. This paper focuses on the extraction and classification of diagrams from PDF documents. We study diagrams available in vector ..."
Abstract
-
Cited by 10 (3 self)
- Add to MetaCart
Diagrams are a critical part of virtually all scientific and technical documents. Analyzing diagrams will be important for building comprehensive document retrieval systems. This paper focuses on the extraction and classification of diagrams from PDF documents. We study diagrams available in vector (not raster) format in online research papers. PDF files are parsed and their vector graphics components installed in a spatial index. Subdiagrams are found by analyzing white space gaps. A set of statistics is generated for each diagram, e.g., the number of horizontal lines and vertical lines. The statistics form a feature vector description of the diagram. The vectors are used in a kernel-based machine learning system (Support Vector Machine). Separating a set of bar graphs from non-bar-graphs gathered from 20,000 biology research papers gave a classification accuracy of 91.7%. The approach is directly applicable to diagrams vectorized from images.
S.M.: Online Recognition of Multi-Stroke Symbols with Orthogonal Series
, 2009
"... We propose an efficient method to recognize multi-stroke handwritten symbols. The method is based on computing the truncated Legendre-Sobolev expansions of the coordinate functions of the stroke curves and classifying them using linear support vector machines. Earlier work has demonstrated the effic ..."
Abstract
-
Cited by 8 (6 self)
- Add to MetaCart
We propose an efficient method to recognize multi-stroke handwritten symbols. The method is based on computing the truncated Legendre-Sobolev expansions of the coordinate functions of the stroke curves and classifying them using linear support vector machines. Earlier work has demonstrated the efficiency and robustness of this approach in the case of single-stroke characters. Here we show that the method can be successfully applied to multi-stroke characters by joining the strokes and including the number of strokes in the feature vector or in the class labels. Our experiments yield an error rate of 11–20%, and in 99 % of cases the correct class is among the top 4. The recognition process causes virtually no delay, because computation of Legendre-Sobolev expansions and SVM classification proceed on-line, as the strokes are written. 1.
Feature generation, feature selection, classifiers, and conceptual drift for biomedical document triage
- Retrieval Conference: TREC 2004, Gaithersburg, MD: National Institute of Standards and Technology
, 2004
"... We approached the problem of classifying papers for the TREC 2004 Genomics Track triage task as a four step process: feature generation, feature selection, classifier training, and finally, classification. Section specific binary features that discriminated significantly between positive and negativ ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
We approached the problem of classifying papers for the TREC 2004 Genomics Track triage task as a four step process: feature generation, feature selection, classifier training, and finally, classification. Section specific binary features that discriminated significantly between positive and negative training samples were chosen using the Chisquare statistic. Three classifiers were trained on this feature set: a simple Naive Bayes classifier, the SVMLight support vector machine implementation, and a voting perceptron extended to support variable learning rates. Comparing the classifiers on the training data we found that neither Naive Bayes nor SVMLight was able to adequately account for the factor of 20 in the utility function. The voting perceptron classifier performed much better at this. The performance on the test collection was lower for all classifiers, although consistent with the relative values of the training cross-validation. Feature subsetting showed no significant differences in precision or recall, implying that there was some redundancy among the features. We also examined how well the feature set derived from the 2002 training collection represented the papers in the 2003 test collection, and found a low level of similarity between feature sets derived from the two collections. This supports the hypothesis that important classification terms change quickly over time.
Video search re-ranking via multi-graph propagation
- in Proceedings of ACM Multimedia
, 2007
"... This paper 1 is concerned with the problem of multimodal fusion in video search. First, we employ an object-sensitive approach to query analysis to improve the baseline result of text-based video search. Then, we propose a PageRank-like graph-based approach to text-based search result re-ranking. To ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
This paper 1 is concerned with the problem of multimodal fusion in video search. First, we employ an object-sensitive approach to query analysis to improve the baseline result of text-based video search. Then, we propose a PageRank-like graph-based approach to text-based search result re-ranking. To better exploit the underlying relationship between video shots, the proposed reranking scheme simultaneously leverages textual relevancy, semantic concept relevancy, and low-level-feature-based visual similarity. In this PageRank-like scheme, we construct a set of graphs with the video shots as vertexes, and the conceptual and visual similarity between video shots as “hyperlinks. ” A modified topic-sensitive PageRank algorithm is then applied on these graphs to propagate the relevance scores through all related video shots. Experimental results verify the effectiveness of the graphbased propagation approach combined with the object-sensitive query analysis approach, which brings significant improvement to the baseline of text-based video search. Our experimental analysis also indicates that the proposed re-ranking method is highly generic and independent of different query classes, training data, and human interference.
Exploiting Confusion Matrices for Automatic Generation of Topic Hierarchies and Scaling Up Multi-Way Classifiers
, 2002
"... A common way to evaluate a multi-way classifier is a confusion matrix that plots, for each of the learned concepts, the true class of test instances against the predicted classes. Aggregate accuracy figures of the classifier are obtained by summing up the diagonal entries of the confusion matrix. ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
A common way to evaluate a multi-way classifier is a confusion matrix that plots, for each of the learned concepts, the true class of test instances against the predicted classes. Aggregate accuracy figures of the classifier are obtained by summing up the diagonal entries of the confusion matrix. However, invaluable information about the relationships amongst classes is often ignored. In this report we show various ways in which the notion of similarity amongst subsets of classes from the confusion matrix can be exploited.
Learning Plan Networks in Conversational Video Games.” M.Sc
- in Media Arts and Sciences Thesis
"... look forward to a future where robots collaborate with humans in the home and workplace, and virtual agents collaborate with humans in games and training simulations. A representation of common ground for everyday scenarios is essential for these agents if they are to be effective collaborators and ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
look forward to a future where robots collaborate with humans in the home and workplace, and virtual agents collaborate with humans in games and training simulations. A representation of common ground for everyday scenarios is essential for these agents if they are to be effective collaborators and communicators. Effective collaborators can infer a partner’s goals and predict future actions. Effective communicators can infer the meaning of utterances based on semantic context. This thesis introduces a computational cognitive model of common ground called a Plan Network. A Plan Network is a statistical model that provides representations of social roles, object affordances, and expected patterns of behavior and language. I describe a methodology for unsupervised learning of a Plan Network using a multiplayer video game, visualization of this network, and evaluation of the learned model with respect to human judgment of typical behavior. Specifically, I describe learning the Restaurant Plan Network from data collected from

