Results 1 -
7 of
7
A tutorial on hidden markov models and selected applications in speech recognition
- Proceedings of the IEEE
, 1989
"... Although initially introduced and studied in the late 1960s and early 1970s, statistical methods of Markov source or hidden Markov modeling have become increasingly popular in the last several years. There are two strong reasons why this has occurred. First the models are very rich in mathematical s ..."
Abstract
-
Cited by 3117 (0 self)
- Add to MetaCart
Although initially introduced and studied in the late 1960s and early 1970s, statistical methods of Markov source or hidden Markov modeling have become increasingly popular in the last several years. There are two strong reasons why this has occurred. First the models are very rich in mathematical structure and hence can form the theoretical basis for use in a wide range of applications. Sec-ond the models, when applied properly, work very well in practice for several important applications. In this paper we attempt to care-fully and methodically review the theoretical aspects of this type of statistical modeling and show how they have been applied to selected problems in machine recognition of speech. I.
An HMM-Based Legal Amount Field OCR System for Checks
- IEEE International Conference on Systems, Man and Cybernetics, Vancouver BC
, 1995
"... The system described in this paper applies Hidden Markov technology to the task of recognizing the handwritten legal amount on personal checks. We argue that the most significant source of error in handwriting recognition is the segmentation process. In traditional handwriting OCR systems, recogniti ..."
Abstract
-
Cited by 10 (5 self)
- Add to MetaCart
The system described in this paper applies Hidden Markov technology to the task of recognizing the handwritten legal amount on personal checks. We argue that the most significant source of error in handwriting recognition is the segmentation process. In traditional handwriting OCR systems, recognition is performed at the character level, using the output of an independent segmentation step. Using a fixed stepsize series of vertical slices from the image, the HMM system described in this paper avoids taking segmentation decisions early in the recognition process. 0 Introduction The current generation of Optical Character Recognition (OCR) systems can be characterized as a pipeline composed of Preprocessing, Segmentation, Classification, and Identification stages. None of these stages are immune to error. Preprocessing may fail to remove existing noise, it may remove portions of the image or add noise by some other mechanism. Segmentation may fail to establish a boundary where there sh...
Mapping context dependent acoustic information into context independent form by LVQ
, 1994
"... In the framework of phonemic speech recognition using Hidden Markov Models (HMMs) together with codebooks trained by Learning Vector Quantization (LVQ), a novel way to model context-dependencies in speech is presented. We use LVQ to map acoustic contextual data into context-independent phonemic form ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
In the framework of phonemic speech recognition using Hidden Markov Models (HMMs) together with codebooks trained by Learning Vector Quantization (LVQ), a novel way to model context-dependencies in speech is presented. We use LVQ to map acoustic contextual data into context-independent phonemic form. The acoustic data is in the form of concatenated averages of successive short-time feature vectors. This mapping eliminates the need to employ context dependent phonemic, for example, triphone HMMs, and the difficulties associated therein. Instead, simpler context-independent discrete observation HMMs suffice. We report excellent results for a speaker dependent task for Finnish. Zusammenfassung Wir diskutieren ein neues Modell von Kontextabhangigkeiten in der phonemischen Spracherkennung. Unser Modell basiert auf den Methoden der Hidden-Markov-Modelle (HMM) und der lernende Vektorquantisierung (LVQ). Wir benutzen LVQ, um Informationen uber den akustischen Kontext in eine kontextunabhang...
Computations and Evaluations of an Optimal Feature-set for an HMM-based Recognizer
, 1996
"... The benefits of a speech recognition machine would be many, resulting in the improvement of the quality of life for people. The design of a speech recognition system can be divided into two parts, commonly known as the front-end and back-end. The front-end deals with the conversion of the analog sp ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
The benefits of a speech recognition machine would be many, resulting in the improvement of the quality of life for people. The design of a speech recognition system can be divided into two parts, commonly known as the front-end and back-end. The front-end deals with the conversion of the analog speech signal into features for classification. This thesis investigates optimal feature-sets for speech recognition. The objectives for an optimal feature-set are improved recognition performance, noise robustness, talker insensitivity and efficiency. Three problems that make it difficult to find optimal features are: 1) the amount of resources (time and computations) required to evaluate the performance of a feature-set, 2) the size of the feature space, and 3) the dependence of features upon some words in t...
A Probabilistic Method for Tracking a Vocalist
, 1998
"... When a musician gives a recital or concert, the music performed generally includes accompaniment. To render a good performance, the soloist and the accompanist must know the musical score and must follow the other musician's performance. Both performing and rehearsing are limited by constraints on t ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
When a musician gives a recital or concert, the music performed generally includes accompaniment. To render a good performance, the soloist and the accompanist must know the musical score and must follow the other musician's performance. Both performing and rehearsing are limited by constraints on the time and money available for bringing musicians together. Computer systems that automatically provide musical accompaniment offer an inexpensive, readily available alternative. Effective computer accompaniment requires software that can listen to live performers and follow along in a musical score. This work presents an implemented system and method for automatically accompanying a singer given a musical score. Specifically, I offer a method for robust, real-time detection of a singer's score position and tempo. Robust score following requires combining information obtained both from analyzing a complex signal (the singer's performance) and from processing symbolic notation (the score). Unfortunately, the mapping from the available information to score position does not define a function. Consequently, this work investigated a statistical characterization of a singer's score position and a model that combines the available musical information to produce a probabilistic position estimate. By making
Improved Stochastic Modeling of Shapes for Content-Based Image Retrieval
- in CBA
, 1999
"... Recent advances in the stochastic modeling of shapes for content-based image database retrieval are presented in this paper. These advances include an integrated approach to shape and color-based retrieval, where the cues color and shape are both utilized in a local rather than a global way, as well ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Recent advances in the stochastic modeling of shapes for content-based image database retrieval are presented in this paper. These advances include an integrated approach to shape and color-based retrieval, where the cues color and shape are both utilized in a local rather than a global way, as well as a novel deformation tolerant method based on (pseudo-) two-dimensional stochastic models. The stochastic modeling itself is based on the use of HMMs, whereas the feature extraction is a polar sampling technique which is also known as shape matrix. In an earlier publication, it has been demonstrated that this combination of feature extraction and HMMs is able to perform an elastic matching, which is especially needed in sketch based image retrieval. The use of streams (sets of features that are assumed to be statistically independent) within the HMM framework allows the integration of shape and color derived features into a single model, thereby allowing to control the influence of the different streams via stream weights. Furthermore, these stream weights can also be utilized in order to integrate weighting factors, which have been derived in the context of shape matrices, in order to achieve a more objective comparison between shapes. The weighting factors are based on the fact that the sampling density is not constant with the polar sampling raster.
Statistical Pattern Recognition Techniques for Multimodal Human Computer Interaction and Multimedia Information Processing
- Information Processing,” in Survey Paper, Int. Workshop ”Speech and Computer
, 1999
"... This paper presents an extensive overview on statistical pattern recognition methods for a variety of different tasks, related to multimodal human-computer interaction and multimedia information processing. Typical tasks in the area of human-computer interaction include handwriting and gesture recog ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
This paper presents an extensive overview on statistical pattern recognition methods for a variety of different tasks, related to multimodal human-computer interaction and multimedia information processing. Typical tasks in the area of human-computer interaction include handwriting and gesture recognition, as well as pen-based retrieval of image databases. Multimedia information processing includes algorithms for document processing, video indexing or face recognition. The aim of the paper is to demonstrate to the speech community the usability of classical speech recognition algorithms, such as Hidden Markov Models and related statistical pattern recognition techniques, for a much larger variety of related problems in man-machine-communication and the ecient processing and retrieval of multimedia information.

