Results 1 -
6 of
6
A tutorial on hidden markov models and selected applications in speech recognition
- Proceedings of the IEEE
, 1989
"... Although initially introduced and studied in the late 1960s and early 1970s, statistical methods of Markov source or hidden Markov modeling have become increasingly popular in the last several years. There are two strong reasons why this has occurred. First the models are very rich in mathematical s ..."
Abstract
-
Cited by 3118 (0 self)
- Add to MetaCart
Although initially introduced and studied in the late 1960s and early 1970s, statistical methods of Markov source or hidden Markov modeling have become increasingly popular in the last several years. There are two strong reasons why this has occurred. First the models are very rich in mathematical structure and hence can form the theoretical basis for use in a wide range of applications. Sec-ond the models, when applied properly, work very well in practice for several important applications. In this paper we attempt to care-fully and methodically review the theoretical aspects of this type of statistical modeling and show how they have been applied to selected problems in machine recognition of speech. I.
Robust endpoint detection and energy normalization for real-time speech and speaker recognition
- IEEE Transactions on Speech and Audio Processing
, 2002
"... Abstract—When automatic speech recognition (ASR) and speaker verification (SV) are applied in adverse acoustic environments, endpoint detection and energy normalization can be crucial to the functioning of both systems. In low signal-to-noise ratio (SNR) and nonstationary environments, conventional ..."
Abstract
-
Cited by 23 (0 self)
- Add to MetaCart
Abstract—When automatic speech recognition (ASR) and speaker verification (SV) are applied in adverse acoustic environments, endpoint detection and energy normalization can be crucial to the functioning of both systems. In low signal-to-noise ratio (SNR) and nonstationary environments, conventional approaches to endpoint detection and energy normalization often fail and ASR performances usually degrade dramatically. The purpose of this paper is to address the endpoint problem. For ASR, we propose a real-time approach. It uses an optimal filter plus a three-state transition diagram for endpoint detection. The filter is designed utilizing several criteria to ensure accuracy and robustness. It has almost invariant response at various background noise levels. The detected endpoints are then applied to energy normalization sequentially. Evaluation results show that the proposed algorithm significantly reduces the string error rates in low SNR situations. The reduction rates even exceed 50 % in several evaluated databases. For SV, we propose a batch-mode approach. It uses the optimal filter plus a two-mixture energy model for endpoint detection. The experiments show that the batch-mode algorithm can detect endpoints as accurately as using HMM forced alignment while the proposed one has much less computational complexity. Index Terms—Change-point detection, edge detection, endpoint detection, optimal filter, robust speech recognition, speaker verification, speech activity detection, speech detection. I.
A Robust, Real-Time Endpoint Detector with Energy Normalization for ASR in Adverse Environments
, 2001
"... When automatic speech recognition (ASR) is applied to hands-free or other adverse acoustic environments, endpoint detection and energy normalization can be crucial to the entire system. In low signal-to-noise (SNR) situations,conventional approaches of endpointing and energy normalization often fail ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
When automatic speech recognition (ASR) is applied to hands-free or other adverse acoustic environments, endpoint detection and energy normalization can be crucial to the entire system. In low signal-to-noise (SNR) situations,conventional approaches of endpointing and energy normalization often fail and ASR performances usually degrade dramatically. The goal of this paper is to find a fast, accurate, and robust endpointing algorithm for real-time ASR. We propose a novel approach of using a special filter plus a 3-state decision logic for endpoint detection. The filter has been designed under several criteria to ensure the accuracy and robustness of detection. The detected endpoints are then applied to energy normalization simultaneously. Evaluation results show that the proposed algorithm significantly reduce the string error rates on 7 out of 12 tested databases. The reduction rates even exceeded 50% on two of them. The algorithm only uses one-dimensional energy with 24-frame lookahead; therefore, it has a low complexity and is suitable for real-time ASR.
Analysis of LPC/DFT Features for an HMM-based Alphadigit Recognizer
- in the Signal Processing Letters
, 1995
"... The search for better and more robust performance of speech recognition systems is ongoing. Much of the improvement is likely to come from better acoustic feature analysis. In this letter, the results from a significant experiment are reported; these show how a warped-DFT analysis outperforms an LPC ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
The search for better and more robust performance of speech recognition systems is ongoing. Much of the improvement is likely to come from better acoustic feature analysis. In this letter, the results from a significant experiment are reported; these show how a warped-DFT analysis outperforms an LPC-cepstral analysis in a significant way, supporting results by other researchers for different recognition tasks. An analysis of nasal-letter performance is used to show the development of the warped-DFT feature analysis. Keywords--- Cepstral Features, ANN, HMM. I. Introduction Different types of hidden Markov model (HMM)-based algorithms have been used successfully in speech recognition systems along with artificial neural networks (ANN), dynamic time warping (DTW) and template matching (TM) algorithms. In all these systems, the properties of the feature set play a very crucial role. In this letter, an HMM-based explicit-duration, talker-independent, connected-alphadigit recognizer is use...
Computations and Evaluations of an Optimal Feature-set for an HMM-based Recognizer
, 1996
"... The benefits of a speech recognition machine would be many, resulting in the improvement of the quality of life for people. The design of a speech recognition system can be divided into two parts, commonly known as the front-end and back-end. The front-end deals with the conversion of the analog sp ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
The benefits of a speech recognition machine would be many, resulting in the improvement of the quality of life for people. The design of a speech recognition system can be divided into two parts, commonly known as the front-end and back-end. The front-end deals with the conversion of the analog speech signal into features for classification. This thesis investigates optimal feature-sets for speech recognition. The objectives for an optimal feature-set are improved recognition performance, noise robustness, talker insensitivity and efficiency. Three problems that make it difficult to find optimal features are: 1) the amount of resources (time and computations) required to evaluate the performance of a feature-set, 2) the size of the feature space, and 3) the dependence of features upon some words in t...
Speaker Independent Isolated Digit Voice Recognition Using Discrete Hidden Markov Model
, 2000
"... Many practical interactive voice response systems require speaker independent speech recognition. Achieving speaker independence is dicult as we do not have direct methods to prepare speaker independent reference patterns of the sub-units of the speech and compare a given sub-unit of speech with the ..."
Abstract
- Add to MetaCart
Many practical interactive voice response systems require speaker independent speech recognition. Achieving speaker independence is dicult as we do not have direct methods to prepare speaker independent reference patterns of the sub-units of the speech and compare a given sub-unit of speech with them. Hidden Markov Models provide better means than other methods to achieve speaker independence with the help of training speech by a suciently large number of speakers. Hidden Markov models have the inherent capability to model the variations in speed of the speech. We developed an interactive voice response system based on discrete Hidden Markov Models. In our system we use a word detector and a linear prediction based signal processing front end which are also developed in this work. We recorded telephone quality speech with the help of modem interface and prepared database of spoken digits of 160 speakers using modem for the training purpose to achieve speaker independence. We also present dierent ne tuning methods to improve the performance of speech recognition. We also present word rejection criterion to improve con dence of the recognition. We also present an interactive voice response system which is developed using the technology developed in this thesis.

