Results 1 - 10
of
58
Maximum A Posteriori Estimation for Multivariate Gaussian Mixture Observations of Markov Chains
- IEEE Transactions on Speech and Audio Processing
, 1994
"... In this paper a framework for maximum a posteriori (MAP) estimation of hidden Markov models (HMM) is presented. Three key issues of MAP estimation, namely the choice of prior distribution family, the specification of the parameters of prior densities and the evaluation of the MAP estimates, are addr ..."
Abstract
-
Cited by 372 (36 self)
- Add to MetaCart
In this paper a framework for maximum a posteriori (MAP) estimation of hidden Markov models (HMM) is presented. Three key issues of MAP estimation, namely the choice of prior distribution family, the specification of the parameters of prior densities and the evaluation of the MAP estimates, are addressed. Using HMMs with Gaussian mixture state observation densities as an example, it is assumed that the prior densities for the HMM parameters can be adequately represented as a product of Dirichlet and normal-Wishart densities. The classical maximum likelihood estimation algorithms, namely the forward-backward algorithm and the segmental k-means algorithm, are expanded and MAP estimation formulas are developed. Prior density estimation issues are discussed for two classes of applications: parameter smoothing and model adaptation, and some experimental results are given illustrating the practical interest of this approach. Because of its adaptive nature, Bayesian learning is shown to serve as a unified approach for a wide range of speech recognition applications
Blind Separation of Synchronous Co-Channel Digital Signals Using an Antenna Array. Part I. Algorithms
- IEEE Transactions on Signal Processing
, 1995
"... We propose a maximum-likelihood approach for separating and estimating multiple synchronous digital signals arriving at an antenna array. The spatial response of the array is assumed to be known imprecisely or unknown. We exploit the finite alphabet (FA) property of digital signals to simultaneou ..."
Abstract
-
Cited by 46 (6 self)
- Add to MetaCart
We propose a maximum-likelihood approach for separating and estimating multiple synchronous digital signals arriving at an antenna array. The spatial response of the array is assumed to be known imprecisely or unknown. We exploit the finite alphabet (FA) property of digital signals to simultaneously determine the array response and the symbol sequence for each signal. Uniqueness of the estimates is established for signals with linear modulation formats. We introduce a signal detection technique based on the FA property which is different from a standard linear combiner. Computationally efficient algorithms for both block and recursive estimation of the signals are presented. This new approach is applicable to an unknown array geometry and propagation environment, which is particularly useful in wireless communication systems. Simulation results demonstrate its promising performance. Email: talwar@sccm.stanford.edu, Ph: (415) 723-0061, Fax: (415) 723-2411. This work was suppor...
Lexical Modeling in a Speaker Independent Speech Understanding System
, 1993
"... Over the past 40 years, significant progress has been made in the fields of speech recognition and speech understanding. Current state-of-the-art speech recognition systems are capable of achieving word-level accuracies of 90 % to 95 % on continuous speech recognition tasks using 5000 words. Even la ..."
Abstract
-
Cited by 39 (8 self)
- Add to MetaCart
Over the past 40 years, significant progress has been made in the fields of speech recognition and speech understanding. Current state-of-the-art speech recognition systems are capable of achieving word-level accuracies of 90 % to 95 % on continuous speech recognition tasks using 5000 words. Even larger systems, capable of recognizing 20,000 words are just now being developed. Speech understanding systems have recently been developed that perform fairly well within a restricted domain. While the size and performance of modern speech recognition and understanding systems are impressive, it is evident to anyone who has used these systems that the technology is primitive compared to our own human ability to understand speech. Some of the difficulties hampering progress in the fields of speech recognition and understanding stem from the many sources of variation that occur during human communication. One of the sources of variation that occurs in human communication is the different ways that words can be pronounced. There are many causes of pronunciation variation, such as: the phonetic environment in which the word occurs, the dialect of the speaker,
Smooth On-Line Learning Algorithms for Hidden Markov Models
, 1994
"... he modeling and analysis of DNA and protein sequences in biology (Baldi et al. (1992) and (1993), Cardon and Stormo (1992), Haussler et al. (1992), Krogh et al. (1993), and references therein) and optical character recognition (Levin and Pieraccini (1993)). A first order HMMM is characterized by a ..."
Abstract
-
Cited by 36 (5 self)
- Add to MetaCart
he modeling and analysis of DNA and protein sequences in biology (Baldi et al. (1992) and (1993), Cardon and Stormo (1992), Haussler et al. (1992), Krogh et al. (1993), and references therein) and optical character recognition (Levin and Pieraccini (1993)). A first order HMMM is characterized by a set of states, an alphabet of symbols, a probability transition matrix T = (t ij ) and a probability emission matrix E = (e ij ). The parameter t ij (resp. e ij ) represents the probability of transition from state i to state j (resp. of emission of symbol j from state i). HMMs can be viewed as adaptive systems: given a training sequence of symbols O, the parameters of a HMM can be iteratively adjusted in order the optimize the fit between the model and the data, as measu
Bayesian Adaptive Learning of the Parameters of Hidden Markov Model for Speech Recognition
"... In this paper a theoretical framework for Bayesian adaptive learning of discrete HMM and semi-continuous one with Gaussian mixture state observation densities is presented. Corresponding to the well-known Baum-Welch and segmental k-means algorithms respectively for HMM training, formulations of MAP ..."
Abstract
-
Cited by 21 (3 self)
- Add to MetaCart
In this paper a theoretical framework for Bayesian adaptive learning of discrete HMM and semi-continuous one with Gaussian mixture state observation densities is presented. Corresponding to the well-known Baum-Welch and segmental k-means algorithms respectively for HMM training, formulations of MAP (maximum aposteriori) and segmental MAP estimation of HMM parameters are developed. Furthermore, a computationally efficient method of the segmental quasi-Bayes estimation for semi-continuous HMM is also presented. The important issue of prior density estimation is discussed and a simplified method of moment estimate is given. The method proposed in this paper will be applicable to some problems in HMM training for speech recognition such as sequential or batch training, model adaptation, and parameter smoothing, etc.
Using Self-Organizing Maps and Learning Vector Quantization for Mixture Density Hidden Markov Models
, 1997
"... This work presents experiments to recognize pattern sequences using hidden Markov models (HMMs). The pattern sequences in the experiments are computed from speech signals and the recognition task is to decode the corresponding phoneme sequences. The training of the HMMs of the phonemes using the col ..."
Abstract
-
Cited by 19 (8 self)
- Add to MetaCart
This work presents experiments to recognize pattern sequences using hidden Markov models (HMMs). The pattern sequences in the experiments are computed from speech signals and the recognition task is to decode the corresponding phoneme sequences. The training of the HMMs of the phonemes using the collected speech samples is a difficult task because of the natural variation in the speech. Two neural computing paradigms, the Self-Organizing Map (SOM) and the Learning Vector Quantization (LVQ) are used in the experiments to improve the recognition performance of the models. A HMM consists of sequential states which are trained to model the feature changes in the signal produced during the modeled process. The output densities applied in this work are mixtures of Gaussian density functions. SOMs are applied to initialize and train the mixtures to give a smooth and faithful presentation of the feature vector space defined by the corresponding training samples. The SOM maps similar feature vect...
A Tutorial On Hidden Markov Models
- Signal Processing and Artificial Neural Networks Laboratory Department of Electrical Engineering Indian Institute of Technology — Bombay Powai, Bombay 400 076, India
, 1996
"... In this tutorial we present an overview of (i) what are HMMs, (ii) what are the different problems associated with HMMs, (iii) the Viterbi algorithm for determining the optimal state sequence, (iv) algorithms associated with training HMMs, and (v) distance between HMMs. 1 Introduction [1] Suppo ..."
Abstract
-
Cited by 19 (0 self)
- Add to MetaCart
In this tutorial we present an overview of (i) what are HMMs, (ii) what are the different problems associated with HMMs, (iii) the Viterbi algorithm for determining the optimal state sequence, (iv) algorithms associated with training HMMs, and (v) distance between HMMs. 1 Introduction [1] Suppose a person has say three coins and is sitting inside a room tossing them in some sequence-- this room is closed and what you are shown (on a display outside the room) is only the outcomes of his tossing TTHTHHTT. . . this will be called the observation sequence . You do not know the sequence in which he is tossing the different coins, nor do you know the bias of the various coins. To appreciate how much the outcome depends on the individual biasing and the order of tossing the coins, suppose you are given that the third coin is highly biased to produce heads and all coins are tossed with equal probability. Then, we naturally expect there to be far greater number of heads than tails in the o...
Robust Entropy-based Endpoint Detection for Speech Recognition in Noisy Environments
, 1998
"... This paper presents an entropy-based algorithm for accurate and robust endpoint detection for speech recognition under noisy environments. Instead of using the conventional energy-based features, the spectral entropy is developed to identify the speech segments accurately. Experimental results show ..."
Abstract
-
Cited by 18 (0 self)
- Add to MetaCart
This paper presents an entropy-based algorithm for accurate and robust endpoint detection for speech recognition under noisy environments. Instead of using the conventional energy-based features, the spectral entropy is developed to identify the speech segments accurately. Experimental results show that this algorithm outperforms the energy-based algorithms in both detection accuracy and recognition performance under noisy environments, with an average error rate reduction of more than 16%. 1. INTRODUCTION Endpoint detection and verification of speech segments become relatively difficult in noisy environments, but are definitely important for robust speech recognition. The short-time energy or spectral energy has been conventionally used as the major feature parameters to distinguish the speech segments from other waveforms [1-4]. However, these features become less reliable and robust in noisy environments, especially in the presence of non-stationary noise and sound artifacts such ...
Hidden Markov modelling of simultaneously recorded cells in the associative cortex of behaving monkeys
- Network: Computation in Neural Systems
, 1997
"... ..."
"Blind" Speech Segmentation: Automatic Segmentation of Speech without Linguistic Knowledge
"... A new automatic speech segmentation procedure, called the "Blind" speech segmentation, is presented. This procedure allows a speech sample to be segmented into sub-word units without the knowledge of any linguistic information (such as, orthographic or phonetic transcription). Hence, this procedure ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
A new automatic speech segmentation procedure, called the "Blind" speech segmentation, is presented. This procedure allows a speech sample to be segmented into sub-word units without the knowledge of any linguistic information (such as, orthographic or phonetic transcription). Hence, this procedure involves finding the optimal number of sub-word segments in the given speech sample, before locating the 1.

