Results 11 - 20
of
44
Hidden Semi-Markov Model Based Speech Synthesis
- in Proc. of ICSLP, 2004
, 2004
"... In the present paper, a hidden-semi Markov model (HSMM) based speech synthesis system is proposed. In a hidden Markov model (HMM) based speech synthesis system which we have proposed, rhythm and tempo are controlled by state duration probability distributions modeled by single Gaussian distributions ..."
Abstract
-
Cited by 12 (5 self)
- Add to MetaCart
In the present paper, a hidden-semi Markov model (HSMM) based speech synthesis system is proposed. In a hidden Markov model (HMM) based speech synthesis system which we have proposed, rhythm and tempo are controlled by state duration probability distributions modeled by single Gaussian distributions. To synthesis speech, it constructs a sentence HMM corresponding to an arbitralily given text and determine state durations maximizing their probabilities, then a speech parameter vector sequence is generated for the given state sequence. However, there is an inconsistency: although the speech is synthesized from HMMs with explicit state duration probability distributions, HMMs are trained without them. In the present paper, we introduce an HSMM, which is an HMM with explicit state duration probability distributions, into the HMM-based speech synthesis system. Experimental results show that the use of HSMM training improves the naturalness of the synthesized speech.
Mapping eye movements to cognitive processes
, 1999
"... policies, either expressed or implied, of the NSF or the U.S. government. ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
policies, either expressed or implied, of the NSF or the U.S. government.
Durational Modelling For Improved Connected Digit Recognition
"... A durational modelling technique is proposed for CDHMM-based connected digit recognition. This reduces the insertion error rate, which is typically the most frequent recognition error observed when no grammar constraint is applied. Insertion errors can be attributed in part to the acknowledged weakn ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
A durational modelling technique is proposed for CDHMM-based connected digit recognition. This reduces the insertion error rate, which is typically the most frequent recognition error observed when no grammar constraint is applied. Insertion errors can be attributed in part to the acknowledged weakness of the acoustic models for accurate temporal modeling of speech signals. Two forms of durational model are investigated: an expanded-state model and an explicit model. Both forms of model significantly reduce the number of insertion errors and hence the digit string error rate. A modification to the explicit model which also accounts for speaking rate is described.
Context-dependent word duration modelling for robust speech recognition
- in Proc. Interspeech
, 2005
"... Conventional hidden Markov models (HMMs) have weak duration constraints. This may cause the decoder to produce word matches with unrealistic durations in noisy situations. This paper describes techniques for modelling context-dependent word duration cues and incorporating them directly in a multi-st ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
Conventional hidden Markov models (HMMs) have weak duration constraints. This may cause the decoder to produce word matches with unrealistic durations in noisy situations. This paper describes techniques for modelling context-dependent word duration cues and incorporating them directly in a multi-stack decoding algorithm. The proposed model is capable of penalising duration constraints of a word depending on its context. Experiments on connected digit recognition show that the new system can significantly improve recognition performance at different noise levels. 1.
Using Phone Durations in Finnish Large Vocabulary Continuous Speech Recognition
, 2004
"... Finnish is one of the languages where phone durations discriminate between words and have in that way a significant role in the proper recognition of speech. Modern large vocabulary continuous speech recognizers do not offer reasonable means to model these durations, which would be necessary in orde ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
Finnish is one of the languages where phone durations discriminate between words and have in that way a significant role in the proper recognition of speech. Modern large vocabulary continuous speech recognizers do not offer reasonable means to model these durations, which would be necessary in order to seamlessly deal with such a language. Therefore some explicit actions have to be taken to be able to distinguish certain words from each other as the only cues for doing this might be prosodic ones, namely the durations. In this work, an extension of an existing speech recognition system to include models for discriminatively important phone durations is studied. The explicit duration model applied resulted in 5% relative reduction in the letter error rate of the recognition task.
KEYWORDS EXPANDED HIDDEN MARKOV MODELS: ALLOWING SYMBOL EMISSIONS IN STATE CHANGES
"... In this paper we formally expand hidden Markov models (HMM) by symbol emissions in state changes. These expanded hidden Markov models (eHMM) can contain more information than original HMM with the same number of states. This is a necessary step towards the definition of hidden non-Markovian models o ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
In this paper we formally expand hidden Markov models (HMM) by symbol emissions in state changes. These expanded hidden Markov models (eHMM) can contain more information than original HMM with the same number of states. This is a necessary step towards the definition of hidden non-Markovian models on the basis of discrete stochastic models. These are most of the time event driven, which makes it necessary to attach information to the state changes that represent the events. The paper shows that the extended paradigm is to some extent equivalent to original HMM, and gives an example of the new possibilities using hidden non-Markovian models.
Optimal Filtering and Smoothing for Speech Recognition Using a Stochastic Target Model
, 1996
"... This paper presents a stochastic target model of speech production, where articulator motion in the vocal tract is represented by the state of a Markov-modulated linear dynamical system, driven by a piecewise-deterministic control trajectory, and observed through a non-linear function representing t ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
This paper presents a stochastic target model of speech production, where articulator motion in the vocal tract is represented by the state of a Markov-modulated linear dynamical system, driven by a piecewise-deterministic control trajectory, and observed through a non-linear function representing the articulatory-acoustic mapping. Optimal #ltering and smoothing algorithms for estimating the hidden states of the model from acoustic measurements are derived using a measure-change technique, and require solution of recursive integral equations. A sub-optimal approximation is developed, and illustrated using examples taken from real speech. 1.
Gaze-Contingent Automatic Speech Recognition
, 2006
"... This study investigated recognition systems that combine loosely coupled modalities, integrating eye movements in an Automatic Speech Recognition (ASR) system as an exemplar. A probabilistic framework for combining modalities was formalised and applied to the specific case of integrating eye movemen ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
This study investigated recognition systems that combine loosely coupled modalities, integrating eye movements in an Automatic Speech Recognition (ASR) system as an exemplar. A probabilistic framework for combining modalities was formalised and applied to the specific case of integrating eye movement and speech. A corpus of a matched eye movement and related spontaneous conversational British English speech for a visual-based, goal-driven task was collected. This corpus enabled the relationship between the modalities to be verified. Robust extraction of visual attention from eye movement data was investigated using Hidden Markov Models and Hidden Semi-Markov Models. Gaze-contingent ASR systems were developed from a research-grade baseline ASR system by redistributing language model probability mass according to the visual attention. The best performing systems maintained the Word Error Rates but showed an increase in the Figure of Merit- a measure of the keyword spotting accuracy and integration success. The core values of this work may be useful for developing robust multimodal decoding system functions.

