Results 11 -
17 of
17
Coarticulation modeling by embedding a target-directed hidden trajectory model into HMM – Model and training
"... The Hidden Dynamic Model (HDM) has been an attractive acoustic modeling approach because it provides a computational model for coarticulation and the dynamics of human speech. However, the lack of a direct decoding algorithm has been a barrier to research progress on HDM. We have developed a new HDM ..."
Abstract
-
Cited by 8 (3 self)
- Add to MetaCart
The Hidden Dynamic Model (HDM) has been an attractive acoustic modeling approach because it provides a computational model for coarticulation and the dynamics of human speech. However, the lack of a direct decoding algorithm has been a barrier to research progress on HDM. We have developed a new HDM-based acoustic model, the Hidden-Trajectory HMM (HTHMM), which combines the state/mixture topology of a traditional monophone HMM with a target-directed hidden-trajectory model (a special form of HDM) for coarticulation modeling. Because the classical Viterbi algorithm is not admissible, we have developed a novel MAP decoding algorithm for HTHMM that correctly takes the hidden continuous trajectory into account. This paper introduces our new HTHMM decoder that allows for the first time to evaluate an HDM-type model by direct decoding instead of-best rescoring. Using direct decoding, we demonstrate that the coarticulatory mechanism of our HTHMM matches traditional context-dependent modeling (enumeration of model parameters): The context-independent HTHMM has slightly better accuracy than a crossword-triphone HMM on the Aurora2 task. The decoder also enables us to include state-boundary optimization into the HDM/HTHMM training procedure. This paper presents the detailed decoding algorithm and evaluation results, while in [1] we present the HTHMM model itself and parameter training. 1.
Extensions To The Word Graph Method For Large Vocabulary Continuous Speech Recognition
- Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing
, 1997
"... This paper describes two methods for constructing word graphs for large vocabulary continuous speech recognition. Both word graph methods are based on a time-synchronous, left-to-right beam search strategy in connection with a tree-organized pronunciation lexicon. The first method is based on the so ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
This paper describes two methods for constructing word graphs for large vocabulary continuous speech recognition. Both word graph methods are based on a time-synchronous, left-to-right beam search strategy in connection with a tree-organized pronunciation lexicon. The first method is based on the so-called word pair approximation and fits directly into a word-conditioned search organization. In order to avoid the assumptions made in the word pair approximation, we design another word graph method. This method is based on a time conditioned factoring of the search space. For the case of a trigram language model, we give a detailed comparison of both word graph methods with an integrated search method. The experiments have been carried out on the North American Business (NAB'94) 20,000-word task.
Vocabulary-independent search in spontaneous speech
- In Proceedings of ICASSP
, 2004
"... For efficient organization of speech recordings – meetings, interviews, ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
For efficient organization of speech recordings – meetings, interviews,
SEARCHING THE AUDIO NOTEBOOK: KEYWORD
"... MIT’s Audio Notebook added great value to the note-taking process by retaining audio recordings, e.g. during lectures or interviews. The key was to provide users ways to quickly and easily access portions of interest in a recording. Several non-speech-recognition based techniques were employed. In t ..."
Abstract
- Add to MetaCart
MIT’s Audio Notebook added great value to the note-taking process by retaining audio recordings, e.g. during lectures or interviews. The key was to provide users ways to quickly and easily access portions of interest in a recording. Several non-speech-recognition based techniques were employed. In this paper we present a system to search directly the audio recordings by key phrases. We have identified the user requirements as accurate ranking of phrase matches, domain independence, and reasonable response time. We address these requirements by a hybrid word/phoneme search in lattices, and a supporting indexing scheme. We will introduce the ranking criterion, a unified hybrid posterior-lattice representation, and the indexing algorithm for hybrid lattices. We present results for five different recording sets, including meetings, telephone conversations, and interviews. Our results show an average search accuracy of 84%, which is dramatically better than a direct search in speech recognition transcripts (less than 40 % search accuracy). 1
LargevocabuCC, continu,x
, 2002
"... Au,u,4, speech recognition of real-live broadcast news (BN) data(Hu,;: has become a challenging research topic in recent years. This papersur,#CC4; ou key e#orts tobu:6 a largevocabu:6: continu6: speech recognition system for the heterogenou BN taskwithou induuq uduuq6 complexity andcompu4q, ..."
Abstract
- Add to MetaCart
Au,u,4, speech recognition of real-live broadcast news (BN) data(Hu,;: has become a challenging research topic in recent years. This papersur,#CC4; ou key e#orts tobu:6 a largevocabu:6: continu6: speech recognition system for the heterogenou BN taskwithou induuq uduuq6 complexity andcompu4q,x;:# resou4q,x These key e#orts inclu,CC .
Developing HMM-based Recognizers . . .
, 1999
"... ESMERALDA is an integrated environment for the development of speech recognition systems. It provides a powerful selection of methods for building statistical models together with an efficient incremental recognizer. In this paper the approaches adopted for estimating mixture densities, Hidden M ..."
Abstract
- Add to MetaCart
ESMERALDA is an integrated environment for the development of speech recognition systems. It provides a powerful selection of methods for building statistical models together with an efficient incremental recognizer. In this paper the approaches adopted for estimating mixture densities, Hidden Markov Models, and n-gram language models are described as well as the algorithms applied during recognition. Evaluation results on a speaker independent spontaneous speech recognition task demonstrate the capabilities of ESMERALDA.
The Time-Conditioned Approach in Dynamic Programming Search for LVCSR
"... Abstract—This paper presents the time-conditioned approach in dynamic programming search for large-vocabulary continuousspeech recognition. The following topics are presented: the baseline algorithm, a time-synchronous beam search version, a comparison with the word-conditioned approach, a compariso ..."
Abstract
- Add to MetaCart
Abstract—This paper presents the time-conditioned approach in dynamic programming search for large-vocabulary continuousspeech recognition. The following topics are presented: the baseline algorithm, a time-synchronous beam search version, a comparison with the word-conditioned approach, a comparison with stack decoding. The approach has been successfully tested on the NAB task using a vocabulary of 64 000 words. Index Terms—Beam search, dynamic programming, large vocabulary speech recognition, one-pass DP search, search organization, time-conditioned DP search. I.

