• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

State Tying For Context Dependent Phoneme Models (0)

by K. Beulen, E. Bransch, H. Ney
Add To MetaCart

Tools

Sorted by:
Results 1 - 9 of 9

Automatic Question Generation For Decision Tree Based State Tying

by K. Beulen, H. Ney - Proceedings of the IEEE Conference on Acoustics, Speech and Signal Processing , 1998
"... Decision tree based state tying uses so-called phonetic questions to assign triphone states to reasonable acoustic models. These phonetic questions are in fact phonetic categories such as vowels, plosives or fricatives. The assumption behind this is that context phonemes which belong to the same pho ..."
Abstract - Cited by 13 (3 self) - Add to MetaCart
Decision tree based state tying uses so-called phonetic questions to assign triphone states to reasonable acoustic models. These phonetic questions are in fact phonetic categories such as vowels, plosives or fricatives. The assumption behind this is that context phonemes which belong to the same phonetic class have a similar influence on the pronunciation of a phoneme. For a new phoneme set, which has to be used e.g. when switching to a different corpus, a phonetic expert is needed to define proper phonetic questions. In this paper a new method is presented which automatically defines good phonetic questions for a phoneme set. This method uses the intermediate clusters from a phoneme clustering algorithm which are reduced to an appropriate number afterwards. Recognition results on the Wall Street Journal data for within-word and acrossword phoneme models show competitive performance of the automatically generated questions with our best handcrafted question set.

Context-Dependent Acoustic Modeling Using Graphemes For Large Vocabulary Speech Recognition

by S. Kanthak, H. Ney - in Proceedings the ICASSP , 2002
"... In this paper we propose to use a decision tree based on graphemic acoustic sub-word units together with phonetic questions. We also show that automatic question generation can be used to completely eliminate any manual effort. ..."
Abstract - Cited by 12 (2 self) - Add to MetaCart
In this paper we propose to use a decision tree based on graphemic acoustic sub-word units together with phonetic questions. We also show that automatic question generation can be used to completely eliminate any manual effort.

Pronunciation Modelling In The Rwth Large Vocabulary Speech Recognizer

by K. Beulen, J. Overmann, A. Eiden, S. Martin, L. Welling, J. Overmann, H. Ney , 1998
"... this paper we describe the application of pronunciation variants for our large vocabulary continuous speech recognizer. We will explain how the pronunciation variants were used in training and recognition and give some recognition results on three different corpora. The recognition tests were perfor ..."
Abstract - Cited by 8 (0 self) - Add to MetaCart
this paper we describe the application of pronunciation variants for our large vocabulary continuous speech recognizer. We will explain how the pronunciation variants were used in training and recognition and give some recognition results on three different corpora. The recognition tests were performed on the Wall Street Journal (WSJ) November 92 development and evaluation corpora (5 000 words), the North American Business (NAB) H1 development corpus (20 000 words) and on the Verbmobil 1996 evaluation corpus (5 000 words). For the WSJ and NAB corpora, a slight improvement in recognition accuracy can be observed, while for the Verbmobil corpus the error rate remains unchanged

Discriminative Training For Large Vocabulary Telephone-Based Name Recognition

by Erik Mcdermott, Alain Biem, Seiichi Tenpaku, Shigeru Katagiri , 2000
"... This paper describes progress on a commercial application of the MECS recognition system to the task of recognizing Japanese family names spoken by customers into the answering machines of a large marketing/human resource company. The task is thus speaker-independent, open vocabulary, and is charact ..."
Abstract - Cited by 7 (4 self) - Add to MetaCart
This paper describes progress on a commercial application of the MECS recognition system to the task of recognizing Japanese family names spoken by customers into the answering machines of a large marketing/human resource company. The task is thus speaker-independent, open vocabulary, and is characterized by large variation in caller speaking styles, telephone types and acoustic environments. Our results show that context-independent hidden Markov models trained discriminatively with the Minimum Classification Error criterion are a practical alternative to contextdependent models based on phonetic decision trees, yielding better performance with a much smaller number of parameters. On this difficult task we have obtained 59% correct family name recognition. A phoneme-based confidence measure enables us to obtain 85% correct name recognition for accepted utterances, at an overall utterance acceptance rate of 15%.

Multilingual Acoustic Modeling Using Graphemes

by S. Kanthak, H. Ney - IN PROCEEDINGS OF EUROPEAN CONFERENCE ON SPEECH COMMUNICATION AND TECHNOLOGY , 2003
"... In this paper we combine grapheme-based sub-word units with multilingual acoustic modeling. We show that a global decision tree together with automatically generated grapheme questions eliminate manual effort completely. We also investigate the effects of additional language questions. We present ..."
Abstract - Cited by 5 (0 self) - Add to MetaCart
In this paper we combine grapheme-based sub-word units with multilingual acoustic modeling. We show that a global decision tree together with automatically generated grapheme questions eliminate manual effort completely. We also investigate the effects of additional language questions. We present

Noise Level Normalization And Reference Adaptation For Robust Speech Recognition

by Florian Hilger, Hermann Ney, Lehrstuhl Fur Informatik Vi - in ASR2000 -- International Workshop on Automatic Speech Recognition , 2000
"... This paper describes an approach to normalize the noise level of a speech signal at the outputs of the Mel scaled filter--bank used in MFCC--feature extraction. An adaptive normalizing function that distinguishes between speech and silence parts of the signal is used to normalize the noise level, wi ..."
Abstract - Cited by 4 (3 self) - Add to MetaCart
This paper describes an approach to normalize the noise level of a speech signal at the outputs of the Mel scaled filter--bank used in MFCC--feature extraction. An adaptive normalizing function that distinguishes between speech and silence parts of the signal is used to normalize the noise level, without altering the speech parts of the signal. This technique is combined with an adaptation of the reference vectors, depending on the average norm of the incoming feature vectors. On a database with training data recorded in office environment and testing data recorded in driving cars, the word error rate could be reduced from 35.5% to 14.7% for the city traffic testing set and from 78.0% to 24.1% for the highway testing set. 1. INTRODUCTION Noise level normalization (NLN) is based on the observance, that a combination spectral subtraction (SS) and signal--to--noise--ratio normalization (SNRN) gives better recognition results when the subtraction and normalization are only applied to the...

Dynamic Programming Search Techniques For Across-Word Modelling In Speech Recognition

by Klaus Beulen, Stefan Ortmanns, Christian Elting - in Speech Recognition, Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing , 1999
"... We describe the integration of across-word models in the RWTH large vocabulary continuous speech recognition system, where our main focus is on the realization of the acoustic recognition process. This paper presents a study of two search methods based on the priniciple of dynamic programming. For b ..."
Abstract - Cited by 4 (0 self) - Add to MetaCart
We describe the integration of across-word models in the RWTH large vocabulary continuous speech recognition system, where our main focus is on the realization of the acoustic recognition process. This paper presents a study of two search methods based on the priniciple of dynamic programming. For both methods we discuss the implementation details and give experimental results on the Verbmobil and on the Wall Street Journal data. In addition, we introduce a score interpolation of within-word and across-word models for both search methods. In combination with across-word models this interpolation technique gives an improvement of the recognition accuracy by 14% relative to our standard system. 1. INTRODUCTION This paper describes the integration of across-word modelling into the RWTH large vocabulary continuous speech recognition system [5]. In particular, we consider two search methods, namely the n-best and one-pass approach, for handling across-word models. Both methods are based o...

Refining Tree-Based State Clustering by Means of Formal Concept Analysis, Balanced Decision Trees and Automatically Generated Model-Sets

by Daniel Willett, Christoph Neukirchen, Jörg Rottland, J Org Rottl, Gerhard Rigoll - In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP99 , 1999
"... Decision tree-based state clustering has emerged in recent years as the most popular approach for clustering the states of context dependent hidden Markov model based speech recognizers. The application of sets of phones, mainly phonetically motivated, that limit the possible clusters, results in a ..."
Abstract - Cited by 2 (0 self) - Add to MetaCart
Decision tree-based state clustering has emerged in recent years as the most popular approach for clustering the states of context dependent hidden Markov model based speech recognizers. The application of sets of phones, mainly phonetically motivated, that limit the possible clusters, results in a reasonably good modeling of unseen phones while it still enables to model specific phones very precisely whenever this is necessary and enough training data is available. Formal Concept Analysis, a young mathematical discipline, provides means for the treatment of sets and sets of sets that are well suited for further improving tree-based state clustering. The possible refinements are outlined and evaluated in this paper. The major merit is the proposal of procedures for the adaptation of the number of sets used for clustering to the amount of available training data, and of a method that generates suitable sets automatically without the incorporation of additional knowledge. 1. INTRODUCTIO...

The ATR HIP Laboratories Minimum Error Classification System (MECS) for Speech Recognition

by Alain Biem, Shigeru Katagiri, Erik Mcdermott, Eric Woudenberg , 2001
"... This manual describes the Minimum Error Classification System (MECS) developed at ATR... ..."
Abstract - Add to MetaCart
This manual describes the Minimum Error Classification System (MECS) developed at ATR...
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University