Results 1 -
5 of
5
Shared-Distribution Hidden Markov Models for Speech Recognition
, 1991
"... Parameter sharing plays an important role in statistical modeling since training data are usually limited. On the one hand, we would like to use models that are as detailed as possible. On the other hand, with models too detailed, we can no longer reliably estimate the parameters. Triphone generaliz ..."
Abstract
-
Cited by 227 (5 self)
- Add to MetaCart
Parameter sharing plays an important role in statistical modeling since training data are usually limited. On the one hand, we would like to use models that are as detailed as possible. On the other hand, with models too detailed, we can no longer reliably estimate the parameters. Triphone generalization may force two models to be merged together when only parts of the model output distributions are similar, while the rest of the output distributions are different. This problem can be avoided if clustering is carried out at the distribution level. In this paper, a shared-distribution model is proposed to replace generalized triphone models for speaker-independent continuous speech recognition. Here, output distributions in the hidden Markov model are shared with each other if they exhibit acoustic similarity. In addition to detailed representation, it also gives us the freedom to use a large number of states for each phonetic model. Although an increase in the number of states will inc...
Improved On-Line Handwriting Recognition Using Context Dependent Hidden Markov Models
- In Proc. Int. Conference on Document Analysis and Recognition (ICDAR
, 1997
"... This paper presents the introduction of context dependent Hidden Markov Models for cursive, unconstrained handwriting recognition with large vocabularies. Since context dependent models were successfully introduced to speech recognition ([1], [2], [3]), it seems obvious, that the use of trigraphs co ..."
Abstract
-
Cited by 8 (3 self)
- Add to MetaCart
This paper presents the introduction of context dependent Hidden Markov Models for cursive, unconstrained handwriting recognition with large vocabularies. Since context dependent models were successfully introduced to speech recognition ([1], [2], [3]), it seems obvious, that the use of trigraphs could also lead to improved on-line handwriting recognition systems [4]. In analogy to triphones in speech recognition, trigraphs are context dependent sub-word units representing a single written character in its left and right context. The tests were conducted on a writer dependent system with three different writers and two different vocabulary sizes (1000 words and 30000 words). The results we obtained with the trigraph-based system compared to the monograph system are very encouraging: A mean relative error reduction of 46% for the 1000 word handwriting recognition system and a mean relative error reduction of 37% for the same system with the 30000 word vocabulary. We believe that this r...
Context-Dependent Modeling in a Segment-Based Speech Recognition System
- S.M. thesis, MIT
, 1997
"... The goal of this thesis is to explore various strategies for incorporating contextual information into a segment-based speech recognition system, while maintaining computational costs at a level acceptable for implementation in a real-time system. The latter is achieved by using context-independent ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
The goal of this thesis is to explore various strategies for incorporating contextual information into a segment-based speech recognition system, while maintaining computational costs at a level acceptable for implementation in a real-time system. The latter is achieved by using context-independent models in the search, while contextdependent models are reserved for re-scoring the hypotheses proposed by the contextindependent system. Within this framework, several types of context-dependent sub-word units were evaluated, including word-dependent, biphone, and triphone units. In each case, deleted interpolation was used to compensate for the lack of training data for the models. Other types of context-dependent modeling, such as context-dependent boundary modeling and "offset" modeling, were also used successfully in the re-scoring pass. The evaluation of the system was performed using the Resource Management task. Context-dependent segment models were able to reduce the error rate of t...
Making an Effective Use of . . .
, 2007
"... Automatic recognition of continuous speech has been acknowledged as one of the most challenging problems today. The performance of a continuous speech recognition system highly depends on the availability of sufficient speech data and transcripts of good quality. In most cases, however, carefully pr ..."
Abstract
- Add to MetaCart
Automatic recognition of continuous speech has been acknowledged as one of the most challenging problems today. The performance of a continuous speech recognition system highly depends on the availability of sufficient speech data and transcripts of good quality. In most cases, however, carefully prepared in-domain data is not easy to obtain because collecting a large amount of transcribed speech data is normally a time-consuming and expensive process. The acoustic model trained without the support of sufficient training data is less capable in handling the complexity and variability of human speech, and thus performs poorly in real world application. This raises us the questions such as how to effectively exploit the given training data to improve the performance of recognition systems, and how to explore error-prone but informative data sources and incorporate them into acoustic model training. This thesis summarizes our efforts in investigating solutions to address the above issues. The work can be divided into two parts. We first investigate Boosting algorithm, an ensemble based supervised training approach, which iteratively creates multiple acoustic models with complementary error patterns by manipulating the distribution of training data. While a great deal of research has been conducted on Boosting style acoustic model training, the techniques
Application of Triphone Clustering in Acoustic Modeling for Continuous Speech Recognition in Bengali
"... The performance of the acoustic models is highly reflective on the overall performance of any continuous speech recognition system. Hence generation of an accurate and robust acoustic model holds the key to satisfactory recognition performance. As phones are found to vary according to the position o ..."
Abstract
- Add to MetaCart
The performance of the acoustic models is highly reflective on the overall performance of any continuous speech recognition system. Hence generation of an accurate and robust acoustic model holds the key to satisfactory recognition performance. As phones are found to vary according to the position of occurrence within a particular word, context information is of prime importance in acoustic modeling of phonetic signals. In this paper we look at the effect of triphonebased acoustic modeling over monophone based acoustic models in the context of continuous speech recognition in Bengali. Keeping in mind the lack of training resources for triphone-based acoustic modeling in Bengali, we have also described herein, the method of generating triphone clusters using decision tree based techniques. These triphone clusters have then been used to generate tied-state triphone based acoustic models to be used in a continuous speech recognizer. 1.

