Results 1 - 10
of
10
A tutorial on hidden markov models and selected applications in speech recognition
- Proceedings of the IEEE
, 1989
"... Although initially introduced and studied in the late 1960s and early 1970s, statistical methods of Markov source or hidden Markov modeling have become increasingly popular in the last several years. There are two strong reasons why this has occurred. First the models are very rich in mathematical s ..."
Abstract
-
Cited by 3117 (0 self)
- Add to MetaCart
Although initially introduced and studied in the late 1960s and early 1970s, statistical methods of Markov source or hidden Markov modeling have become increasingly popular in the last several years. There are two strong reasons why this has occurred. First the models are very rich in mathematical structure and hence can form the theoretical basis for use in a wide range of applications. Sec-ond the models, when applied properly, work very well in practice for several important applications. In this paper we attempt to care-fully and methodically review the theoretical aspects of this type of statistical modeling and show how they have been applied to selected problems in machine recognition of speech. I.
Modeling Out-Of-Vocabulary Words For Robust Speech Recognition
, 2000
"... This thesis concerns the problem of unknown or out-of-vocabulary (00V) words in continuous speech recognition. Most of today's state-of-the-art speech recognition systems can recognize only words that belong to some predefined finite word vocabulary. When encountering an OOV word, a speech recognize ..."
Abstract
-
Cited by 43 (5 self)
- Add to MetaCart
This thesis concerns the problem of unknown or out-of-vocabulary (00V) words in continuous speech recognition. Most of today's state-of-the-art speech recognition systems can recognize only words that belong to some predefined finite word vocabulary. When encountering an OOV word, a speech recognizer erroneously substitutes the OOV word with a similarly sounding word from its vocabulary. Furthermore, a recognition error due to an OOV word tends to spread errors into neighboring words; dramatically degrading overall recognition performance.
Lexical Modeling in a Speaker Independent Speech Understanding System
, 1993
"... Over the past 40 years, significant progress has been made in the fields of speech recognition and speech understanding. Current state-of-the-art speech recognition systems are capable of achieving word-level accuracies of 90 % to 95 % on continuous speech recognition tasks using 5000 words. Even la ..."
Abstract
-
Cited by 39 (8 self)
- Add to MetaCart
Over the past 40 years, significant progress has been made in the fields of speech recognition and speech understanding. Current state-of-the-art speech recognition systems are capable of achieving word-level accuracies of 90 % to 95 % on continuous speech recognition tasks using 5000 words. Even larger systems, capable of recognizing 20,000 words are just now being developed. Speech understanding systems have recently been developed that perform fairly well within a restricted domain. While the size and performance of modern speech recognition and understanding systems are impressive, it is evident to anyone who has used these systems that the technology is primitive compared to our own human ability to understand speech. Some of the difficulties hampering progress in the fields of speech recognition and understanding stem from the many sources of variation that occur during human communication. One of the sources of variation that occurs in human communication is the different ways that words can be pronounced. There are many causes of pronunciation variation, such as: the phonetic environment in which the word occurs, the dialect of the speaker,
On-Line Cursive Handwriting Recognition Using Speech Recognition Methods
, 1994
"... A hidden Markov model (HMM) based continuous speech recognition system is applied to on-line cursive handwriting recognition. The base system is unmodified except for using handwriting feature vectors instead of speech. Due to inherent properties of HMMs, segmentation of the handwritten script sente ..."
Abstract
-
Cited by 35 (5 self)
- Add to MetaCart
A hidden Markov model (HMM) based continuous speech recognition system is applied to on-line cursive handwriting recognition. The base system is unmodified except for using handwriting feature vectors instead of speech. Due to inherent properties of HMMs, segmentation of the handwritten script sentences is unnecessary. A 1.1% word error rate is achieved for a 3050 word lexicon, 52 character, writer-dependent task and 3%-5% word error rates are obtained for six different writers in a 25,595 word lexicon, 86 character, writer-dependent task. Similarities and differences between the continuous speech and on-line cursive handwriting recognition tasks are explored; the handwriting database collected over the past year is described; and specific implementation details of the handwriting system are discussed. 1. INTRODUCTION Traditionally, the first step in handwriting recognition is the segmentation of words into component characters [1]. However, in modern continuous speech recognition ef...
Modeling Linguistic Features in Speech Recognition
, 2003
"... This paper explores a new approach to speech recognition in which su%R ord u9 ts are modeled in terms of lingu stic featu res. Specifically, we have adopted a scheme of modeling separately themanne and articu lation for theseu nits. A novelty of ou r work is theu se of a generalized defin ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
This paper explores a new approach to speech recognition in which su%R ord u9 ts are modeled in terms of lingu stic featu res. Specifically, we have adopted a scheme of modeling separately themanne and articu lation for theseu nits. A novelty of ou r work is theu se of a generalized definition of place of artic u ation that enables u to map both vowels and consonants into a common lingu stic space. Modeling manner and place separately also allowsu s to explore a mu lti-stage recognition architectu]R in which the search space issuF]((% vely redu4% as more detailed models arebrou]( in. In the 8,000 word PhoneBook isolated word telephone speech recognition task, we show that su ch an approach can achieve a recognition WER that is 10% better than that achieved in the best resu ts reported in theliteratu re. This performance gain comes with improvements in search space andcompu ation time as well.
A Robust Loose Coupling for Speech Recognition and Natural Language Understanding
- IEEE, Bob O'Hara and Al
, 1995
"... The focus of this thesis proposal is to improve the ability of a computational system to understand spoken utterances in a dialogue with a human. Available computational methods for word recognition do not perform as well on spontaneous speech as we would hope. Even a state of the art recognizer ach ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
The focus of this thesis proposal is to improve the ability of a computational system to understand spoken utterances in a dialogue with a human. Available computational methods for word recognition do not perform as well on spontaneous speech as we would hope. Even a state of the art recognizer achieves slightly worse than 70% word accuracy on (nearly) spontaneous speech in a conversation about a specific problem. To address this problem, I will explore novel methods for post-processing the output of a speech recognizer in order to correct errors. I adopt statistical techniques for modeling the noisy channel from the speaker to the listener in order to correct some of the errors introduced there. The statistical model accounts for frequent errors such as simple word/word confusions and short phrasal problems (one-to-many word substitutionsand many-to-one word concatenations). To use the model, a search algorithm is required to find the most likely correction of a given word sequence ...
Phonetic And Prosodic Analysis Of Speech
- Modern Modes of Man-Machine Communication
, 1994
"... : In order to cope with the problems of spontaneous speech (including, for example, hesitations and non-words) it is necessary to extract from the speech signal all information it contains. Modeling of words by segmental units should be supported by suprasegmental units since valuable information is ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
: In order to cope with the problems of spontaneous speech (including, for example, hesitations and non-words) it is necessary to extract from the speech signal all information it contains. Modeling of words by segmental units should be supported by suprasegmental units since valuable information is represented in the prosody of an utterance. We present an approach to flexible and efficient modeling of speech by segmental units and describe extraction and use of suprasegmental information. Keywords: speech recognition, hidden Markov models, prosody, INTRODUCTION This paper presents an approach towards statistical modeling and use of segmental and suprasegmental information in a speech signal. We treat the aspects of word recognition and improvement of linguistic analysis by suprasegmental information. Sect. 1 gives an account of acoustic--phonetic analysis in the ISADORA system for word recognition. It will be demonstrated that it is general enough to also include prosodic informati...
Towards a Unified Framework for Sub-lexical and Supra-lexical Linguistic Modeling
, 2002
"... Conversational interfaces have received much attention as a promising natural communication channel between humans and computers. A typical conversational interface consists of three major systems: speech understanding, dialog management and spoken language generation. In such a conversational inter ..."
Abstract
- Add to MetaCart
Conversational interfaces have received much attention as a promising natural communication channel between humans and computers. A typical conversational interface consists of three major systems: speech understanding, dialog management and spoken language generation. In such a conversational interface, speech recognition as the front-end of speech understanding remains to be one of the fundamental challenges for establishing robust and effective human/computer communications. On the one hand, the speech recognition component in a conversational interface lives in a rich system environment. Diverse sources of knowledge are available and can potentially be beneficial to its robustness and accuracy. For example, the natural language understanding component can provide linguistic knowledge in syntax and semantics that helps constrain the recognition search space. On the other hand, the speech recognition component also faces the challenge of spontaneous speech, and it is important to address the casualness of speech using the knowledge sources available. For example, sub-lexical linguistic information would be very useful in providing linguistic support for previously unseen words, and dynamic reliability modeling may help improve recognition robustness for poorly articulated speech.
Integration of Diverse Recognition Methodologies Through Reevaluation of N-Best Sentence Hypotheses
"... This paper describes a general formalism for integrating two or more speech recognition technologies, which could be developed at different research sites using different recognition strategies. In this formalism, one system uses the N-best search strategy to generate a list of candidate sentences; ..."
Abstract
- Add to MetaCart
This paper describes a general formalism for integrating two or more speech recognition technologies, which could be developed at different research sites using different recognition strategies. In this formalism, one system uses the N-best search strategy to generate a list of candidate sentences; the list is rescorred by other systems; and the different scores axe combined to optimize performance. Specifically, we report on combining the BU system based on stochastic segment models and the BBN system based on hidden Markov models. In addition to facilitating integration of different systems, the N-best approach results in a large reduction in computation for word recognition using the stochastic segment model
Chairperson, Departmental Committee on Graduate Students2Human-Machine Collaboration for Rapid Speech Transcription
, 2007
"... Inexpensive storage and sensor technologies are yielding a new generation of massive multimedia datasets. The exponential growth in storage and processing power makes it possible to collect more data than ever before, yet without appropriate content annotation for search and analysis such corpora ar ..."
Abstract
- Add to MetaCart
Inexpensive storage and sensor technologies are yielding a new generation of massive multimedia datasets. The exponential growth in storage and processing power makes it possible to collect more data than ever before, yet without appropriate content annotation for search and analysis such corpora are of little use. While advances in data mining and machine learning have helped to automate some types of analysis, the need for human annotation still exists and remains expensive. The Human Speechome Project is a heavily data-driven longitudinal study of language acquisition. More than 100,000 hours of audio and video recordings have been collected over a two year period to trace one child’s language development at home. A critical first step in analyzing this corpus is to obtain high quality transcripts of all speech heard and produced by the child. Unfortunately, automatic speech transcription has proven to be inadequate for these recordings, and manual transcription with existing tools is extremely labor intensive and therefore expensive. A new human-machine collaborative system for rapid speech transcription has been developed

