Results 1 - 10
of
11
Learning String Edit Distance
, 1997
"... In many applications, it is necessary to determine the similarity of two strings. A widely-used notion of string similarity is the edit distance: the minimum number of insertions, deletions, and substitutions required to transform one string into the other. In this report, we provide a stochastic mo ..."
Abstract
-
Cited by 158 (2 self)
- Add to MetaCart
In many applications, it is necessary to determine the similarity of two strings. A widely-used notion of string similarity is the edit distance: the minimum number of insertions, deletions, and substitutions required to transform one string into the other. In this report, we provide a stochastic model for string edit distance. Our stochastic model allows us to learn a string edit distance function from a corpus of examples. We illustrate the utility of our approach by applying it to the difficult problem of learning the pronunciation of words in conversational speech. In this application, we learn a string edit distance with nearly one fifth the error rate of the untrained Levenshtein distance. Our approach is applicable to any string classification problem that may be solved using a similarity function against a database of labeled prototypes.
The Use of Context in Large Vocabulary Speech Recognition
, 1995
"... decide which contexts are similar and can share parameters. A key feature of this approach is that it allows the construction of models which are dependent upon contextual effects occurring across word boundaries. The use of cross word context dependent models presents problems for conventional dec ..."
Abstract
-
Cited by 93 (0 self)
- Add to MetaCart
decide which contexts are similar and can share parameters. A key feature of this approach is that it allows the construction of models which are dependent upon contextual effects occurring across word boundaries. The use of cross word context dependent models presents problems for conventional decoders. The second part of the thesis therefore presents a new decoder design which is capable of using these models efficiently. The decoder is suitable for use with very large vocabularies and long span language models. It is also capable of generating a lattice of word hypotheses with little computational overhead. These lattices can be used to constrain further decoding, allowing efficient use of complex acoustic and language models. The effectiveness of these techniques has been assessed on a variety of large vocabulary continuous speech recognition tasks and results are presented which analyse performance in terms of computational complexity and recognition accuracy. The experiments dem
Error-responsive feedback mechanisms for speech recognizers
, 1997
"... This thesis is about modeling, analyzing, and predicting errorful behavior in large vocabulary continuous speech recognition systems. Because today's state-of-the-art recognizers are not designed to be situated naturally in an error feedback loop, they are ill-positioned for inclusion in multi-modal ..."
Abstract
-
Cited by 37 (4 self)
- Add to MetaCart
This thesis is about modeling, analyzing, and predicting errorful behavior in large vocabulary continuous speech recognition systems. Because today's state-of-the-art recognizers are not designed to be situated naturally in an error feedback loop, they are ill-positioned for inclusion in multi-modal interfaces, multi-media databases, and other interesting applications. I make improvements to the current approach to predicting and analyzing error behaviors, which is currently based only on the measurement ofword error rate. The speech recognizer's functionality is extended to include con dence annotations, which are \meta-level " markings that indicate how certain the recognizer is that it has decoded its input correctly. This is accomplished by feeding externally de ned error conditions back to the recognizer. Error feedback enables the construction of statistical models that map measurements of the recognizer's internal states and behaviors to externally de ned error conditions.
Automatic Generation Of Detailed Pronunciation Lexicons
, 1995
"... We explore different ways of "spelling" a word in a speech recognizer's lexicon and how to obtain those spellings. In particular, we compare using as the source of sub-words units for which we build acoustic models (1) a coarse phonemic representation, (2) a single, fine phonetic realization, and (3 ..."
Abstract
-
Cited by 36 (3 self)
- Add to MetaCart
We explore different ways of "spelling" a word in a speech recognizer's lexicon and how to obtain those spellings. In particular, we compare using as the source of sub-words units for which we build acoustic models (1) a coarse phonemic representation, (2) a single, fine phonetic realization, and (3) multiple phonetic realizations with associated likelihoods. We describe how we obtain these different pronunciations from text-to-speech systems and from procedures that build decision trees trained on phonetically-labeled corpora. We evaluate these methods applied to speech recognition with the DARPA Resource Management (RM) and the North American Business News (NAB) tasks. For the RM task (with perplexity 60 grammar), we obtain 93.4% word accuracy using phonemic pronunciations, 94.1% using a single phonetic pronunciation per word, and 96.3% using multiple phonetic pronunciations per word with associated likelihoods. For the NAB task (with 60K vocabulary and 34M 1-5 grams), we obtain 87.3% word accuracy with phonemic pronunciations and 90.0% using multiple phonetic pronunciations
Dynamic Programming Search for Continuous Speech Recognition
, 1999
"... . Initially introduced in the late 1960s and early 1970s, dynamic programming algorithms have become increasingly popular in automatic speech recognition. There are two reasons why this has occurred: First, the dynamic programming strategy can be combined with avery e#cient and practical pruning str ..."
Abstract
-
Cited by 30 (0 self)
- Add to MetaCart
. Initially introduced in the late 1960s and early 1970s, dynamic programming algorithms have become increasingly popular in automatic speech recognition. There are two reasons why this has occurred: First, the dynamic programming strategy can be combined with avery e#cient and practical pruning strategy so that very large search spaces can be handled. Second, the dynamic programming strategy has turned out to be extremely #exible in adapting to new requirements. Examples of such requirements are the lexical tree organization of the pronunciation lexicon and the generation of a word graph instead of the single best sentence. In this paper, we attempt to systematically review the use of dynamic programming search strategies for small#vocabulary and large#vocabulary continuous speech recognition. The following methods are described in detail: search using a linear lexicon, search using a lexical tree, language-model look-ahead and word graph generation. 1 Introduction Search strategie...
Towards Multi-Domain Speech Understanding with Flexible and Dynamic Vocabulary
, 2001
"... In developing telephone-based conversational systems, we foresee future systems capable of supporting multiple domains and flexible vocabulary. Users can pursue several topics of interest within a single telephone call, and the system is able to switch transparently among domains within a single dia ..."
Abstract
-
Cited by 14 (3 self)
- Add to MetaCart
In developing telephone-based conversational systems, we foresee future systems capable of supporting multiple domains and flexible vocabulary. Users can pursue several topics of interest within a single telephone call, and the system is able to switch transparently among domains within a single dialog. This system is able to detect the presence of any out-of-vocabulary (OOV) words, and automatically hypothesizes each of their pronunciation, spelling and meaning. These can be confirmed with the user and the new words are subsequently incorporated into the recognizer lexicon for future use. This thesis
On the Determinization of Weighted Finite Automata
- SIAM J. Comput
, 1998
"... . We study determinization of weighted finite-state automata (WFAs), which has important applications in automatic speech recognition (ASR). We provide the first polynomial-time algorithm to test for the twins property, which determines if a WFA admits a deterministic equivalent. We also provide ..."
Abstract
-
Cited by 13 (0 self)
- Add to MetaCart
. We study determinization of weighted finite-state automata (WFAs), which has important applications in automatic speech recognition (ASR). We provide the first polynomial-time algorithm to test for the twins property, which determines if a WFA admits a deterministic equivalent. We also provide a rigorous analysis of a determinization algorithm of Mohri, with tight bounds for acyclic WFAs. Given that WFAs can expand exponentially when determinized, we explore why those used in ASR tend to shrink. The folklore explanation is that ASR WFAs have an acyclic, multi-partite structure. We show, however, that there exist such WFAs that always incur exponential expansion when determinized. We then introduce a class of WFAs, also with this structure, whose expansion depends on the weights: some weightings cause them to shrink, while others, including random weightings, cause them to expand exponentially. We provide experimental evidence that ASR WFAs exhibit this weight dependence. ...
Large Vocabulary Continuous Speech Recognition: from Laboratory Systems towards Real-World Applications
, 1996
"... This paper provides an overview of the state-of-the-art in laboratory speaker-independent, large vocabulary continuous speech recognition (LVCSR) systems with a view towards adapting such technology to the requirements of real-world applications. While in speech recognition the principal concern is ..."
Abstract
-
Cited by 6 (4 self)
- Add to MetaCart
This paper provides an overview of the state-of-the-art in laboratory speaker-independent, large vocabulary continuous speech recognition (LVCSR) systems with a view towards adapting such technology to the requirements of real-world applications. While in speech recognition the principal concern is to transcribe the speech signal as a sequence of words, the same core technology can be applied to domains other than dictation. The main topics addressed are acoustic-phonetic modeling, lexical representation, language modeling, decoding and model adaptation. After a brief summary of experimental results some directions towards usable systems are given. In moving from laboratory systems towards real-world applications, different constraints arise which influence the system design. The application imposes limitations on computational resources, constraints on signal capture, requirements for noise and channel compensation, and rejection capability. The difficulties and costs of adapting existing technology to new languages and application need to be assessed. Near term applications for LVCSR technology are likely to grow in somewhat limited domains such as spoken language systems for information retrieval, and limited domain dictation. Perspectives on some unresolved problems are given, indicating areas for future research
A Surficial Pronunciation Model
- In: Proc. of the ESCA Workshop ‘Modeling Pronunciation Variation for Automatic Speech Recognition’ (see [87
, 1998
"... We argue for a surficial pronunciation model: a model without underlying forms. The surficial model outperforms a traditional generative model by a significant margin on conversational speech (Switchboard) as well as on read speech (TIMIT). Our results suggest that the true mapping from underlying f ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
We argue for a surficial pronunciation model: a model without underlying forms. The surficial model outperforms a traditional generative model by a significant margin on conversational speech (Switchboard) as well as on read speech (TIMIT). Our results suggest that the true mapping from underlying forms to surface forms is too complex to be accurately modeled using current techniques, and that we would be best served to model the surface forms directly.
Shrinking Language Models by Robust Approximation
- in Proc. IEEE Int'l. Conf. on Acoustics, Speech, and Signal Processing '98
, 1998
"... We study the problem of reducing the size of a language model while preserving recognition performance (accuracy and speed). A successful approach has been to represent language models by weighted finite-state automata (WFAs). Analogues of classical automata determinization and minimization algorith ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
We study the problem of reducing the size of a language model while preserving recognition performance (accuracy and speed). A successful approach has been to represent language models by weighted finite-state automata (WFAs). Analogues of classical automata determinization and minimization algorithms then provide a general method to produce smaller but equivalent WFAs. We extend this approach by introducing the notion of approximate determinization. We provide an algorithm that, when applied to language models for the North American Business task, achieves 25--35% size reduction compared to previous techniques, with negligible effects on recognition time and accuracy. 1. INTRODUCTION An important goal of language model engineering is to produce small language models that guarantee fast and accurate automatic speech recognition (ASR). In practice we see tradeoffs: e.g., in size vs. accuracy and in accuracy vs. speed. There has been recent progress, however, on automatic methods for r...

