Results 1 - 10
of
15
COMBINING KNOWLEDGE SOURCES TO REORDER N-BEST SPEECH HYPOTHESIS LISTS
, 1994
"... A simple and general method is described that can combine different knowledge sources to reorder N-best lists of hypothe-ses produced by a speech recognizer. The method is automat-ically trainable, acquiring information from both positive and negative examples. In experiments, the method was tested ..."
Abstract
-
Cited by 40 (13 self)
- Add to MetaCart
A simple and general method is described that can combine different knowledge sources to reorder N-best lists of hypothe-ses produced by a speech recognizer. The method is automat-ically trainable, acquiring information from both positive and negative examples. In experiments, the method was tested on a 1000-utterance sample of unseen ATIS data.
Genones: Generalized Mixture Tying in Continuous Hidden Markov Model-Based Speech Recognizers
- IEEE Transactions on Speech and Audio Processing
, 1996
"... An algorithm is proposed that achieves a good trade-off between modeling resolution and robustness by using a new, general scheme for tying of mixture components in continuous mixture-density hidden Markov model (HMM)-based speech recognizers. The sets of HMM states that share the same mixture co ..."
Abstract
-
Cited by 36 (7 self)
- Add to MetaCart
An algorithm is proposed that achieves a good trade-off between modeling resolution and robustness by using a new, general scheme for tying of mixture components in continuous mixture-density hidden Markov model (HMM)-based speech recognizers. The sets of HMM states that share the same mixture components are determined automatically using agglomerative clustering techniques. Experimental results on ARPA's Wall-Street Journal corpus show that this scheme reduces errors by 25% over typical tied-mixture systems. New fast algorithms for computing Gaussian likelihoods--the most time-consuming aspect of continuous-density HMM systems--are also presented. These new algorithms significantly reduce the number of Gaussian densities that are evaluated with little or no impact on speech recognition accuracy. Corresponding Author: Vassilios Digalakis Address: Electronic and Computer Engineering Department Technical University of Crete, Kounoupidiana Chania, 73100 GREECE Phone: +30-821...
Hierarchical search for large vocabulary conversational speech recognition
- IEEE Signal Processing Magazine
, 1999
"... ABSTRACT 2 Speaker-independent speech recognition technology has made significant progress from the days of isolated word recognition. Today, state-of-the-art systems are capable of performing large vocabulary continuous speech recognition (LVCSR) on audio streams derived from complex information so ..."
Abstract
-
Cited by 15 (5 self)
- Add to MetaCart
ABSTRACT 2 Speaker-independent speech recognition technology has made significant progress from the days of isolated word recognition. Today, state-of-the-art systems are capable of performing large vocabulary continuous speech recognition (LVCSR) on audio streams derived from complex information sources such as broadcast news and two-way telephone dialogs. A significant contribution to this advancement in technology is the development of search techniques that find suboptimal but accurate solutions in problems involving large search spaces and extremely complex statistical models. Moreover, these search strategies are capable of dynamically integrating information from a number of diverse knowledge sources to determine the correct word hypothesis, and limit the scope of the search by using a hierarchical search strategy. We refer to this problem as the decoding or search problem. This paper describes the complexity associated with decoding using hierarchical representations for linguistic and acoustic knowledge sources. An extensible object-oriented decoder available in the public domain, that leverages current state-of-the-art technology is described to illustrate these concepts. This decoder supports efficient handling of acoustic models for cross-word contextdependent phones, multiple pronunciations of words using lexical trees, and rescoring of word graphs based on N-gram language models in a single pass. It employs a state-of-the-art Viterbistyle dynamic programming algorithm, and is equipped with several heuristic pruning criteria to minimize the consumption of computational resources while maintaining good accuracy.
Improving Language Models by Clustering Training Sentences
, 1994
"... Many of the kinds of language model used in speech understanding suffer from imperfect modeling of intra-sentential contextual influences. I argue that this problem can be addressed by clustering the sentences in a training corpus automatically into subcorpora on the criterion of entropy reduc ..."
Abstract
-
Cited by 13 (0 self)
- Add to MetaCart
Many of the kinds of language model used in speech understanding suffer from imperfect modeling of intra-sentential contextual influences. I argue that this problem can be addressed by clustering the sentences in a training corpus automatically into subcorpora on the criterion of entropy reduction, and calculating separate language model parameters for each cluster. This kind of clustering offers a way to represent impor- tant contextual effects and can therefore significantly improve the performance of a model. It also offers a reasonably automatic means to gather evidence on whether a more complex, context-sensitive model using the same general kind of linguistic information is likely to reward the effort that would be required to develop it: if clustering improves the performance of a model, this proves the existence of further context dependencies, not exploited by the unclustered model. As evidence for these claims, I present results showing that clustering improves some models but not others for the ATIS domain. These results are consistent with other findings for such models, suggesting that the existence or otherwise of an improvement brought about by clustering is indeed a good pointer to whether it is worth developing further the unclustered model.
Combining Linguistic with Statistical Methods in Automatic Speech Understanding
, 1994
"... this paper will argue, combining knowledge and techniques from the two communities can yield results that neither community alone could achieve ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
this paper will argue, combining knowledge and techniques from the two communities can yield results that neither community alone could achieve
Language-Processing Strategies and Mixed-Initiative Dialogues
, 1999
"... We describe an implemented spoken-language dialogue system for a travel-planning domain, which accesses a commercially available travelinformation web-server and supports a flexible mixed-initiative dialogue strategy. We argue, based on data from initial Wizard-of-Oz experiments, that mixed-in ..."
Abstract
-
Cited by 11 (2 self)
- Add to MetaCart
We describe an implemented spoken-language dialogue system for a travel-planning domain, which accesses a commercially available travelinformation web-server and supports a flexible mixed-initiative dialogue strategy. We argue, based on data from initial Wizard-of-Oz experiments, that mixed-initiative strategies are appropriate for many types of user, but require more sophisticated architectures for processing of language and dialogue; we then use these observations to motivate an architecture which combines parallel deep and shallow natural language analysis engines and an agenda-driven dialogue manager. We outline the top-level processing strategy used by the dialogue manager, and also a novel formalism, which we call Flat Utterance Description, that allows us to reduce the output of the deep and shallow languageprocessing engines to a common representation.
Spoken Language Translation With Mid-90's Technology: A Case Study
, 1993
"... We describe the architecture of the Spoken Language Translator (SLT), a prototype speech translation system which can translate queries from spoken English to spoken Swedish in the domain of air travel information systems. Though the performance given the level of effort so far has been extremely en ..."
Abstract
-
Cited by 10 (6 self)
- Add to MetaCart
We describe the architecture of the Spoken Language Translator (SLT), a prototype speech translation system which can translate queries from spoken English to spoken Swedish in the domain of air travel information systems. Though the performance given the level of effort so far has been extremely encouraging, more work is needed to provide a technology that will support widespread applications. With this goal, we have developed techniques for rapid development and for evaluation. These techniques allow us to estimate the level of effort required to achieve higher levels of performance.
Language Processing For Spoken Dialogue Systems: Is Shallow Parsing Enough?
- IN ESCA ETRW WORKSHOP ON ACCESSING INFORMATION IN SPOKEN AUDIO
, 1999
"... With maturing speech technology, spoken dialogue systems are increasingly moving from research prototypes to fielded systems. The fielded systems however generally employ much simpler linguistic and dialogue processing strategies than the research prototypes. We describe an implemented spoken-langua ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
With maturing speech technology, spoken dialogue systems are increasingly moving from research prototypes to fielded systems. The fielded systems however generally employ much simpler linguistic and dialogue processing strategies than the research prototypes. We describe an implemented spoken-language dialogue system for a travel planning domain which supports a mixed initiative dialogue strategy. The system accesses a commercially available travel information web-server. The system architecture combines both shallow and deep linguistic processors, partly so that a robust if shallow analysis is always available to the dialogue manager, and partly so that we can begin to examine where significant gains can be made by employing more advanced linguistic processing. We present the results of a preliminary investigation using data from a Wizard of Oz experiment. The results lend limited support to our original hypothesis that deep linguistic processing will prove useful at points where the ...
Efficient Scalable Encoding for Distributed Speech Recognition
- IEEE Transactions on Speech and Audio Processing, Submitted
, 2003
"... In this paper the remote speech recognition problem is addressed. Speech features are extracted at a client and transmitted to a remote recognizer. This enables a low complexity client, which does not have the computational and memory resources to host a complex speech recognizer, to make use of dis ..."
Abstract
-
Cited by 6 (3 self)
- Add to MetaCart
In this paper the remote speech recognition problem is addressed. Speech features are extracted at a client and transmitted to a remote recognizer. This enables a low complexity client, which does not have the computational and memory resources to host a complex speech recognizer, to make use of distributed resources to provide speech recognition services to the user. The novelties of the proposed work are (i) the extracted features are compressed using scalable encoding techniques providing a multi-resolution bitstream, (ii) a complete scalable distributed speech recognition (DSR) system is presented wherein the proposed scalable encoding technique is combined with a scalable recognition system. The scalable DSR system provides successive approximation in terms of recognition performance, (i.e., as additional bits are transmitted the recognition can be refined to improve the performance) and achieves both bandwidth and complexity (latency) reductions. The proposed encoding schemes are well suited to be implemented on light-weight mobile devices where varying ambient conditions and limited computational capabilities pose a severe constraint in achieving good recognition performance. The scalable DSR system is capable of adapting to the varying network, system and user constraints by operating at the "right" trade-off point between transmission rate, recognition performance and complexity to provide good quality of service (QoS) to the user. The system was tested using two case studies. In the first, the scalable encoder along with a dynamic time warping-hidden Markov model (DTW-HMM) system reduced the recognition complexity by 25% compared to a system using only a HMM, with no degradation in word error rate (WER). In the second study, a distributed two-...
Techniques for modelling Phonological Processes in Automatic Speech Recognition
, 2001
"... Declaration This dissertation is the result of my own work and includes nothing which is the outcome of work done in collaboration, except where stated. It has not been submitted in whole or part for a degree at any other university. The length of this thesis including footnotes and appendices does ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
Declaration This dissertation is the result of my own work and includes nothing which is the outcome of work done in collaboration, except where stated. It has not been submitted in whole or part for a degree at any other university. The length of this thesis including footnotes and appendices does not exceed 29,500 words and includes no more than 40 figures. 1 Systems which automatically transcribe carefully dictated speech are now commercially available, but their performance degrades dramatically when the speaking style of users becomes more relaxed or conversational. This dissertation focuses on techniques that aim to improve the robustness of statistical speech transcription systems to conversational speaking styles. The dissertation shows first that the performance degradation occuring as speech becomes more conversational is severe and is partially attributable to differences in the acoustic realizations of sentences. Hypothesizing that the quantifiably wider range of

