Results 1 - 10
of
12
Unlimited vocabulary speech recognition based on morphs discovered in an unsupervised manner
- in Proc. Eurospeech
, 2003
"... We study continuous speech recognition based on sub-word units found in an unsupervised fashion. For agglutinative languages like Finnish, traditional word-based n-gram language modeling does not work well due to the huge number of different word forms. We use a method based on the Minimum Descripti ..."
Abstract
-
Cited by 43 (20 self)
- Add to MetaCart
We study continuous speech recognition based on sub-word units found in an unsupervised fashion. For agglutinative languages like Finnish, traditional word-based n-gram language modeling does not work well due to the huge number of different word forms. We use a method based on the Minimum Description Length principle to split words statistically into subword units allowing efficient language modeling and unlimited vocabulary. The perplexity and speech recognition experiments on Finnish speech data show that the resulting model outperforms both word and syllable based trigram models. Compared to the word trigram model, the out-of-vocabulary rate is reduced from 20 % to 0 % and the word error rate from 56 % to 32%. 1.
Uncertainty decoding for noise robust speech recognition
- in Proc. Interspeech
, 2004
"... This dissertation is the result of my own work and includes nothing which is the outcome of work done in collaboration. It has not been submitted in whole or in part for a degree at any other university. Some of the work has been published previously in conference proceedings ..."
Abstract
-
Cited by 26 (8 self)
- Add to MetaCart
This dissertation is the result of my own work and includes nothing which is the outcome of work done in collaboration. It has not been submitted in whole or in part for a degree at any other university. Some of the work has been published previously in conference proceedings
Incremental Language Models For Speech Recognition Using Finite-State Transducers
- in Proc. IEEE Automatic Speech Recogntion and Understanding Workshop, Madonna di Campiglio
, 2001
"... to speech recognition, we investigate a novel decoding strategy to deal with very large n-gram language models often used in large-vocabulary systems. In particular, we present an alternative to full, static expansion and optimization of the finite-state transducer network. This alternative is usefu ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
to speech recognition, we investigate a novel decoding strategy to deal with very large n-gram language models often used in large-vocabulary systems. In particular, we present an alternative to full, static expansion and optimization of the finite-state transducer network. This alternative is useful when the individual knowledge sources, modeled as transducers, are too large to be composed and optimized. While the recognition decoder perceives a single, weighted finitestate transducer, we apply a divide-and-conquer technique to split the language model into two parts which add up exactly to the original language model. We investigate the merits of these `incremental language models' and present some initial results.
Statistical Modelling in Continuous Speech Recognition (CSR)
- IN CONFERENCE ON UNCERTAINTY IN ARTIFICIAL INTELLIGENCE
, 2001
"... Automatic continuous speech recognition (CSR) is sufficiently ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
Automatic continuous speech recognition (CSR) is sufficiently
A Comparison Of Two LVR Search Optimization Techniques
- in Proc. Int. Conf. Spoken Language Processing
, 2002
"... This paper presents a detailed comparison between two search optimization techniques for large vocabulary speech recognition -- one based on word-conditioned tree search (WCTS) and one based on weighted finite-state transducers (WFSTs). Existing North American Business News systems from RWTH and AT& ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
This paper presents a detailed comparison between two search optimization techniques for large vocabulary speech recognition -- one based on word-conditioned tree search (WCTS) and one based on weighted finite-state transducers (WFSTs). Existing North American Business News systems from RWTH and AT&T representing each of the two approaches, were modified to remove variations in model data and acoustic likelihood computation. An experimental comparison showed that the WFST-based system explored fewer search states and had less runtime overhead than the WCTS-based system for a given word error rate. This is attributed to differences in the pre-compilation, degree of non-determinism, and path weight distribution in the respective search graphs.
Towards Formal Structural Representation of Spoken Language: An Evolving Transformation System (ETS) Approach
, 2005
"... Speech recognition has been a very active area of research over the past twenty years. Despite an evident progress, it is generally agreed by the practitioners of the field that performance of the current speech recognition systems is rather suboptimal and new ap-proaches are needed. The motivation ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Speech recognition has been a very active area of research over the past twenty years. Despite an evident progress, it is generally agreed by the practitioners of the field that performance of the current speech recognition systems is rather suboptimal and new ap-proaches are needed. The motivation behind the undertaken research is an observation that the notion of representation of objects and concepts that once was considered to be central in the early days of pattern recognition, has been largely marginalised by the ad-vent of statistical approaches. As a consequence of a predominantly statistical approach to speech recognition problem, due to the numeric, feature vector-based, nature of rep-resentation, the classes inductively discovered from real data using decision-theoretic techniques have little meaning outside the statistical framework. This is because deci-sion surfaces or probability distributions are difficult to analyse linguistically. Because of the later limitation it is doubtful that the gap between speech recognition and lin-guistic research can be bridged by the numeric representations. This thesis investigates an alternative, structural, approach to spoken language representation and categorisa-
An efficient one-pass decoder for Finnish large vocabulary continuous speech recognition
- in Proceedings of Second Baltic Conference on Human Language Technologies, 2005
"... This paper describes a design of a one-pass large vocabulary continuous speech recognition decoder aimed for Finnish. The decoder is based on the popular time-synchronous beam search approach, extended to handle some language dependent issues. For the construction of the static part of the search ne ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
This paper describes a design of a one-pass large vocabulary continuous speech recognition decoder aimed for Finnish. The decoder is based on the popular time-synchronous beam search approach, extended to handle some language dependent issues. For the construction of the static part of the search network a new algorithm is presented, which enables efficient use of shared state HMMs. The search network also includes statically expanded cross-word triphone contexts, which increase the memory requirements only modestly. The beam search is enhanced with a bigram language model look-ahead technique, implemented using simple table lookups and an efficient caching scheme. Compared to our previous decoder, the new design achieves 24 % relative reduction to phoneme error rate, with a near real-time performance.
Juicer: A Weighted Finite-State Transducer speech decoder
"... Abstract. A major component in the development of any speech recognition system is the decoder. As task complexities and, consequently, system complexities have continued to increase the decoding problem has become an increasingly significant component in the overall speech recognition system develo ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
Abstract. A major component in the development of any speech recognition system is the decoder. As task complexities and, consequently, system complexities have continued to increase the decoding problem has become an increasingly significant component in the overall speech recognition system development effort, with efficient decoder design contributing to significantly improve the trade-off between decoding time and search errors. In this paper we present the“Juicer”(from transducer) large vocabulary continuous speech recognition (LVCSR) decoder based on weighted finite-State transducer (WFST). We begin with a discussion of the need for open source, state-of-the-art decoding software in LVCSR research and how this lead to the development of Juicer, followed by a brief overview of decoding techniques and major issues in decoder design. We present Juicer and its major features, emphasising its potential not only as a critical component in the development of LVCSR systems, but also as an important research tool in itself, being based around the flexible WFST paradigm. We also provide results of benchmarking tests that have been carried out to date, demonstrating that in many respects Juicer, while still in its early development, is already achieving stateof-the-art. These benchmarking tests serve to not only demonstrate the utility of Juicer in its present state, but are also being used to guide future development, hence, we conclude with a brief discussion of some of the extensions that are currently under way or being considered for Juicer. 1
Decoder Issues in Unlimited Finnish Speech Recognition
, 2004
"... In contrast to continuous speech recognition systems which utilize a fixed vocabulary to limit the search, practically unlimited vocabulary recognition can be achieved by constructing the recognition result from sub-word units. This paper discusses some important things to consider in subword based ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
In contrast to continuous speech recognition systems which utilize a fixed vocabulary to limit the search, practically unlimited vocabulary recognition can be achieved by constructing the recognition result from sub-word units. This paper discusses some important things to consider in subword based decoders, especially when recognizing languages with heavy use of inflections and compound words. Also, a decoder design implemented to achieve unlimited vocabulary Finnish speech recognition is described.
unknown title
, 2006
"... lattice search technique for a long-contextual-span hidden trajectory model of speech q ..."
Abstract
- Add to MetaCart
lattice search technique for a long-contextual-span hidden trajectory model of speech q

