Results 1 - 10
of
10
Openfst: a general and efficient weighted finite-state transducer library
- in Proceedings of the Ninth International Conference on Implementation and Application of Automata, (CIAA 2007
, 2007
"... Abstract. We describe OpenFst, an open-source library for weighted finite-state transducers (WFSTs). OpenFst consists of a C++ template library with efficient WFST representations and over twenty-five operations for constructing, combining, optimizing, and searching them. At the shell-command level, ..."
Abstract
-
Cited by 36 (4 self)
- Add to MetaCart
Abstract. We describe OpenFst, an open-source library for weighted finite-state transducers (WFSTs). OpenFst consists of a C++ template library with efficient WFST representations and over twenty-five operations for constructing, combining, optimizing, and searching them. At the shell-command level, there are corresponding transducer file representations and programs that operate on them. OpenFst is designed to be both very efficient in time and space and to scale to very large problems. This library has key applications speech, image, and natural language processing, pattern and string matching, and machine learning. We give an overview of the library, examples of its use, details of its design that allow customizing the labels, states, and weights and the lazy evaluation of many of its operations. Further information and a download of the OpenFst library can be obtained from
On the integration of speech recognition and statistical machine translation
- Proc. European Conf. on Speech Communication and Technology
, 2005
"... This paper focuses on the interface between speech recognition and machine translation in a speech translation system. Based on a thorough theoretical framework, we exploit word lattices of automatic speech recognition hypotheses as input to our translation system which is based on weighted finite-s ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
This paper focuses on the interface between speech recognition and machine translation in a speech translation system. Based on a thorough theoretical framework, we exploit word lattices of automatic speech recognition hypotheses as input to our translation system which is based on weighted finite-state transducers. We show that acoustic recognition scores of the recognized words in the lattices positively and significantly affect the translation quality. In experiments, we have found consistent improvements on three different corpora in comparison with translations of single best recognized results. In addition we build and evaluate a fully integrated speech translation model. 1.
Integrated Chinese Word Segmentation in Statistical Machine Translation
"... A Chinese sentence is represented as a sequence of characters, and words are not separated from each other. In statistical machine translation, the conventional approach is to segment the Chinese character sequence into words during the pre-processing. The training and translation are performed afte ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
A Chinese sentence is represented as a sequence of characters, and words are not separated from each other. In statistical machine translation, the conventional approach is to segment the Chinese character sequence into words during the pre-processing. The training and translation are performed afterwards. However, this method is not optimal for two reasons: 1. The segmentations may be erroneous. 2. For a given character sequence, the best segmentation depends on its context and translation. In order to minimize the translation errors, we take different segmentation alternatives instead of a single segmentation into account and integrate the segmentation process with the search for the best translation. The segmentation decision is only taken during the generation of the translation. With this method we are able to translate Chinese text at the character level. The experiments on the IWSLT 2005 task showed improvements in the translation performance using two translation systems: a phrase-based system and a finite state transducer based system. For the phrase-based system, the improvement of the BLEU score is 1.5 % absolute. 1.
Modified mpe/mmi in a transducer-based framework
- in Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing
, 2009
"... In this paper we show how common training criteria like for example MPE or MMI can be extended to incorporate a margin term. In addition, a transducer-based training implementation is presented, which covers a large variety of discriminative training criteria for ASR, including the standard MMI, MPE ..."
Abstract
-
Cited by 4 (4 self)
- Add to MetaCart
In this paper we show how common training criteria like for example MPE or MMI can be extended to incorporate a margin term. In addition, a transducer-based training implementation is presented, which covers a large variety of discriminative training criteria for ASR, including the standard MMI, MPE, and MCE criteria, as well as the modifications to these criteria presented here. The modified criteria are directly related with the conventional large margin formulation of SVMs. In the proposed approach, we can take advantage of the generalization guarantees of large margin classifiers while keeping the existing framework for the discriminative training, including the efficient algorithms for conventional MPE or MMI. On the conceptual side, this allows for a direct evaluation of the margin term. Finally, experimental results are presented for different large vocabulary continuous speech recognition tasks (one of which is trained on a very large amount of training data) using these modified criteria. Index Terms — training criteria, large margin, weighted finite state transducer, speech recognition 1.
Juicer: A Weighted Finite-State Transducer speech decoder
"... Abstract. A major component in the development of any speech recognition system is the decoder. As task complexities and, consequently, system complexities have continued to increase the decoding problem has become an increasingly significant component in the overall speech recognition system develo ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
Abstract. A major component in the development of any speech recognition system is the decoder. As task complexities and, consequently, system complexities have continued to increase the decoding problem has become an increasingly significant component in the overall speech recognition system development effort, with efficient decoder design contributing to significantly improve the trade-off between decoding time and search errors. In this paper we present the“Juicer”(from transducer) large vocabulary continuous speech recognition (LVCSR) decoder based on weighted finite-State transducer (WFST). We begin with a discussion of the need for open source, state-of-the-art decoding software in LVCSR research and how this lead to the development of Juicer, followed by a brief overview of decoding techniques and major issues in decoder design. We present Juicer and its major features, emphasising its potential not only as a critical component in the development of LVCSR systems, but also as an important research tool in itself, being based around the flexible WFST paradigm. We also provide results of benchmarking tests that have been carried out to date, demonstrating that in many respects Juicer, while still in its early development, is already achieving stateof-the-art. These benchmarking tests serve to not only demonstrate the utility of Juicer in its present state, but are also being used to guide future development, hence, we conclude with a brief discussion of some of the extensions that are currently under way or being considered for Juicer. 1
The RWTH Aachen University Open Source Speech Recognition System
"... We announce the public availability of the RWTH Aachen University speech recognition toolkit. The toolkit includes state of the art speech recognition technology for acoustic model training and decoding. Speaker adaptation, speaker adaptive training, unsupervised training, a finite state automata li ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
We announce the public availability of the RWTH Aachen University speech recognition toolkit. The toolkit includes state of the art speech recognition technology for acoustic model training and decoding. Speaker adaptation, speaker adaptive training, unsupervised training, a finite state automata library, and an efficient tree search decoder are notable components. Comprehensive documentation, example setups for training and recognition, and a tutorial are provided to support newcomers. Index Terms: speech recognition, LVCSR, software 1.
Automatic Sentence Structure Annotation for Spoken Language Processing
, 2008
"... Increasing amounts of easily available electronic data are precipitating a need for automatic processing
that can aid humans in digesting large amounts of data. Speech and video are becoming
an increasingly significant portion of on-line information, from news and television broadcasts, to
oral hist ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Increasing amounts of easily available electronic data are precipitating a need for automatic processing
that can aid humans in digesting large amounts of data. Speech and video are becoming
an increasingly significant portion of on-line information, from news and television broadcasts, to
oral histories, on-line lectures, or user generated content. Automatic processing of audio and video
sources requires automatic speech recognition (ASR) in order to provide transcripts. Typical ASR
generates only words, without punctuation, capitalization, or further structure. Many techniques
available from natural language processing therefore suffer when applied to speech recognition output,
because they assume the presence of reliable punctuation and structure. In addition, errors from
automatic transcription also degrade the performance of downstream processing such as machine
translation, name detection, or information retrieval. We develop approaches for automatically
annotating structure in speech, including sentence and sub-sentence segmentation, and then turn
towards optimizing ASR and annotation for downstream applications.
Compositions of Top-down Tree Transducers with "-rules
"... Abstract. Top-down tree transducers with "-rules ("tdtt) are a restricted version of extended top-down tree transducers. They are implemented in the framework Tiburon and ful ll some criteria desirable in a machine translation model. However, they compute a class of transformations that is not close ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Abstract. Top-down tree transducers with "-rules ("tdtt) are a restricted version of extended top-down tree transducers. They are implemented in the framework Tiburon and ful ll some criteria desirable in a machine translation model. However, they compute a class of transformations that is not closed under composition (not even for linear and nondeleting "tdtt). A composition construction that composes "tdtt M and N is presented. It is correct whenever (i) M has at most one output symbol in each rule, (ii) M is deterministic or N is linear, and (iii) M is total or N is nondeleting. This corresponds nicely to a classical composition result by Baker. 1
unknown title
, 2007
"... Strengths and weaknesses of finite-state technology: a case study in morphological grammar development ..."
Abstract
- Add to MetaCart
Strengths and weaknesses of finite-state technology: a case study in morphological grammar development
Signal Speech
"... ◮ Further development by several PhD students at i6 ◮ Today: standard system for all ASR research topics and projects ◮ Very flexible and extendable ◮ Framework also used for machine translation, video / image processing ..."
Abstract
- Add to MetaCart
◮ Further development by several PhD students at i6 ◮ Today: standard system for all ASR research topics and projects ◮ Very flexible and extendable ◮ Framework also used for machine translation, video / image processing

