Results 1 - 10
of
12
Openfst: a general and efficient weighted finite-state transducer library
- in Proceedings of the Ninth International Conference on Implementation and Application of Automata, (CIAA 2007
, 2007
"... Abstract. We describe OpenFst, an open-source library for weighted finite-state transducers (WFSTs). OpenFst consists of a C++ template library with efficient WFST representations and over twenty-five operations for constructing, combining, optimizing, and searching them. At the shell-command level, ..."
Abstract
-
Cited by 36 (4 self)
- Add to MetaCart
Abstract. We describe OpenFst, an open-source library for weighted finite-state transducers (WFSTs). OpenFst consists of a C++ template library with efficient WFST representations and over twenty-five operations for constructing, combining, optimizing, and searching them. At the shell-command level, there are corresponding transducer file representations and programs that operate on them. OpenFst is designed to be both very efficient in time and space and to scale to very large problems. This library has key applications speech, image, and natural language processing, pattern and string matching, and machine learning. We give an overview of the library, examples of its use, details of its design that allow customizing the labels, states, and weights and the lazy evaluation of many of its operations. Further information and a download of the OpenFst library can be obtained from
A multi-pass, dynamic-vocabulary approach to real-time, large-vocabulary speech recognition
- in Proc. of INTERSPEECH
, 2005
"... We present a multi-pass approach to real-time, largevocabulary speech recognition in which we dynamically manipulate the vocabulary between passes. For recognition tasks where subsets of the vocabulary can be triggered by the occurences of other words or phrases, a combination of unknown word modell ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
We present a multi-pass approach to real-time, largevocabulary speech recognition in which we dynamically manipulate the vocabulary between passes. For recognition tasks where subsets of the vocabulary can be triggered by the occurences of other words or phrases, a combination of unknown word modelling and vocabulary refinement can be utilized to attack large-vocabulary tasks with relatively small active vocabularies. We evaluate this approach within the JUPITER weather information domain by enabling recognition of all 30,000 citystate pairs within the USA. By maximally precompiling the static and dynamic portions of our search space using finitestate transducers (FSTs), we splice dynamic-vocabulary components on-demand during decoding with negligible speed impact while enforcing cross-word context-dependent constraints. We find that a dynamic-vocabulary system can compete quite favorably with a single-pass, large-vocabulary system. For even larger vocabularies (e.g., street addresses), static compilation may be infeasible, making a dynamic-vocabulary approach necessary. 1.
Discriminative training of hierarchical acoustic models for large vocabulary continuous speech recognition
"... In this paper we propose discriminative training of hierarchical acoustic models for large vocabulary continuous speech recognition tasks. After presenting our hierarchical modeling framework, we describe how the models can be generated with either Minimum Classification Error or large-margin traini ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
In this paper we propose discriminative training of hierarchical acoustic models for large vocabulary continuous speech recognition tasks. After presenting our hierarchical modeling framework, we describe how the models can be generated with either Minimum Classification Error or large-margin training. Experiments on a large vocabulary lecture transcription task show that the hierarchical model can yield more than 1.0 % absolute word error rate reduction over non-hierarchical models for both kinds of discriminative training. Index Terms — hierarchical acoustic modeling, discriminative training, LVCSR 1.
Festival Multisyn Voices for the 2007 Blizzard Challenge
"... This paper describes selected aspects of the Festival Multisyn entry to the Blizzard Challenge 2007. We provide an overview of the process of building the three required voices from the speech data provided. This paper focuses on new features of Multisyn which are currently under development and whi ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
This paper describes selected aspects of the Festival Multisyn entry to the Blizzard Challenge 2007. We provide an overview of the process of building the three required voices from the speech data provided. This paper focuses on new features of Multisyn which are currently under development and which have been employed in the system used for this Blizzard Challenge. These differences are the application of a more flexible phonetic lattice representation during forced alignment labelling and the use of a pitch accent target cost component. Finally, we also examine aspects of the speech data provided for this year’s Blizzard Challenge and raise certain issues for discussion concerning the aim of comparing voices made with differing subsets of the data provided. 1.
Automatic lexical pronunciations generation and update
- in Proc. of ASRU, Kyoto
, 2007
"... Most automatic speech recognizers use a dictionary that maps words to one or more canonical pronunciations. Such entries are typically hand-written by lexical experts. In this research, we investigate a new approach for automatically generating lexical pronunciations using a linguistically motivated ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Most automatic speech recognizers use a dictionary that maps words to one or more canonical pronunciations. Such entries are typically hand-written by lexical experts. In this research, we investigate a new approach for automatically generating lexical pronunciations using a linguistically motivated subword model, and refining the pronunciations with spoken examples. The approach is evaluated on an isolated word recognition task with a 2k lexicon of restaurant and street names. A letter-to-sound model is first used to generate seed baseforms for the lexicon. Then spoken utterances of words in the lexicon are presented to a subword recognizer and the top hypotheses are used to update the lexical baseforms. The spelling of each word is also used to constrain the subword search space and generate spelling-constrained baseforms. The results obtained are quite encouraging and indicate that our approach can be successfully used to learn valid pronunciations of new words. Index Terms — Letter-to-sound model, lexical pronunciations 1.
A Back-off Discriminative Acoustic Model for Automatic Speech Recognition
"... In this paper we propose a back-off discriminative acoustic model for Automatic Speech Recognition (ASR). We use a set of broad phonetic classes to divide the classification problem originating from context-dependent modeling into a set of subproblems. By appropriately combining the scores from clas ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
In this paper we propose a back-off discriminative acoustic model for Automatic Speech Recognition (ASR). We use a set of broad phonetic classes to divide the classification problem originating from context-dependent modeling into a set of subproblems. By appropriately combining the scores from classifiers designed for the sub-problems, we can guarantee that the back-off acoustic score for different context-dependent units will be different. The back-off model can be combined with discriminative training algorithms to further improve the performance. Experimental results on a large vocabulary lecture transcription task show that the proposed back-off discriminative acoustic model has more than a 2.0 % absolute word error rate reduction compared to clustering-based acoustic model. Index Terms: context-dependent acoustic modeling, back-off acoustic models, discriminative training,
Bibliographic Meta-Data Extraction Using Probabilistic Finite State Transducers
"... We present the application of probabilistic finite state transducers to the task of bibliographic meta-data extraction from scientific references. By using the transducer approach, which is often applied successfully in computational linguistics, we obtain a trainable and modular framework. This res ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
We present the application of probabilistic finite state transducers to the task of bibliographic meta-data extraction from scientific references. By using the transducer approach, which is often applied successfully in computational linguistics, we obtain a trainable and modular framework. This results in simplicity, flexibility, and easy adaptability to changing requirements. An evaluation on the Cora dataset that serves as a common benchmark for accuracy measurements yields a word accuracy of 88.5%, a field accuracy of 82.6%, and an instance accuracy of 42.7%. Based on a comparison to other published results, we conclude that our system performs second best on the given data set using a conceptually simple approach and implementation. 1
INTERSPEECH 2007 New Word Acquisition Using Subword Modeling
"... In this paper, we use subword modeling to learn the pronunciations and spellings of new words. The subwords are generated with a context-free grammar, and are intermediate units between phonemes and syllables. We first evaluate the effectiveness of the subword model in automatically generating the s ..."
Abstract
- Add to MetaCart
In this paper, we use subword modeling to learn the pronunciations and spellings of new words. The subwords are generated with a context-free grammar, and are intermediate units between phonemes and syllables. We first evaluate the effectiveness of the subword model in automatically generating the spelling and pronunciation of new words. Then the subword model is embedded in a multi-stage recognizer which consists of word, subword, and letter recognizers. In a preliminary set of experiments, the hybrid system outperforms a large-vocabulary isolated word recognizer. The subword model is also used to improve the performance of the letter recognizer by generating a spelling cohort which is used to train a small letter n-gram. The small letter n-gram has a reduced perplexity compared to a much larger n-gram, and can be used by the letter recognizer for the spoken spelling mode. This could translate to an improved letter error rate in future letter recognition experiments. Index Terms: subword modeling, new word acquisition 1.
INTERSPEECH 2010 Learning New Word Pronunciations from Spoken Examples
"... A lexicon containing explicit mappings between words and pronunciations is an integral part of most automatic speech recognizers (ASRs). While many ASR components can be trained or adapted using data, the lexicon is one of the few that typically remains static until experts make manual changes. This ..."
Abstract
- Add to MetaCart
A lexicon containing explicit mappings between words and pronunciations is an integral part of most automatic speech recognizers (ASRs). While many ASR components can be trained or adapted using data, the lexicon is one of the few that typically remains static until experts make manual changes. This work takes a step towards alleviating the need for manual intervention by integrating a popular grapheme-to-phoneme conversion technique with acoustic examples to automatically learn highquality baseform pronunciations for unknown words. We explore two models in a Bayesian framework, and discuss their individual advantages and shortcomings. We show that both are able to generate better-than-expert pronunciations with respect to word error rate on an isolated word recognition task. Index Terms: grapheme-to-phoneme conversion, pronunciation models, lexical representation
INTERSPEECH 2011 Pronunciation Learning from Continuous Speech
"... This paper explores the use of continuous speech data to learn stochastic lexicons. Building on previous work in which we augmented graphones with acoustic examples of isolated words, we extend our pronunciation mixture model framework to two domains containing spontaneous speech: a weather informat ..."
Abstract
- Add to MetaCart
This paper explores the use of continuous speech data to learn stochastic lexicons. Building on previous work in which we augmented graphones with acoustic examples of isolated words, we extend our pronunciation mixture model framework to two domains containing spontaneous speech: a weather information retrieval spoken dialogue system and the academic lectures domain. We find that our learned lexicons out-perform expert, hand-crafted lexicons in each domain. Index Terms: grapheme-to-phoneme conversion, pronunciation models, lexical representation

