Results 1 - 10
of
72
Dynamic Conditional Random Fields: Factorized Probabilistic Models for Labeling and Segmenting Sequence Data
- IN ICML
, 2004
"... In sequence modeling, we often wish to represent complex interaction between labels, such as when performing multiple, cascaded labeling tasks on the same sequence, or when longrange dependencies exist. We present dynamic conditional random fields (DCRFs), a generalization of linear-chain cond ..."
Abstract
-
Cited by 88 (10 self)
- Add to MetaCart
In sequence modeling, we often wish to represent complex interaction between labels, such as when performing multiple, cascaded labeling tasks on the same sequence, or when longrange dependencies exist. We present dynamic conditional random fields (DCRFs), a generalization of linear-chain conditional random fields (CRFs) in which each time slice contains a set of state variables and edges---a distributed state representation as in dynamic Bayesian networks (DBNs)---and parameters are tied across slices. Since exact
Discriminative language modeling with conditional random fields and the perceptron algorithm
- In Proc. ACL
, 2004
"... This paper describes discriminative language modeling for a large vocabulary speech recognition task. We contrast two parameter estimation methods: the perceptron algorithm, and a method based on conditional random fields (CRFs). The models are encoded as deterministic weighted finite state automata ..."
Abstract
-
Cited by 41 (4 self)
- Add to MetaCart
This paper describes discriminative language modeling for a large vocabulary speech recognition task. We contrast two parameter estimation methods: the perceptron algorithm, and a method based on conditional random fields (CRFs). The models are encoded as deterministic weighted finite state automata, and are applied by intersecting the automata with word-lattices that are the output from a baseline recognizer. The perceptron algorithm has the benefit of automatically selecting a relatively small feature set in just a couple of passes over the training data. However, using the feature set output from the perceptron algorithm (initialized with their weights), CRF training provides an additional 0.5 % reduction in word error rate, for a total 1.8 % absolute reduction from the baseline of 39.2%. 1
Lattice-based search for spoken utterance retrieval
- In Proceedings of HLT-NAACL 2004
, 2004
"... Recent work on spoken document retrieval has suggested that it is adequate to take the singlebest output of ASR, and perform text retrieval on this output. This is reasonable enough for the task of retrieving broadcast news stories, where word error rates are relatively low, and the stories are long ..."
Abstract
-
Cited by 32 (8 self)
- Add to MetaCart
Recent work on spoken document retrieval has suggested that it is adequate to take the singlebest output of ASR, and perform text retrieval on this output. This is reasonable enough for the task of retrieving broadcast news stories, where word error rates are relatively low, and the stories are long enough to contain much redundancy. But it is patently not reasonable if one’s task is to retrieve a short snippet of speech in a domain where WER’s can be as high as 50%; such would be the situation with teleconference speech, where one’s task is to find if and when a participant uttered a certain phrase. In this paper we propose an indexing procedure for spoken utterance retrieval that works on lattices rather than just single-best text. We demonstrate that this procedure can improve F scores by over five points compared to singlebest retrieval on tasks with poor WER and low redundancy. The representation is flexible so that we can represent both word lattices, as well as phone lattices, the latter being important for improving performance when searching for phrases containing OOV words. 1
A Weighted Finite State Transducer Implementation of the Alignment Template Model for Statistical Machine Translation
, 2003
"... We present a derivation of the alignment template model for statistical machine translation and an implementation of the model using weighted finite state transducers. The approach we describe allows us to implement each constituent distribution of the model as a weighted finite state transduc ..."
Abstract
-
Cited by 26 (3 self)
- Add to MetaCart
We present a derivation of the alignment template model for statistical machine translation and an implementation of the model using weighted finite state transducers. The approach we describe allows us to implement each constituent distribution of the model as a weighted finite state transducer or acceptor. We show that bitext word alignment and translation under the model can be performed with standard FSM operations involving these transducers.
Composition of Conditional Random Fields for Transfer Learning
- PROCEEDINGS OF HLT/EMNLP
, 2005
"... Many learning tasks have subtasks for which much training data exists. Therefore, we want to transfer learning from the old, generalpurpose subtask to a more specific new task, for which there is often less data. While work in transfer learning often considers how the old task should affect learning ..."
Abstract
-
Cited by 21 (1 self)
- Add to MetaCart
Many learning tasks have subtasks for which much training data exists. Therefore, we want to transfer learning from the old, generalpurpose subtask to a more specific new task, for which there is often less data. While work in transfer learning often considers how the old task should affect learning on the new task, in this paper we show that it helps to take into account how the new task affects the old. Specifically, we perform joint decoding of separately-trained sequence models, preserving uncertainty between the tasks and allowing information from the new task to affect predictions on the old task. On two standard text data sets, we show that joint decoding outperforms cascaded decoding.
Automatic Recognition of Spontaneous Speech for Access to Multilingual Oral History Archives
- IEEE Transactions on Speech and Audio Processing
, 2004
"... Abstract—Much is known about the design of automated systems to search broadcast news, but it has only recently become possible to apply similar techniques to large collections of spontaneous speech. This paper presents initial results from experiments with speech recognition, topic segmentation, to ..."
Abstract
-
Cited by 20 (6 self)
- Add to MetaCart
Abstract—Much is known about the design of automated systems to search broadcast news, but it has only recently become possible to apply similar techniques to large collections of spontaneous speech. This paper presents initial results from experiments with speech recognition, topic segmentation, topic categorization, and named entity detection using a large collection of recorded oral histories. The work leverages a massive manual annotation effort on 10 000 h of spontaneous speech to evaluate the degree to which automatic speech recognition (ASR)-based segmentation and categorization techniques can be adapted to approximate decisions made by human annotators. ASR word error rates near 40 % were achieved for both English and Czech for heavily accented, emotional and elderly spontaneous speech based on 65–84 h of transcribed speech. Topical segmentation based on shifts in the recognized English vocabulary resulted in 80 % agreement with
Segmental minimum Bayes-risk decoding for automatic speech recognition
- IEEE Transactions on Speech and Audio Processing
, 2003
"... Abstract—Minimum Bayes-Risk (MBR) speech recognizers have been shown to yield improvements over the conventional maximum a-posteriori probability (MAP) decoders through N-best list rescoring and search over word lattices. We present a Segmental Minimum Bayes-Risk decoding (SMBR) framework that simpl ..."
Abstract
-
Cited by 20 (6 self)
- Add to MetaCart
Abstract—Minimum Bayes-Risk (MBR) speech recognizers have been shown to yield improvements over the conventional maximum a-posteriori probability (MAP) decoders through N-best list rescoring and search over word lattices. We present a Segmental Minimum Bayes-Risk decoding (SMBR) framework that simplifies the implementation of MBR recognizers through the segmentation of the N-best lists or lattices over which the recognition is to be performed. This paper presents lattice cutting procedures that underly SMBR decoding. Two of these procedures are based on a risk minimization criterion while a third one is guided by word-level confidence scores. In conjunction with SMBR decoding, these lattice segmentation procedures give consistent improvements in recognition word error rate (WER) on the Switchboard corpus. We also discuss an application of risk-based lattice cutting to multiple-system SMBR decoding and show that it is related to other system combination techniques such as ROVER. This strategy combines lattices produced from multiple ASR systems and is found to give WER improvements in a Switchboard evaluation system. Index Terms—ASR system combination, extended-ROVER, lattice cutting, minimum Bayes-risk decoding, segmental minimum
Voice Signatures
- In Proc. IEEE Automatic Speech Recognition and Understanding Workshop
, 2003
"... of words from a speaker's voice. This largely ignores other useful information that can be inferred from speech such as gender, age, dialect, or emotion. These characteristics of a speaker's voice, voice signatures, whether static or dynamic, can be useful for speech mining applications or for the d ..."
Abstract
-
Cited by 18 (4 self)
- Add to MetaCart
of words from a speaker's voice. This largely ignores other useful information that can be inferred from speech such as gender, age, dialect, or emotion. These characteristics of a speaker's voice, voice signatures, whether static or dynamic, can be useful for speech mining applications or for the design of a natural spoken-dialog system. This paper explores the problem of extracting automatically and accurately voice signatures from a speaker's voice.
FSA: An Efficient and Flexible C++ Toolkit for Finite State Automata Using On-Demand Computation
- IN: ACL PROCEEDINGS. (2004
, 2004
"... In this paper we present the RWTH FSA toolkit -- an efficient implementation of algorithms for creating and manipulating weighted finite-state automata. The toolkit has been designed using the principle of on-demand computation and offers a large range of widely used algorithms. To prove the superio ..."
Abstract
-
Cited by 14 (4 self)
- Add to MetaCart
In this paper we present the RWTH FSA toolkit -- an efficient implementation of algorithms for creating and manipulating weighted finite-state automata. The toolkit has been designed using the principle of on-demand computation and offers a large range of widely used algorithms. To prove the superior efficiency of the toolkit, we compare the implementation to that of other publically available toolkits. We also show that on-demand computations help to reduce memory requirements significantly without any loss in speed. To increase its flexibility, the RWTH FSA toolkit supports high-level interfaces to the programming language Python as well as a command-line tool for interactive manipulation of FSAs. Furthermore, we show how to utilize the toolkit to rapidly build a fast and accurate statistical machine translation system. Future extensibility of the toolkit is ensured as it will be publically available as open source software.
Hierarchical phrase-based translation with weighted finite state transducers and . . .
- IN PROCEEDINGS OF HLT/NAACL
, 2010
"... In this article we describe HiFST, a lattice-based decoder for hierarchical phrase-based translation and alignment. The decoder is implemented with standard Weighted Finite-State Transducer (WFST) operations as an alternative to the well-known cube pruning procedure. We find that the use of WFSTs ra ..."
Abstract
-
Cited by 14 (7 self)
- Add to MetaCart
In this article we describe HiFST, a lattice-based decoder for hierarchical phrase-based translation and alignment. The decoder is implemented with standard Weighted Finite-State Transducer (WFST) operations as an alternative to the well-known cube pruning procedure. We find that the use of WFSTs rather than k-best lists requires less pruning in translation search, resulting in fewer search errors, better parameter optimization, and improved translation performance. The direct generation of translation lattices in the target language can improve subsequent rescoring procedures, yielding further gains when applying long-span language models and Minimum Bayes Risk decoding. We also provide insights as to how to control the size of the search space defined by hierarchical rules. We show that shallow-n grammars, low-level rule catenation, and other search constraints can help to match the power of the translation system to specific language pairs.

