Results 1 -
5 of
5
Lightly Supervised and Unsupervised Acoustic Model Training
- Computer Speech and Language
, 2002
"... The last decade has witnessed substantial progress in speech recognition technology, with todays state-of-the-art systems being able to transcribe unrestricted broadcast news audio data with a word error of about 20%. ..."
Abstract
-
Cited by 34 (2 self)
- Add to MetaCart
The last decade has witnessed substantial progress in speech recognition technology, with todays state-of-the-art systems being able to transcribe unrestricted broadcast news audio data with a word error of about 20%.
Lightly Supervised Acoustic Model Training
- Proc. ISCA ITRW ASR2000
, 2000
"... Although tremendous progress has been made in speech recognition technology, with the capability of todays state-of-the-art systems to transcribe unrestricted continuous speech from broadcast data, these systems rely on the availability of large amounts of manually transcribed acoustic training data ..."
Abstract
-
Cited by 20 (6 self)
- Add to MetaCart
Although tremendous progress has been made in speech recognition technology, with the capability of todays state-of-the-art systems to transcribe unrestricted continuous speech from broadcast data, these systems rely on the availability of large amounts of manually transcribed acoustic training data. Obtaining such data is both time-consuming and expensive, requiring trained human annotators with substantial amounts of supervision. In this paper we describe some recent experiments using lightly supervised techniques for acoustic model training in order to reduce the system development cost. The strategy we investigate uses a speech recognizer to transcribe unannotated broadcast news data, and optionally combines the hypothesized transcription with associated, but unaligned closed captions or transcripts to create labeled training. We show that this approach can dramatically reduces the cost of building acoustic models. 1. INTRODUCTION The last decade has witnessed substantial progres...
Generative and Discriminative Methods using Morphological Information for Sentence Segmentation of Turkish
"... This paper presents novel methods for generative, discriminative, and hybrid sequence classification for segmentation of Turkish utterances into sentences. In the literature, this task is generally solved using statistical models that take advantage of lexical information among others. However, Turk ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
This paper presents novel methods for generative, discriminative, and hybrid sequence classification for segmentation of Turkish utterances into sentences. In the literature, this task is generally solved using statistical models that take advantage of lexical information among others. However, Turkish has a productive morphology that generates an exponential vocabulary size, harming language models such as the established hidden event language model (HELM). We extend this model as a factored hidden event language model (fHELM) in order to take advantage of morphologically informed features in addition to the word sequence. Our results indicate that fHELMs result in a 26 % reduction in error rate for Turkish broadcast news. Combining lexical, morphological, and prosodic information using these new models and discriminative classifiers (boosting and conditional random fields) results in significant performance improvements over any of the classifiers alone.
Large Vocabulary Statistical Language Modeling for Continuous Speech
- In Proceedings of the 7th European Conference on Speech Communication and Technology
, 2001
"... Statistical language modeling (SLM) is an essential part in any large-vocabulary continuous speech recognition (LVCSR) system. The development of the standard SLM methods has been strongly affected by the goals of LVCSR in English. The structure of Finnish is substantially different from English, so ..."
Abstract
- Add to MetaCart
Statistical language modeling (SLM) is an essential part in any large-vocabulary continuous speech recognition (LVCSR) system. The development of the standard SLM methods has been strongly affected by the goals of LVCSR in English. The structure of Finnish is substantially different from English, so if the standard SLMs are directly applied, the success is by no means granted. In this paper we describe our first attempts of building a LVCSR for Finnish and the new SLMs that we have tried. One of our objective has been the indexing and recognition of broadcast news, so special issues of our interest are topic detection, word stemming and modeling words that are poorly covered in the training data. Our new methods are based on neural computing using the self-organizing map (SOM) which has recently been shown to successfully extract and approximate latent semantic structures from massive text collections.

