Results 1 -
6 of
6
Investigating text normalization and pronunciation variants for German broadcast transcription
- In ICSLP'2000
, 2000
"... In this paper we describe our ongoing work concerning lexical modeling in the LIMSI broadcast transcription system for German. Lexical decomposition is investigated with a twofold goal: lexical coverage optimization and improved letter-to-sound conversion. A set of about 450 decompounding rules, dev ..."
Abstract
-
Cited by 9 (3 self)
- Add to MetaCart
In this paper we describe our ongoing work concerning lexical modeling in the LIMSI broadcast transcription system for German. Lexical decomposition is investigated with a twofold goal: lexical coverage optimization and improved letter-to-sound conversion. A set of about 450 decompounding rules, developed using statistics from a 300M word corpus, reduces the OOV rate from 4.5% to 4.0% on a 30k development text set. Adding partial inflection stripping, the OOV rate drops to 2.9%. For letterto -sound conversion, decompounding reduces cross-lexeme ambiguities and thus contributes to more consistent pronunciation dictionaries. Another point of interest concerns reduced pronunciation modeling. Word error rates, measured on 1.3 hours of ARTE TV broadcast, vary between 18 and 24% depending on the show and the system configuration. Our experiments indicate that using reduced pronunciations slightly decreases word error rates. 1. INTRODUCTION The German language, more than other major western ...
Pronunciation Variants across System Configuration, Language and Speaking Style
, 1999
"... This contribution aims at evaluating the use of pronunciation variants for different recognition system configurations, languages and speaking styles. This study is limited to the use of variants during speech alignment, given an orthographic transcription of the utterance and a phonemically represe ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
This contribution aims at evaluating the use of pronunciation variants for different recognition system configurations, languages and speaking styles. This study is limited to the use of variants during speech alignment, given an orthographic transcription of the utterance and a phonemically represented lexicon, and is thus focused on the modeling capabilities of the acoustic word models. To measure the need for variants we have defined the variant2+ rate which is the percentage of words in the corpus not aligned with the most common phonemic transcription. This measure may be indicative of the possible need for pronunciation variants in the recognition system.
Automatic Transcription Of Compressed Broadcast Audio
- in Proc. IEEE Intl. Conf. on Acoustics, Speech, and Signal Processing (ICASSP’01), Vol.1
, 2001
"... With increasing volumes of audio and video data broadcast over the web, it is of interest to assess the performance of state-of-theart automatic transcription systems on compressed audio data for media indexation applications. In this paper the performance of the LIMSI 10x French broadcast news tran ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
With increasing volumes of audio and video data broadcast over the web, it is of interest to assess the performance of state-of-theart automatic transcription systems on compressed audio data for media indexation applications. In this paper the performance of the LIMSI 10x French broadcast news transcription system is measured on a two-hour audio set for a range of MP3 and RealAudio codecs at various bitrates and the GSM codec used for European cellular phone communications. The word error rates are compared with those obtained on high quality PCM recordings prior to compression. For a 6.5 kbps audio bit rate (the most commonly used on the web), word error rates under 40% can be achieved, which makes automatic media monitoring systems over the web a realistic task. 1.
Broadcast News Transcription in Mandarin
- Proc. ICSLP'2000
, 2000
"... In this paper, our work in developing a Mandarin broadcast news transcription system is described. The main focus of this work is a port of the LIMSI American English broadcast news transcription system to the Chinese Mandarin language. The system consists of an audio partitioner and an HMM-based co ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
In this paper, our work in developing a Mandarin broadcast news transcription system is described. The main focus of this work is a port of the LIMSI American English broadcast news transcription system to the Chinese Mandarin language. The system consists of an audio partitioner and an HMM-based continuous speech recognizer. The acoustic models were trained on about 24 hours of data from the 1997 Hub4 Mandarin corpus available via LDC. In addition to the transcripts, the language models were trained on Mandarin Chinese News Corpus containing about 186 million characters. We investigate recognition performance as a function of lexical size, with and without tone in the lexicon, and with a topic dependent language model. The transcription character error rate on the DARPA 1997 test set is 18.1% using a lexicon with 3 tone levels and a topic-based language model. 1. INTRODUCTION It is well known that radio and television broadcast shows contain different types of speech from the acoust...
Recent Activities in Spoken Language Processing at LIMSI
- LIMSI,” DARPA Continuous Speech Recognition Workshop
, 1992
"... : This paper summarizes recent activities at LIMSI in multilingual speech recognition and its applications. While the main goal of speech recognition is to provide a transcription of the speech signal as a sequence of words, the same basic technology serves as the first step in other application are ..."
Abstract
- Add to MetaCart
: This paper summarizes recent activities at LIMSI in multilingual speech recognition and its applications. While the main goal of speech recognition is to provide a transcription of the speech signal as a sequence of words, the same basic technology serves as the first step in other application areas, such as in automatic systems for information access and for automatic indexation of audiovisual data. SPEECH RECOGNITION Speech recognition is principally concerned with the problem of transcribing the speech signal as a sequence of words. The LIMSI system, in common with most of today's state-of-the-art systems (4), makes use of statistical models of speech generation. From this point of view, message generation is represented by a language model which provides an estimate of the probability of any given word string, and the encoding of the message in the acoustic signal is represented by a probability density function (HMM). The speech decoding problem then consists of maximizing the...
Development of a Speech Recognition System for Spanish Broadcast News
"... This paper reports on the development process of a speech recognition system for Spanish broadcast news within the MESH FP6 project 1. The system uses the SONIC recognizer developed at the Center for Spoken Language Research (CSLR), University of Colorado. Acoustic and language models were trained u ..."
Abstract
- Add to MetaCart
This paper reports on the development process of a speech recognition system for Spanish broadcast news within the MESH FP6 project 1. The system uses the SONIC recognizer developed at the Center for Spoken Language Research (CSLR), University of Colorado. Acoustic and language models were trained using Hub4 broadcast news data. Experiments and evaluation results are reported. 2

