Results 1 -
7 of
7
The LIMSI Broadcast News Transcription System
- Speech Communication
, 2002
"... This paper reports on activites at LIMSI over the last few years directed at the transcription of broadcast news data. We describe our development work in moving from laboratory read speech data to real-world or `found' speech data in preparation for the ARPA Nov96, Nov97 and Nov98 evaluations. T ..."
Abstract
-
Cited by 84 (5 self)
- Add to MetaCart
This paper reports on activites at LIMSI over the last few years directed at the transcription of broadcast news data. We describe our development work in moving from laboratory read speech data to real-world or `found' speech data in preparation for the ARPA Nov96, Nov97 and Nov98 evaluations. Two main problems needed to be addressed to deal with the continuous flow of inhomogenous data. These concern the varied acoustic nature of the signal (signal quality, environmental and transmission noise, music) and different linguistic styles (prepared and spontaneous speech on a wide range of topics, spoken by a large variety of speakers).
Strategies for automatic segmentation of audio data
- in Proc. ICASSP
, 2000
"... In many applications, like indexing of broadcast news or surveillance applications, the input data consists of a continuous, unsegmented audio stream. Speech recognition technology, however, usually requires segments of relatively short length as input. For such applications, effective methods to se ..."
Abstract
-
Cited by 31 (0 self)
- Add to MetaCart
In many applications, like indexing of broadcast news or surveillance applications, the input data consists of a continuous, unsegmented audio stream. Speech recognition technology, however, usually requires segments of relatively short length as input. For such applications, effective methods to segment continuous audio streams into homogeneous segments are required. In this paper, three different segmenting strategies (model-based, metric-based and energy-based) are compared on the same broadcast news test data. It is shown that model-based and metric-based techniques outperform the simpler energy-based algorithms. While model-based segmenters achieve very high level of segment boundary precision, the metric-based segmenter performes better in terms of segment boundary recall (RCL). To combine the advantages of both strategies, a new hybrid algorithm is introduced. For this, the results of a preliminary metric-based segmentation are used to construct the models for the final model-based segmenter run. The new hybrid approach is shown to outperform the other segmenting strategies. 1.
The Spoken Language Component of the Mask Kiosk
- Human Comfort & Security of Information Systems
, 1997
"... The aim of the Multimodal-Multimedia Automated Service Kiosk (MASK) project is to pave the way for more advanced public service applications by user interfaces employing multimodal, multi-media input and output. The project has analyzed the technological requirements in the context of users and the ..."
Abstract
-
Cited by 12 (4 self)
- Add to MetaCart
The aim of the Multimodal-Multimedia Automated Service Kiosk (MASK) project is to pave the way for more advanced public service applications by user interfaces employing multimodal, multi-media input and output. The project has analyzed the technological requirements in the context of users and the tasks they perform in carrying out travel enquiries, and developed a prototype information kiosk that will be installed in the Gare St. Lazare in Paris. The kiosk will improve the effectiveness of such services by enabling interaction through the coordinated use of multimodal inputs (speech and touch) and multimedia output (sound, video, text, and graphics) and in doing so create the opportunity for new public services. Vocal input is managed by a spoken language system, which aims to provide a natural interface between the user and the computer through the use of simple and natural dialogs. In this paper the architecture and the capabilities of the spoken language system are described, with...
The LIMSI 1998 Hub-4E Transcription System
- IN PROC. OF THE DARPA BROADCAST NEWS WORKSHOP
, 1999
"... In this paper we report on our Nov98 Hub-4E system, which is an extension of our Nov97 system[4]. The LIMSI system for the November 1998 Hub-4E evaluation is a continuous mixture density, tied-state cross-word context-dependent HMM system. The acoustic models were trained on the 1995, 1996 and 1997 ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
In this paper we report on our Nov98 Hub-4E system, which is an extension of our Nov97 system[4]. The LIMSI system for the November 1998 Hub-4E evaluation is a continuous mixture density, tied-state cross-word context-dependent HMM system. The acoustic models were trained on the 1995, 1996 and 1997 official Hub-4E training data containing about 150 hours of transcribed speech material. 65K word language models were obtained by interpolation of backoff n-gram language models trained on different text data sets. Prior to word decoding a maximum likelihood partitioning algorithm segments the data into homogenous regions and assigns gender, bandwidth and cluster labels to the speech segments. Word decoding is carried out in three steps, integrating cluster-based MLLR acoustic model adaptation. The final decoding step uses a 4-gram languagemodel interpolated with a category trigram model. The main differences compared to last year's system arise from the use of additional acoustic and lang...
Recent Advances in Transcribing Television and Radio Broadcasts
- Proc. Eurospeech '99
"... Transcription of broadcast news shows (radio and television) is a major step in developing automatic tools for indexation and retrieval of the vast amounts of information generated on a daily basis. Broadcast shows are challenging to transcribe as they consist of a continuous data stream with segmen ..."
Abstract
-
Cited by 8 (5 self)
- Add to MetaCart
Transcription of broadcast news shows (radio and television) is a major step in developing automatic tools for indexation and retrieval of the vast amounts of information generated on a daily basis. Broadcast shows are challenging to transcribe as they consist of a continuous data stream with segments of different linguistic and acoustic natures. Transcribing such data requires addressing two main problems: those related to the varied acoustic properties of the signal, and those related to the linguistic properties of the speech. Prior to word transcription, the data is partitioned into homogeneous acoustic segments. Non-speech segments are identified and rejected, and the speech segments are clustered and labeled according to bandwidth and gender. The speaker-independent large vocabulary, continuous speech recognizer makes use of n-gram statistics for language modeling and of continuous density HMMs with Gaussian mixtures for acoustic modeling. The LIMSI system has consistently obtain...
Spoken Language Dialog System Development and Evaluation at LIMSI
- In Proceedings of the International Symposium on Spoken Dialogue
, 1998
"... The development of natural spoken language dialog systems requires expertise in multiple domains, including speech recognition, natural spoken language understanding and generation, dialog managment and speech synthesis. In this paper I report on our experience at LIMSI in the design, development an ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
The development of natural spoken language dialog systems requires expertise in multiple domains, including speech recognition, natural spoken language understanding and generation, dialog managment and speech synthesis. In this paper I report on our experience at LIMSI in the design, development and evaluation of spoken language dialog systems for information retrieval tasks. Drawing upon our experience in this area, I attempt to highlight some aspects of the design process, such as the use of general and task-specific knowledge sources, the need for an iterative development cycle, and some of the difficulties related to evaluation of development progress. 1. INTRODUCTION At LIMSI we have experience in developing several spoken language dialog systems for information retrieval tasks[5, 11, 16, 19, 1]. Our recent activities in this area have been mainly in the context of European projects, such as ESPRIT MASK, Language Engineering RAILTEL and ARISE, Tide HOME-AOM, Esprit LTR Concerte...
Performance Of The Ibm Large Vocabulary Continuous Speech Recognition System On The Arpa Wall Street Journal Task
- on the ARPA Wall Street Journal task,” in Proc. ICASSP
"... In this paper we discuss various experimental results using our continuous speech recognition system on the Wall Street Jounal task. Experiments with different feature extraction methods, varying amounts and type of training data, and different vocabulary sizes are reported. 1 INTRODUCTION Large v ..."
Abstract
- Add to MetaCart
In this paper we discuss various experimental results using our continuous speech recognition system on the Wall Street Jounal task. Experiments with different feature extraction methods, varying amounts and type of training data, and different vocabulary sizes are reported. 1 INTRODUCTION Large vocabulary continuous speech recognition is an area that is of great current interest, and to this end, several speech recognition systems have evolved that are capable of dealing with such recognition tasks [2, 4, 5, 6, 7, 9]. The ARPA sponsored Wall Street Journal task represents a standardized database that enables the evaluation of the features specific to these different systems on a common platform. In this paper, we present the performance of the IBM continuous speech recognition system on this task. We will concentrate on the speaker-independent portion of the database. The test data used in the experiments is read speech recorded using a Sennheiser microphone. We report experimental ...

