Results 1 -
8 of
8
The LIMSI Broadcast News Transcription System
- Speech Communication
, 2002
"... This paper reports on activites at LIMSI over the last few years directed at the transcription of broadcast news data. We describe our development work in moving from laboratory read speech data to real-world or `found' speech data in preparation for the ARPA Nov96, Nov97 and Nov98 evaluations. T ..."
Abstract
-
Cited by 84 (5 self)
- Add to MetaCart
This paper reports on activites at LIMSI over the last few years directed at the transcription of broadcast news data. We describe our development work in moving from laboratory read speech data to real-world or `found' speech data in preparation for the ARPA Nov96, Nov97 and Nov98 evaluations. Two main problems needed to be addressed to deal with the continuous flow of inhomogenous data. These concern the varied acoustic nature of the signal (signal quality, environmental and transmission noise, music) and different linguistic styles (prepared and spontaneous speech on a wide range of topics, spoken by a large variety of speakers).
The Use of Speaker Correlation Information for Automatic Speech Recognition
, 1998
"... This dissertation addresses the independence of observations assumption whichis typically made by today's automatic speech recognition systems. This assumption ignores within-speaker correlations which are known to exist. The assumption clearly damages the recognition ability of standard speaker in ..."
Abstract
-
Cited by 7 (3 self)
- Add to MetaCart
This dissertation addresses the independence of observations assumption whichis typically made by today's automatic speech recognition systems. This assumption ignores within-speaker correlations which are known to exist. The assumption clearly damages the recognition ability of standard speaker independent systems, as can seen by the severe drop in performance exhibited by systems between their speaker dependent mode and their speaker independent mode. The typical solution to this problem is to apply speaker adaptation to the models of the speaker independent system. This approach is examined in this thesis with the explicit goal of improving the rapid adaptation capabilities of the system by incorporating within-speaker correlation information into the adaptation process. This is achieved through the creation of an adaptation technique called referencespeaker weighting and in the development of a speaker clustering technique called speaker cluster weighting. However, speaker adaptation is just one way in which the independence assumption can be attacked. This dissertation also introduces a novel speech recognition technique called consistency modeling. This technique utilizes a priori knowledge about the within-speaker correlations which exist between di#erent phonetic events for the purpose of incorporating speaker constraintinto a speech recognition system without explicitly applying speaker adaptation. These new techniques are implemented within a segment-based speech recognition system and evaluation results are reported on the DARPA Resource Management recognition task.
Large Vocabulary Continuous Speech Recognition: from Laboratory Systems towards Real-World Applications
, 1996
"... This paper provides an overview of the state-of-the-art in laboratory speaker-independent, large vocabulary continuous speech recognition (LVCSR) systems with a view towards adapting such technology to the requirements of real-world applications. While in speech recognition the principal concern is ..."
Abstract
-
Cited by 6 (4 self)
- Add to MetaCart
This paper provides an overview of the state-of-the-art in laboratory speaker-independent, large vocabulary continuous speech recognition (LVCSR) systems with a view towards adapting such technology to the requirements of real-world applications. While in speech recognition the principal concern is to transcribe the speech signal as a sequence of words, the same core technology can be applied to domains other than dictation. The main topics addressed are acoustic-phonetic modeling, lexical representation, language modeling, decoding and model adaptation. After a brief summary of experimental results some directions towards usable systems are given. In moving from laboratory systems towards real-world applications, different constraints arise which influence the system design. The application imposes limitations on computational resources, constraints on signal capture, requirements for noise and channel compensation, and rejection capability. The difficulties and costs of adapting existing technology to new languages and application need to be assessed. Near term applications for LVCSR technology are likely to grow in somewhat limited domains such as spoken language systems for information retrieval, and limited domain dictation. Perspectives on some unresolved problems are given, indicating areas for future research
Issues in Large Vocabulary, Multilingual Speech Recognition
- Proc. Europ. Conf. on Speech Communication and Technology
"... In this paper we report on our activities in multilingual, speakerindependent, large vocabulary continuous speech recognition. The multilingual aspect of this work is of particular importance in Europe, where each country has its own national language. Our existing recognizer for American English an ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
In this paper we report on our activities in multilingual, speakerindependent, large vocabulary continuous speech recognition. The multilingual aspect of this work is of particular importance in Europe, where each country has its own national language. Our existing recognizer for American English and French, has been ported to British English and German. It has been assessed in the context of the LRESQALE project whose objective was to experiment with installing in Europe a multilingual evaluation paradigm for the assessment of large vocabulary, continuous speech recognition systems. The recognizer makes use of phone-based continuous density HMM for acoustic modeling and n-gram statistics estimated on newspaper texts for language modeling. The system has been evaluated on a dictation task with read, newspaper-based corpora, the ARPA Wall Street Journal corpus of American English, the WSJCAM0 corpus of British English, the BREF-Le Monde corpus of French and the PHONDAT-Frankfurter Runds...
Spoken language technologies applied to digital talking books
- in Proceedings of Interspeech
, 2006
"... Digital Talking Books (DTBs) offer to visually impaired users an evolution of analogue talking books that mimics the interaction possibilities of print books. This paper describes a new DTB player which tries to improve the usability and accessibility of current players, through the combination of t ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Digital Talking Books (DTBs) offer to visually impaired users an evolution of analogue talking books that mimics the interaction possibilities of print books. This paper describes a new DTB player which tries to improve the usability and accessibility of current players, through the combination of the possibilities offered by multimodal interaction and interface adaptability, and the integration of several language processing components. Besides the potential for a greater enjoyment of the reader in general, these modifications also pave the way to the use of DTBs in different domains, from e-inclusion to e-learning applications. Index Terms: digital talking books, Portuguese. 1.
Transcribing Broadcast News: The LIMSI Nov96 Hub4 System
- In Proc. of DARPA Speech Recognition Workshop
, 1997
"... In this paper we report on the LIMSI Nov96 Hub4 system for transcription of broadcast news shows. We describe the development work in moving from laboratory read speech data to realworld speech data in order to build a system for the ARPA Nov96 evaluation. Two main problems were addressed to deal wi ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
In this paper we report on the LIMSI Nov96 Hub4 system for transcription of broadcast news shows. We describe the development work in moving from laboratory read speech data to realworld speech data in order to build a system for the ARPA Nov96 evaluation. Two main problems were addressed to deal with the continuous flow of inhomogenous data. These concern the varied acoustic nature of the signal (signal quality, environmental and transmission noise, music) and different linguistic styles (prepared and spontaneous speech on a wide range of topics, spoken by a large variety of speakers). The speech recognizer makes use of continuous density HMMs with Gaussian mixture for acoustic modeling and n-gram statistics estimated on large text corpora. The base acoustic models were trained on the WSJ0/WSJ1 corpus, and adapted using MAP estimation with 35 hours of transcribed task-specific training data. The 65k language models are trained on 160 million words of newspaper texts and 132 million w...
AUTOMATIC VS. MANUAL TOPIC SEGMENTATION AND INDEXATION IN BROADCAST NEWS
"... This paper describes the latest progress in our work on Broadcast News for European Portuguese. The central modules of our media watch system that matches the topic of each news story against the user preferences registered in the system are: audio pre-processing, speech recognition and topic segmen ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
This paper describes the latest progress in our work on Broadcast News for European Portuguese. The central modules of our media watch system that matches the topic of each news story against the user preferences registered in the system are: audio pre-processing, speech recognition and topic segmentation and indexation. The main focus of the paper is on the impact of the errors made by the earlier modules in the last ones. This impact is in our opinion an essential diagnostic tool for the improvement of the overall pipeline system. 1.
Aligning and Recognizing Spoken Books in Different Varieties of Portuguese
"... This paper tries to present digital spoken books as a useful diagnostic tool for detecting alignment and recognition problems and for studying the porting of these technologies to different varieties of the same language- Portuguese, in our case. We summarize the main differences between European an ..."
Abstract
- Add to MetaCart
This paper tries to present digital spoken books as a useful diagnostic tool for detecting alignment and recognition problems and for studying the porting of these technologies to different varieties of the same language- Portuguese, in our case. We summarize the main differences between European and Brazilian Portuguese (EP/BP) and describe how they affect the GtoP system. Despite the small size of our parallel spoken book corpus in the two varieties, our preliminary experiments confirmed our expectations in terms of the effectiveness of an EP-trained aligner used on BP spoken books. They also confirmed the inadequacy of an EP Broadcast News recognizer tested over literary contents, and the expected degradation in recognition scores caused by using that recognizer on a BP spoken book. Pronunciation adaptation was tested by adding variants derived by the BP GtoP system to our EP lexicon, resulting in a very small improvement in terms of recognition scores.

