Results 1 - 10
of
11
From Broadcast News To Spontaneous Dialogue Transcription:
- In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing
, 2001
"... This paper reports on experiments of porting the ITC-irst Italian broadcast news recognition system to two spontaneous dialogue domains. The trade-off between performance and the required amount of task specific data was investigated. Porting was experimented by applying supervised adaptation method ..."
Abstract
-
Cited by 9 (5 self)
- Add to MetaCart
This paper reports on experiments of porting the ITC-irst Italian broadcast news recognition system to two spontaneous dialogue domains. The trade-off between performance and the required amount of task specific data was investigated. Porting was experimented by applying supervised adaptation methods on acoustic and language models. By using two hours of manually transcribed speech, word error rates of 26.0% and 28.4% were achieved by the adapted systems. Two reference systems, developed on a larger training corpus, achieved word error rates of 22.6% and 21.2%, respectively.
Multilingual Person to Person Communication at IRST
- In ICASSP
, 1997
"... This paper refers to a machine-mediated person-to-person multilingual communication system. Stress is put on robustness, that is the ability of the system to preserve communication even in presence of the variability and errors typical of spoken language systems. The statistical approach is adopted ..."
Abstract
-
Cited by 9 (6 self)
- Add to MetaCart
This paper refers to a machine-mediated person-to-person multilingual communication system. Stress is put on robustness, that is the ability of the system to preserve communication even in presence of the variability and errors typical of spoken language systems. The statistical approach is adopted not only at the acoustic level, but also for the linguistic processing. Therefore, while an overview of the global architecture will be briefly introduced, the focus will be put on the acoustic recognizer and the understanding module. Experimental evaluations complete the presentation. 1. INTRODUCTION This paper refers to a machine-mediated person-to-person multilingual communication system. The scenario involves appointment negotiation between two persons speaking different languages. The focus will be put on two of the system modules, while an overview of the global architecture will be briefly introduced. The expression multilingual communication is preferred to speech-to-speech translat...
A System for the Segmentation and Transcription of Italian Radio News
- IN PROC. OF THE RIAO CONFERENCE
, 2000
"... This paper presents the development of an Italian broadcast news transcription system, to be applied for the indexing of multimedia archives. Moreover, a broadcast news corpus under collection at ITC-irst is introduced. The system processes the input audio stream in four stages. The first one per ..."
Abstract
-
Cited by 6 (5 self)
- Add to MetaCart
This paper presents the development of an Italian broadcast news transcription system, to be applied for the indexing of multimedia archives. Moreover, a broadcast news corpus under collection at ITC-irst is introduced. The system processes the input audio stream in four stages. The first one performs audio segmentation via the Bayesian Information Criterion (BIC) and classification by Gaussians mixtures modeling. The second stage groups spectrally homogeneous speech segments, again using the BIC method, in order to provide speaker clusters suitable for the following adaptation module. The third stage adapts the acoustic models to each selected cluster and, finally, the fourth stage transcribes the audio data employing cluster adapted models. The achieved word error rate, measured on a 1h:15m test set, corresponding to 6 news programs, was 21.5%.
A Comparison Of Two LVR Search Optimization Techniques
- in Proc. Int. Conf. Spoken Language Processing
, 2002
"... This paper presents a detailed comparison between two search optimization techniques for large vocabulary speech recognition -- one based on word-conditioned tree search (WCTS) and one based on weighted finite-state transducers (WFSTs). Existing North American Business News systems from RWTH and AT& ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
This paper presents a detailed comparison between two search optimization techniques for large vocabulary speech recognition -- one based on word-conditioned tree search (WCTS) and one based on weighted finite-state transducers (WFSTs). Existing North American Business News systems from RWTH and AT&T representing each of the two approaches, were modified to remove variations in model data and acoustic likelihood computation. An experimental comparison showed that the WFST-based system explored fewer search states and had less runtime overhead than the WCTS-based system for a given word error rate. This is attributed to differences in the pre-compilation, degree of non-determinism, and path weight distribution in the respective search graphs.
Cross-Task Portability of a Broadcast News Speech Recognition System
- Speech Communication
, 2002
"... Dieser Artikel berichtet ber Experimente zur Portierung des ITC-irst Systems zur automatischen Verschriftung von Radio- und TV-Sendungen auf zwei Domnen mit spontansprachlichen Dialogen. Die Portierung wurde mittels moderner Adaptionsmethoden fr die akustischen Referenzen und fr das Sprachmodell ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Dieser Artikel berichtet ber Experimente zur Portierung des ITC-irst Systems zur automatischen Verschriftung von Radio- und TV-Sendungen auf zwei Domnen mit spontansprachlichen Dialogen. Die Portierung wurde mittels moderner Adaptionsmethoden fr die akustischen Referenzen und fr das Sprachmodells untersucht und anhand des Verhltnisses von Erkennungsleistung und der bentigten Menge an annotierten Daten speziell fr die Zieldomne evaluiert. Weiterhin wurden verschiedene Stufen der berwachung bei der Adaption des akustischen Modells untersucht. Mit zwei Stunden von Hand annotierten Daten zur Adaption wurden mit dem adaptierten System Wordfehlerraten von 26.0speziell auf die Domne zugeschnittenen Systemen mit Wordfehlerraten von 22.6entwickelt wurden. Schlielich wird eine robuste Methode prsentiert, mit der Gewichte fr die Behandlung von spontansprchlichen E#ekten automatisch eingestellt werden knnen.
A Mixed Approach To Speech Understanding
- In Proc. of ICSLP
, 1996
"... This paper presents a mixed approachtospoken language understanding that tries tomake best use of the advantages of bothstatistical and knowledge-based algorithms. Results obtained on ATIS #Air Travel Information System# scenario transferred toItalian language are presented and discussed. 1. ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
This paper presents a mixed approachtospoken language understanding that tries tomake best use of the advantages of bothstatistical and knowledge-based algorithms. Results obtained on ATIS #Air Travel Information System# scenario transferred toItalian language are presented and discussed. 1.
A Baseline For The Transcription Of Italian Broadcast News
- IN PROC. OF ICASSP
"... This paper presents the first achievements in the development of a broadcast news transcription system to be applied for the processing of huge audio archives. In particular, the Italian broadcast news corpus under collection is introduced, and the first implemented baseline system is outlined. The ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
This paper presents the first achievements in the development of a broadcast news transcription system to be applied for the processing of huge audio archives. In particular, the Italian broadcast news corpus under collection is introduced, and the first implemented baseline system is outlined. The baseline system consists of an audio segmentation module and a speech recognizer featuring a recursive Viterbi beam search, a 64K-word lexicon, a treebased trigram LM representation, and MLLR adaptation. The word error rate of the baseline was 20.9% on planned studio speech and 28.8% on the whole test set.
Strategies de perception par vision active pour la reconstruction et l'exploration de scnes statiques
- PhD Thesis, Universit de Rennes 1, IRISA
, 1996
"... Abstract—This paper presents an algorithm for the composition of weighted finite-state transducers which is specially tailored to speech recognition applications: it composes the lexicon with the language model while simultaneously optimizing the resulting transducer. Furthermore, it performs these ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
Abstract—This paper presents an algorithm for the composition of weighted finite-state transducers which is specially tailored to speech recognition applications: it composes the lexicon with the language model while simultaneously optimizing the resulting transducer. Furthermore, it performs these computations “on-the-fly ” to allow easier management of the tradeoff between offline and online computation and memory. The algorithm is exact for local knowledge integration and optimization operations such as composition and determinization. Minimization and pushing operations are approximated. Our results have confirmed the efficiency of these approximations. Index Terms—Speech recognition, weighted finite-state transducers (WFSTs). I.
Advances In Automatic Transcription Of Italian Broadcast News
- In Proceedings of the International Conference of Spoken Language Processing, volume II
, 2000
"... This paper presents some recent improvements in automatic transcription of Italian broadcast news obtained at ITCirst. A first preliminary activity was carried out in order to develop a suitable speech corpus for the Italian language. The resulting corpus, formed by recordings covering 30 hours of r ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
This paper presents some recent improvements in automatic transcription of Italian broadcast news obtained at ITCirst. A first preliminary activity was carried out in order to develop a suitable speech corpus for the Italian language. The resulting corpus, formed by recordings covering 30 hours of radio news, was exploited for developing a baseline system for transcription of broadcast news. The system performs in different stages: acoustic segmentation and classification, speaker clustering, acoustic model adaptation and speech decoding. Major recent advances allowing performance improvement concern with speech segmentation and clustering, acoustic modeling, acoustic model adaptation and the language model.
A Speech-to-Speech Translation based Interface for Tourism
- In Proceedings of the ENTER Conference
, 1999
"... . This paper presents a speech-to-speech translation system for tourism application developed in the context of the C-STAR consortium. Potential users can communicate by speech and by using their own language with a travel agent in order to organize their travel. The system uses an interchange forma ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
. This paper presents a speech-to-speech translation system for tourism application developed in the context of the C-STAR consortium. Potential users can communicate by speech and by using their own language with a travel agent in order to organize their travel. The system uses an interchange format representation of the semantic contents of utterances, which is flexible and simplifies the system portability to new languages. A demonstrative prototype, developed at ITC-irst, is now working for the Italian modules and was integrated with the English counter part developed at the Interactive System Laboratory at CMU. 1 Introduction In the field of tourist information, users from every part of the world may want to access an information system to get information for organizing their travels. However, potential users are not computer scientists nor are keen onto use artificial languages for interacting with the system. They would rather exploit their own language. This is why multi-lingu...

