Results 11 - 20
of
21
Evaluation of Extractive Voicemail Summarization
- In Proc. ISCA Workshop on Multilingual Spoken Document Retrieval, Hong Kong
, 2003
"... This paper is about the evaluation of a system that generates short text summaries of voicemail messages, suitable for transmission as text messages. Our approach to summarization is based on a speech-recognized transcript of the voicemail message, from which a set of summary words is extracted. The ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
This paper is about the evaluation of a system that generates short text summaries of voicemail messages, suitable for transmission as text messages. Our approach to summarization is based on a speech-recognized transcript of the voicemail message, from which a set of summary words is extracted. The system uses a classifier to identify the summary words, with each word being identified by a vector of lexical and prosodic features. The features are selected using Parcel, an ROC-based algorithm. Our evaluations of the system, using a slot error rate metric, have compared manual and automatic summarization, and manual and automatic recognition (using two different recognizers). We also report on two subjective evaluations using mean opinion score of summaries, and a set of comprehension tests. The main results from these experiments were that the perceived difference in quality of summarization was affected more by errors resulting from automatic transcription, than by the automatic summarization process.
Robust Automatic Speech Recognition With Unreliable Data
, 1999
"... Theoretical and practical issues of some of the problems in robust automatic speech recognition (ASR) and some of the techniques that address them are presented in this report. The problem of the robustness of the ASR in real--life (as opposed to laboratory) conditions is paramount to the widespread ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Theoretical and practical issues of some of the problems in robust automatic speech recognition (ASR) and some of the techniques that address them are presented in this report. The problem of the robustness of the ASR in real--life (as opposed to laboratory) conditions is paramount to the widespread deployment of speech enabled products. The report reviews techniques used so far for robust ASR, ranging from simple spectrum subtraction to various types of model adaptation. A possible connection of robust ASR with the computational auditory scene analysis (CASA), methods for local Signal--to--Noise Ratio (SNR) estimation and classification/scoring with on--line adapted statistical models is discussed. The main focus is on the techniques that would allow for incorporation of CASA and local SNR estimates (used as methods for speech/non--speech separation) into the present prevailing stochastic pattern matching paradigms -- Hidden Markov models (HMM) and artificial neural networks (ANN). Th...
Compound decomposition in Dutch large vocabulary speech recognition
- IN PROCEEDINGS OF EUROSPEECH 2003, GENEVE
, 2003
"... This paper addresses compound splitting for Dutch in the context of broadcast news transcription. Language models were created using original text versions and text versions that were decomposed using a data-driven compound splitting algorithm. Language model performances were compared in terms of o ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
This paper addresses compound splitting for Dutch in the context of broadcast news transcription. Language models were created using original text versions and text versions that were decomposed using a data-driven compound splitting algorithm. Language model performances were compared in terms of outof -vocabulary rates and word error rates in a real-world broadcast news transcription task. It was concluded that compound splitting does improve ASR performance. Best results were obtained when frequent compounds were not decomposed.
Development, Evaluation and Automatic Segmentation of Slovenian Broadcast News Speech Database
"... The paper reviews the development of a new Slovenian broadcast news speech database. The database consists of audio, video and annotation transcripts of about 34 hours of television daily news program captured from the public TV station RTVSLO. The paper addresses issues concerning transcription and ..."
Abstract
- Add to MetaCart
The paper reviews the development of a new Slovenian broadcast news speech database. The database consists of audio, video and annotation transcripts of about 34 hours of television daily news program captured from the public TV station RTVSLO. The paper addresses issues concerning transcription and annotation of the collected data, provides information on content analysis and basic statistics of the collected material and compares different methods of automatic segmentation. 1.
The Thisl Sdr System At Trec-8
- Proc. of the 8th Text Retrieval Conference TREC-8, Nov 1999. Martine Adda-Decker, Gilles Adda
"... This paper describes the participation of the THISL group at the TREC-8 Spoken Document Retrieval (SDR) track. The THISL SDR system consists of the realtime version of the ABBOT large vocabulary speech recognition system and the THISLIR text retrieval system. The TREC-8 evaluation assessed SDR perfo ..."
Abstract
- Add to MetaCart
This paper describes the participation of the THISL group at the TREC-8 Spoken Document Retrieval (SDR) track. The THISL SDR system consists of the realtime version of the ABBOT large vocabulary speech recognition system and the THISLIR text retrieval system. The TREC-8 evaluation assessed SDR performance on a corpus of 500 hours of broadcast news material collected over a five month period. The main test condition involved retrieval of stories defined by manual segmentation of the corpus in which non-news material, such as commercials, were excluded. An optional test condition required required retrieval of the same stories from the unsegmented audio stream. The THISL SDR system participated at both test conditions. The results show that a system such as THISL can produce respectable information retrieval performance on a realistically-sized corpus of unsegmented audio material. 1. INTRODUCTION The TREC-8 test collection was obtained from the TDT2 corpus and consisted of 902 shows (...
ASR - Articulatory Speech Recognition
, 2001
"... We describe a speech recognition system which uses a combination of acoustic and articulatory features as input. Linear dynamic models capture the trajectories which characterise each segment type. We describe classification and recognition tasks for systems based on acoustic data in conjunction wit ..."
Abstract
- Add to MetaCart
We describe a speech recognition system which uses a combination of acoustic and articulatory features as input. Linear dynamic models capture the trajectories which characterise each segment type. We describe classification and recognition tasks for systems based on acoustic data in conjunction with both real and automatically recovered articulatory parameters.
On Using MLP Features in LVCSR
- Proc. ICSLP, Jeju, Korea
, 2004
"... One of the major research thrusts in the speech group at ICSI is to use Multi-Layer Perceptron (MLP) based features in automatic speech recognition (ASR). This paper presents a study of three aspects of this effort: 1) the properties of the MLP features which make them useful, 2) incorporating MLP f ..."
Abstract
- Add to MetaCart
One of the major research thrusts in the speech group at ICSI is to use Multi-Layer Perceptron (MLP) based features in automatic speech recognition (ASR). This paper presents a study of three aspects of this effort: 1) the properties of the MLP features which make them useful, 2) incorporating MLP features together with PLP features in ASR, and 3) possible redundancy between MLP features and more conventional system refinements such as discriminative training and system combination. The paper shows that MLP transformations yield variables that have regular distributions, which can be further modified by using logarithm to make the distribution easier to model by a Gaussian-HMM. Two or more vectors of these features can easily be combined without increasing the feature dimension. Recognition results show that MLP features can significantly improve recognition performance in large vocabulary continuous speech recognition (LVCSR) tasks for the NIST 2001 Hub-5 evaluation set with models trained on the Switchboard Corpus, even when discriminative training and system combination are used.
The Listening Machine: Sound Source Organization for Multimedia Understanding
"... Identifying the individual sources present in a real-world sound recording is difficult: Almost without exception, sounds of interest are embedded in a context of competing sounds, and it is rare to be given an unobstructed view of an ideal, isolated target. Human listeners, in common with other aud ..."
Abstract
- Add to MetaCart
Identifying the individual sources present in a real-world sound recording is difficult: Almost without exception, sounds of interest are embedded in a context of competing sounds, and it is rare to be given an unobstructed view of an ideal, isolated target. Human listeners, in common with other auditorily-equipped animals, are adept at handling such mixed signals, but our best computational audition systems — for instance automatic speech recognizers — are highly vulnerable to added interference, even at levels that listeners barely notice. This program is about developing algorithms and systems for the analysis of sound mixtures in the context of automatic multimedia scene analysis. In comparison with video and image analysis, there has been little work on the general problem of organizing everyday sounds into the objects and events perceived by listeners. Unlike noise-robust speech recognition, which seeks simply to minimize the impact of the nonspeech components on the derived signal features, sound organization involves identifying and separately characterizing each significant contribution in a sound. Central to the proposed approach is the idea of sound fragment recognition: Although a sound mixture may not afford unobstructed views of an entire sound source (voice, telephone ring, musical
Speech-based information retrieval for Dutch
, 2003
"... In this paper, the current state-of-aairs in Dutch speechbased retrieval as addressed in a series of multimedia retrieval projects is described and possible future directions of the research in this eld are discussed in brief. ..."
Abstract
- Add to MetaCart
In this paper, the current state-of-aairs in Dutch speechbased retrieval as addressed in a series of multimedia retrieval projects is described and possible future directions of the research in this eld are discussed in brief.

