Results 1 - 10
of
29
Talker Localization And Speech Recognition Using A Microphone Array And A Cross-Powerspectrum Phase Analysis
- In Int. Conf. on Spoken Language Processing (ICSLP
, 1994
"... Mismatch in training and testing conditions reduces considerably the performance of a speaker-independent HMM-based continuous speech recognizer. Compensation of this mismatch can avoid the complex and time-consuming retraining of the recognizer. This paper describes an acquisition system based on a ..."
Abstract
-
Cited by 20 (1 self)
- Add to MetaCart
Mismatch in training and testing conditions reduces considerably the performance of a speaker-independent HMM-based continuous speech recognizer. Compensation of this mismatch can avoid the complex and time-consuming retraining of the recognizer. This paper describes an acquisition system based on a four omnidirectional microphone array that was employed to reproduce a "beamformed" version of the original acoustic messages acquired in a noisy and reverberant environment, with a talker-microphone distance of one meter. In this preliminary activity, some simple noise compensation techniques (i.e. a Mean Spectrum based Enhancement and a Cepstrum Mean Subtraction) were incorporated in this preprocessing stage to obtain an enhanced version of the given utterance. Feeding a clean-condition trained continuous speech recognizer with enhanced signals led to a significant improvement of performance, if compared to the use of unprocessed single-microphone signals as input. I. INTRODUCTION Perfo...
Language Model Representations For Beam-Search Decoding
- In Proceedings of the ICASSP'95
, 1995
"... This paper presents an efficient way of representing a bigram language model for a beam-search based, continuous speech, large vocabulary HMM recognizer. The tree-based topology considered takes advantage of a factorization of the bigram probability derived from the bigram interpolation scheme, and ..."
Abstract
-
Cited by 17 (1 self)
- Add to MetaCart
This paper presents an efficient way of representing a bigram language model for a beam-search based, continuous speech, large vocabulary HMM recognizer. The tree-based topology considered takes advantage of a factorization of the bigram probability derived from the bigram interpolation scheme, and of a tree organization of all the words that can follow a given one. Moreover, an optimization algorithm is used to considerably reduce the space requirements of the language model. Experimental results are provided for two 10,000-word dictation tasks: radiological reporting (perplexity 27) and newspaper dictation (perplexity 120). In the former domain 93% word accuracy is achieved with real-time response and 23 Mb process space. In the newspaper dictation domain, 88.1% word accuracy is achieved with 1:41 real-time response and 38 Mb process space. All recognition tests were performed on an HP-735 workstation. 1. INTRODUCTION Many current ASR systems generate initial hypotheses through a b...
Training of HMM with Filtered Speech Material for Hands-Free Recognition
- in Proc. ICASSP
, 1999
"... This paper addresses the problem of hands-free speech recognition in a noisy office environment. An array of six omnidirectional microphones and a corresponding time delay compensation module are used to provide a beamformed signal as input to a HMM-based recognizer. Training of HMMs is performed e ..."
Abstract
-
Cited by 16 (5 self)
- Add to MetaCart
This paper addresses the problem of hands-free speech recognition in a noisy office environment. An array of six omnidirectional microphones and a corresponding time delay compensation module are used to provide a beamformed signal as input to a HMM-based recognizer. Training of HMMs is performed either using a clean speech database or using a filtered version of the same database. Filtering consists in a convolution with the acoustic impulse response between speaker and microphone, to reproduce the reverberation effect. Background noise is summed to provide the desired SNR. The paper shows that the new models trained on these data perform better than the baseline ones. Furthermore, the paper investigates on MLLR adaptation of the new models. It is shown that a further performance improvement is obtained, allowing to reach a 98.7% WRR in a connected digit recognition task, when the talker is at 1.5 m distance from the array. 1. INTRODUCTION Hands-free continuous speech recognition ...
Improvements In Tree-Based Language Model Representation
- in Proc. of EUROSPEECH
, 1995
"... This paper describes an efficient way of representing a bigram language model with a finite state network used by a beam-search based and continuous speech HMM recognizer. In a previous paper [1], a compact tree-based organization of the search space was presented, that could be further reduced thro ..."
Abstract
-
Cited by 14 (10 self)
- Add to MetaCart
This paper describes an efficient way of representing a bigram language model with a finite state network used by a beam-search based and continuous speech HMM recognizer. In a previous paper [1], a compact tree-based organization of the search space was presented, that could be further reduced through an optimization algorithm. There, it was pointed out that for a 10,000-word newspaper dictation task the minimization step could have taken a lot of time and space on a standard workstation. In this paper, a new compilation technique that takes into account the particular tree-based topology is described. Results show that without additional time and space costs, the new technique produces networks equivalent to the tree-based ones but almost as small as the optimized one. 1 INTRODUCTION The most widely used Language Models (LMs) in speech recognition are n-gram models, due to both easy inference from the training corpus and easy integrability with the decoding algorithms commonly used...
Investigating Recognition Of Children's Speech
- IN PROC. ICASSP, 2003
, 2003
"... In this work recognition of children's speech was investigated by considering a phone recognition task. Two baseline systems were trained, one for children and one for adults, by exploiting two Italian speech databases. Under matching conditions, training and recognition performed with data from the ..."
Abstract
-
Cited by 13 (0 self)
- Add to MetaCart
In this work recognition of children's speech was investigated by considering a phone recognition task. Two baseline systems were trained, one for children and one for adults, by exploiting two Italian speech databases. Under matching conditions, training and recognition performed with data from the same population group, the phone recognition accuracy was 77.30% and 79.43% for children and adults, respectively. It was
Microphone Array Based Speech Recognition With Different Talker-Array Positions
- Proc. of ICASSP
, 1997
"... The use of a microphone array for hands-free continuous speech recognition in noisy and reverberant environment is investigated. An array of eight omnidirectional microphones was placed at different angles and distances from the talker. A time delay compensation module was used to provide a beamform ..."
Abstract
-
Cited by 12 (4 self)
- Add to MetaCart
The use of a microphone array for hands-free continuous speech recognition in noisy and reverberant environment is investigated. An array of eight omnidirectional microphones was placed at different angles and distances from the talker. A time delay compensation module was used to provide a beamformed signal as input to a Hidden Markov Model (HMM) based recognizer. A phone HMM adaptation, based on a small amount of phonetically rich sentences, further improved the recognition rate obtained by applying only beamforming. These results were confirmed both by experiments conducted in a noisy and reverberant environment and by simulations. In the latter case, different conditions were recreated by using the image method to reproduce synthetic versions of the array microphone signals. 1. INTRODUCTION In the last years, many experimental activities were devoted to investigate the use of microphone arrays for hands-free continuous speech recognition [1, 2, 3, 4, 5, 6]. The system under study...
On Field Experiments of Continuous Digit Recognition over the Telephone Network
- In Proc. of the EUROSPEECH
, 1997
"... In this paper a continuous digit recognizer over the telephone network in real time will be described. The activity has allowed the realization of a system, installed in some Italian telephone exchanges, for providing semi-automatic collect call services. Data collection has also been performed, and ..."
Abstract
-
Cited by 9 (5 self)
- Add to MetaCart
In this paper a continuous digit recognizer over the telephone network in real time will be described. The activity has allowed the realization of a system, installed in some Italian telephone exchanges, for providing semi-automatic collect call services. Data collection has also been performed, and a field database was built. Either a continuous digit recognition task and a confirmation task, requiring rejection, have been defined. Recognition results are presented. INTRODUCTION The activity reported in this paper led to the realization of a system, installed in some Italian telephone exchanges. It provides two semi-automatic collect call services, called "Italy Direct" and "170". These systems require the recognition of digit sequences, as well as of yes/no. In the last case rejection of unforeseen sentences must be used to assure sufficient robustness with respect to user inexperience. To train and test the system some telephone speech databases, later described, have been used. I...
Multilingual Person to Person Communication at IRST
- In ICASSP
, 1997
"... This paper refers to a machine-mediated person-to-person multilingual communication system. Stress is put on robustness, that is the ability of the system to preserve communication even in presence of the variability and errors typical of spoken language systems. The statistical approach is adopted ..."
Abstract
-
Cited by 9 (6 self)
- Add to MetaCart
This paper refers to a machine-mediated person-to-person multilingual communication system. Stress is put on robustness, that is the ability of the system to preserve communication even in presence of the variability and errors typical of spoken language systems. The statistical approach is adopted not only at the acoustic level, but also for the linguistic processing. Therefore, while an overview of the global architecture will be briefly introduced, the focus will be put on the acoustic recognizer and the understanding module. Experimental evaluations complete the presentation. 1. INTRODUCTION This paper refers to a machine-mediated person-to-person multilingual communication system. The scenario involves appointment negotiation between two persons speaking different languages. The focus will be put on two of the system modules, while an overview of the global architecture will be briefly introduced. The expression multilingual communication is preferred to speech-to-speech translat...
RADIOLOGICAL REPORTING BY SPEECH RECOGNITION: THE A.Re.S. SYSTEM
, 1994
"... Radiological reporting has already been identified as a field in which voice technologies can prove to be very useful. Recent progress in automatic speech recognition and in hardware and software technology makes it possible to build large-vocabulary, continuous speech, speaker-independent, real-tim ..."
Abstract
-
Cited by 8 (8 self)
- Add to MetaCart
Radiological reporting has already been identified as a field in which voice technologies can prove to be very useful. Recent progress in automatic speech recognition and in hardware and software technology makes it possible to build large-vocabulary, continuous speech, speaker-independent, real-time systems. In this paper a dictation system for radiology reporting, the A.Re.S. system, is presented. A.Re.S. is a "software only" system which runs in real-time on an HP 715 workstation. It relies on an asynchronous and multi-process architecture in which speech decoding is performed by processes in pipeline. System requirements and architecture will be described, together with the results of a preliminary evaluation based on three months of on-site testing. I. INTRODUCTION Recent progress in Automatic Speech Recognition (ASR) and in hardware and software technology makes it possible to build large-vocabulary, real-time, speaker-independent systems. Medical document generation presents f...
Experiments Of Speech Recognition In A Noisy And Reverberant Environment Using A Microphone Array And Hmm Adaptation
- In Proc. of ICSLP
, 1997
"... The use of a microphone array for hands-free continuous speech recognition in noisy and reverberantenvironmentis investigated. An array of four omnidirectional microphones is placed at 1.5 m distance from thetalker; given the array signals, a Time Delay Compensation #TDC# module provides a beamform ..."
Abstract
-
Cited by 8 (6 self)
- Add to MetaCart
The use of a microphone array for hands-free continuous speech recognition in noisy and reverberantenvironmentis investigated. An array of four omnidirectional microphones is placed at 1.5 m distance from thetalker; given the array signals, a Time Delay Compensation #TDC# module provides a beamformed signal, thatisshown e#ective as inputtoa Hidden MarkovModel #HMM# based recognizer. Given a small amountofsentences collected from a new speaker in a real environment, HMM adaptation further improves recognition rate. These results are con#rmed bothby experiments conducted in a noisy o#ce environmentandbysimulations. In thelatter case, di#erent SNR and reverberation conditions were recreated byusingtheimage method to reproduce synthetic array microphone signals.

