Results 1 -
8 of
8
Histogram Equalization of the Speech Representation for Robust Speech Recognition
, 2001
"... The noise degrades the performance of Automatic Speech Recognition systems mainly due to the mismatch between the training and recognition conditions it introduces. The noise causes a distortion of the feature space which usually presents a non-linear behavior. In order to reduce this mismatch, the ..."
Abstract
-
Cited by 12 (2 self)
- Add to MetaCart
The noise degrades the performance of Automatic Speech Recognition systems mainly due to the mismatch between the training and recognition conditions it introduces. The noise causes a distortion of the feature space which usually presents a non-linear behavior. In order to reduce this mismatch, the methods proposed for robust speech recognition try to compensate the noise effect either by obtaining an estimation of the clean speech or by adapting the recognizer acoustic models for a proper modeling of the noisy speech. In this paper we propose a method to compensate the noise effect over the speech representation. This method is based on the histogram equalization technique frequently applied for Digital Image Processing, which has been adapted to the speech representation. For each component of the feature vectors representing the speech signal, the histogram is estimated and the transformation which converts it into a reference histogram is calculated. Such transformations tend to compensate the distortion the noise produces over the different components of the feature vector and improve the performance of the recognition systems under noise conditions. We describe how the histogram equalization method can be adapted to robust speech recognition and present some recognition experiments to evaluate the proposed method.
Hybrid Connectionist-Structural Acoustical Modeling In The Atros System
- In Proc. Eurospeech'99
, 1999
"... In this paper, we introduce several hybrid connectionist-structural acoustic models for contextindependent phone-like units in the atros recognition system. The structural part of the acoustic models has been modeled with Markov chains, and a multilayer perceptron (or a committee of multilayer perce ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
In this paper, we introduce several hybrid connectionist-structural acoustic models for contextindependent phone-like units in the atros recognition system. The structural part of the acoustic models has been modeled with Markov chains, and a multilayer perceptron (or a committee of multilayer perceptrons) is used to estimate the emission probabilities of the Markov chains. We compare the recognition performance attained by these models with the performance obtained by classical continuous density hidden Markov models on a semantic restricted task. 1 Introduction Acoustic phonetic-decoding for continuous speech recognition is an open problem in speech research, because the nal performance of an automatic speech recognition system greatly depends on the acoustic modeling quality. Hidden Markov models (HMMs) of phone-like units are the most popular option for modeling speech sounds. Under the statistical framework [1], the problem of speech recognition is to search for a word string ^ ...
A Fast Version Of The ATROS System
, 1999
"... Atros is an automatic speech recognition/understanding /translation system whose knowledge sources (acoustic models, lexical models, syntactic language models, semantic models and translation models) can be learnt automatically from training data by using similar techniques. The search process in At ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Atros is an automatic speech recognition/understanding /translation system whose knowledge sources (acoustic models, lexical models, syntactic language models, semantic models and translation models) can be learnt automatically from training data by using similar techniques. The search process in Atros is performed through a Synchronous Beam Search technique. In this paper, a faster version of Atros is presented and evaluated. This version supports improved acoustic and syntactical models. It also incorporates improved search algorithms to reduce and the computational requirements for decoding: Fast Phoneme Look-Ahead and Histogram Pruning. The system has been tested on a Spanish task of queries to a geographical database (with a vocabulary of 1,264 words). The best result achieved (in real time) was 7.10% of word error rate. 1 System overview Optimal speech decoding based on a search process in an integrated network of different knowledge sources is a hard computational problem [1]. ...
Acoustic And Syntactical Modeling in the ATROS System
, 1999
"... Current speech technology allows us to build efficient speech recognition systems. However, model learning of knowledge sources in a speech recognition system is not a closed problem. In addition, lower demand of computational requirements are crucial to building real-time systems. ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Current speech technology allows us to build efficient speech recognition systems. However, model learning of knowledge sources in a speech recognition system is not a closed problem. In addition, lower demand of computational requirements are crucial to building real-time systems.
BILINGUAL SPEECH RECOGNITION IN TWO PHONETICALLY SIMILAR LANGUAGES
"... As Speech Recognition Systems improve, they become suitable for facing new problems. Multilingual speech recognition is one of such problems. In the present work, the case of the Comunitat Valenciana multilingual environment is studied. The official languages in the Comunitat Valenciana (Spanish and ..."
Abstract
- Add to MetaCart
As Speech Recognition Systems improve, they become suitable for facing new problems. Multilingual speech recognition is one of such problems. In the present work, the case of the Comunitat Valenciana multilingual environment is studied. The official languages in the Comunitat Valenciana (Spanish and Valencian) share most of their acoustic units, and their vocabularies and syntax are quite similar. They have influenced each other for many years. A small corpus on an Information System task was developed for experimentation purposes. This choice will make it possible to develop a working prototype in the future, and it is simple enough to build semiautomatic language models. The design of the acoustic corpus is discussed, showing that all combinations of accents have been studied (native, non-native speakers, male, female, etc.). In addition, some experiments have been conducted with this corpus that show promising results for a Spanish-Valencian multilingual speech recognizer. 1.
Fast Phoneme Look-Ahead in the ATROS system
- Accepted in VIII Spanish Symposium on Pattern Recognition and Image Analysis
, 1999
"... Current speech recognition systems require a lot of computational resources to decode an input utterance. Many efforts have been done in order to reduce these requirements. One of the techniques that is being explored is the fast phoneme look-ahead. The idea is to compute quickly approximate scor ..."
Abstract
- Add to MetaCart
Current speech recognition systems require a lot of computational resources to decode an input utterance. Many efforts have been done in order to reduce these requirements. One of the techniques that is being explored is the fast phoneme look-ahead. The idea is to compute quickly approximate scores in order to prune little promising hypothesis. These scores are computed by using simple phone-like units and analysing an acoustic segment look-ahead.
Bilingual speech corpus in two phonetically similar languages
"... As Speech Recognition Systems improve, they become suitable for facing new problems. Multilingual speech recognition is one such problems. In the present work, the case of the Comunitat Valenciana multilingual environment is studied. The official languages in the Comunitat Valenciana (Spanish and Va ..."
Abstract
- Add to MetaCart
As Speech Recognition Systems improve, they become suitable for facing new problems. Multilingual speech recognition is one such problems. In the present work, the case of the Comunitat Valenciana multilingual environment is studied. The official languages in the Comunitat Valenciana (Spanish and Valencian) share most of their acoustic units, and their vocabularies and syntax are quite similar. They have influenced each other for many years. A small corpus on an Information System task was developed for experimentation purposes. This choice will make it possible to develop a working prototype in the future, and it is simple enough to build semi-automatic language models. The design of the acoustic corpus is discussed, showing that all combinations of accents have been studied (native, non-native speakers, male, female, etc.). 1.
ITERATIVE SPEAKER ADAPTATION USING MLLR
"... Speech recognition systems are usually speaker-independent, but they are not as good as speaker-dependent systems for specific speakers. An initial speaker-independent system can be adapted to improve recognition accuracy by transforming it into a speaker-dependent system. In this work, a new genera ..."
Abstract
- Add to MetaCart
Speech recognition systems are usually speaker-independent, but they are not as good as speaker-dependent systems for specific speakers. An initial speaker-independent system can be adapted to improve recognition accuracy by transforming it into a speaker-dependent system. In this work, a new general acoustic model adaptation technology is presented, using the MLLR algorithm iteratively in a supervised manner. Experiments have been performed on the TT2 Spanish speech corpus. The initial acoustic models were trained from the Albayzin speech database. Their results, which were obtained for 10 speakers, show an improvement in speech recognition accuracy. 1.

