Results 1 - 10
of
12
Classification using a Hierarchical Bayesian Approach
- In Proceedings of the International Conference on Pattern Recog nition (ICPR’02
, 2002
"... A key problem faced by classifiers is coping with styles not represented in the training set. We present an application of hierarchical Bayesian methods to the problem of recognizing degraded printed characters in a variety of fonts. The proposed method works by using training data of various styles ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
A key problem faced by classifiers is coping with styles not represented in the training set. We present an application of hierarchical Bayesian methods to the problem of recognizing degraded printed characters in a variety of fonts. The proposed method works by using training data of various styles and classes to compute prior distributions on the parameters for the class conditional distributions. For classification, the parameters for the actual class conditional distributions are fitted using an EM algorithm. The advantage of hierarchical Bayesian methods is motivated with a theoretical example. Severalfold increases in classification performance relative to style-oblivious and style-conscious are demonstrated on a multifont OCR task.
Robust Text-Independent Speaker Identification over Telephone Channels
- IEEE Trans. on Speech and Audio Processing
, 1997
"... This paper addresses the issue of closed-set text-independent speaker identification from samples of speech recorded over the telephone. It focuses on the effects of acoustic mismatches between training and testing data, and concentrates on two approaches: extracting features that are robust against ..."
Abstract
-
Cited by 10 (1 self)
- Add to MetaCart
This paper addresses the issue of closed-set text-independent speaker identification from samples of speech recorded over the telephone. It focuses on the effects of acoustic mismatches between training and testing data, and concentrates on two approaches: extracting features that are robust against channel variations, and transforming the speaker models to compensate for channel effects. First, an experimental study shows that optimizing the front end processing of the speech signal can significantly improve speaker recognition performance. A new filterbank design is introduced to improve the robustness of the speech spectrum computation in the front-end unit. Next, a new feature based on spectral slopes is described. Its ability to discriminate between speakers is shown to be superior to that of the traditional cepstrum. This feature can be used alone or combined with the cepstrum. The second part of the paper presents two model transformation methods that further reduce channel effe...
Speech Recognition System Design Based on Automatically Derived Units
, 1999
"... In most speech recognition systems today, acoustic modeling and lexical modeling are viewed as separable problems. Currently the most popular approach is to manually define canonical word pronunciations in terms of phonetic units and let the acoustic models capture differences between actual spoken ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
In most speech recognition systems today, acoustic modeling and lexical modeling are viewed as separable problems. Currently the most popular approach is to manually define canonical word pronunciations in terms of phonetic units and let the acoustic models capture differences between actual spoken and canonical pronunciations implicitly with Gaussian mixture models. As a result, these models can be very broad, particularly for casual spontaneous speech. An alternative approach, explored in this thesis, is to learn a unit inventory and pronunciation dictionary from training data using a maximum likelihood objective function. In particular,
Model Transformation For Robust Speaker Recognition From Telephone Data
- in ICASSP-97
, 1997
"... In the context of automatic speaker recognition, we propose a model transformation technique that renders speaker models more robust to acoustic mismatches and to data scarcity by appropriately increasing their variances. We use a stereo database containing speech recorded simultaneously under diffe ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
In the context of automatic speaker recognition, we propose a model transformation technique that renders speaker models more robust to acoustic mismatches and to data scarcity by appropriately increasing their variances. We use a stereo database containing speech recorded simultaneously under different acoustic conditions to derive a synthetic variance distribution. This distribution is then used to modify the variances of other speaker models from other telephone databases. The technique is illustrated with experiments conducted on a locally collected database and on the NIST'95 and '96 subsets of the Switchboard Corpus. 1. INTRODUCTION Many applications of speaker identification systems (speaker-ID for short) assume that the users access the system remotely. Typically, the channel involved in the communication is that of the telephone. Because the handset and the line can vary from call to call, there is often an acoustic mismatch between the data collected to train the speaker mo...
An Experimental Study Of Acoustic Adaptation Algorithms
- IEEE Int'l Conference on ASSP
, 1996
"... Recently there has been much interest in the area of adaptation for improved speech recognition in the presence of mismatches between the training and testing conditions. In this paper we focus on transformation-based maximum-likelihood (ML) adaptation. Some of the important adaptation parameters in ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
Recently there has been much interest in the area of adaptation for improved speech recognition in the presence of mismatches between the training and testing conditions. In this paper we focus on transformation-based maximum-likelihood (ML) adaptation. Some of the important adaptation parameters include whether the adaptation is sbibperformed in the feature-space or model-space, and whether the adaptation is supervised or unsupervised. An additional parameter is the adaptation data. For example adaptation may be performed using an independent dataset or the test data itself. The latter is referred to as transcription-mode adaptation. In this paper, we experimentally study the effect of these various parameters, and report on our findings. 1. INTRODUCTION Recently, there has been much interest in the area of transformation-based ML adaptation to reduce the recognition degradation caused by acoustic mismatches between the training and testing conditions [1, 2, 3]. It is assumed that...
Improved Modeling and Efficiency for Automatic Transcription of Broadcast News
, 2000
"... Over the last few years, the DARPA-sponsored Hub4 continuous speech recognition evaluations have pushed speech recognition technology for the very interesting and difficult task of automatically transcribing broadcast news. In this paper, we report on our research and progress on this problem. We fo ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Over the last few years, the DARPA-sponsored Hub4 continuous speech recognition evaluations have pushed speech recognition technology for the very interesting and difficult task of automatically transcribing broadcast news. In this paper, we report on our research and progress on this problem. We focus on individual techniques we developed, rather than on descriptions of our evaluation systems. We provide comparative experimental results showing the improvements obtained with the novel approaches we developed. 1 Introduction In recent years there has been increasing interest in developing large-vocabulary continuous speech recognition (LVCSR) systems for speech found in real sources. Broadcast news, in particular, has been the testbed for the DARPA-sponsored Hub4 continuous speech recognition (CSR) evaluations over the last few years, and represents a significant challenge to speech recognition researchers. Many interesting problems are associated with the automatic recognition of b...
Acoustic Modeling for the SRI Hub4 Partitioned Evaluation Continuous Speech Recognition System
- In Proceedings of the DARPA Speech Recognition Workshop
, 1997
"... We describe the development of the SRI systemevaluated in the 1996 DARPA continuous speechrecognition (CSR) Hub4 partitioned evaluation (PE). The task for the Hub4evaluation was to recognize speech from broadcast television and radio shows. Recognizingsuch speech by machines poses many challenges. F ..."
Abstract
-
Cited by 5 (4 self)
- Add to MetaCart
We describe the development of the SRI systemevaluated in the 1996 DARPA continuous speechrecognition (CSR) Hub4 partitioned evaluation (PE). The task for the Hub4evaluation was to recognize speech from broadcast television and radio shows. Recognizingsuch speech by machines poses many challenges. First, the segments to be recognized could be very long. This introduces a problem in training and recognition becauseof the consequentincreasedsystem memory requirement. A simple segmentation technique is used to break long segments into shorter, more manageable lengths. The speech from broadcast news sources exhibits a variety of difficult acoustic conditions, such as spontaneous speech, band-limited speech, and speech in the presence of noise, music, or background speakers. Such background conditions lead to significant degradation in performance. We describe techniques, based on acoustic adaptation, that adapt recognition models to the different acoustic background conditions, so as to im...
Acoustic Clustering and Adaptation for Robust Speech Recognition
- In Proceedings of EUROSPEECH
, 1997
"... We describe an algorithm based on acoustic clustering and acoustic adaptation to significantly improve speech recognition performance. The method is particularly useful when speech from multiple speakers is to be recognized and the boundary between speakers is not known. We assume that each test dat ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
We describe an algorithm based on acoustic clustering and acoustic adaptation to significantly improve speech recognition performance. The method is particularly useful when speech from multiple speakers is to be recognized and the boundary between speakers is not known. We assume that each test data segment is relatively homogeneous with respect to the acoustic background and speaker. These segments are then grouped using an agglomerative acoustic clustering algorithm. The idea is to group together all test segments that are acoustically similar. The speech recognition models are then adapted separately to each test data cluster. Finally these adapted models are used to recognize the data from that cluster. This algorithm was used in SRI's system for the 1996 DARPA Hub4 partitioned evaluation. Experimental results are presented on the 1996 H4 development data set. It was found that an improvement of 9.5% was achieved by using this algorithm. 1. INTRODUCTION Recently there has been mu...
Noise-Resistant Feature Extraction And Model Training For Robust Speech Recognition
- in Proceedings of the 1996 DARPA CSR Workshop
, 1996
"... In this paper we report on our recent work on noise-robust feature extraction and model training to alleviate the mismatch caused by different microphones and ambient room noise in the context of the 1995 DARPA-sponsored H3 benchmark test, which used the unlimited-vocabulary North American Business ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
In this paper we report on our recent work on noise-robust feature extraction and model training to alleviate the mismatch caused by different microphones and ambient room noise in the context of the 1995 DARPA-sponsored H3 benchmark test, which used the unlimited-vocabulary North American Business News (NABN) database. We present a novel noise-robust feature extraction algorithm that is a combination of our previously developed minimum mean square error (MMSE) log-energy estimation algorithm and the probabilistic optimum filtering (POF) algorithm. We also studied an approach based on training the automatic speech recognition (ASR) system with previously collected noisy speech. While both the above approaches gave significant improvements, it was found that combining them gave the best results. We also report on a new part-of-speech (POS) language model that makes it possible to train robust POS language models that incorporate longer contexts than is possible with word-based language ...
SRI's 1998 Broadcast News System -- Toward Faster, Better, Smaller Speech Recognition
- In Proceedings of the DARPA Broadcast News Workshop
, 1999
"... We describe several new research directions we investigated toward the development of our broadcast news transcription system for the 1998 DARPA H4 evaluations. Our goal was to develop significantly faster and smaller speech recognition systems without degrading the word error rate of our 1997 syste ..."
Abstract
- Add to MetaCart
We describe several new research directions we investigated toward the development of our broadcast news transcription system for the 1998 DARPA H4 evaluations. Our goal was to develop significantly faster and smaller speech recognition systems without degrading the word error rate of our 1997 system. We did this through significant algorithmic research creating various new techniques. A sample of these techniques was used to put together our 1998 broadcast news system, which is conceptually much simpler, faster, and smaller, but gives the same word error rate as our 1997 system. In particular, our 1998 system is based on a simple phonetically tied mixture (PTM) model with a total of only 13,000 Gaussians, as compared to a 67,000-Gaussian state-clustered system we used in 1997. 1. Introduction One of our main goals in 1998 was to significantly increase speed and decrease model size, while maintaining or improving accuracy. These goals are difficult to achieve simultaneously because o...

