Results 1 -
8 of
8
Automatic Speaker Clustering
- DARPA Speech Recognition Workshop
, 1997
"... This paper presents a fully automatic speaker clustering algorithm, which consists of three components: building a distance matrix based on Gaussian models of the acoustic segments; performing hierarchical clustering on the distance matrix with the prior assumption that consecutive segments should b ..."
Abstract
-
Cited by 26 (5 self)
- Add to MetaCart
This paper presents a fully automatic speaker clustering algorithm, which consists of three components: building a distance matrix based on Gaussian models of the acoustic segments; performing hierarchical clustering on the distance matrix with the prior assumption that consecutive segments should be more likely to come from the same speaker; and selecting the best clustering solution automatically by minimizing the within-cluster dispersion with some penalty against too many clusters. We applied this automatic speaker clustering technique in 1996 Hub4 evaluation, and the results show that it contributed significantly to the word error rate (WER) reduction in unsupervised adaptation. From our experiments, the algorithm seldom misclassifies segments from the same speaker into different clusters. We used the same clustering procedure for both partitioned evaluation (PE) and unpartitioned evaluation (UE) tests [1]. Experiments also show that this automatic speaker clustering algorithm imp...
Time-First Search For Large Vocabulary Speech Recognition
, 1998
"... This paper describes a new search technique for large vocabulary speech recognition based on a stack decoder. Considerable memory savings are achieved with the combination of a tree based lexicon and a new search technique. The search proceeds time-first, that is partial path hypotheses are extended ..."
Abstract
-
Cited by 21 (4 self)
- Add to MetaCart
This paper describes a new search technique for large vocabulary speech recognition based on a stack decoder. Considerable memory savings are achieved with the combination of a tree based lexicon and a new search technique. The search proceeds time-first, that is partial path hypotheses are extended into the future in the inner loop and a tree walk over the lexicon is performed as an outer loop. Partial word hypotheses are grouped based on language model state. The stack maintains information about groups of hypotheses and whole groups are extended by one word to form new stack entries. An implementation is described of a one-pass decoder employing a 65,000 word lexicon and a disk-based trigram language model. Real time operation is achieved with a small search error, a search space of about 5 Mbyte and a total memory usage of about 35 Mbyte. 1. INTRODUCTION Search is an interesting problem in the field of large vocabulary speech recognition. Typically the acoustic vectors correspondi...
The 1996 Bbn Byblos Hub-4 Transcription System
- In Proc. of DARPA Speech Recognition Workshop
, 1996
"... In this paper, we describe the BBN Byblos system used for the 1996 Hub-4 Partitioned Evaluation (PE) and Unpartitioned Evaluation (UE) tests. For the PE, we chose to ignore the segment feature labels that were given to the system as side-information so that our approach would generalize trivially to ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
In this paper, we describe the BBN Byblos system used for the 1996 Hub-4 Partitioned Evaluation (PE) and Unpartitioned Evaluation (UE) tests. For the PE, we chose to ignore the segment feature labels that were given to the system as side-information so that our approach would generalize trivially to the UE. Moreover, we chose not to model specific channel conditions in the training because the observed gains were too small to warrant the additional system complexity required to support them. In the end, we estimated a single set of acoustic models from only 40 hours of broadcast news data. For the UE, the data was automatically segmented with a simple dual-gender phoneme recognizer that efficiently located pauses and changes in speakers' gender. After this preliminary stage of segmentation and gender-classification, our UE and PE systems were identical. We achieved a 30.2% word error rate on the PE test and 31.8% on the UE test - only a 5% relative degradation from our PE result. 1. I...
Dynamic Programming Search Techniques For Across-Word Modelling In Speech Recognition
- in Speech Recognition, Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing
, 1999
"... We describe the integration of across-word models in the RWTH large vocabulary continuous speech recognition system, where our main focus is on the realization of the acoustic recognition process. This paper presents a study of two search methods based on the priniciple of dynamic programming. For b ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
We describe the integration of across-word models in the RWTH large vocabulary continuous speech recognition system, where our main focus is on the realization of the acoustic recognition process. This paper presents a study of two search methods based on the priniciple of dynamic programming. For both methods we discuss the implementation details and give experimental results on the Verbmobil and on the Wall Street Journal data. In addition, we introduce a score interpolation of within-word and across-word models for both search methods. In combination with across-word models this interpolation technique gives an improvement of the recognition accuracy by 14% relative to our standard system. 1. INTRODUCTION This paper describes the integration of across-word modelling into the RWTH large vocabulary continuous speech recognition system [5]. In particular, we consider two search methods, namely the n-best and one-pass approach, for handling across-word models. Both methods are based o...
The Bbn Byblos 2000 Conversational Mandarin Lvcsr System
, 2000
"... This paper describes the year 2000 BBN Byblos Mandarin large vocabulary conversational speech recognition (LVCSR) system, the winning (and only) Mandarin system from the Spring 2000 Hub-5 evaluation sponsored by NIST. We first outline the training and decoding procedures used in the system, and desc ..."
Abstract
- Add to MetaCart
This paper describes the year 2000 BBN Byblos Mandarin large vocabulary conversational speech recognition (LVCSR) system, the winning (and only) Mandarin system from the Spring 2000 Hub-5 evaluation sponsored by NIST. We first outline the training and decoding procedures used in the system, and describe the performance of the system used in the evaluation. We then describe the effect of several features that were not in the evaluation system but have been added since, including Jacobian compensated Vocal Tract Length Normalization (VTLN), system combination, a higher number of system parameters, and additional training data. Together these give an additional 5.4% relative improvement on character error rate (CER) from the evaluation system. 1. INTRODUCTION This paper describes the BBN Byblos Mandarin system that was entered in the Spring 2000 NIST LVCSR evaluation. The evaluation test consisted of twenty 5-minute conversations of fluent Mandarin taken from the CallHome database. The ...
The 1998 BBN BYBLOS Primary System applied to English and Spanish Broadcast News Transcription
- in Proceedings of the DARPA Broadcast News Workshop
, 1999
"... In this paper, we describe the BBN BYBLOS system used for the 1998 Hub-4E primary and Hub-4Sp evaluation benchmarks, and discuss the improvements made to the system in 1998. We focus on the techniques that were new in this year's system, including processing of the acoustic training data, test segme ..."
Abstract
- Add to MetaCart
In this paper, we describe the BBN BYBLOS system used for the 1998 Hub-4E primary and Hub-4Sp evaluation benchmarks, and discuss the improvements made to the system in 1998. We focus on the techniques that were new in this year's system, including processing of the acoustic training data, test segmentation, revised cepstral normalization and Vocal Tract Length Normalization (VTLN), band-specific models, Diagonal transform Speaker Adaptive Training (DSAT), and a modified ROVER method for system combination. We show that by combining all the above techniques, we were able to improve the recognition accuracy on the 1997 Hub-4E evaluation test by 27% relative to our 1997 system (from 20.4% to 14.8%). We also present our results on the 1998 Hub-4E and Hub4Sp benchmarks, and discuss the differences between the English and Spanish transcription systems. 1. INTRODUCTION The 1997 BBN BYBLOS system [1] was focused on improving the recognition accuracy of the F0 and F1 focus conditions (high fi...
The 1997 Bbn Byblos System Applied To Broadcast News Transcription
- Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop
, 1998
"... In this paper, we describe the BBN Byblos system used for the 1997 DARPA Hub-4 Broadcast News evaluation and discuss numerous improvements made to the system in 1997. We focused our e#ort entirely upon the two conditions containing studio-quality uncorrupted speech from native speakers, the so-calle ..."
Abstract
- Add to MetaCart
In this paper, we describe the BBN Byblos system used for the 1997 DARPA Hub-4 Broadcast News evaluation and discuss numerous improvements made to the system in 1997. We focused our e#ort entirely upon the two conditions containing studio-quality uncorrupted speech from native speakers, the so-called F0 #prepared speech# and F1 #spontaneous speech# conditions. In particular, we did not bother to create a separate acoustic model for narrow-band telephone speech. Our overall 1997 Hub-4 evaluation result was 20.4# WER, but our error rate on the F0#F1 conditions was only 14#. We ran regression tests on development test data that showwe reduced word error rate by 22-30# on the F0#F1 conditions compared to our 1996 system. Sizable gains were achieved on all the other conditions as well, even though no extra e#ort was spenttoward improving them. Brief summaries of three related e#orts are also given covering the use of Byblos for Spanish news transcription, near real-time transcription, and ...
Recent Improvements in the BBN OCR Syste
"... We describe several improvements that we have made in the BBN BYBLOS OCR System. First, we adopted continuous density hidden Markov models (HMMs) rather than discrete density HMMs. This resulted in improved accuracy when more training data is available. It also allowed us to use unsupervised speaker ..."
Abstract
- Add to MetaCart
We describe several improvements that we have made in the BBN BYBLOS OCR System. First, we adopted continuous density hidden Markov models (HMMs) rather than discrete density HMMs. This resulted in improved accuracy when more training data is available. It also allowed us to use unsupervised speaker adaptation algorithms (borrowed from speech recognition) for adaptation to font, style, and quality. Second. we sped up the character recognition by a factor of about 50 so that a full page of 2,000 characters requires about 30 to 40 seconds for processing. Third, we tested the system on Chinese characters. This required development of tools to create a training corpus from available sources. It also required techniques for dealing with an open set of characters, where some of the characters may have no real training data. The end result was 1.2 % character error on newspaper data. 1.

