Results 1 -
6 of
6
Language Model Representations For Beam-Search Decoding
- In Proceedings of the ICASSP'95
, 1995
"... This paper presents an efficient way of representing a bigram language model for a beam-search based, continuous speech, large vocabulary HMM recognizer. The tree-based topology considered takes advantage of a factorization of the bigram probability derived from the bigram interpolation scheme, and ..."
Abstract
-
Cited by 17 (1 self)
- Add to MetaCart
This paper presents an efficient way of representing a bigram language model for a beam-search based, continuous speech, large vocabulary HMM recognizer. The tree-based topology considered takes advantage of a factorization of the bigram probability derived from the bigram interpolation scheme, and of a tree organization of all the words that can follow a given one. Moreover, an optimization algorithm is used to considerably reduce the space requirements of the language model. Experimental results are provided for two 10,000-word dictation tasks: radiological reporting (perplexity 27) and newspaper dictation (perplexity 120). In the former domain 93% word accuracy is achieved with real-time response and 23 Mb process space. In the newspaper dictation domain, 88.1% word accuracy is achieved with 1:41 real-time response and 38 Mb process space. All recognition tests were performed on an HP-735 workstation. 1. INTRODUCTION Many current ASR systems generate initial hypotheses through a b...
Improvements In Tree-Based Language Model Representation
- in Proc. of EUROSPEECH
, 1995
"... This paper describes an efficient way of representing a bigram language model with a finite state network used by a beam-search based and continuous speech HMM recognizer. In a previous paper [1], a compact tree-based organization of the search space was presented, that could be further reduced thro ..."
Abstract
-
Cited by 14 (10 self)
- Add to MetaCart
This paper describes an efficient way of representing a bigram language model with a finite state network used by a beam-search based and continuous speech HMM recognizer. In a previous paper [1], a compact tree-based organization of the search space was presented, that could be further reduced through an optimization algorithm. There, it was pointed out that for a 10,000-word newspaper dictation task the minimization step could have taken a lot of time and space on a standard workstation. In this paper, a new compilation technique that takes into account the particular tree-based topology is described. Results show that without additional time and space costs, the new technique produces networks equivalent to the tree-based ones but almost as small as the optimized one. 1 INTRODUCTION The most widely used Language Models (LMs) in speech recognition are n-gram models, due to both easy inference from the training corpus and easy integrability with the decoding algorithms commonly used...
Category-Based Statistical Language Models
, 1997
"... this document. The first section, in chapter 3, develops a model for syntactic dependencies based on word-category n-grams. The second section, in chapter 4, extends this model by allowing short-range word relations to be captured through the incorporation of selected word n-grams. ..."
Abstract
-
Cited by 11 (2 self)
- Add to MetaCart
this document. The first section, in chapter 3, develops a model for syntactic dependencies based on word-category n-grams. The second section, in chapter 4, extends this model by allowing short-range word relations to be captured through the incorporation of selected word n-grams.
RADIOLOGICAL REPORTING BY SPEECH RECOGNITION: THE A.Re.S. SYSTEM
, 1994
"... Radiological reporting has already been identified as a field in which voice technologies can prove to be very useful. Recent progress in automatic speech recognition and in hardware and software technology makes it possible to build large-vocabulary, continuous speech, speaker-independent, real-tim ..."
Abstract
-
Cited by 8 (8 self)
- Add to MetaCart
Radiological reporting has already been identified as a field in which voice technologies can prove to be very useful. Recent progress in automatic speech recognition and in hardware and software technology makes it possible to build large-vocabulary, continuous speech, speaker-independent, real-time systems. In this paper a dictation system for radiology reporting, the A.Re.S. system, is presented. A.Re.S. is a "software only" system which runs in real-time on an HP 715 workstation. It relies on an asynchronous and multi-process architecture in which speech decoding is performed by processes in pipeline. System requirements and architecture will be described, together with the results of a preliminary evaluation based on three months of on-site testing. I. INTRODUCTION Recent progress in Automatic Speech Recognition (ASR) and in hardware and software technology makes it possible to build large-vocabulary, real-time, speaker-independent systems. Medical document generation presents f...
A System for the Segmentation and Transcription of Italian Radio News
- IN PROC. OF THE RIAO CONFERENCE
, 2000
"... This paper presents the development of an Italian broadcast news transcription system, to be applied for the indexing of multimedia archives. Moreover, a broadcast news corpus under collection at ITC-irst is introduced. The system processes the input audio stream in four stages. The first one per ..."
Abstract
-
Cited by 6 (5 self)
- Add to MetaCart
This paper presents the development of an Italian broadcast news transcription system, to be applied for the indexing of multimedia archives. Moreover, a broadcast news corpus under collection at ITC-irst is introduced. The system processes the input audio stream in four stages. The first one performs audio segmentation via the Bayesian Information Criterion (BIC) and classification by Gaussians mixtures modeling. The second stage groups spectrally homogeneous speech segments, again using the BIC method, in order to provide speaker clusters suitable for the following adaptation module. The third stage adapts the acoustic models to each selected cluster and, finally, the fourth stage transcribes the audio data employing cluster adapted models. The achieved word error rate, measured on a 1h:15m test set, corresponding to 6 news programs, was 21.5%.
A Baseline For The Transcription Of Italian Broadcast News
- IN PROC. OF ICASSP
"... This paper presents the first achievements in the development of a broadcast news transcription system to be applied for the processing of huge audio archives. In particular, the Italian broadcast news corpus under collection is introduced, and the first implemented baseline system is outlined. The ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
This paper presents the first achievements in the development of a broadcast news transcription system to be applied for the processing of huge audio archives. In particular, the Italian broadcast news corpus under collection is introduced, and the first implemented baseline system is outlined. The baseline system consists of an audio segmentation module and a speech recognizer featuring a recursive Viterbi beam search, a 64K-word lexicon, a treebased trigram LM representation, and MLLR adaptation. The word error rate of the baseline was 20.9% on planned studio speech and 28.8% on the whole test set.

