Results 11  20
of
30
Continuous Speech Recognition in the WAXHOLM Dialogue System
, 1996
"... This paper presents the status of the continuous speech recognition engine of the WAXHOLM project. The engine is a software only system written in portable C code. The design is flexible and different modes for phonetic pattern matching are available. In particular, artificial neural networks and ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
This paper presents the status of the continuous speech recognition engine of the WAXHOLM project. The engine is a software only system written in portable C code. The design is flexible and different modes for phonetic pattern matching are available. In particular, artificial neural networks and standard multiple Gaussian mixtures are implemented for phone probability estimation, and for research purposes, a general mode where the input consists of a phonegraph also exists. A lexicon with multiple pronunciations for many words and a class bigramgrammar is used. The lexicon and grammar constraints are represented by a lexical graph, optimised for efficient lexical decoding. The decoding is performed in a twopass search. The first pass is a Viterbi beamsearch and the second is an A* stackdecoding search. Pruningstrategies and memory management in the two passes are discussed in the report. Several different output formats are available. Results can be reported either on the word or phoneme level with or without the time alignment information. Multiple hypotheses can be output either as standard Nbest lists or in a more compact wordgraph format. Continuous speech recognition can be performed on a standard UNIX workstation in realtime with a lexicon of about 1000 words.
EnglishtoKorean Transliteration using Multiple Unbounded Overlapping Phoneme Chunks
 In Proceedings of the 18th International Conference on Computational Linguistics
, 2000
"... ..."
Automatic Continuous Speech Recognition with Rapid Speaker Adaption for Human/Machine Interaction
, 1997
"... This thesis presents work in three main directions of the automatic speech recognition field. The work within two of these  dynamic decoding and hybrid HMM/ANN speech recognition  has resulted in a realtime speech recognition system, currently in use in the human/machine dialogue demonstra ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
This thesis presents work in three main directions of the automatic speech recognition field. The work within two of these  dynamic decoding and hybrid HMM/ANN speech recognition  has resulted in a realtime speech recognition system, currently in use in the human/machine dialogue demonstration system WAXHOLM, developed at the department. The third direction is fast unsupervised speaker adaptation, where "fast" refers to adaptation with a small amount of adaptation speech. The work in
ContextDependent Modeling in a SegmentBased Speech Recognition System
 S.M. thesis, MIT
, 1997
"... in partial ful llment of the requirements for the degree of ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
in partial ful llment of the requirements for the degree of
OnLine Handwriting Recognition with Constrained NBest Decoding
 In Proc. 13th ICPR, volume C
, 1996
"... It is well known that N best decoding for speech recognition coupled with postprocessing can provide significant accuracy advantages. We have implemented and experimented with N best decoding for handwriting recognition, using an N best decoding algorithm that employs a synchronous forward pass ..."
Abstract

Cited by 5 (3 self)
 Add to MetaCart
It is well known that N best decoding for speech recognition coupled with postprocessing can provide significant accuracy advantages. We have implemented and experimented with N best decoding for handwriting recognition, using an N best decoding algorithm that employs a synchronous forward pass and an asynchronous backward pass. One novel aspect of our algorithm is the use of pruning in the backward pass to constrain the search to candidates whose likelihood score is within a threshold specified using the likelihood score of the best candidate. We show that this algorithm is more efficient than traditional N best decoding algorithms. A twostage method is introduced in which the language model changes from a relaxed model during the N best search to a more constrained model for rescoring in a second pass. This method reduces the computation needed for more detailed pattern matching by preselecting the N best most likely candidates. 1. Introduction For most stochastic pattern r...
Combined Optimisation Of Baseforms And Subword Models For An HNN Based Speech Recogniser
 in Proc. The 4th Int. Symposium on Signal Processing and its Applications (ISSPA), (Gold
, 1996
"... In this paper a framework for combined optimisation of baseforms and subword models for a speech recogniser is proposed. Given a set of subword Hidden Markov Models (HMMs) and a set of utterances of a specific word, the modified treetrellis algorithm and the BaumWelch reestimation procedure is use ..."
Abstract

Cited by 4 (4 self)
 Add to MetaCart
In this paper a framework for combined optimisation of baseforms and subword models for a speech recogniser is proposed. Given a set of subword Hidden Markov Models (HMMs) and a set of utterances of a specific word, the modified treetrellis algorithm and the BaumWelch reestimation procedure is used iteratively to achieve a combined optimisation of baseforms and subword models. The DARPA Resource Management (RM) database was used to evaluate the combined optimisation scheme. The proposed method resulted in a monotonic increase in the likelihood score of both test and training data. When compared to the initial lexicon derived from the DARPA RMdistribution and a set of initial HMMs, a 13% reduction in word error rate is achieved at best. 1. INTRODUCTION Modern large vocabulary speech recognisers employ subwords as the basic modelling units. This implies that in order to recognise words (or sentences), a lexicon which defines the composition of the vocabulary words in terms of the b...
Finding the k Shortest Paths in Parallel
, 2000
"... . A concurrentread exclusivewrite PRAM algorithm is developed to find the k shortest paths between pairs of vertices in an edgeweighted directed graph. Repetitions of vertices along the paths are allowed. The algorithm computes an implicit representation of the k shortest paths to a given destina ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
. A concurrentread exclusivewrite PRAM algorithm is developed to find the k shortest paths between pairs of vertices in an edgeweighted directed graph. Repetitions of vertices along the paths are allowed. The algorithm computes an implicit representation of the k shortest paths to a given destination vertex from every vertex of a graph with n vertices and m edges, using O(m +nk log 2 k) work and O(log 3 k log # k + log n(log log k +log # n)) time, assuming that a shortest path tree rooted at the destination is precomputed. The paths themselves can be extracted from the implicit representation in O(log k +log n) time, and O(n log n+L) work, where L is the total length of the output. Key Words. Parallel graph algorithms, Data structures, Shortest paths. 1. Introduction. The problem of finding shortest paths in an edgeweighted graph is an important and wellstudied problem in computer science. The more general problem of computing the k shortest paths between vertices of...
NBest Breadth Search For Large Vocabulary Continuous Speech Recognition Using A Long Span Language Model
, 1998
"... In large vocabulary continuous speech recognition, high level linguistic knowledge can enhance performance. However, integration of high level linguistic knowledge and complex acoustic models under an efficient search scheme is still an open question. In this paper, we propose the nbest breadth sea ..."
Abstract

Cited by 3 (3 self)
 Add to MetaCart
In large vocabulary continuous speech recognition, high level linguistic knowledge can enhance performance. However, integration of high level linguistic knowledge and complex acoustic models under an efficient search scheme is still an open question. In this paper, we propose the nbest breadth search algorithm under the framework of a state space search. The nbest breadth search is a combination of the best first search and the breadth first search, and it efficiently accommodates the long span language models and complex acoustic models. Our pilot experiment shows that the proposed algorithm decreases execution time with little effect on performance. 136th Meeting of Acoustical Society of America 2 Contents 1 INTRODUCTION 3 2 REVIEW OF DECODING ALGORITHMS 4 3 NBEST BREADTH SEARCH 5 4 IMPLEMENTATION ISSUES 7 5 EXPERIMENTAL RESULTS 8 6 CONCLUSIONS 9 7 ACKNOWLEDGMENT 136th Meeting of Acoustical Society of America 3 1 INTRODUCTION In the statistical approach, speech recognition ...
A Comparison of LexiconBuilding Methods for SubwordBased Speech Recognisers
 in Proc. IEEE Region 10 Conf. on Digital Signal Proc. (TENCON
, 1996
"... : A comparison of different algorithms for training of pronunciation dictionaries for use with subwordbased speech recognisers is given. An extension to existing suboptimal solutions is presented, and is shown to give results close to the maximum likelihood solution. The DARPA Resource Management ..."
Abstract

Cited by 3 (3 self)
 Add to MetaCart
: A comparison of different algorithms for training of pronunciation dictionaries for use with subwordbased speech recognisers is given. An extension to existing suboptimal solutions is presented, and is shown to give results close to the maximum likelihood solution. The DARPA Resource Management (RM) database was used for evaluating the lexiconbuilding algorithms. When compared to the initial lexicon derived from the DARPA RMdistribution, improvements of recognition rates have been obtained for all lexicons trained with the different criteria. The maximum likelihood solution resulted in an 11.5% reduction in word error rate, compared to the 10.5% reduction offered by the proposed suboptimal method. 1.INTRODUCTION Modern large vocabulary speech recognisers employ subwords as the basic modelling units. This implies that in order to recognise words (or sentences) , a lexicon which defines the composition of the vocabulary words in terms of the basic units must be available to the r...
Realtime word confidence scoring using local posterior probabilities on tree trellis search
 In Proc. ICASSP,volume 1
, 2004
"... Confidence scoring based on word posterior probability is usually performed as a post process of speech recognition decoding, and also needs a large number of word hypotheses to get enough confidence quality. We propose a simple way of computing the word confidence using estimated posterior probabil ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
Confidence scoring based on word posterior probability is usually performed as a post process of speech recognition decoding, and also needs a large number of word hypotheses to get enough confidence quality. We propose a simple way of computing the word confidence using estimated posterior probability while decoding. At the word expansion of stack decoding search, the local sentence likelihoods that contains heuristic scores of unreached segment are directly used to compute the posterior probabilities. Experimental result showed that, although the likelihoods are not optimal, it can provide slightly better confidence measures compared with Nbest lists, while the computation is faster than 100best method because no Nbest decoding is required. 1.