Results 1 -
4 of
4
On-Line Cursive Script Recognition using Time Delay Neural Networks and Hidden Markov Models
"... We present a writer independent system for on-line handwriting recognition which can handle a variety of writing styles including cursive script and hand-print. The input to our system contains the pen trajectory information, encoded as a time-ordered sequence of feature vectors. A Time Delay Neural ..."
Abstract
-
Cited by 48 (2 self)
- Add to MetaCart
We present a writer independent system for on-line handwriting recognition which can handle a variety of writing styles including cursive script and hand-print. The input to our system contains the pen trajectory information, encoded as a time-ordered sequence of feature vectors. A Time Delay Neural Network is used to estimate a posteriori probabilities for characters in a word. A Hidden Markov Model segments the word in a way which optimizes the global word score, taking a dictionary into account. A geometrical normalization scheme and a fast but efficient dictionary search are also presented. Trained on 20k words from 59 writers, using a 25k word dictionary we reached a 89% character and 80% word recognition rate on test data from a disjoint set of writers. Keywords: Handwriting Recognition, Neural Networks, Cursive Script, Hidden Markov Models, Dictionary Search. 1 Introduction Pen interfaces should replace advantageously both mouse and keyboard in a variety of situations. Users w...
Decoder Technology For Connectionist Large Vocabulary Speech Recognition
, 1995
"... The search problem in large vocabulary continuous speech recognition (LVCSR) is to locate the most probable string of words for a spoken utterance given the acoustic signal and a set of sentence models. Searching the space of possible utterances is difficult because of the large vocabulary size and ..."
Abstract
-
Cited by 23 (3 self)
- Add to MetaCart
The search problem in large vocabulary continuous speech recognition (LVCSR) is to locate the most probable string of words for a spoken utterance given the acoustic signal and a set of sentence models. Searching the space of possible utterances is difficult because of the large vocabulary size and the complexity imposed when long-span language models are used. This report describes an efficient search procedure and its software embodiment in a decoder, NOWAY, which has been incorporated in ABBOT, a hybrid connectionist/ hidden Markov model (HMM) LVCSR system [15]. The search algorithm is based on stack decoding and uses both likelihood- and posterior-based pruning. The use of the posterior-based phone deactivation pruning techniques is well-suited to hybrid connectionist/HMM systems because posterior phone probabilities are directly computed by the connectionist acoustic model. The single-pass decoder has been evaluate on the large vocabulary North American Business News task using a...
Start-synchronous search for large vocabulary continuous speech recognition
- IEEE Trans. Speech and Audio Processing
"... Abstract — In this paper, we present a novel, efficient search strategy for large vocabulary continuous speech recognition. The search algorithm, based on a stack decoder framework, utilizes phone-level posterior probability estimates (produced by a connectionist/hidden Markov model acoustic model) ..."
Abstract
-
Cited by 17 (9 self)
- Add to MetaCart
Abstract — In this paper, we present a novel, efficient search strategy for large vocabulary continuous speech recognition. The search algorithm, based on a stack decoder framework, utilizes phone-level posterior probability estimates (produced by a connectionist/hidden Markov model acoustic model) as a basis for phone deactivation pruning—a highly efficient method of reducing the required computation. The single-pass algorithm is naturally factored into the time-asynchronous processing of the word sequence and the time-synchronous processing of the hidden Markov model state sequence. This enables the search to be decoupled from the language model while still maintaining the computational benefits of time-synchronous processing. The incorporation of the language model in the search is discussed and computationally cheap approximations to the full language model are introduced. Experiments were performed on the North American Business News task using a 60 000 word vocabulary and a trigram language model. Results indicate that the computational cost of the search may be reduced by more than a factor of 40 with a relative search error of less than 2 % using the techniques discussed in the paper. Index Terms — Hidden Markov model, large vocabulary continuous speech recognition, phone deactivation pruning, search, stack decoding. I.
In memory of my brother,
, 1955
"... This thesis addresses the application of automatic speech recognition to the task of offline closed-captioning of television programs, and describes the collection of corpora to support such research and an exploration of issues to be addressed. The use of automatic speech recognition (ASR) for tran ..."
Abstract
- Add to MetaCart
This thesis addresses the application of automatic speech recognition to the task of offline closed-captioning of television programs, and describes the collection of corpora to support such research and an exploration of issues to be addressed. The use of automatic speech recognition (ASR) for transcription of broadcast speech and as an aid to captioning is reviewed. As background to the task, the methodology for large vocabulary continuous speech recognition (LVCSR) is presented, with particular attention given to the issues of large vocabulary language modelling and consideration of the acoustic complexity arising in broadcast material. A speech corpus of segmented and transcribed speech utterances for ten program episodes was developed for a typical genre of television programming (travelogues) for which offline closed-captions are applied. The development of this corpus demonstrates the feasibility of using existing closed-caption sources for generating labelled acoustic data suitable for speech recognition research. The speech corpus exhibits far greater acoustic complexity and much lower signal to noise ratios than occurs in broadcast news data (which has been systematically evaluated in ASR research). Noise-tolerant speech recognisers were developed and effectively

