Results 11 -
19 of
19
Improved Modeling and Efficiency for Automatic Transcription of Broadcast News
, 2000
"... Over the last few years, the DARPA-sponsored Hub4 continuous speech recognition evaluations have pushed speech recognition technology for the very interesting and difficult task of automatically transcribing broadcast news. In this paper, we report on our research and progress on this problem. We fo ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Over the last few years, the DARPA-sponsored Hub4 continuous speech recognition evaluations have pushed speech recognition technology for the very interesting and difficult task of automatically transcribing broadcast news. In this paper, we report on our research and progress on this problem. We focus on individual techniques we developed, rather than on descriptions of our evaluation systems. We provide comparative experimental results showing the improvements obtained with the novel approaches we developed. 1 Introduction In recent years there has been increasing interest in developing large-vocabulary continuous speech recognition (LVCSR) systems for speech found in real sources. Broadcast news, in particular, has been the testbed for the DARPA-sponsored Hub4 continuous speech recognition (CSR) evaluations over the last few years, and represents a significant challenge to speech recognition researchers. Many interesting problems are associated with the automatic recognition of b...
Parallel Structure in an Integrated Speech-Recognition Network
- In EuroPar'99
, 1999
"... . Large-vocabulary continuous-speech recognition (LVCR) speakerindependent systems which integrate cross-word context dependent acoustic models and n-gram language models are difficult to parallelize because of their interwoven structure, large dynamic data structures, and complex object-oriente ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
. Large-vocabulary continuous-speech recognition (LVCR) speakerindependent systems which integrate cross-word context dependent acoustic models and n-gram language models are difficult to parallelize because of their interwoven structure, large dynamic data structures, and complex object-oriented software design. This paper shows how retrospective decomposition can be achieved if a quantitative analysis is made of dynamic system behaviour. A design which accommodates unforeseen effects and future modifications is presented. 1 Introduction Two varieties of LVCR system exist: a pipelined structure in which components of acoustic matching and language modelling are separated; and an approach which integrates cross-word context dependent acoustic models and n-gram language models into the search. The former has been thought to be more computationally tractable [1], while the latter has delivered a low mean error rate, 8.2% per word in ARPA evaluation, for a 65k vocabulary, tri-gra...
Asynchronous Integration Of Visual Information In An Automatic Speech Recognition System
- Proc. International Conference on Spoken Language Processing
"... This paper deals with the integration of visual data in automatic speech recognition systems. We first describe the framework of our research; the development of advanced multi-user multi-modal interfaces. Then we present audiovisual speech recognition problems in general, and the ones we are intere ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
This paper deals with the integration of visual data in automatic speech recognition systems. We first describe the framework of our research; the development of advanced multi-user multi-modal interfaces. Then we present audiovisual speech recognition problems in general, and the ones we are interested in, in particular. After a very brief discussion of existing systems, the major part of the paper describes the systems we developed according to two different approaches to the problem of integration of visual data in speech recognition systems.
Large-vocabulary continuous speech recognition algorithm applied to a multi-modal telephone directory assistance system," Speech Communication
, 1995
"... This paper describes an accurate and efficient algorithm for very-large-vocabulary continuous speech recognition based on an HMM-LR algorithm. The HMM-LR algorithm uses a generalized LR parser as a language model and hidden Markov models (HMMs) as phoneme models. To reduce the search space without p ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
This paper describes an accurate and efficient algorithm for very-large-vocabulary continuous speech recognition based on an HMM-LR algorithm. The HMM-LR algorithm uses a generalized LR parser as a language model and hidden Markov models (HMMs) as phoneme models. To reduce the search space without pruning the correct candidate, we use forward and backward trellis likelihoods, an adjusting win-dow for choosing only the probable part of the trellis for each predicted phoneme, and an algorithm for merging candi-dates that have the same allophonic phoneme sequences and the same context-free grammar states. Candidates are also merged at the meaning level. This algorithm is applied to a telephone directory assistance system that recognizes spon-taneous speech containing the names and addresses of more than 70,000 subscribers (vocabulary size is about 80,000). The experimental results show that the system performs well in spite of the large perplexity. This algorithm was also ap-plied to a multi-modal telephone directory assistance system, and the system was evaluated from the human-interface point of view. To cope with the problem of background noise, an HMM composition technique which combines a noise-source HMM and a clean phoneme HMM into a noise-added phoneme HMM was investigated and incorporated into the system. 1.
Joint work with
, 1996
"... Text and speech processing: hard problems Theory of automata Appropriate level of abstraction ..."
Abstract
- Add to MetaCart
Text and speech processing: hard problems Theory of automata Appropriate level of abstraction
Harmonic Sinusoid Modeling of Tonal Music Events
"... October, 2007“I certify that this thesis and the research to which it refers are the product of my own work, and that any ideas or quotations from the work of other people, published or otherwise, are fully acknowledged in accordance with the standard referencing practices of the discipline. This th ..."
Abstract
- Add to MetaCart
October, 2007“I certify that this thesis and the research to which it refers are the product of my own work, and that any ideas or quotations from the work of other people, published or otherwise, are fully acknowledged in accordance with the standard referencing practices of the discipline. This thesis presents the theory, implementation and applications of the harmonic sinusoid modeling of pitched audio events. Harmonic sinusoid modeling is a parametric model that expresses an audio signal, or part of an audio signal, as the linear combination of concurrent slow-varying sinusoids, grouped together under harmonic frequency constraints. The harmonic sinusoid modeling is an extension of the sinusoid modeling, with the additional frequency constraints so that it is capable to directly model tonal sounds. This enables applications such as object-oriented audio manipulations, polyphonic transcription, instrument/singer recognition with background music, etc. The modeling system consists of an analyzer and a synthesizer. The analyzer
2009 10th International Conference on Document Analysis and Recognition Stochastic Segment Modeling for Offline Handwriting Recognition
"... In this paper, we present a novel approach for incorporating structural information into the hidden Markov Modeling (HMM) framework for offline handwriting recognition. Traditionally, structural features have been used in recognition approaches that rely on accurate segmentation of words into smalle ..."
Abstract
- Add to MetaCart
In this paper, we present a novel approach for incorporating structural information into the hidden Markov Modeling (HMM) framework for offline handwriting recognition. Traditionally, structural features have been used in recognition approaches that rely on accurate segmentation of words into smaller units (sub-words or characters). However, such segmentation based approaches do not perform well on real-world handwritten images, because breaks and merges in glyphs typically create new connected components that are not observed in the training data. To mitigate the problem of having to derive accurate segmentation from connected components, we present a novel framework where the HMM based recognition system trained on shorter-span features is used to generate the 2-D character images (the “Stochastic Segments”), and then another classifier that uses structural features extracted from the stochastic character segments generates a new set of scores. Finally, the scores from the HMM system and from structural matching are used in combination to generate a hypothesis that is better than the results from either the HMM or from structural matching alone. We demonstrate the efficacy of our approach by reporting experimental results on a large corpus of handwritten Arabic documents. 1.

