Results 1 - 10
of
13
Dynamic Programming Search for Continuous Speech Recognition
, 1999
"... Initially introduced in the late 1960s and early 1970s, dynamic programming algorithms have become increasingly popular in automatic speech recognition. There are two reasons why this has occurred: First, the dynamic programming strategy can be combined with avery efficient and practical pruning str ..."
Abstract
-
Cited by 30 (0 self)
- Add to MetaCart
Initially introduced in the late 1960s and early 1970s, dynamic programming algorithms have become increasingly popular in automatic speech recognition. There are two reasons why this has occurred: First, the dynamic programming strategy can be combined with avery efficient and practical pruning strategy so that very large search spaces can be handled. Second, the dynamic programming strategy has turned out to be extremely flexible in adapting to new requirements. Examples of such requirements are the lexical tree organization of the pronunciation lexicon and the generation of a word graph instead of the single best sentence. In this paper, we attempt to systematically review the use of dynamic programming search strategies for small-vocabulary and large-vocabulary continuous speech recognition. The following methods are described in detail: search using a linear lexicon, search using a lexical tree, language-model look-ahead and word graph generation.
Lattice Parsing for Speech Recognition
- In Proceedings of 6me
, 1999
"... A lot of work remains to be done in the domain of a better integration of speech recognition and language processing systems. This paper gives an overview of several strategies for integrating linguistic models into speech understanding systems and investigates several ways of producing sets of hypo ..."
Abstract
-
Cited by 13 (3 self)
- Add to MetaCart
A lot of work remains to be done in the domain of a better integration of speech recognition and language processing systems. This paper gives an overview of several strategies for integrating linguistic models into speech understanding systems and investigates several ways of producing sets of hypotheses that include more "semantic" variability than usual language models. The main goal is to present and demonstrate by actual experiments that sequential coupling may be efficiently achieved by word-lattice syntactic analyzers, efficiently parsing the huge number of hypothesis (i.e. possible sentences) contained in the lattice produced by the speech recognizer. 1. Motivations The past decade has seen significant progress in speech recognition technology: word (recognition) error rates continue to drop by a factor of 2 every two years (Rabiner et al., 1996) and high performance systems are now becoming available. Several factors have contributed to this rapid progress: ffl Generalisati...
Progress in Dynamic Programming Search for LVCSR
- Proceedings of the IEEE
, 1997
"... This paper gives an overview of the recent improvements in dynamic programming search for large vocabulary continuous speech recognition: search using lexical trees, time-conditioned search and word graph construction. ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
This paper gives an overview of the recent improvements in dynamic programming search for large vocabulary continuous speech recognition: search using lexical trees, time-conditioned search and word graph construction.
The "Orthogonal Algorithm" For Optical Flow Detection Using Dynamic Programming
"... This paper introduces a new and original algorithm for optical flow detection. It is based on an iterative search for a displacement field that minimizes the L 1 or L 2 distance between two images. Both images are sliced into parallel and overlapping strips. Corresponding strips are aligned using d ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
This paper introduces a new and original algorithm for optical flow detection. It is based on an iterative search for a displacement field that minimizes the L 1 or L 2 distance between two images. Both images are sliced into parallel and overlapping strips. Corresponding strips are aligned using dynamic programming exactly as 2D representations of speech signal are with the DTW algorithm. Two passes are performed using orthogonal slicing directions. This process is iterated in a pyramidal fashion by reducing the spacing and width of the strips. This algorithm provides a very high quality matching for calibrated patterns as well as for human visual sensation. The results appears to be at least as good as those obtained with classical optical flow detection methods. 1. INTRODUCTION Optical flow detection is a very essential and generic procedure that needs to be implemented in computer vision systems. It is necessary in a wide range of applications such as: image matching for stereo...
Image Matching using Dynamic Programming: Application to Stereovision and Image Interpolation
- Image Communication
, 1996
"... This paper presents an original algorithm called the \Orthogonal Algorithm " for image matching using dynamic programming and experimental results from its application to stereovision and image interpolation. The algorithm provides a dense, continuous and di erentiable eld of bidimensional displacem ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
This paper presents an original algorithm called the \Orthogonal Algorithm " for image matching using dynamic programming and experimental results from its application to stereovision and image interpolation. The algorithm provides a dense, continuous and di erentiable eld of bidimensional displacements like classical optical ow detection algorithms. It is based on an iterative search for a displacement eld that minimizes the L1 or L2 distance between two images. Both images are sliced into parallel and overlapping strips. Corresponding strips are aligned using dynamic programming exactly as 2D representations of speech signal are with the DTW algorithm. Two passes are performed using orthogonal slicing directions. This process is iterated in a pyramidal fashion while reducing the spacing and width of the strips. Very good results have been obtained for stereovision and image interpolation. 1.
A P2P GRID ARCHITECTURE FOR DISTRIBUTED ARABIC OCR BASED ON THE DTW ALGORITHM
"... Arabic cursive optical character recognition (OCR) based on the dynamic time warping (DTW) algorithm provides simultaneously very interesting segmentation and recognition rates. However, the computing complexity of the DTW algorithm restricts its widespread utilization and its consideration at a com ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Arabic cursive optical character recognition (OCR) based on the dynamic time warping (DTW) algorithm provides simultaneously very interesting segmentation and recognition rates. However, the computing complexity of the DTW algorithm restricts its widespread utilization and its consideration at a commercial scale. Accelerating the DTW execution time has attracted many researchers and several solutions have already been proposed. These solutions are commonly based on very specialized processors and hardware architectures and as such they remain very expensive and not amenable to a large scale utilization. In a previous work, we found that loosely coupled architectures can indeed provide viable infrastructures to implement a distributed Arabic OCR. Our objective here is to allow the recognition of huge quantities of Arabic documents such as those of certain national libraries. Undoubtedly, enough processing power and storage capabilities are needed. In this paper, we proposed and used a peer-to-peer (P2P) architecture using the scientific research Tunisian grid (SRTG). Conducted experiments testify that our proposed architecture provides very adequate speedups of the DTWbased
CarpeDiem: Optimizing the Viterbi Algorithm and Applications to Supervised Sequential Learning
"... The growth of information available to learning systems and the increasing complexity of learning tasks determine the need for devising algorithms that scale well with respect to all learning parameters. In the context of supervised sequential learning, the Viterbi algorithm plays a fundamental role ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
The growth of information available to learning systems and the increasing complexity of learning tasks determine the need for devising algorithms that scale well with respect to all learning parameters. In the context of supervised sequential learning, the Viterbi algorithm plays a fundamental role, by allowing the evaluation of the best (most probable) sequence of labels with a time complexity linear in the number of time events, and quadratic in the number of labels. In this paper we propose CarpeDiem, a novel algorithm allowing the evaluation of the best possible sequence of labels with a sub-quadratic time complexity. 1 We provide theoretical grounding together with solid empirical results supporting two chief facts. CarpeDiem always finds the optimal solution requiring, in most cases, only a small fraction of the time taken by the Viterbi algorithm; meantime, CarpeDiem is never asymptotically worse than the Viterbi algorithm, thus confirming it as a sound replacement.
Digital Video Library Network: System and Approach
"... Tremendous growth of the Internet population creates a large demand on new applications, and advances in Internet technologies make it feasible to develop new exciting application base on video and broadband network. One of the most hottest topic nowadays is the Digital Video Library Systems. It has ..."
Abstract
- Add to MetaCart
Tremendous growth of the Internet population creates a large demand on new applications, and advances in Internet technologies make it feasible to develop new exciting application base on video and broadband network. One of the most hottest topic nowadays is the Digital Video Library Systems. It has a promising application scope in entertainment, information, education or business. However, due to the temporal nature of video, indexing and retrieval of video content is not trivial. Therefore, we will introduce the digital video library network in this paper with a number of techniques concerning the indexing and retrieval of video contents. Contents Introduction ................................................................................................................... 3 System Architecture ...................................................................................................... 4 Video Server .........................................................................
Resume
"... Nous presentons un processeur integrespecialise pour e ectuer les calculs de programmation dynamique pour les systemes de reconnaissance vocale. Ce processeur delivre une puissance de 10 MIPS e caces a20MHzetsonarchitecture parallele et pipe-line lui permet d'e ectuer en moyenne plus de 30 millions ..."
Abstract
- Add to MetaCart
Nous presentons un processeur integrespecialise pour e ectuer les calculs de programmation dynamique pour les systemes de reconnaissance vocale. Ce processeur delivre une puissance de 10 MIPS e caces a20MHzetsonarchitecture parallele et pipe-line lui permet d'e ectuer en moyenne plus de 30 millions d'operations par seconde. Un systeme utilisant un seul processeur de comparaison dynamique peut reconna^tre 5000 mots isoles monolocuteur, 500 mots isoles multilocuteur ou 600 mots encha^nes monolocuteur. Plusieurs de ces processeurs peuvent ^etre utilises en parallele dans un m^eme systeme. L'originalite de ce processeur est d'o rir simultanement: une tres grande puissance unitaire, une tres grande souplesse d'emploi et une capacite de fonctionnement e caceenmultiprocesseur. Le circuit a ete realise dans une technologie CMOS 2 m, il integre 127309 transistors dans 60 mm2 et est disponible en bo^tier 84 broches de type PGA ou PLCC. Il comprend une memoire de programme interne de 640 instructions sur 24 bits et fonctionne avec une memoire de donnees externe de 256 Kmots de 16 bits. Le developpement comprend: un chemin de donnees et un generateur d'horloges entierement personnalises, une memoire compilee et une realisation sur base de cellules standard pour la logique de contr^ole et les plots d'entree-sortie. I.
S15b.10 HIDDEN MARKOV MODEL DECOMPOSITION OF SPEECH AND NOISE
"... This paper addresses the problem of automatic speech recognition in the presence of interfering signals and noise with statistical characteristics ranging from stationary to fast changing and impulsive. A technique of signal decomposition using hidden Markov models, [lj, is described. This is a gene ..."
Abstract
- Add to MetaCart
This paper addresses the problem of automatic speech recognition in the presence of interfering signals and noise with statistical characteristics ranging from stationary to fast changing and impulsive. A technique of signal decomposition using hidden Markov models, [lj, is described. This is a generalisation of conventional hidden Markov modelling that provides an optimal method of decomposing simultaneous processes. The technique exploits the ability of hidden Markov models to model dynamically varying signals in order to accomodate concurrent processes, including interfering signals as complex as speech. This form of signal decomposition has wide implications for signal separation in general and improved speech modelling in particular. However. this paper concentrates on the application of decomposition to the problem of recognition of speech contaminated with noise. 1

