Results 1 - 10
of
13
A Maximum-Likelihood Approach to Stochastic Matching for Robust Speech Recognition
- IEEE Transactions on Speech and Audio Processing
, 1996
"... is granted. A Maximum-Likelihood Approach to Stochastic Matching for Robust Speech Recognition Ananth Sankar 2 and Chin-Hui Lee Speech Research Department AT&T Bell Laboratories Murray Hill, NJ 07974 1 Introduction Recently there has been much interest in the problem of improving the performanc ..."
Abstract
-
Cited by 86 (14 self)
- Add to MetaCart
is granted. A Maximum-Likelihood Approach to Stochastic Matching for Robust Speech Recognition Ananth Sankar 2 and Chin-Hui Lee Speech Research Department AT&T Bell Laboratories Murray Hill, NJ 07974 1 Introduction Recently there has been much interest in the problem of improving the performance of automatic speech recognition (ASR) systems in adverse environments. When there is a mismatch between the training and testing environments, ASR systems suffer a degradation in performance. The goal of robust speech recognition is to remove the effect of this mismatch so as to bring the recognition performance as close as possible to the matched conditions. In speech recognition, the speech is usually modeled by a set of hidden Markov models (HMM) X . During recognition the observed utterance Y is decoded using these models. Due to the mismatch between training and testing conditions, this often results in a degradation in performance compared to the matched conditions. The mismatch b...
A lexicon driven approach to handwritten word recognition for real-time applications
- IEEE Transactions on PAMI
, 1997
"... Abstract—A fast method of handwritten word recognition suitable for real time applications is presented in this paper. Preprocessing, segmentation and feature extraction are implemented using a chain code representation of the word contour. Dynamic matching between characters of a lexicon entry and ..."
Abstract
-
Cited by 82 (28 self)
- Add to MetaCart
Abstract—A fast method of handwritten word recognition suitable for real time applications is presented in this paper. Preprocessing, segmentation and feature extraction are implemented using a chain code representation of the word contour. Dynamic matching between characters of a lexicon entry and segment(s) of the input word image is used to rank the lexicon entries in order of best match. Variable duration for each character is defined and used during the matching. Experimental results prove that our approach using the variable duration outperforms the method using fixed duration in terms of both accuracy and speed. Speed of the entire recognition process is about 200 msec on a single SPARC-10 platform and the recognition accuracy is 96.8 percent are achieved for lexicon size of 10, on a database of postal words captured at 212 dpi. Index Terms—Handwritten word recognition, segmentation algorithm, variable duration, chain code representation, dynamic
Towards unsupervised pattern discovery in speech
- Peter Hagedorn, Wolfgang Konrad and J. Wallaschek, The Journal of Sound and Vibration
, 2005
"... Abstract—We present a novel approach to speech processing based on the principle of pattern discovery. Our work represents a departure from traditional models of speech recognition, where the end goal is to classify speech into categories defined by a prespecified inventory of lexical units (i.e., p ..."
Abstract
-
Cited by 27 (6 self)
- Add to MetaCart
Abstract—We present a novel approach to speech processing based on the principle of pattern discovery. Our work represents a departure from traditional models of speech recognition, where the end goal is to classify speech into categories defined by a prespecified inventory of lexical units (i.e., phones or words). Instead, we attempt to discover such an inventory in an unsupervised manner by exploiting the structure of repeating patterns within the speech signal. We show how pattern discovery can be used to automatically acquire lexical entities directly from an untranscribed audio stream. Our approach to unsupervised word acquisition utilizes a segmental variant of a widely used dynamic programming technique, which allows us to find matching acoustic patterns between spoken utterances. By aggregating information about these matching patterns across audio streams, we demonstrate how to group similar acoustic sequences together to form clusters corresponding to lexical entities such as words and short multiword phrases. On a corpus of academic lecture material, we demonstrate that clusters found using this technique exhibit high purity and that many of the corresponding lexical identities are relevant to the underlying audio stream. Index Terms—Speech processing, unsupervised pattern discovery, word acquisition. I.
Online Recognition of Chinese Characters: The State-of-the-Art
- IEEE TRANS. PATTERN ANAL. MACH. INTELL
, 2004
"... Online handwriting recognition is gaining renewed interest owing to the increase of pen computing applications and new pen input devices. The recognition of Chinese characters is different from western handwriting recognition and poses a special challenge. To provide an overview of the technical s ..."
Abstract
-
Cited by 17 (1 self)
- Add to MetaCart
Online handwriting recognition is gaining renewed interest owing to the increase of pen computing applications and new pen input devices. The recognition of Chinese characters is different from western handwriting recognition and poses a special challenge. To provide an overview of the technical status and inspire future research, this paper reviews the advances in online Chinese character recognition (OLCCR), with emphasis on the research works from the 1990s. Compared to the research in the 1980s, the research efforts in the 1990s aimed to further relax the constraints of handwriting, namely, the adherence to standard stroke orders and stroke numbers and the restriction of recognition to isolated characters only. The target of recognition has shifted from regular script to fluent script in order to better meet the requirements of practical applications. The research works are reviewed in terms of pattern representation, character classification, learning/adaptation, and contextual processing. We compare important results and discuss possible directions of future research.
A search engine for handwritten documents
- Proceedings of SPIE-IS&T Electronic Imaging, 2005
, 2005
"... The design and functionality of a versatile search engine on handwritten documents is described. Documents are indexed using global image features, e.g., stroke width, slant, word gaps, as well local features that describe shapes of characters and words. Image indexing is done automatically using pa ..."
Abstract
-
Cited by 10 (1 self)
- Add to MetaCart
The design and functionality of a versatile search engine on handwritten documents is described. Documents are indexed using global image features, e.g., stroke width, slant, word gaps, as well local features that describe shapes of characters and words. Image indexing is done automatically using page analysis, page segmentation, line separation, word segmentation and recognition of characters and words. Several types of searches are
"Blind" Speech Segmentation: Automatic Segmentation of Speech without Linguistic Knowledge
"... A new automatic speech segmentation procedure, called the "Blind" speech segmentation, is presented. This procedure allows a speech sample to be segmented into sub-word units without the knowledge of any linguistic information (such as, orthographic or phonetic transcription). Hence, this procedure ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
A new automatic speech segmentation procedure, called the "Blind" speech segmentation, is presented. This procedure allows a speech sample to be segmented into sub-word units without the knowledge of any linguistic information (such as, orthographic or phonetic transcription). Hence, this procedure involves finding the optimal number of sub-word segments in the given speech sample, before locating the 1.
Y.: Considering multiple options when interpreting spoken utterances
- In: Proceedings of the Fifth IJCAI Workshop on Knowledge and Reasoning in Practical Dialogue Systems
, 2007
"... We describe Scusi?, a spoken language interpretation mechanism designed to be part of a robot-mounted dialogue system. Scusi?’s interpretation process maps spoken utterances to text, which in turn is parsed and then converted to conceptual graphs. In order to support robust and flexible performance ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
We describe Scusi?, a spoken language interpretation mechanism designed to be part of a robot-mounted dialogue system. Scusi?’s interpretation process maps spoken utterances to text, which in turn is parsed and then converted to conceptual graphs. In order to support robust and flexible performance of the dialogue module, Scusi? maintains multiple options at each stage of the interpretation process, and uses maximum posterior probability to rank the (partial) interpretations produced at each stage. The time and space requirements of maintaining multiple options are handled by means of an anytime search algorithm. Our evaluation focuses on the impact of the speech recognizer and the search algorithm on Scusi?’s performance.
A RECOGNITION METHOD OF CONNECTED SPOKEN WORDS WITH SYNTACTICAL CONSTRAINTS BY AUGMENTED CONTINUOUS DP ALGORITHM
"... The technique of dynamic time warping by using dynamic programming is powerful for isolated word recognition. An augmented continuous dynamic programming algorithm is proposed for connected spoken word recognition with syntactical constraints. The algorithm is based on the same principle of two leve ..."
Abstract
- Add to MetaCart
The technique of dynamic time warping by using dynamic programming is powerful for isolated word recognition. An augmented continuous dynamic programming algorithm is proposed for connected spoken word recognition with syntactical constraints. The algorithm is based on the same principle of two level DP and level building DP. Although our algorithm obtains a near optimal solution for the recognition principle based on pattern matching, it is computationaly more efficient than the conventional methods and also does not require many memory storages. Therefore it is useful for connected word recognition with syntactical constraints in a large vocabulary. The amount of computation is almost the same as that for isolated word recognition. I
Enriching Music with Synchronized Lyrics, Images and Colored Lights
"... We present a method to synchronize popular music with its lyrics at the stanza level. First we apply an algorithm to segment audio content into harmonically similar and/or contrasting progressions, i.e. the stanzas. We map the stanzas found to a sequence of labels, where stanzas with a similar progr ..."
Abstract
- Add to MetaCart
We present a method to synchronize popular music with its lyrics at the stanza level. First we apply an algorithm to segment audio content into harmonically similar and/or contrasting progressions, i.e. the stanzas. We map the stanzas found to a sequence of labels, where stanzas with a similar progression are mapped to the same label. The lyrics are analyzed as well to compute a second sequence of labels. Using dynamic programming, an optimal match is found between the two sequences, resulting in a stanzalevel synchronization of the lyrics and the audio. The synchronized lyrics can be used to compute a synchronized slide show to accompany the music, where the images are retrieved using the lyrics. For an additional enrichment of the experience, colored light effects are synchronized with the music that are computed from the sets of images. The song segmentation can be done reliably, while the mapping of the audio segments and lyrics gives encouraging results. Categories and Subject Descriptors

