Results 21 - 30
of
59
Language Models For A Spelled Letter Recognizer
- In Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing
, 1995
"... In some speech recognition applications, it is reasonable to constrain the search space of a speech recognizer to a large but finite set of sentences. We demonstrate the problem on a spelling task, where the recognition of continuously spelled last names is constrained to 110,000 entries (= 43,000 u ..."
Abstract
-
Cited by 6 (3 self)
- Add to MetaCart
In some speech recognition applications, it is reasonable to constrain the search space of a speech recognizer to a large but finite set of sentences. We demonstrate the problem on a spelling task, where the recognition of continuously spelled last names is constrained to 110,000 entries (= 43,000 unique names) of a telephone book. Several techniques to address this problem are compared: recognition without any language model, bigrams, functions to map a hypothesis onto a legal string, n-best lists, and finally a newly developed method which integrates all constraints directly into the search process within reasonable memory and time bounds. The baseline result of 56% string accuracy is improved to 62, 85, 88, and 92%, respectively. To appear in: Proc. IEEE International Conf. on Acoustics, Speech, and Signal Processing, Detroit, USA, May 1995. 1. INTRODUCTION Spelled letter recognition is an essential subtask of many speech recognition systems. Applications include spelling of arbit...
Large Vocabulary Continuous Speech Recognition: from Laboratory Systems towards Real-World Applications
, 1996
"... This paper provides an overview of the state-of-the-art in laboratory speaker-independent, large vocabulary continuous speech recognition (LVCSR) systems with a view towards adapting such technology to the requirements of real-world applications. While in speech recognition the principal concern is ..."
Abstract
-
Cited by 6 (4 self)
- Add to MetaCart
This paper provides an overview of the state-of-the-art in laboratory speaker-independent, large vocabulary continuous speech recognition (LVCSR) systems with a view towards adapting such technology to the requirements of real-world applications. While in speech recognition the principal concern is to transcribe the speech signal as a sequence of words, the same core technology can be applied to domains other than dictation. The main topics addressed are acoustic-phonetic modeling, lexical representation, language modeling, decoding and model adaptation. After a brief summary of experimental results some directions towards usable systems are given. In moving from laboratory systems towards real-world applications, different constraints arise which influence the system design. The application imposes limitations on computational resources, constraints on signal capture, requirements for noise and channel compensation, and rejection capability. The difficulties and costs of adapting existing technology to new languages and application need to be assessed. Near term applications for LVCSR technology are likely to grow in somewhat limited domains such as spoken language systems for information retrieval, and limited domain dictation. Perspectives on some unresolved problems are given, indicating areas for future research
A SyntaxDirected Level Building Algorithm for Large Vocabulary Handwritten Word
- In Proc. 4th International Workshop on Document Analysis Systems
, 2000
"... This paper describes a large vocabulary handwritten word recognition system based on a syntax#directed level building algorithm #SDLBA# that incorporates contextual information. The sequences of observations extracted from the input images are matched against the entries of a tree#structure lexi ..."
Abstract
-
Cited by 5 (4 self)
- Add to MetaCart
This paper describes a large vocabulary handwritten word recognition system based on a syntax#directed level building algorithm #SDLBA# that incorporates contextual information. The sequences of observations extracted from the input images are matched against the entries of a tree#structure lexicon where each node is represented bya 10#state character HMM. The search proceeds breadth---#rst and each node is decoded by the SDLBA. Contextual information about writing styles and case transitions is injected between the levels of the SDLBA.
Automatic Prosodic Segmentation by F0 Clustering Using Superpositional Modeling
- IEEE ICASSP 95
, 1995
"... In this paper, we propose an automatic method for detecting accent phrase boundaries in Japanese continuous speech by using F0 information. In the training phase, hand labeled accent patterns are parameterized according to a superpositional model proposed by Fujisaki, and assigned to some clusters b ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
In this paper, we propose an automatic method for detecting accent phrase boundaries in Japanese continuous speech by using F0 information. In the training phase, hand labeled accent patterns are parameterized according to a superpositional model proposed by Fujisaki, and assigned to some clusters by a clustering method, in which accent templates are calculated as centroid of each cluster. In the segmentation phase, automatic N-best extraction of boundaries is performed by One-Stage DP matching between the reference templates and the target F0 contour. About 90 % of accent phrase boundaries were correctly detected in speaker independent experiments with the ATR Japanese continuous speech database. 1.
Query-by-example spoken term detection using phonetic posteriorgram templates
- in Proc. ASRU
, 2009
"... Abstract—This paper examines a query-by-example approach to spoken term detection in audio files. The approach is designed for low-resource situations in which limited or no in-domain training material is available and accurate word-based speech recognition capability is unavailable. Instead of usin ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Abstract—This paper examines a query-by-example approach to spoken term detection in audio files. The approach is designed for low-resource situations in which limited or no in-domain training material is available and accurate word-based speech recognition capability is unavailable. Instead of using word or phone strings as search terms, the user presents the system with audio snippets of desired search terms to act as the queries. Query and test materials are represented using phonetic posteriorgrams obtained from a phonetic recognition system. Query matches in the test data are located using a modified dynamic time warping search between query templates and test utterances. Experiments using this approach are presented using data from the Fisher corpus. I.
EUTRANS: a Speech-to-Speech Translator Prototype.
- In Proceedings of EuroSpeech
, 2001
"... EUTRANS system is a telephone speech input translation prototype capable of translating telephone calls from one language to another. It assumes a human to human communication, each one speaking a different language, assisted by a system with translation capabilities. The prototype has been develope ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
EUTRANS system is a telephone speech input translation prototype capable of translating telephone calls from one language to another. It assumes a human to human communication, each one speaking a different language, assisted by a system with translation capabilities. The prototype has been developed as a demonstrator for the European project with the same name. EUTRANS achieves a response time close to real time for speaker-independent, medium complexity tasks (a few thousand words) and offers competitive accuracy. The acoustic, language and translation models are finite-state networks that are automatically learnt form training samples, this makes the system easily adaptable to news tasks. It runs on a standard PC with audio capability and a cheap modem. The system is currently available for two translation tasks: FUB task (Italian-English) and Traveler task (SpanishEnglish) .
Robust Matching by Dynamic Space Warping for Accurate Face Recognition
- IEEE International Conference On Image Processing, ICIP
, 2001
"... The utility of face recognition for multimedia indexing is enhanced by using accurate detection and alignment of salient invariant face features. The face recognition can be performed using template matching or a feature-based-approach, but both these methods suffer from occlusion and require an a ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
The utility of face recognition for multimedia indexing is enhanced by using accurate detection and alignment of salient invariant face features. The face recognition can be performed using template matching or a feature-based-approach, but both these methods suffer from occlusion and require an a priori model for extracting information. To avoid these drawbacks, we present in this paper a complete scheme for face recognition based on salient feature extraction in challenging conditions, which is performed without an a priori or learned model. These features are used in a matching process that overcomes occlusion effects using the dynamic space warping which aligns each feature in the query image, if possible, with its corresponding feature in the gallery set. Thus, we make face recognition robust to low frequency variations (like the presence of occlusion,etc) as well as to high frequency variations (like expression, gender,etc). A maximum likelihood scheme is used to make the recognition process more precise, as is shown in the experiments.
Robust Face Recognition Using Dynamic Space Warping
- Paul Nurse, Director-General, Imperial Cancer Research Fund Publish with BMC and
, 2002
"... he utility of facer ecognition for multimedia indexing is enhanced by using accu r te detection and alignment of salient invar - ant face featur)( he facerceT) ition can be per2SU ed using template matching or a featur-based-appr ach, but both these methods su#er fr om occlusion and r quir e an apr ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
he utility of facer ecognition for multimedia indexing is enhanced by using accu r te detection and alignment of salient invar - ant face featur)( he facerceT) ition can be per2SU ed using template matching or a featur-based-appr ach, but both these methods su#er fr om occlusion and r quir e an apr or model for extr cting inforH tion. o avoid these d r wbacks, we pr)Uz t in this paper a complete schemefor facerceT( ition based on salient featur e extr2H ion in challenging conditions, which is per2zz ed without an a pr ior or lear ed model. hese featur s ar e used in a matching pr cess that over omes occlusion e#ects and facial expr)(jWk s using the dynamic space war ing which aligns each featur in the quer y image, if possible, with its cor2 sponding featur in thegaller set. hus, we make facerceT2 itionr obust to lowfr( uency var ations (like the pr sence of occlusion, etc) as well as to high fr equency var(j ions (like expr)HkjH , gende r etc). A maximum likelihood scheme is used to make ther ecognition p r cess mor pr ecise, as is shown in the exper(U( ts. 1
Comparison of Three Approaches to Phonetic String Generation for Large Vocabulary Speech Recognition
- ICSLP
, 1994
"... We are building a large vocabulary, isolated word preselection system according to a bottom-up design strategy. It will be used in the development of a dictation machine for Spanish and it is composed of three main modules: feature extraction, phonetic string build up and lexical access. In the seco ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
We are building a large vocabulary, isolated word preselection system according to a bottom-up design strategy. It will be used in the development of a dictation machine for Spanish and it is composed of three main modules: feature extraction, phonetic string build up and lexical access. In the second one, we are considering three different technological approaches based on static modeling (SM), Hidden Markov Models (HMM) and Neural Networks (NN). This paper will compare these three alternatives in terms of recognition performance, training complexity and computational load, and will conclude with the results of the comparison in order to adopt the most suitable approach depending on the task. I. INTRODUCTION The study we are presenting was done to help the decision process of adopting a certain technology for future developments in very large vocabulary speech recognition systems based in the hypothesisverification paradigm. The stage analyzed here constitutes the first step (hypoth...
Audio genre classification using percussive pattern clustering combined with timbral features
- In ICME
, 2009
"... Many musical genres and styles are characterized by distinct representative rhythmic patterns. In most automatic genre classification systems global statistical features based on timbral dynamics such as Mel-Frequency Cepstral Coefficients (MFCC) are utilized but so far rhythmic information has not ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Many musical genres and styles are characterized by distinct representative rhythmic patterns. In most automatic genre classification systems global statistical features based on timbral dynamics such as Mel-Frequency Cepstral Coefficients (MFCC) are utilized but so far rhythmic information has not so effectively been used. In order to extract bar-long unit rhythmic patterns for a music collection we propose a clustering method based on one-pass dynamic programming and k-means clustering. After extracting the fundamental rhythmic patterns for each style/genre a pattern occurrence histogram is calculated and used as a feature vector for supervised learning. Experimental results show that the automatically calculated rhythmic pattern information can be used to effectively classify musical genre/style and improve upon current approaches based on timbral features.

