Results 11 -
17 of
17
Speech Recognition
, 1994
"... Contents 1 Introduction 1 2 The Human Speech 3 2.1 Phonemes : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 3 2.1.1 Other Speech Units : : : : : : : : : : : : : : : : : : : : : 4 2.2 Kinds of Phonemes : : : : : : : : : : : : : : : : : : : : : : : : : : 5 2.2.1 Consonants : : : : : : : ..."
Abstract
- Add to MetaCart
Contents 1 Introduction 1 2 The Human Speech 3 2.1 Phonemes : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 3 2.1.1 Other Speech Units : : : : : : : : : : : : : : : : : : : : : 4 2.2 Kinds of Phonemes : : : : : : : : : : : : : : : : : : : : : : : : : : 5 2.2.1 Consonants : : : : : : : : : : : : : : : : : : : : : : : : : : 5 2.2.1.1 Voicing : : : : : : : : : : : : : : : : : : : : : : : 6 2.2.1.2 Place of Articulation : : : : : : : : : : : : : : : 6 2.2.1.3 Manner of Articulation : : : : : : : : : : : : : : 7 2.2.2 Vowels : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 8 2.2.3 Diphthongs : : : : : : : : : : : : : : : : : : : : : : : : : : 8 2.3 Formants : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 9 3 A Signal Processing View of the Human Speech 10 3.1 Def
A Speed-Invariant Temporal Feature Detector
"... Temporal pattern classification normally operates upon vectors whose components are time-delayed samples, be they samples of spectral, cepstral, or LPC coefficients over time. Such a spatial representation of temporal patterns is very sensitive to speed variations, and requires computationally e ..."
Abstract
- Add to MetaCart
Temporal pattern classification normally operates upon vectors whose components are time-delayed samples, be they samples of spectral, cepstral, or LPC coefficients over time. Such a spatial representation of temporal patterns is very sensitive to speed variations, and requires computationally expensive time-alignment techniques such as Dynamic Time Warping in order to compare inputs to exemplars. This paper proposes a speed-invariant representation of temporal patterns using Taylor series expansion. A vector composed of successive time-derivative samples of the temporal signal is unique to the shape of the function in a region around the sampling point. Simple manipulations on such a "Taylor vector" yield a speed-invariant form. The degree of speed invariance can be controlled parametrically while retaining sensitivity to the direction of presentation. Simulations demonstrate that learning and recall of temporal pattterns coded using this representation is accurate and do...
Off-Line Handwritten Word Recognition Using Hidden Markov Models
, 1999
"... Introduction Today, handwriting recognition is one of the most challenging tasks and exciting areas of research in computer science. Indeed, despite the growing interest in this field, no satisfactory solution is available. The difficulties encountered are numerous and include the huge variability ..."
Abstract
- Add to MetaCart
Introduction Today, handwriting recognition is one of the most challenging tasks and exciting areas of research in computer science. Indeed, despite the growing interest in this field, no satisfactory solution is available. The difficulties encountered are numerous and include the huge variability of handwriting such as inter-writer and intra-writer variabilities, writing environment (pen, sheet, support, etc.), the overlap between characters, and the ambiguity that makes many characters unidentifiable without referring to context. Owing to these difficulties, many researchers have integrated the lexicon as a constraint to build lexicon-driven strategies to decrease the problem complexity. For small lexicons, as in bank-check processing, most approaches are global and consider a word as an indivisible entity [1] - [5]. If the lexicon is large, as in postal applications (city name or street name recognition) [6] - [10], one cannot consider a word as one entity, because of the huge num
Edge Detection with the Parametric Filtering Method (Comparison with Canny Method)
"... Abstract—In this paper, a new method of image edge-detection and characterization is presented. “Parametric Filtering method ” uses a judicious defined filter, which preserves the signal correlation structure as input in the autocorrelation of the output. This leads, showing the evolution of the ima ..."
Abstract
- Add to MetaCart
Abstract—In this paper, a new method of image edge-detection and characterization is presented. “Parametric Filtering method ” uses a judicious defined filter, which preserves the signal correlation structure as input in the autocorrelation of the output. This leads, showing the evolution of the image correlation structure as well as various distortion measures which quantify the deviation between two zones of the signal (the two Hamming signals) for the protection of an image edge. Keywords—Edge detection, parametrable recursive filter, autocorrelation structure, distortion measurements. I.
Informing Multisource Decoding in Robust Automatic Speech Recognition
, 2008
"... Listeners are remarkably adept at recognising speech in natural multisource environments, while most Automatic Speech Recognition (ASR) technology fails in these conditions. It has been proposed that this human ability is governed by Auditory Scene Analysis (ASA) processes, in which a sound mixture ..."
Abstract
- Add to MetaCart
Listeners are remarkably adept at recognising speech in natural multisource environments, while most Automatic Speech Recognition (ASR) technology fails in these conditions. It has been proposed that this human ability is governed by Auditory Scene Analysis (ASA) processes, in which a sound mixture is segregated into perceptual packages, called ‘streams’, by a combination of bottom-up and top-down processing. This thesis examines a novel ASR framework based on the ASA account, Speech Fragment Decoding (SFD). A ‘fragment ’ is a spectro-temporal region where energy from a single sound source dominates. SFD employs techniques developed from knowledge about the auditory system to identify fragments. A decoding process using statistical speech models is applied to the fragment representation to simultaneously identify speech evidence and recognise speech. In this study three techniques for improving SFD are investigated. Firstly, explicit duration modelling is exploited to combat the corruption of acoustic data which often causes the decoder to produce word matches with unrealistic durations. Secondly, it is argued that the top-down information in recognition models may be insufficient to mediate the speech
Speaker Dependent and Independent Isolated Hindi Word Recognizer using Hidden Markov Model (HMM)
"... Hindi is very complex language with large number of phonemes and being used with various ascents in different regions in India. In this manuscript, speaker dependent and independent isolated Hindi word recognizers using the Hidden Markov Model (HMM) is implemented, under noisy environment. For this ..."
Abstract
- Add to MetaCart
Hindi is very complex language with large number of phonemes and being used with various ascents in different regions in India. In this manuscript, speaker dependent and independent isolated Hindi word recognizers using the Hidden Markov Model (HMM) is implemented, under noisy environment. For this study, a set of 10 Hindi names has been chosen as a test set for which the training and testing is performed. The scheme instigated here implements the Mel Frequency Cepstral Coefficients (MFCC) in order to compute the acoustic features of the speech signal. Then, K-means algorithm is used for the codebook generation by performing clustering over the obtained feature space. Baum Welch algorithm is used for re-estimating the parameters, and finally for deciding the recognized Hindi word whose model likelihood is highest, Viterbi algorithm has been implemented; for the given HMM. This work resulted in successful recognition with 98.6 % recognition rate for speaker dependent recognition, for total of 10 speakers (6 male, 4 female) and 97.5 % for speaker independent isolated word recognizer for 10 speakers (male).

