Results 1 -
5 of
5
Smoothness Analysis For Trajectory Features
- Int. Conf. in Acoustics, Speech and Signal Processing
, 1997
"... Dynamic modeling of speech is potentially a major improvement on Hidden Markov Models (HMMs). In one approach, trajectory models[1] are used to model the dynamics of the spectrum, and are used as basis for classification [1, 2]. Although some improvement has been achieved in this way, one would hope ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
Dynamic modeling of speech is potentially a major improvement on Hidden Markov Models (HMMs). In one approach, trajectory models[1] are used to model the dynamics of the spectrum, and are used as basis for classification [1, 2]. Although some improvement has been achieved in this way, one would hope for more substantial improvements given that the independence assumption is removed. One reason why this was not achieved may be that the trajectory models are based on cepstral coefficients; we show that these tracks contain spurious oscillations. This suggests that these trajectory features might have a high within-class variance. We introduce a measure of evaluating the smoothness of trajectory-based features. This measure provides a method of selecting the best of a set of similar features. Formant trajectories prove to be significantly smoother than trajectories of mel scale cepstral coefficients (MFCC) by this measure, but this does not translate directly to improved performance. 1. I...
Efficient Estimation of Perceptual Features for Speech Recognition
"... A number of studies have shown that a pair of perceptual effective formants can be defined to capture most of the phonetic information present in vowels. Various methods of computing the effective formant values were proposed. However, many of them depend on the accuracy of conventional formant esti ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
A number of studies have shown that a pair of perceptual effective formants can be defined to capture most of the phonetic information present in vowels. Various methods of computing the effective formant values were proposed. However, many of them depend on the accuracy of conventional formant estimation. In this work, we study methods of automatically estimating perceptual effective formants without estimating the actual formant values and compare the results with the perceptually measured effective formant values. The preliminary results show that the method is effective in estimating the perceptual effective formants. Classification experiments using perceptual effective formants as explicit features do not demonstrate any advantages. However, using the perceptual effective second formant value as input to our formant estimation algorithm can help to correct up to 44% of the formant tracking errors. 1. INTRODUCTION Substantial improvements to speech recognition may be attained if ...
Speaker Normalization using Correlations Among Classes
"... In this research, we study the relationship amongst speakers and different sounds in speech, investigate the relevant features to represent this relationship, and explore the applications in speaker normalization/adaptation. We propose a new method which incorporate correlations amongst classes. Usi ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
In this research, we study the relationship amongst speakers and different sounds in speech, investigate the relevant features to represent this relationship, and explore the applications in speaker normalization/adaptation. We propose a new method which incorporate correlations amongst classes. Using principal component analysis we construct a speaker space based on a speaker covariance matrix obtained from the training data. The speaker covariance matrix is constructed in such a manner as to explicitly describe the correlations between classes. By explicitly modeling these correlations it is possible to adapt the model or normalize the speaker's features. These hypothesis are tested in segment classification tasks where other variant conditions (such as contextual variation) are minimized. keywords: speaker adaptation 1 Introduction Achieving invariance over the large number of sources of variation in speech is one of the most difficult problems in speech recognition. These may eit...
Explicit N-Best Formant Features fo Segment-Based Speech Recognition
, 1996
"... This thesis investigates the use of explicit speech knowledge in computer speech-recognition. Speech knowledge is generally expressed in terms of acoustic events occurring near phonetic segment boundaries and the location, shape and dynamics of formant trajectories. This suggests the creation of a s ..."
Abstract
- Add to MetaCart
This thesis investigates the use of explicit speech knowledge in computer speech-recognition. Speech knowledge is generally expressed in terms of acoustic events occurring near phonetic segment boundaries and the location, shape and dynamics of formant trajectories. This suggests the creation of a segment-based recognition framework and the use of explicit formant features in a flexible integration scheme to ultimately improve the phonetic recognition accuracy. We describe a segmentation algorithm that produces a lattice of segment hypotheses, each with an associated broad phonetic identity. We build a single phonetic segment classifier along with separate vowel/semi-vowel and consonant classifiers based on traditional cepstral features paying attention to reducing the mismatch between training and deployment conditions. We develop a robust, N-best formant tracking algorithm that generates a list of up to N consistent formant interpretations. The use of the N best feature paradigum is based on the observation that there are generally only a handful of reasonable interpretation of the given formant information. Instead of finding the best formant interpretation through the use of a global cost function that includes energy maximization and smoothness terms, we delay the selection of the correct formant interpretation until after the segment classification and phonetic search. We use the formant interpretations to extract features for a vowel/semi-vowel segment classifier. The formant trajectories are approximated either by three line segments or by a third-order Legendre polynomial. We show that together with formant amplitude, formant bandwidth, pitch, and segment durations we can produce a classifier of comparable performance to a cepstral-based classifier. We further demonstrate the potential of the N best classification paradigm and show that a combination of formant and cepstral features further improves the classification accuracy. Finally, the validity of the entire approach of using a segment-based approach, separate classifiers for vowels and consontans, and explicit formant features is verified by phonetic recognition experiments.
Accent Recognition for Indian English using Acoustic Feature Approach
"... Accent is the basic pattern of acoustic feature and pronunciation. It can identify the person’s social and linguistic background. It is an important source of inter as well as intra speaker variability. The accent dependent dictionary or model can be used to improve accuracy of speech recognition sy ..."
Abstract
- Add to MetaCart
Accent is the basic pattern of acoustic feature and pronunciation. It can identify the person’s social and linguistic background. It is an important source of inter as well as intra speaker variability. The accent dependent dictionary or model can be used to improve accuracy of speech recognition system. In this study we present an experimental approach of acoustic speech feature for Marathi & Arabic accents for English speaking. The detail study of acoustics correlates the accent using formant frequency, energy and pitch characteristics. The database consists of speech from speaker with Marathi as their mother tongue and speakers from Iraq with Arabic language as mother tongue. Both the speakers were asked to speak English number from zero to nine. Through experimental results the fifth formant frequency found to be very effective for accent recognition.

