Results 1 - 10
of
656
Tandem connectionist feature extraction for conventional HMM systems
"... Hidden Markov model speech recognition systems typically use Gaussian mixture models to estimate the distributions of decorrelated acoustic feature vectors that correspond to individual subword units. By contrast, hybrid connectionist-HMM systems use discriminatively-trained neural networks to estim ..."
Abstract
-
Cited by 242 (24 self)
- Add to MetaCart
Hidden Markov model speech recognition systems typically use Gaussian mixture models to estimate the distributions of decorrelated acoustic feature vectors that correspond to individual subword units. By contrast, hybrid connectionist-HMM systems use discriminatively-trained neural networks
Connectionist feature extraction for conventional HMM systems
- Proc. of ICASSP 00
, 2000
"... Hidden Markov model speech recognition systems typically use Gaussian mixture models to estimate the distributions of decorrelated acoustic feature vectors that correspond to individual subword units. By contrast, hybrid connectionist-HMM systems use discriminatively-trained neural networks to estim ..."
Abstract
-
Cited by 26 (9 self)
- Add to MetaCart
Hidden Markov model speech recognition systems typically use Gaussian mixture models to estimate the distributions of decorrelated acoustic feature vectors that correspond to individual subword units. By contrast, hybrid connectionist-HMM systems use discriminatively-trained neural networks
TANDEM CONNECTIONIST FEATURE EXTRACTION FOR CONVENTIONAL HMM SYSTEMS
"... ABSTRACT Hidden Markov model speech recognition systems typically use Gaussian mixture models to estimate the distributions of decorrelated acoustic feature vectors that correspond to individual subword units. By contrast, hybrid connectionist-HMM systems use discriminatively-trained neural network ..."
Abstract
- Add to MetaCart
ABSTRACT Hidden Markov model speech recognition systems typically use Gaussian mixture models to estimate the distributions of decorrelated acoustic feature vectors that correspond to individual subword units. By contrast, hybrid connectionist-HMM systems use discriminatively-trained neural
Mel Frequency Cepstral Coefficients for Music Modeling
- In International Symposium on Music Information Retrieval
, 2000
"... We examine in some detail Mel Frequency Cepstral Coefficients (MFCCs) - the dominant features used for speech recognition - and investigate their applicability to modeling music. In particular, we examine two of the main assumptions of the process of forming MFCCs: the use of the Mel frequency scale ..."
Abstract
-
Cited by 299 (3 self)
- Add to MetaCart
scale to model the spectra; and the use of the Discrete Cosine Transform (DCT) to decorrelate the Mel-spectral vectors.
Acoustic Modeling using Deep Belief Networks
- SUBMITTED TO IEEE TRANS. ON AUDIO, SPEECH, AND LANGUAGE PROCESSING
, 2010
"... Gaussian mixture models are currently the dominant technique for modeling the emission distribution of hidden Markov models for speech recognition. We show that better phone recognition on the TIMIT dataset can be achieved by replacing Gaussian mixture models by deep neural networks that contain ma ..."
Abstract
-
Cited by 163 (16 self)
- Add to MetaCart
many layers of features and a very large number of parameters. These networks are first pretrained as a multilayer generative model of a window of spectral feature vectors without making use of any discriminative information. Once the generative pretraining has designed the features, we perform
Content-Based Retrieval of Music and Audio
- MULTIMEDIA STORAGE AND ARCHIVING SYSTEMS II, PROC. OF SPIE
, 1997
"... Though many systems exist for content-based retrieval of images, little work has been done on the audio portion of the multimedia stream. This paper presents a system to retrieve audio documents by acoustic similarity. The similarity measure is based on statistics derived from a supervised vector qu ..."
Abstract
-
Cited by 169 (9 self)
- Add to MetaCart
Though many systems exist for content-based retrieval of images, little work has been done on the audio portion of the multimedia stream. This paper presents a system to retrieve audio documents by acoustic similarity. The similarity measure is based on statistics derived from a supervised vector
Exploiting Acoustic Feature Correlations By Joint Neural Vector Quantizer Design In A Discrete HMM System
- Proc. ICASSP'98
, 1998
"... In previous work about hybrid speech recognizers with discrete HMMs we have shown that VQs, that are trained according to an MMI criterion, are well suited for ML estimated Bayes classifiers. This is only valid for single VQ systems. In this paper we extend the theory to speech recognizers with mult ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
of recognition performance. The joint multiple VQ training decorrelates the quantizer labels and improves system performance. In addition the new training criterion allows for a less careful way of splitting up the feature vector into multiple streams that do not have to be statistically independent
Analysis of Disturbed Acoustic Features
, 2001
"... An analysis method was developed to study the impact of training-test mismatch due to the presence of additive noise. The contributions of individual observation vector components to the emission cost are determined in the matched and mismatched condition and histograms are computed for these contri ..."
Abstract
- Add to MetaCart
and how in certain cases this type of information may be helpful to increase recognition accuracy by applying acoustic backing-off to selected features only. Some limitations of the approach are also discussed.
PHONETIC FEATURES AND ACOUSTIC LANDMARKS
"... A probabilistic and statistical framework is presented for automatic speech recognition based on a phonetic feature representation of speech sounds. In this acoustic-phonetic approach, the speech recognition problem is hypothesized as a maximization of the joint posterior probability of a set of pho ..."
Abstract
- Add to MetaCart
A probabilistic and statistical framework is presented for automatic speech recognition based on a phonetic feature representation of speech sounds. In this acoustic-phonetic approach, the speech recognition problem is hypothesized as a maximization of the joint posterior probability of a set
Speech Emotion Recognition Combining Acoustic Features and Linguistic Information in a Hybrid Support Vector
- Machine - Belief Network Architecture," ICASSP 2004
"... In this contribution we introduce a novel approach to the combination of acoustic features and language information for a most robust automatic recognition of a speaker’s emotion. Seven discrete emotional states are classified throughout the work. Firstly a model for the recognition of emotion by ac ..."
Abstract
-
Cited by 41 (10 self)
- Add to MetaCart
In this contribution we introduce a novel approach to the combination of acoustic features and language information for a most robust automatic recognition of a speaker’s emotion. Seven discrete emotional states are classified throughout the work. Firstly a model for the recognition of emotion
Results 1 - 10
of
656