• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations

Tools

Sorted by:
Try your query at:
Semantic Scholar Scholar Academic
Google Bing DBLP
Results 1 - 10 of 656
Next 10 →

Tandem connectionist feature extraction for conventional HMM systems

by Hynek Hermansky, Daniel P. W. Ellis, Sangita Sharma
"... Hidden Markov model speech recognition systems typically use Gaussian mixture models to estimate the distributions of decorrelated acoustic feature vectors that correspond to individual subword units. By contrast, hybrid connectionist-HMM systems use discriminatively-trained neural networks to estim ..."
Abstract - Cited by 242 (24 self) - Add to MetaCart
Hidden Markov model speech recognition systems typically use Gaussian mixture models to estimate the distributions of decorrelated acoustic feature vectors that correspond to individual subword units. By contrast, hybrid connectionist-HMM systems use discriminatively-trained neural networks

Connectionist feature extraction for conventional HMM systems

by Hynek Hermansky, Dan Ellis, Sangita Sharma - Proc. of ICASSP 00 , 2000
"... Hidden Markov model speech recognition systems typically use Gaussian mixture models to estimate the distributions of decorrelated acoustic feature vectors that correspond to individual subword units. By contrast, hybrid connectionist-HMM systems use discriminatively-trained neural networks to estim ..."
Abstract - Cited by 26 (9 self) - Add to MetaCart
Hidden Markov model speech recognition systems typically use Gaussian mixture models to estimate the distributions of decorrelated acoustic feature vectors that correspond to individual subword units. By contrast, hybrid connectionist-HMM systems use discriminatively-trained neural networks

TANDEM CONNECTIONIST FEATURE EXTRACTION FOR CONVENTIONAL HMM SYSTEMS

by Hynek Hermansky1' , Daniel I? W Ellis , Sangita Shamza'
"... ABSTRACT Hidden Markov model speech recognition systems typically use Gaussian mixture models to estimate the distributions of decorrelated acoustic feature vectors that correspond to individual subword units. By contrast, hybrid connectionist-HMM systems use discriminatively-trained neural network ..."
Abstract - Add to MetaCart
ABSTRACT Hidden Markov model speech recognition systems typically use Gaussian mixture models to estimate the distributions of decorrelated acoustic feature vectors that correspond to individual subword units. By contrast, hybrid connectionist-HMM systems use discriminatively-trained neural

Mel Frequency Cepstral Coefficients for Music Modeling

by Beth Logan - In International Symposium on Music Information Retrieval , 2000
"... We examine in some detail Mel Frequency Cepstral Coefficients (MFCCs) - the dominant features used for speech recognition - and investigate their applicability to modeling music. In particular, we examine two of the main assumptions of the process of forming MFCCs: the use of the Mel frequency scale ..."
Abstract - Cited by 299 (3 self) - Add to MetaCart
scale to model the spectra; and the use of the Discrete Cosine Transform (DCT) to decorrelate the Mel-spectral vectors.

Acoustic Modeling using Deep Belief Networks

by Abdel-rahman Mohamed, George E. Dahl, Geoffrey Hinton - SUBMITTED TO IEEE TRANS. ON AUDIO, SPEECH, AND LANGUAGE PROCESSING , 2010
"... Gaussian mixture models are currently the dominant technique for modeling the emission distribution of hidden Markov models for speech recognition. We show that better phone recognition on the TIMIT dataset can be achieved by replacing Gaussian mixture models by deep neural networks that contain ma ..."
Abstract - Cited by 163 (16 self) - Add to MetaCart
many layers of features and a very large number of parameters. These networks are first pretrained as a multilayer generative model of a window of spectral feature vectors without making use of any discriminative information. Once the generative pretraining has designed the features, we perform

Content-Based Retrieval of Music and Audio

by Jonathan T. Foote - MULTIMEDIA STORAGE AND ARCHIVING SYSTEMS II, PROC. OF SPIE , 1997
"... Though many systems exist for content-based retrieval of images, little work has been done on the audio portion of the multimedia stream. This paper presents a system to retrieve audio documents by acoustic similarity. The similarity measure is based on statistics derived from a supervised vector qu ..."
Abstract - Cited by 169 (9 self) - Add to MetaCart
Though many systems exist for content-based retrieval of images, little work has been done on the audio portion of the multimedia stream. This paper presents a system to retrieve audio documents by acoustic similarity. The similarity measure is based on statistics derived from a supervised vector

Exploiting Acoustic Feature Correlations By Joint Neural Vector Quantizer Design In A Discrete HMM System

by Christoph Neukirchen, Daniel Willett, Stefan Eickeler, Stefan Müller - Proc. ICASSP'98 , 1998
"... In previous work about hybrid speech recognizers with discrete HMMs we have shown that VQs, that are trained according to an MMI criterion, are well suited for ML estimated Bayes classifiers. This is only valid for single VQ systems. In this paper we extend the theory to speech recognizers with mult ..."
Abstract - Cited by 3 (0 self) - Add to MetaCart
of recognition performance. The joint multiple VQ training decorrelates the quantizer labels and improves system performance. In addition the new training criterion allows for a less careful way of splitting up the feature vector into multiple streams that do not have to be statistically independent

Analysis of Disturbed Acoustic Features

by In Terms Of, Laurens Van De Werff, Johan De Veth, Bert Cranen, Louis Boves , 2001
"... An analysis method was developed to study the impact of training-test mismatch due to the presence of additive noise. The contributions of individual observation vector components to the emission cost are determined in the matched and mismatched condition and histograms are computed for these contri ..."
Abstract - Add to MetaCart
and how in certain cases this type of information may be helpful to increase recognition accuracy by applying acoustic backing-off to selected features only. Some limitations of the approach are also discussed.

PHONETIC FEATURES AND ACOUSTIC LANDMARKS

by Carol Espy-wilson, Amit Juneja
"... A probabilistic and statistical framework is presented for automatic speech recognition based on a phonetic feature representation of speech sounds. In this acoustic-phonetic approach, the speech recognition problem is hypothesized as a maximization of the joint posterior probability of a set of pho ..."
Abstract - Add to MetaCart
A probabilistic and statistical framework is presented for automatic speech recognition based on a phonetic feature representation of speech sounds. In this acoustic-phonetic approach, the speech recognition problem is hypothesized as a maximization of the joint posterior probability of a set

Speech Emotion Recognition Combining Acoustic Features and Linguistic Information in a Hybrid Support Vector

by Björn Schuller, Gerhard Rigoll, Manfred Lang, Technische Universität München - Machine - Belief Network Architecture," ICASSP 2004
"... In this contribution we introduce a novel approach to the combination of acoustic features and language information for a most robust automatic recognition of a speaker’s emotion. Seven discrete emotional states are classified throughout the work. Firstly a model for the recognition of emotion by ac ..."
Abstract - Cited by 41 (10 self) - Add to MetaCart
In this contribution we introduce a novel approach to the combination of acoustic features and language information for a most robust automatic recognition of a speaker’s emotion. Seven discrete emotional states are classified throughout the work. Firstly a model for the recognition of emotion
Next 10 →
Results 1 - 10 of 656
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University