Results 1 - 10
of
12
Phoneme recognition using spectral envelope and modulation frequency features
- in IEEE ICASSP
, 2009
"... We present a new feature extraction technique for phoneme recognition that uses short-term spectral envelope and modulation frequency features. These features are derived from sub-band temporal envelopes of speech estimated using Frequency Domain Linear Prediction (FDLP). While spectral envelope fea ..."
Abstract
-
Cited by 8 (6 self)
- Add to MetaCart
We present a new feature extraction technique for phoneme recognition that uses short-term spectral envelope and modulation frequency features. These features are derived from sub-band temporal envelopes of speech estimated using Frequency Domain Linear Prediction (FDLP). While spectral envelope features are obtained by the short-term integration of the sub-band envelopes, the modulation frequency components are derived from the long-term evolution of the sub-band envelopes. These features are combined at the phoneme posterior level and used as features for a hybrid HMM-ANN phoneme recognizer. For the phoneme recognition task on the TIMIT database, the proposed features show an improvement of 4.7 % over the other feature extraction techniques. Index Terms — Spectral envelope and Modulation frequency features, Phoneme Recognition, Frequency Domain Linear Prediction
Temporal Envelope Subtraction for Robust Speech Recognition Using Modulation Spectrum
- IEEE ASRU
, 2009
"... Abstract—In this paper, we present a new noise compensation technique for modulation frequency features derived from syllable length segments of subband temporal envelopes. The subband temporal envelopes are estimated using frequency domain linear prediction (FDLP). We propose a technique for noise ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract—In this paper, we present a new noise compensation technique for modulation frequency features derived from syllable length segments of subband temporal envelopes. The subband temporal envelopes are estimated using frequency domain linear prediction (FDLP). We propose a technique for noise compensation in FDLP where an estimate of the noise envelope is subtracted from the noisy speech envelope. The noise compensated FDLP envelopes are compressed with static (logarithmic) and dynamic (adaptive loops) compression and are transformed into modulation spectral features. Experiments are performed on a phoneme recognition task as well as a connected digit recognition task where the test data is corrupted with variety of noise types at different signal to noise ratios. In these experiments with mismatched train and test conditions, the proposed features provide considerable improvements compared to other state of the art noise robust feature extraction techniques (average relative improvement of 25 % and 35 % over the baseline PLP features for phoneme and word recognition tasks respectively). I.
Tandem Representations of Spectral Envelope and Modulation Frequency Features for ASR
"... We present a feature extraction technique for automatic speech recognition that uses Tandem representation of short-term spectral envelope and modulation frequency features. These features, derived from sub-band temporal envelopes of speech estimated using frequency domain linear prediction, are com ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
We present a feature extraction technique for automatic speech recognition that uses Tandem representation of short-term spectral envelope and modulation frequency features. These features, derived from sub-band temporal envelopes of speech estimated using frequency domain linear prediction, are combined at the phoneme posterior level. Tandem representations derived from these phoneme posteriors are used along with HMM based ASR systems for both small and large vocabulary continuous speech recognition (LVCSR) tasks. For a small vocabulary continuous digit task on the OGI Digits database, the proposed features reduce the word error rate (WER) by 13 % relative to other feature extraction techniques. We obtain a relative reduction of about 14 % in WER for an LVCSR task using the NIST RT05 evaluation data. For phoneme recognition tasks on the TIMIT database these features provide a relative improvement of 13% compared to other techniques.
Separating Speech from Speech Noise Annual Report 2006
, 2007
"... The main work at Columbia this year has been the development of algorithms for extracting and recognizing speech in nonstationary, noisy environments when only a single microphone channel is available. Our particular approach is based on using trained models to distinguish regions of time-frequency ..."
Abstract
- Add to MetaCart
The main work at Columbia this year has been the development of algorithms for extracting and recognizing speech in nonstationary, noisy environments when only a single microphone channel is available. Our particular approach is based on using trained models to distinguish regions of time-frequency containing speech from nonspeech areas [2], and we have pursued this along several directions: One approach is to use trained models of the speech signal and to find the best set of model parameters that are consistent with the noisy speech observations. An alternative approach is to treat the labeling of each time-frequency cell as a simple classification task, and to train pattern recognition classifiers to perform this task. In that work, the challenge is to find the best classifier architecture and the most effective representation of the context. When more than one microphone channel is available, some different approaches to source separation become possible. The two-channel case is particularly interesting because this is the number of ears possessed by the typical listener. We have been looking at ways to separate sources in recordings made
FREQUENCY DOMAIN LINEAR PREDICTION
, 2008
"... Frequency Domain Linear Prediction (FDLP) uses auto-regressive models to represent Hilbert envelopes of relatively long segments of speech/audio signals. Although the basic FDLP audio codec achieves good quality of the reconstructed signal at high bit-rates, there is a need for scaling to lower bit- ..."
Abstract
- Add to MetaCart
Frequency Domain Linear Prediction (FDLP) uses auto-regressive models to represent Hilbert envelopes of relatively long segments of speech/audio signals. Although the basic FDLP audio codec achieves good quality of the reconstructed signal at high bit-rates, there is a need for scaling to lower bit-rates without degrading the reconstruction quality. Here, we present a method for improving the compression efficiency of the FDLP codec by the application of the Modified Discrete Cosine Transform (MDCT) for encoding the FDLP residual signals. In the subjective and objective quality evaluations, the proposed FDLP codec provides competent quality of reconstructed signal compared to the state-of-the-art audio codecs for the 32 − 64 kbps range.
Error Resilient Speech Coding Using
"... Abstract. Frequency Domain Linear Prediction (FDLP) represents a technique for auto-regressive modelling of Hilbert envelopes of a signal. In this paper, we propose a speech coding technique that uses FDLP in Quadrature Mirror Filter (QMF) sub-bands of short segments of the speech signal (25 ms). Li ..."
Abstract
- Add to MetaCart
Abstract. Frequency Domain Linear Prediction (FDLP) represents a technique for auto-regressive modelling of Hilbert envelopes of a signal. In this paper, we propose a speech coding technique that uses FDLP in Quadrature Mirror Filter (QMF) sub-bands of short segments of the speech signal (25 ms). Line Spectral Frequency parameters related to autoregressive models and the spectral components of the residual signals are transmitted. For simulating the effects of lossy transmission channels, bit-packets are dropped randomly. In the objective and subjective quality evaluations, the proposed FDLP speech codec is judged to be more resilient to bit-packet losses compared to the state-of-the-art Adaptive Multi-Rate Wide-Band (AMR-WB) codec at 12 kbps.
MODULATION FREQUENCY FEATURES
, 2009
"... We present a new feature extraction technique for phoneme recognition that uses short-term spectral envelope and modulation frequency features. These features are derived from sub-band temporal envelopes of speech estimated using Frequency Domain Linear Prediction (FDLP). While spectral envelope fea ..."
Abstract
- Add to MetaCart
We present a new feature extraction technique for phoneme recognition that uses short-term spectral envelope and modulation frequency features. These features are derived from sub-band temporal envelopes of speech estimated using Frequency Domain Linear Prediction (FDLP). While spectral envelope features are obtained by the short-term integration of the sub-band envelopes, the modulation frequency components are derived from the long-term evolution of the sub-band envelopes. These features are combined at the phoneme posterior level and used as features for a hybrid HMM-ANN phoneme recognizer. For the phoneme recognition task on the TIMIT database, the proposed features show an improvement of 4.7 % over the other feature extraction techniques. Index Terms — Spectral envelope and Modulation frequency features, Phoneme Recognition, Frequency Domain Linear Prediction 1.
What is ‘nonspeech’?
, 2009
"... ◮ according to research effort: a little music ◮ in the world: most everything high speech music Information content low wind & water natural animal sounds contact/ collision Origin machines & engines man-made attributes? E6820 (Ellis & Mandel) L6: Nonspeech and Music February 26, 2009 3 / 30Sound a ..."
Abstract
- Add to MetaCart
◮ according to research effort: a little music ◮ in the world: most everything high speech music Information content low wind & water natural animal sounds contact/ collision Origin machines & engines man-made attributes? E6820 (Ellis & Mandel) L6: Nonspeech and Music February 26, 2009 3 / 30Sound attributes Attributes suggest model parameters What do we notice about ‘general ’ sound? ◮ psychophysics: pitch, loudness, ‘timbre’ ◮ bright/dull; sharp/soft; grating/soothing ◮ sound is not ‘abstract’: tendency is to describe by source-events Ecological perspective ◮ what matters about sound is ‘what happened’ → our percepts express this more-or-less directly
Wide-Band Audio Coding Based on Frequency-Domain Linear Prediction
, 2010
"... We revisit an original concept of speech coding in which the signal is separated into the carrier modulated by the signal envelope. A recently developed technique, called frequency-domain linear prediction (FDLP), is applied for the efficient estimation of the envelope. The processing in the tempor ..."
Abstract
- Add to MetaCart
We revisit an original concept of speech coding in which the signal is separated into the carrier modulated by the signal envelope. A recently developed technique, called frequency-domain linear prediction (FDLP), is applied for the efficient estimation of the envelope. The processing in the temporal domain allows for a straightforward emulation of the forward temporal masking. This, combined with an efficient nonuniform sub-band decomposition and application of noise shaping in spectral domain instead of temporal domain (a technique to suppress artifacts in tonal audio signals), yields a codec that does not rely on the linear speech production model but rather uses well-accepted concept of frequency-selective auditory perception. As such, the codec is not only specific for coding speech but also well suited for coding other important acoustic signals such as music and mixed content. The quality of the proposed codec at 66 kbps is evaluated using objective and subjective quality assessments. The evaluation indicates competitive performance with the MPEG codecs operating at similar bit rates.

