Results 1 - 10
of
9,020
The Aurora Experimental Framework for the Performance Evaluation of Speech Recognition Systems under Noisy Conditions
- in ISCA ITRW ASR2000
, 2000
"... This paper describes a database designed to evaluate the performance of speech recognition algorithms in noisy conditions. The database may either be used to measure frontend feature extraction algorithms, using a defined HMM recognition back-end, or complete recognition systems. The source speech f ..."
Abstract
-
Cited by 534 (6 self)
- Add to MetaCart
for this database is the TIdigits, consisting of connected digits task spoken by American English talkers (downsampled to 8kHz). A selection of 8 different real-world noises have been added to the speech over a range of signal to noise ratios with controlled filtering of the speech and noise. The framework
The information bottleneck method
, 1999
"... We define the relevant information in a signal x ∈ X as being the information that this signal provides about another signal y ∈ Y. Examples include the information that face images provide about the names of the people portrayed, or the information that speech sounds provide about the words spoken. ..."
Abstract
-
Cited by 540 (35 self)
- Add to MetaCart
We define the relevant information in a signal x ∈ X as being the information that this signal provides about another signal y ∈ Y. Examples include the information that face images provide about the names of the people portrayed, or the information that speech sounds provide about the words spoken
Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences
- ACOUSTICS, SPEECH AND SIGNAL PROCESSING, IEEE TRANSACTIONS ON
, 1980
"... Several parametric representations of the acoustic signal were compared as to word recognition performance in a syllable-oriented continuous speech recognition system. The vocabulary in-cluded many phonetically similar monosyllabic words, therefore the emphasis was on ability to retain phonetically ..."
Abstract
-
Cited by 1120 (2 self)
- Add to MetaCart
Several parametric representations of the acoustic signal were compared as to word recognition performance in a syllable-oriented continuous speech recognition system. The vocabulary in-cluded many phonetically similar monosyllabic words, therefore the emphasis was on ability to retain
Coupled hidden Markov models for complex action recognition
, 1996
"... We present algorithms for coupling and training hidden Markov models (HMMs) to model interacting processes, and demonstrate their superiority to conventional HMMs in a vision task classifying two-handed actions. HMMs are perhaps the most successful framework in perceptual computing for modeling and ..."
Abstract
-
Cited by 501 (22 self)
- Add to MetaCart
and classifying dynamic behaviors, popular because they offer dynamic time warping, a training algorithm, and a clear Bayesian semantics. However, the Markovian framework makes strong restrictive assumptions about the system generating the signal---that it is a single process having a small number of states
Construction And Evaluation Of A Robust Multifeature Speech/music Discriminator
, 1997
"... We report on the construction of a real-time computer system capable of distinguishing speech signals from music signals over a wide range of digital audio input. We have examined 13 features intended to measure conceptually distinct properties of speech and/or music signals, and combined them in se ..."
Abstract
-
Cited by 354 (5 self)
- Add to MetaCart
We report on the construction of a real-time computer system capable of distinguishing speech signals from music signals over a wide range of digital audio input. We have examined 13 features intended to measure conceptually distinct properties of speech and/or music signals, and combined them
Blind separation of speech mixtures via time-frequency masking
- IEEE TRANSACTIONS ON SIGNAL PROCESSING (2002) SUBMITTED
, 2004
"... Binary time-frequency masks are powerful tools for the separation of sources from a single mixture. Perfect demixing via binary time-frequency masks is possible provided the time-frequency representations of the sources do not overlap: a condition we call-disjoint orthogonality. We introduce here t ..."
Abstract
-
Cited by 322 (5 self)
- Add to MetaCart
the concept of approximate-disjoint orthogonality and present experimental results demonstrating the level of approximate W-disjoint orthogonality of speech in mixtures of various orders. The results demonstrate that there exist ideal binary time-frequency masks that can separate several speech signals from
Monitoring and self-repair in speech
- Cognition
, 1983
"... Making a self-repair in speech typically proceeds in three phases. The first phase involves the monitoring of one’s own speech and the interruption of the flow of speech when trouble is detected. From an analysis of 959 spontaneous self-repairs it appears that interrupting follows detection promptly ..."
Abstract
-
Cited by 279 (11 self)
- Add to MetaCart
on the nature of the speech trouble in a rather regular fashion: Speech errors induce other editing terms than words that are merely inappropriate, and trouble which is detected quickly by the speaker is preferably signalled by the use of ‘uh’. The third phase consists of making the repair proper
Correlates of Linguistic Rhythm in the Speech Signal
, 1999
"... This paper presents instrumental measurements based on a consonant/vowel segmentation for eight languages. The measurements suggest that intuitive rhythm types reflect specific phonological properties, which in turn are signaled by the acoustic/phonetic properties of speech. The data support the not ..."
Abstract
-
Cited by 211 (9 self)
- Add to MetaCart
This paper presents instrumental measurements based on a consonant/vowel segmentation for eight languages. The measurements suggest that intuitive rhythm types reflect specific phonological properties, which in turn are signaled by the acoustic/phonetic properties of speech. The data support
Noise power spectral density estimation based on optimal smoothing and minimum statistics
- IEEE TRANS. SPEECH AND AUDIO PROCESSING
, 2001
"... We describe a method to estimate the power spectral density of nonstationary noise when a noisy speech signal is given. The method can be combined with any speech enhancement algo-rithm which requires a noise power spectral density estimate. In contrast to other methods, our approach does not use a ..."
Abstract
-
Cited by 276 (7 self)
- Add to MetaCart
We describe a method to estimate the power spectral density of nonstationary noise when a noisy speech signal is given. The method can be combined with any speech enhancement algo-rithm which requires a noise power spectral density estimate. In contrast to other methods, our approach does not use
Signal modeling techniques in speech recognition
- PROCEEDINGS OF THE IEEE
, 1993
"... We have seen three important trends develop in the last five years in speech recognition. First, heterogeneous parameter sets that mix absolute spectral information with dynamic, or time-derivative, spectral information, have become common. Second, similariry transform techniques, often used to norm ..."
Abstract
-
Cited by 181 (5 self)
- Add to MetaCart
to normalize and decor-relate parameters in some computationally inexpensive way, have become popular. Third, the signal parameter estimation problem has merged with the speech recognition process so that more sophisticated statistical models of the signal’s spectrum can be estimated in a closed-loop manner
Results 1 - 10
of
9,020