• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations

Tools

Sorted by:
Try your query at:
Semantic Scholar Scholar Academic
Google Bing DBLP
Results 1 - 10 of 9,020
Next 10 →

The Aurora Experimental Framework for the Performance Evaluation of Speech Recognition Systems under Noisy Conditions

by David Pearce, Hans-günter Hirsch, Ericsson Eurolab Deutschland Gmbh - in ISCA ITRW ASR2000 , 2000
"... This paper describes a database designed to evaluate the performance of speech recognition algorithms in noisy conditions. The database may either be used to measure frontend feature extraction algorithms, using a defined HMM recognition back-end, or complete recognition systems. The source speech f ..."
Abstract - Cited by 534 (6 self) - Add to MetaCart
for this database is the TIdigits, consisting of connected digits task spoken by American English talkers (downsampled to 8kHz). A selection of 8 different real-world noises have been added to the speech over a range of signal to noise ratios with controlled filtering of the speech and noise. The framework

The information bottleneck method

by Naftali Tishby, Fernando C. Pereira, William Bialek , 1999
"... We define the relevant information in a signal x ∈ X as being the information that this signal provides about another signal y ∈ Y. Examples include the information that face images provide about the names of the people portrayed, or the information that speech sounds provide about the words spoken. ..."
Abstract - Cited by 540 (35 self) - Add to MetaCart
We define the relevant information in a signal x ∈ X as being the information that this signal provides about another signal y ∈ Y. Examples include the information that face images provide about the names of the people portrayed, or the information that speech sounds provide about the words spoken

Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences

by Steven B. Davis, Paul Mermelstein - ACOUSTICS, SPEECH AND SIGNAL PROCESSING, IEEE TRANSACTIONS ON , 1980
"... Several parametric representations of the acoustic signal were compared as to word recognition performance in a syllable-oriented continuous speech recognition system. The vocabulary in-cluded many phonetically similar monosyllabic words, therefore the emphasis was on ability to retain phonetically ..."
Abstract - Cited by 1120 (2 self) - Add to MetaCart
Several parametric representations of the acoustic signal were compared as to word recognition performance in a syllable-oriented continuous speech recognition system. The vocabulary in-cluded many phonetically similar monosyllabic words, therefore the emphasis was on ability to retain

Coupled hidden Markov models for complex action recognition

by Matthew Brand, Nuria Oliver, Alex Pentland , 1996
"... We present algorithms for coupling and training hidden Markov models (HMMs) to model interacting processes, and demonstrate their superiority to conventional HMMs in a vision task classifying two-handed actions. HMMs are perhaps the most successful framework in perceptual computing for modeling and ..."
Abstract - Cited by 501 (22 self) - Add to MetaCart
and classifying dynamic behaviors, popular because they offer dynamic time warping, a training algorithm, and a clear Bayesian semantics. However, the Markovian framework makes strong restrictive assumptions about the system generating the signal---that it is a single process having a small number of states

Construction And Evaluation Of A Robust Multifeature Speech/music Discriminator

by Eric Scheirer, Malcolm Slaney , 1997
"... We report on the construction of a real-time computer system capable of distinguishing speech signals from music signals over a wide range of digital audio input. We have examined 13 features intended to measure conceptually distinct properties of speech and/or music signals, and combined them in se ..."
Abstract - Cited by 354 (5 self) - Add to MetaCart
We report on the construction of a real-time computer system capable of distinguishing speech signals from music signals over a wide range of digital audio input. We have examined 13 features intended to measure conceptually distinct properties of speech and/or music signals, and combined them

Blind separation of speech mixtures via time-frequency masking

by Özgür Yılmaz, Scott Rickard - IEEE TRANSACTIONS ON SIGNAL PROCESSING (2002) SUBMITTED , 2004
"... Binary time-frequency masks are powerful tools for the separation of sources from a single mixture. Perfect demixing via binary time-frequency masks is possible provided the time-frequency representations of the sources do not overlap: a condition we call-disjoint orthogonality. We introduce here t ..."
Abstract - Cited by 322 (5 self) - Add to MetaCart
the concept of approximate-disjoint orthogonality and present experimental results demonstrating the level of approximate W-disjoint orthogonality of speech in mixtures of various orders. The results demonstrate that there exist ideal binary time-frequency masks that can separate several speech signals from

Monitoring and self-repair in speech

by Willem J. M. Levelt - Cognition , 1983
"... Making a self-repair in speech typically proceeds in three phases. The first phase involves the monitoring of one’s own speech and the interruption of the flow of speech when trouble is detected. From an analysis of 959 spontaneous self-repairs it appears that interrupting follows detection promptly ..."
Abstract - Cited by 279 (11 self) - Add to MetaCart
on the nature of the speech trouble in a rather regular fashion: Speech errors induce other editing terms than words that are merely inappropriate, and trouble which is detected quickly by the speaker is preferably signalled by the use of ‘uh’. The third phase consists of making the repair proper

Correlates of Linguistic Rhythm in the Speech Signal

by Franck Ramus, Marina Nespor, Jacques Mehler , 1999
"... This paper presents instrumental measurements based on a consonant/vowel segmentation for eight languages. The measurements suggest that intuitive rhythm types reflect specific phonological properties, which in turn are signaled by the acoustic/phonetic properties of speech. The data support the not ..."
Abstract - Cited by 211 (9 self) - Add to MetaCart
This paper presents instrumental measurements based on a consonant/vowel segmentation for eight languages. The measurements suggest that intuitive rhythm types reflect specific phonological properties, which in turn are signaled by the acoustic/phonetic properties of speech. The data support

Noise power spectral density estimation based on optimal smoothing and minimum statistics

by Rainer Martin - IEEE TRANS. SPEECH AND AUDIO PROCESSING , 2001
"... We describe a method to estimate the power spectral density of nonstationary noise when a noisy speech signal is given. The method can be combined with any speech enhancement algo-rithm which requires a noise power spectral density estimate. In contrast to other methods, our approach does not use a ..."
Abstract - Cited by 276 (7 self) - Add to MetaCart
We describe a method to estimate the power spectral density of nonstationary noise when a noisy speech signal is given. The method can be combined with any speech enhancement algo-rithm which requires a noise power spectral density estimate. In contrast to other methods, our approach does not use

Signal modeling techniques in speech recognition

by Joseph W. Picone - PROCEEDINGS OF THE IEEE , 1993
"... We have seen three important trends develop in the last five years in speech recognition. First, heterogeneous parameter sets that mix absolute spectral information with dynamic, or time-derivative, spectral information, have become common. Second, similariry transform techniques, often used to norm ..."
Abstract - Cited by 181 (5 self) - Add to MetaCart
to normalize and decor-relate parameters in some computationally inexpensive way, have become popular. Third, the signal parameter estimation problem has merged with the speech recognition process so that more sophisticated statistical models of the signal’s spectrum can be estimated in a closed-loop manner
Next 10 →
Results 1 - 10 of 9,020
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University