Results 1 - 10
of
270
Robust automatic speech recognition with missing and unreliable acoustic data
- Speech Communication
, 2001
"... ..."
Prediction-Driven Computational Auditory Scene Analysis for Dense Sound Mixtures
, 1996
"... We interpret the sound reaching our ears as the combined effect of independent, sound-producing entities in the external world; hearing would have limited usefulness if were defeated by overlapping sounds. Computer systems that are to interpret real-world sounds for speech recognition or for multime ..."
Abstract
-
Cited by 124 (8 self)
- Add to MetaCart
We interpret the sound reaching our ears as the combined effect of independent, sound-producing entities in the external world; hearing would have limited usefulness if were defeated by overlapping sounds. Computer systems that are to interpret real-world sounds for speech recognition or for multimedia indexing must similarly interpret complex mixtures. However, existing functional models of audition employ only data-driven processing incapable of making context-dependent inferences in the face of interference. We propose aprediction-driven approach to this problem, raising numerous issues including the need to represent any kind of sound, and to handle multiple competing hypotheses. Results from an implementation of this approach illustrate its ability to analyze complex, ambient sound scenes that would confound previous systems.
MARSYAS: A framework for audio analysis
, 2000
"... Existing audio tools handle the increasing amount of computer audio data inadequately. The typical tape-recorder paradigm for audio interfaces is inflexible and time consuming, especially for large data sets. On the other hand, completely automatic audio analysis and annotation is impossible using c ..."
Abstract
-
Cited by 89 (16 self)
- Add to MetaCart
Existing audio tools handle the increasing amount of computer audio data inadequately. The typical tape-recorder paradigm for audio interfaces is inflexible and time consuming, especially for large data sets. On the other hand, completely automatic audio analysis and annotation is impossible using current techniques.
One Microphone Source Separation
- In Advances in Neural Information Processing Systems 13
, 2000
"... Source separation, or computational auditory scene analysis, attempts to extract individual acoustic objects from input which contains a mixture of sounds from different sources, altered by the acoustic environment. Unmixing algorithms such as ICA and its extensions recover sources by reweighting mu ..."
Abstract
-
Cited by 77 (1 self)
- Add to MetaCart
Source separation, or computational auditory scene analysis, attempts to extract individual acoustic objects from input which contains a mixture of sounds from different sources, altered by the acoustic environment. Unmixing algorithms such as ICA and its extensions recover sources by reweighting multiple observation sequences, and thus cannot operate when only a single observation signal is available. I present a technique called refiltering which recovers sources by a nonstationary reweighting ("masking") of frequency sub-bands from a single recording, and argue for the application of statistical algorithms to learning this masking function. I present results of a simple factorial HMM system which learns on recordings of single speakers and can then separate mixtures using only one observation signal by computing the masking function and then refiltering.
Perceptual Coding of Digital Audio
- Proceedings of the IEEE
, 2000
"... During the last decade, CD-quality digital audio has essentially replaced analog audio. Emerging digital audio applications for network, wireless, and multimedia computing systems face a series of constraints such as reduced channel bandwidth, limited storage capacity, and low cost. These new applic ..."
Abstract
-
Cited by 76 (0 self)
- Add to MetaCart
During the last decade, CD-quality digital audio has essentially replaced analog audio. Emerging digital audio applications for network, wireless, and multimedia computing systems face a series of constraints such as reduced channel bandwidth, limited storage capacity, and low cost. These new applications have created a demand for high-quality digital audio delivery at low bit rates. In response to this need, considerable research has been devoted to the development of algorithms for perceptually transparent coding of high-fidelity (CD-quality) digital audio. As a result, many algorithms have been proposed, and several have now become international and/or commercial product standards. This paper reviews algorithms for perceptually transparent coding of CD-quality digital audio, including both research and standardization activities. The paper is organized as follows. First, psychoacoustic principles are described with the MPEG psychoacoustic signal analysis model 1 discussed in some detail. Next, filter bank design issues and algorithms are addressed, with a particular emphasis placed on the Modified Discrete Cosine Transform (MDCT), a perfect reconstruction (PR) cosine-modulated filter bank that has become of central importance in perceptual audio coding. Then, we review methodologies that achieve perceptually transparent coding of FM- and CD-quality audio signals, including algorithms that manipulate transform components, subband signal decompositions, sinusoidal signal components, and linear prediction (LP) parameters, as well as hybrid algorithms that make use of more than one signal model. These discussions concentrate on architectures and applications of
Separation of Speech from Interfering Sounds Based on Oscillatory Correlation
- IEEE TRANSACTIONS ON NEURAL NETWORKS
, 1999
"... A multistage neural model is proposed for an auditory scene analysis task---segregating speech from interfering sound sources. The core of the model is a two-layer oscillator network that performs stream segregation on the basis of oscillatory correlation. In the oscillatory correlation framework, a ..."
Abstract
-
Cited by 67 (22 self)
- Add to MetaCart
A multistage neural model is proposed for an auditory scene analysis task---segregating speech from interfering sound sources. The core of the model is a two-layer oscillator network that performs stream segregation on the basis of oscillatory correlation. In the oscillatory correlation framework, a stream is represented by a population of synchronized relaxation oscillators, each of which corresponds to an auditory feature, and different streams are represented by desynchronized oscillator populations. Lateral connections between oscillators encode harmonicity, and proximity in frequency and time. Prior to the oscillator network are a model of the auditory periphery and a stage in which mid-level auditory representations are formed. The model has been systematically evaluated using a corpus of voiced speech mixed with interfering sounds, and produces improvements in terms of signal-to-noise ratio for every mixture. The performance of our model is compared with other studies on computa...
Learning at a distance I. Statistical learning of non-adjacent dependencies
- COGNITIVE PSYCHOLOGY
, 2004
"... ..."
Three-dimensional virtual acoustic displays
, 1991
"... The development of an alternative medium for displaying information in complex humanmachine interfaces is described. The three-dimensional virtual acoustic display is a means for accurately transfering information to a human operator using the auditory modality; it combines directional and semantic ..."
Abstract
-
Cited by 58 (0 self)
- Add to MetaCart
The development of an alternative medium for displaying information in complex humanmachine interfaces is described. The three-dimensional virtual acoustic display is a means for accurately transfering information to a human operator using the auditory modality; it combines directional and semantic characteristics to form naturalistic representations of dynamic objects and events in remotely-sensed or simulated environments. Although the technology can stand alone, it is envisioned as a component of a larger multisensory environment and will no doubt find its greatest utility in that context. The general philosophy in the design of the display has been that the develop-ment of advanced computer interfaces should be driven first by an understanding of human perceptual requirements, and later by technological capabilities or constraints. In expanding on this view, the paper addresses current and potential uses of virtual acoustic displays, characterizes such dis-plays, reviews recent approaches to their implementation and application, describes the research project at NASA Ames in some detail, and finally outlines some critical research issues for the future.
Monaural speech segregation based on pitch tracking and amplitude modulation
- IEEE Trans. Neural Networks
, 2004
"... Speech segregation is an important task of auditory scene analysis (ASA), in which the speech of a certain speaker is separated from other interfering signals. Wang and Brown proposed a multistage neural model for speech segregation, the core of which is a two-layer oscillator network. In this paper ..."
Abstract
-
Cited by 55 (23 self)
- Add to MetaCart
Speech segregation is an important task of auditory scene analysis (ASA), in which the speech of a certain speaker is separated from other interfering signals. Wang and Brown proposed a multistage neural model for speech segregation, the core of which is a two-layer oscillator network. In this paper, we extend their model by adding further processes based on psychoacoustic evidence to improve the performance. These processes include pitch tracking and grouping based on amplitude modulation (AM). Our model is systematically evaluated and compared with the Wang-Brown model, and it yields significantly better performance. 1.
Application of Bayesian Probability Network to Music Scene Analysis
, 1998
"... We propose a process model for hierarchical perceptual sound organization, which recognizes perceptual sounds included in incoming sound signals. We consider perceptual sound organization as a scene analysis problem in the auditory domain. Our current application is a music scene analysis system, ..."
Abstract
-
Cited by 49 (0 self)
- Add to MetaCart
We propose a process model for hierarchical perceptual sound organization, which recognizes perceptual sounds included in incoming sound signals. We consider perceptual sound organization as a scene analysis problem in the auditory domain. Our current application is a music scene analysis system, which recognizes rhythm, chords, and source-separated musical notes included in incoming music signals. Our process model consists of multiple processing modules and a probability network for information integration. The structure of our model is conceptually based on the blackboard architecture. However, employment of a Bayesian probability network has facilitated integration of multiple sources of information provided by autonomous modules without global control knowledge. 1 Introduction We humans recognize or understand existence, localization and movements of external entities through five senses. We call this function "scene analysis". Scene analysis is viewed here as an information pr...

