Results 1 - 10
of
21
New Applications of the Sound Description Interchange Format
- Proc. ICMC-98, Ann Arbor
, 1998
"... This paper describes the goals and design of SDIF and its standard frame types, followed by a review of recent SDIF work at CNMAT, IRCAM, and IUA. ..."
Abstract
-
Cited by 23 (2 self)
- Add to MetaCart
This paper describes the goals and design of SDIF and its standard frame types, followed by a review of recent SDIF work at CNMAT, IRCAM, and IUA.
Denoising by Sparse Approximation: Error Bounds Based on Rate-Distortion Theory
, 2006
"... If a signal x is known to have a sparse representation with respect to a frame, it can be estimated from a noise-corrupted observation y by finding the best sparse approximation to y. Removing noise in this manner depends on the frame efficiently representing the signal while it inefficiently repres ..."
Abstract
-
Cited by 21 (5 self)
- Add to MetaCart
If a signal x is known to have a sparse representation with respect to a frame, it can be estimated from a noise-corrupted observation y by finding the best sparse approximation to y. Removing noise in this manner depends on the frame efficiently representing the signal while it inefficiently represents the noise. The mean-squared error (MSE) of this denoising scheme and the probability that the estimate has the same sparsity pattern as the original signal are analyzed. First an MSE bound that depends on a new bound on approximating a Gaussian signal as a linear combination of elements of an overcomplete dictionary is given. Further analyses are for dictionaries generated randomly according to a spherically-symmetric distribution and signals expressible with single dictionary elements. Easily-computed approximations for the probability of selecting the correct dictionary element and the MSE are given. Asymptotic expressions reveal a critical input signal-to-noise ratio for signal recovery.
Matching Pursuit and Atomic Signal Models Based on Recursive Filter Banks
- IEEE Transactions on Signal Processing
, 1902
"... The matching pursuit algorithm can be used to derive signal decompositions in terms of the elements of a dictionary of time-frequency atoms. Using a structured overcomplete dictionary yields a signal model that is both parametric and signal-adaptive. In this paper, we apply matching pursuit to the d ..."
Abstract
-
Cited by 20 (1 self)
- Add to MetaCart
The matching pursuit algorithm can be used to derive signal decompositions in terms of the elements of a dictionary of time-frequency atoms. Using a structured overcomplete dictionary yields a signal model that is both parametric and signal-adaptive. In this paper, we apply matching pursuit to the derivation of signal expansions based on damped sinusoids. It is shown that expansions in terms of complex damped sinusoids can be efficiently derived using simple recursive filter banks. We discuss a subspace extension of the pursuit algorithm which provides a framework for deriving real-valued expansions of real signals based on such complex atoms. Furthermore, we consider symmetric and asymmetric two-sided atoms constructed from underlying one-sided damped sinusoids. The primary concern is the application of this approach to the modeling of signals with transient behavior such as music; it is shown that time-frequency atoms based on damped sinusoids are more suitable for representing trans...
Advances in parametric audio coding
- in Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), Mohonk, New Paltz
, 1999
"... For very low bit rate audio coding applications in mobile communications or on the internet, parametric audio coding has evolved as a technique complementing the more traditional approaches. These are transform codecs originally designed for achieving CDlike quality on one hand, and specialized spee ..."
Abstract
-
Cited by 19 (5 self)
- Add to MetaCart
For very low bit rate audio coding applications in mobile communications or on the internet, parametric audio coding has evolved as a technique complementing the more traditional approaches. These are transform codecs originally designed for achieving CDlike quality on one hand, and specialized speech codecs on the other hand. Both of these techniques usually represent the audio signal waveform in a way such that the decoder output signal gives an approximation of the encoder input signal, while taking into account perceptual criteria. Compared to this approach, in parametric audio coding the models of the signal source and of human perception are extended. The source model is now based on the assumption that the audio signal is the sum of “components,” each of which can be approximated by a relatively simple signal model with a small number of parameters. The perception model is based on the assumption that the sound of the decoder output signal should be as similar as possible to that of the encoder input signal. Therefore, the approximation of waveforms is no longer necessary. This approach can lead to a very efficient representation. However, a suitable set of models for signal components, a good decomposition, and a good parameter estimation are all vital for achieving maximum audio quality. We will give an overview on the current status of parametric audio coding developments and demonstrate advantages and challenges of this approach. Finally, we will indicate possible directions of further improvements. 1.
Extending Spectral Modeling Synthesis with . . .
, 2000
"... Sinusoidal modeling has enjoyed a rich history in both speech and music applications, including sound transformations, compression, denoising, and auditory scene analysis. For such applications, the underlying signal model must efficiently capture salient audio features (Goodwin 1998). In this artic ..."
Abstract
-
Cited by 15 (1 self)
- Add to MetaCart
Sinusoidal modeling has enjoyed a rich history in both speech and music applications, including sound transformations, compression, denoising, and auditory scene analysis. For such applications, the underlying signal model must efficiently capture salient audio features (Goodwin 1998). In this article, we present an accurate, efficient, and flexible three-part model for audio signals consisting of sines, transients, and noise by extending spectral modeling synthesis (SMS) (Serra and Smith 1990) with an explicit flexible transient model called transient-modeling synthesis (TMS). The sinusoidal transformation system (STS) (McAulay and Quatieri 1986) and SMS find the slowly varying sinusoidal components in a signal using spectral-peak-picking algorithms. Subtracting the synthesized sinusoids from the original signal creates a residual consisting of transients and noise (Serra 1989; George and Smith 1992). However, sinusoids do not model this residual well. Although it is possible to model transients and noise by a sum of sinusoidal signals (as with the Fourier transform), it is neither efficient, because transient and noisy signals require many sinusoids for their description, nor meaningful, because transients are short-lived signals, while the sinusoidal model uses sinusoids that are active on a much larger time scale. In the STS system (generally applied to speech), the transient + noise residual is often masked sufficiently to be ignored (McAulay and Quatieri 1986). In music applications, this residual is often important to the integrity of the
Methods for separation of harmonic sound sources using sinusoidal modeling
- in Proc. AES 106th Convention
, 1999
"... Methods are proposed for separation of harmonic sound sources using sinusoidal modeling. A local nonlinear least-squares (NLS) frequency estimator is proposed to resolve sinusoids that are close in frequency. An iterative analysis scheme using interpolated parameter trajectories and subtraction of d ..."
Abstract
-
Cited by 13 (1 self)
- Add to MetaCart
Methods are proposed for separation of harmonic sound sources using sinusoidal modeling. A local nonlinear least-squares (NLS) frequency estimator is proposed to resolve sinusoids that are close in frequency. An iterative analysis scheme using interpolated parameter trajectories and subtraction of detected components is presented. A measure is proposed for testing the accuracy of the model. 0
Sound Source Separation in Monaural Music Signals
, 2006
"... Sound source separation refers to the task of estimating the signals produced by individual sound sources from a complex acoustic mixture. It has several applications, since monophonic signals can be processed more efficiently and flexibly than polyphonic mixtures. This thesis deals with the separat ..."
Abstract
-
Cited by 11 (3 self)
- Add to MetaCart
Sound source separation refers to the task of estimating the signals produced by individual sound sources from a complex acoustic mixture. It has several applications, since monophonic signals can be processed more efficiently and flexibly than polyphonic mixtures. This thesis deals with the separation of monaural, or, one-channel music recordings. We concentrate on separation methods, where the sources to be separated are not known beforehand. Instead, the separation is enabled by utilizing the common properties of real-world sound sources, which are their continuity, sparseness, and repetition in time and frequency, and their harmonic spectral structures. One of the separation approaches taken here use unsupervised learning and the other uses model-based inference based on sinusoidal modeling. Most of the existing unsupervised separation algorithms are based on a linear instantaneous signal model, where each frame of the input mixture signal is
Speeding up HILN – MPEG-4 parametric audio encoding with reduced complexity
- in AES 109th Convention
, 2000
"... Parametric modelling permits an efficient representation of audio signals and is utilised for very low bit rate coding by the MPEG-4 Standard. Here we look at the MPEG-4 parametric audio coding tools ”Harmonic and Individual Lines plus Noise ” (HILN) which are based on a decomposition of the audio s ..."
Abstract
-
Cited by 7 (3 self)
- Add to MetaCart
Parametric modelling permits an efficient representation of audio signals and is utilised for very low bit rate coding by the MPEG-4 Standard. Here we look at the MPEG-4 parametric audio coding tools ”Harmonic and Individual Lines plus Noise ” (HILN) which are based on a decomposition of the audio signal into components that are described by appropriate source models and represented by model parameters. Until now, HILN encoding mainly focused on maximum audio quality at the expense of high computational complexity. In this paper, different approaches to speed up HILN encoding are presented and the tradeoff between computational complexity and audio quality is analysed. 1
MIRAI: Multi-hierarchical, FS-tree based Music Information Retrieval System
- Invited Paper), Proceedings of RSEISP 2007, M. Kryszkiewicz et al. (Eds), LNAI
, 2007
"... Abstract. With the fast booming of online music repositories, there is a need for content-based automatic indexing which will help users to find their favorite music objects in real time. Recently, numerous successful approaches on musical data feature extraction and selection have been proposed for ..."
Abstract
-
Cited by 4 (4 self)
- Add to MetaCart
Abstract. With the fast booming of online music repositories, there is a need for content-based automatic indexing which will help users to find their favorite music objects in real time. Recently, numerous successful approaches on musical data feature extraction and selection have been proposed for instrument recognition in monophonic sounds. Unfortunately, none of these methods can be successfully applied to polyphonic sounds. Identification of music instruments in polyphonic sounds is still difficult and challenging, especially when harmonic partials are overlapping with each other. This has stimulated the research on music sound separation and new features development for content-based automatic music information retrieval. Our goal is to build a cooperative query answering system (QAS), for a musical database, retrieving from it all objects satisfying queries like ”find all musical pieces in pentatonic scale with a viola and piano where viola is playing for minimum 20 seconds and piano for minimum 10 seconds”. We use the database of musical sounds, containing almost 4000 sounds taken from the MUMs (McGill University Master Samples), as a vehicle to construct several classifiers for automatic instrument recognition. Classifiers showing the best performance are adopted for automatic indexing of musical pieces by instruments. Our musical database has an FS-tree (Frame Segment Tree) structure representation. The cooperativeness of QAS is driven by several hierarchical structures used for classifying musical instruments. 1

