Results 1 - 10
of
17
HILN - The MPEG-4 Parametric Audio Coding Tools
- in Proc. of IEEE Int. Symposium on Circuits and Systems
, 2000
"... The MPEG-4 Audio Standard combines tools for efficient and flexible coding of audio. For very low bitrate applications, tools based on a parametric signal representation are utilised. The parametric speech coding tools (HVXC) are already available in Version 1 of MPEG-4. The main focus of this paper ..."
Abstract
-
Cited by 16 (3 self)
- Add to MetaCart
The MPEG-4 Audio Standard combines tools for efficient and flexible coding of audio. For very low bitrate applications, tools based on a parametric signal representation are utilised. The parametric speech coding tools (HVXC) are already available in Version 1 of MPEG-4. The main focus of this paper is on the parametric audio coding tools "Harmonic and Individual Lines plus Noise" (HILN) which are included in Version 2 of MPEG-4. As already indicated by their name, the HILN tools are based on the decomposition of the audio signal into components which are described by appropriate source models and represented by model parameters. This paper gives an overview of the HILN tools, presents the recent advances in signal modelling and parameter coding, and concludes with an evaluation of the subjective audio quality. 1. INTRODUCTION In the context of evolving multimedia applications -- like digital broadcasting, storage, realtime communication, the World Wide Web, or games -- new demands f...
A prototype system for object coding of musical audio
- in Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA
"... This article deals with low bitrate object coding of musical audio, and more precisely with the extraction of pitched sound objects in polyphonic music. After a brief review of existing methods, we discuss the potential benefits of recasting this problem in a Bayesian framework. We define pitched ob ..."
Abstract
-
Cited by 8 (5 self)
- Add to MetaCart
This article deals with low bitrate object coding of musical audio, and more precisely with the extraction of pitched sound objects in polyphonic music. After a brief review of existing methods, we discuss the potential benefits of recasting this problem in a Bayesian framework. We define pitched objects by a set of probabilistic priors and derive efficient algorithms to infer active objects and their parameters. Preliminary experiments suggest that the proposed method results in a better sound quality than simple sinusoidal coding while achieving a lower bitrate. 1.
Audio modeling based on delayed sinusoids
- IEEE Trans. Speech Audio Process
, 2004
"... Abstract — In this work, we present an evolution of the DDS (Damped & Delayed Sinusoidal) model introduced within the framework of the general signal modeling. This model is named the Partial Damped & Delayed Sinusoidal (PDDS) model and takes into account a single time delay parameter for a set (sum ..."
Abstract
-
Cited by 7 (5 self)
- Add to MetaCart
Abstract — In this work, we present an evolution of the DDS (Damped & Delayed Sinusoidal) model introduced within the framework of the general signal modeling. This model is named the Partial Damped & Delayed Sinusoidal (PDDS) model and takes into account a single time delay parameter for a set (sum) of damped sinusoids. This modification is more consistent with the transient audio modeling problem. We show the validity of this approach by comparison with the well-known EDS (Exponentially Damped Sinusoids) approach. Finally, the performances of three model high-resolution parameter estimation algorithms are compared on synthetic fast time-varying signals and on two typical audio transients.
Speeding up HILN – MPEG-4 parametric audio encoding with reduced complexity
- in AES 109th Convention
, 2000
"... Parametric modelling permits an efficient representation of audio signals and is utilised for very low bit rate coding by the MPEG-4 Standard. Here we look at the MPEG-4 parametric audio coding tools ”Harmonic and Individual Lines plus Noise ” (HILN) which are based on a decomposition of the audio s ..."
Abstract
-
Cited by 7 (3 self)
- Add to MetaCart
Parametric modelling permits an efficient representation of audio signals and is utilised for very low bit rate coding by the MPEG-4 Standard. Here we look at the MPEG-4 parametric audio coding tools ”Harmonic and Individual Lines plus Noise ” (HILN) which are based on a decomposition of the audio signal into components that are described by appropriate source models and represented by model parameters. Until now, HILN encoding mainly focused on maximum audio quality at the expense of high computational complexity. In this paper, different approaches to speed up HILN encoding are presented and the tradeoff between computational complexity and audio quality is analysed. 1
Low Bitrate Object Coding of Musical Audio Using Bayesian Harmonic Models
, 2006
"... This article deals with the decomposition of music signals into pitched sound objects made of harmonic sinusoidal partials for very low bitrate coding purposes. After a brief review of existing methods, we recast this problem in the Bayesian framework. We propose a family of probabilistic signal mod ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
This article deals with the decomposition of music signals into pitched sound objects made of harmonic sinusoidal partials for very low bitrate coding purposes. After a brief review of existing methods, we recast this problem in the Bayesian framework. We propose a family of probabilistic signal models combining learnt object priors and various perceptually motivated distortion measures. We design efficient algorithms to infer object parameters and build a coder based on the interpolation of frequency and amplitude parameters. Listening tests suggest that the loudness-based distortion measure outperforms other distortion measures and that our coder results in a better sound quality than baseline transform and parametric coders at 8 kbit/s and 2 kbit/s. This work constitutes a new step towards a fully object-based coding system, which would represent audio signals as collections of meaningful note-like sound objects.
Prioritizing signals for selective real-time audio processing
- in "International Conference on Auditory Display (ICAD’05)", ICAD
, 2005
"... This paper studies various priority metrics that can be used to progressively select sub-parts of a number of audio signals for realtime processing. In particular, five level-related metrics were examined: RMS level, A-weighted level, Zwicker and Moore loudness models and a masking threshold-based m ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
This paper studies various priority metrics that can be used to progressively select sub-parts of a number of audio signals for realtime processing. In particular, five level-related metrics were examined: RMS level, A-weighted level, Zwicker and Moore loudness models and a masking threshold-based model. We conducted a pilot subjective evaluation study aimed at evaluating which metric would perform best at reconstructing mixtures of various types (speech, ambient and music) using only a budget amount of original audio data. Our results suggest that A-weighting performs the worst while results obtained with loudness metrics appear to depend on the type of signals. RMS level offers a good compromise for all cases. Our results also show that significant sub-parts of the original audio data can be omitted in most cases, without noticeable degradation in the generated mixtures, which validates the usability of our selective processing approach for real-time applications. In this context, we successfully implemented a prototype 3D audio rendering pipeline using our selective approach. 1.
A 6kbps to 85kbps scalable audio coder
- In Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing
, 2000
"... Scalable audio coding is important in network environments, such as the Internet, where bandwidth is not guaranteed, packet loss is common, and client connection data rates are heterogeneous. Signal models provide a general frame work for attacking a wide range of challenges in the unicast delivery ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Scalable audio coding is important in network environments, such as the Internet, where bandwidth is not guaranteed, packet loss is common, and client connection data rates are heterogeneous. Signal models provide a general frame work for attacking a wide range of challenges in the unicast delivery of real-time audio over packet switched networks. The specific signal model in this work generates a parametric representation for general wide-band audio signals. The model consists of three complementary components: sines, transients, and noise. Because the human hearing system ultimately judges the validity of a model for audio signals, psychoacoustic principles are explicitly considered in the three part model. Once analyzed, the parameters are quantized, compressed and packed into a single 85Kbps bit-stream. From this bit-stream, bit-streams at several bit-rates between 6Kbps and 85Kbps may be readily extracted. The audio coder offers a wide range of scalability while the audio quality of the coding scheme gracefully degrades from perceptually lossless to low-quality. 1.
Sparse Linear Regression With Structured Priors and Application to Denoising of Musical Audio
"... Abstract—We describe in this paper an audio denoising technique based on sparse linear regression with structured priors. The noisy signal is decomposed as a linear combination of atoms belonging to two modified discrete cosine transform (MDCT) bases, plus a residual part containing the noise. One M ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Abstract—We describe in this paper an audio denoising technique based on sparse linear regression with structured priors. The noisy signal is decomposed as a linear combination of atoms belonging to two modified discrete cosine transform (MDCT) bases, plus a residual part containing the noise. One MDCT basis has a long time resolution, and thus high frequency resolution, and is aimed at modeling tonal parts of the signal, while the other MDCT basis has short time resolution and is aimed at modeling transient parts (such as attacks of notes). The problem is formulated within a Bayesian setting. Conditional upon an indicator variable which is either 0 or 1, one expansion coefficient is set to zero or given a hierarchical prior. Structured priors are employed for the indicator variables; using two types of Markov chains, persistency along the time axis is favored for expansion coefficients of the tonal layer, while persistency along the frequency axis is favored for the expansion coefficients of the transient layer. Inference about the denoised signal and model parameters is performed using a Gibbs sampler, a standard Markov chain Monte Carlo (MCMC) sampling technique. We present results for denoising of a short glockenspiel excerpt and a long polyphonic music excerpt. Our approach is compared with unstructured sparse regression and with structured sparse regression in a single resolution MDCT basis (no transient layer). The results show that better denoising is obtained, both from signal-to-noise ratio measurements and from subjective criteria, when both a transient and tonal layer are used, in conjunction with our proposed structured prior framework. Index Terms—Bayesian variable selection, denoising, Markov chain Monte Carlo (MCMC) methods, nonlinear signal approximation, sparse component analysis, sparse regression, sparse representations. I.
Model-Based Analysis of Noisy Musical Recordings with Application to Audio Restoration
, 2004
"... Teknillinen korkeakoulu Sähkö- ja tietoliikennetekniikan osasto Akustiikan ja äänenkäsittelytekniikan laboratorio ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Teknillinen korkeakoulu Sähkö- ja tietoliikennetekniikan osasto Akustiikan ja äänenkäsittelytekniikan laboratorio
A complex envelope sinusoidal model for audio coding
- Proc. of the 10 th Int. Conference on Digital Audio Effects, DAFx-07
, 2007
"... A modification to the hybrid sinusoidal model is proposed for the purpose of high-quality audio coding. In our proposal the amplitude envelope of each harmonic partial is modeled by a narrowband complex signal. Such representation incorporates most of the signal energy associated with sinusoidal com ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
A modification to the hybrid sinusoidal model is proposed for the purpose of high-quality audio coding. In our proposal the amplitude envelope of each harmonic partial is modeled by a narrowband complex signal. Such representation incorporates most of the signal energy associated with sinusoidal components, including that related to frequency estimation and quantization errors. It also takes into account the natural width of each spectral line. The advantages of such model extension are a more straightforward and robust representation of the deterministic component and a clean stochastic residual without ghost sinusoids. The reconstructed signal is virtually free from harmonic artifacts and more natural sounding. We propose to encode the complex envelopes by the means of MCLT transform coefficients with coefficient interleave across partials within an MPEG-like coding scheme. We show some experimental results with high compression efficiency achieved. 1.

