Results 1 - 10
of
20
Perceptual Coding of Digital Audio
- Proceedings of the IEEE
, 2000
"... During the last decade, CD-quality digital audio has essentially replaced analog audio. Emerging digital audio applications for network, wireless, and multimedia computing systems face a series of constraints such as reduced channel bandwidth, limited storage capacity, and low cost. These new applic ..."
Abstract
-
Cited by 76 (0 self)
- Add to MetaCart
During the last decade, CD-quality digital audio has essentially replaced analog audio. Emerging digital audio applications for network, wireless, and multimedia computing systems face a series of constraints such as reduced channel bandwidth, limited storage capacity, and low cost. These new applications have created a demand for high-quality digital audio delivery at low bit rates. In response to this need, considerable research has been devoted to the development of algorithms for perceptually transparent coding of high-fidelity (CD-quality) digital audio. As a result, many algorithms have been proposed, and several have now become international and/or commercial product standards. This paper reviews algorithms for perceptually transparent coding of CD-quality digital audio, including both research and standardization activities. The paper is organized as follows. First, psychoacoustic principles are described with the MPEG psychoacoustic signal analysis model 1 discussed in some detail. Next, filter bank design issues and algorithms are addressed, with a particular emphasis placed on the Modified Discrete Cosine Transform (MDCT), a perfect reconstruction (PR) cosine-modulated filter bank that has become of central importance in perceptual audio coding. Then, we review methodologies that achieve perceptually transparent coding of FM- and CD-quality audio signals, including algorithms that manipulate transform components, subband signal decompositions, sinusoidal signal components, and linear prediction (LP) parameters, as well as hybrid algorithms that make use of more than one signal model. These discussions concentrate on architectures and applications of
New Applications of the Sound Description Interchange Format
- Proc. ICMC-98, Ann Arbor
, 1998
"... This paper describes the goals and design of SDIF and its standard frame types, followed by a review of recent SDIF work at CNMAT, IRCAM, and IUA. ..."
Abstract
-
Cited by 23 (2 self)
- Add to MetaCart
This paper describes the goals and design of SDIF and its standard frame types, followed by a review of recent SDIF work at CNMAT, IRCAM, and IUA.
Structured audio and effects processing in the MPEG-4 multimedia standard
- ACM Multimedia Sys. J
, 1999
"... . While previous generations of the MPEG multimedia standard have focused primarily on coding and transmission of content digitally sampled from the real world, MPEG-4 contains extensive support for structured, synthetic and synthetic/natural hybrid coding methods. An overview is presented of the "S ..."
Abstract
-
Cited by 14 (5 self)
- Add to MetaCart
. While previous generations of the MPEG multimedia standard have focused primarily on coding and transmission of content digitally sampled from the real world, MPEG-4 contains extensive support for structured, synthetic and synthetic/natural hybrid coding methods. An overview is presented of the "Structured Audio" and "AudioBIFS" components of MPEG-4, which enable the description of synthetic soundtracks, musical scores, and effects algorithms and the compositing, manipulation, and synchronization of real and synthetic audio sources. A discussion of the separation of functionality between the systems layer and the audio toolset of MPEG-4 is presented, and prospects for efficient DSP-based implementations are discussed.
Unsupervised analysis of polyphonic music by sparse coding
- IEEE Transactions on Neural Networks
, 2006
"... We investigate a data-driven approach to the analysis and transcription of polyphonic music, using a probabilistic model which is able to find sparse linear decompositions of a sequence of short-term Fourier spectra. The resulting system represents each input spectrum as a weighted sum of a small nu ..."
Abstract
-
Cited by 13 (3 self)
- Add to MetaCart
We investigate a data-driven approach to the analysis and transcription of polyphonic music, using a probabilistic model which is able to find sparse linear decompositions of a sequence of short-term Fourier spectra. The resulting system represents each input spectrum as a weighted sum of a small number of “atomic ” spectra chosen from a larger dictionary; this dictionary is, in turn, learned from the data in such a way as to represent the given training set in an (information theoretically) efficient way. When exposed to examples of polyphonic music, most of the dictionary elements take on the spectral characteristics of individual notes in the music, so that the sparse decomposition can be used to identify the notes in a polyphonic mixture. Our approach differs from other methods of polyphonic analysis based on spectral decomposition by combining all of the following: a) a formulation in terms of an explicitly given probabilistic model, in which the process estimating which notes are present corresponds naturally with the inference of latent variables in the model; b) a particularly simple generative model, motivated by very general considerations about efficient coding, that makes very few assumptions about the musical origins of the signals being processed; and c) the ability to learn a dictionary of atomic spectra (most of which converge to harmonic spectral profiles associated with specific notes) from polyphonic examples alone—no separate training on monophonic examples is required. Index Terms Learning overcomplete dictionaries, polyphonic music, probabilistic modeling, redundancy reduction, sparse factorial coding, unsupervised learning. ©2006 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any
Transmitting audio Content as Sound Objects
- IN PROCEEDINGS OF AES22 INTERNATIONAL CONFERENCE ON VIRTUAL, SYNTHETIC AND ENTERTAINMENT AUDIO. ESPOO, FINLAND.
, 2002
"... As audio and music applications tend to a higher level of abstraction and to fill in the gap between the signal processing world and the end-user we are more and more interested on processing content and not (only) signal. This change in point of view leads to the redefinition of several “classica ..."
Abstract
-
Cited by 11 (2 self)
- Add to MetaCart
As audio and music applications tend to a higher level of abstraction and to fill in the gap between the signal processing world and the end-user we are more and more interested on processing content and not (only) signal. This change in point of view leads to the redefinition of several “classical†concepts, and a new conceptual framework needs to be set to give support to these new trends. In [2], a model for the transmission of audio content was introduced. The model is now extended to include the idea of Sound Objects. With these thoughts in mind, examples of design decisions that have led to the implementation of the CLAM framework are also given.
Generalized Audio Coding with MPEG-4 Structured Audio
, 1999
"... This paper introduces the concept of generalized audio coding, in which the Structured Audio decoder is used to emulate the behavior of other audio decoders. We prove that the MPEG-4 Structured Audio tool can be used to mimic the behavior of any other kind of decoder and that structured-audio coding ..."
Abstract
-
Cited by 10 (3 self)
- Add to MetaCart
This paper introduces the concept of generalized audio coding, in which the Structured Audio decoder is used to emulate the behavior of other audio decoders. We prove that the MPEG-4 Structured Audio tool can be used to mimic the behavior of any other kind of decoder and that structured-audio coding is a universally minimal coding technique. We provide examples of simple natural audio coders that use the SA toolset, and characterize the overhead that arises in the transcoding process. Generalized audio coding removes marketplace barriers to the use of special-purpose or signal-adaptive coding formats, and thus promotes greater overall efficiency in the world of audio coding. INTRODUCTION
AudioBIFS: Describing Audio Scenes with the MPEG-4 Multimedia Standard
, 1999
"... We present an overview of the AudioBIFS system, part of the Binary Format for Scene Description (BIFS) tool in the MPEG-4 International Standard. AudioBIFS is the tool that integrates the synthetic and natural sound coding functions in MPEG-4. It allows the flexible construction of soundtracks and s ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
We present an overview of the AudioBIFS system, part of the Binary Format for Scene Description (BIFS) tool in the MPEG-4 International Standard. AudioBIFS is the tool that integrates the synthetic and natural sound coding functions in MPEG-4. It allows the flexible construction of soundtracks and sound scenes using compressed sound, sound synthesis, streaming audio, interactive and terminal-dependent presentation, threedimensional (3-D) spatialization, environmental auralization, and dynamic download of custom signal-processing effects algorithms. MPEG-4 sound scenes are based on a model that is a superset of the model in VRML 2.0, and we describe how MPEG-4 is built upon VRML and the new capabilities provided by MPEG4. We discuss the use of structured audio orchestra language, the MPEG-4 SAOL, for writing downloadable effects, present an example sound scene built with AudioBIFS, and describe the current state of implementations of the standard.
Structured Audio, Kolmogorov Complexity, and Generalized Audio Coding
, 2001
"... Structured-audio techniques are a recent development in audio coding that develop new connections between the existing practices of audio synthesis and audio compression. A theoretical basis for this coding model is presented, grounded in information theory and Kolmogorov complexity theory. It is de ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
Structured-audio techniques are a recent development in audio coding that develop new connections between the existing practices of audio synthesis and audio compression. A theoretical basis for this coding model is presented, grounded in information theory and Kolmogorov complexity theory. It is demonstrated that algorithmic structured audio can provide higher compression ratios than other techniques for many audio signals and proved rigorously that it can provide compression at least as good as every other technique (up to a constant term) for every audio signal. The MPEG-4 Structured Audio standard is the first practical application of algorithmic coding theory. It points the direction toward a new paradigm of generalized audio coding, in which structured-audio coding subsumes all other audio-coding techniques. Generalized audio coding offers new marketplace models that enable advances in compression technology to be rapidly leveraged toward the solution of problems in audio coding. Index Terms---Audio compression, MPEG-4, sound synthesis, structured audio. I.

