Results 1 - 10
of
26
Chord Segmentation and Recognition using EM-Trained Hidden Markov Models
, 2003
"... Automatic extraction of content description from commercial audio recordings has a number of important applications, from indexing and retrieval through to novel musicological analyses based on very large corpora of recorded performances. Chord sequences are a description that captures much of ..."
Abstract
-
Cited by 64 (2 self)
- Add to MetaCart
Automatic extraction of content description from commercial audio recordings has a number of important applications, from indexing and retrieval through to novel musicological analyses based on very large corpora of recorded performances. Chord sequences are a description that captures much of the character of a piece in acompact form and using a modest lexicon. Chords also have the attractive property that a piece of music can (mostly) be segmented into time intervals that consistofasingle chord, much as recorded speech can (mostly)besegmented into time intervals that correspond to specific words. In this work, we build a system for automatic chord transcription using speech recognition tools. For features we use "pitch class profile" vectors to emphasize the tonal content of the signal, and we show that these features far outperform cepstral coefficients for our task. Sequence recognition is accomplished with hidden Markov models (HMMs) directly analogous to subword models in a speech recognizer, and trained by the same Expectation-Maximization (EM) algorithm.
A Generative Model for Music Transcription
, 2005
"... In this paper we present a graphical model for polyphonic music transcription. Our model, formulated as a Dynamical Bayesian Network, embodies a transparent and computationally tractable approach to this acoustic analysis problem. An advantage of our approach is that it places emphasis on explicitl ..."
Abstract
-
Cited by 26 (7 self)
- Add to MetaCart
In this paper we present a graphical model for polyphonic music transcription. Our model, formulated as a Dynamical Bayesian Network, embodies a transparent and computationally tractable approach to this acoustic analysis problem. An advantage of our approach is that it places emphasis on explicitly modelling the sound generation procedure. It provides a clear framework in which both high level (cognitive) prior information on music structure can be coupled with low level (acoustic physical) information in a principled manner to perform the analysis. The model is a special case of the, generally intractable, switching Kalman filter model. Where possible, we derive, exact polynomial time inference procedures, and otherwise efficient approximations. We argue that our generative model based approach is computationally feasible for many music applications and is readily extensible to more general auditory scene analysis scenarios.
Personal communication with A. Agogino
- IEEE Trans. Audio, Speech, and Language Proc
, 2007
"... Abstract — Although the process of analyzing an audio recording of a music performance is complex and difficult even for a human listener, there are limited forms of information that may be tractably extracted and yet still enable interesting applications. We discuss melody – roughly, the part a lis ..."
Abstract
-
Cited by 16 (4 self)
- Add to MetaCart
Abstract — Although the process of analyzing an audio recording of a music performance is complex and difficult even for a human listener, there are limited forms of information that may be tractably extracted and yet still enable interesting applications. We discuss melody – roughly, the part a listener might whistle or hum – as one such reduced descriptor of music audio, and consider how to define it, and what use it might be. We go on to describe the results of full-scale evaluations of melody transcription systems conducted in 2004 and 2005, including an overview of the systems submitted, details of how the evaluations were conducted, and a discussion of the results. For our definition of melody, current systems can achieve around 70 % correct transcription at the frame level, including distinguishing between the presence or absence of the melody. Melodies transcribed at this level are readily recognizable, and show promise for practical applications. I.
Polyphonic Music Modeling with Random Fields
- MM'03
, 2003
"... Recent interest in the area of music information retrieval and related technologies is exploding. However, very few of the existing techniques take advantage of recent developments in statistical modeling. In this paper we discuss an application of Random Fields to the problem of creating accurate y ..."
Abstract
-
Cited by 14 (1 self)
- Add to MetaCart
Recent interest in the area of music information retrieval and related technologies is exploding. However, very few of the existing techniques take advantage of recent developments in statistical modeling. In this paper we discuss an application of Random Fields to the problem of creating accurate yet flexible statistical models of polyphonic music. With such models in hand, the challenges of developing e#ective searching, browsing and organization techniques for the growing bodies of music collections may be successfully met. We o#er an evaluation of these models in terms of perplexity and prediction accuracy, and show that random fields not only outperform Markov chains, but are much more robust in terms of overfitting.
Automatic Labelling of Tabla Signals
- In Proc. of the 4th ISMIR Conf
, 2003
"... Most of the recent developments in the field of music indexing and music information retrieval are focused on western music. In this paper, we present an automatic music transcription system dedicated to Tabla - a North Indian percussion instrument. Our approach is based on three main steps: fi ..."
Abstract
-
Cited by 13 (1 self)
- Add to MetaCart
Most of the recent developments in the field of music indexing and music information retrieval are focused on western music. In this paper, we present an automatic music transcription system dedicated to Tabla - a North Indian percussion instrument. Our approach is based on three main steps: firstly, the audio signal is segmented in adjacent segments where each segment represents a single stroke. Secondly, rhythmic information such as relative durations are calculated using beat detection techniques. Finally, the transcription (recognition of the strokes) is performed by means of a statistical model based on Hidden Markov Model (HMM). The structure of this model is designed in order to represent the time dependencies between successives strokes and to take into account the specificities of the tabla score notation (transcription symbols may be context dependent). Realtime transcription of Tabla soli (or performances) with an error rate of 6.5% is made possible with this transcriber. The transcription system, along with some additional features such as sound synthesis or phrase correction, are integrated in a user-friendly environment called Tablascope.
Unsupervised analysis of polyphonic music by sparse coding
- IEEE Transactions on Neural Networks
, 2006
"... We investigate a data-driven approach to the analysis and transcription of polyphonic music, using a probabilistic model which is able to find sparse linear decompositions of a sequence of short-term Fourier spectra. The resulting system represents each input spectrum as a weighted sum of a small nu ..."
Abstract
-
Cited by 13 (3 self)
- Add to MetaCart
We investigate a data-driven approach to the analysis and transcription of polyphonic music, using a probabilistic model which is able to find sparse linear decompositions of a sequence of short-term Fourier spectra. The resulting system represents each input spectrum as a weighted sum of a small number of “atomic ” spectra chosen from a larger dictionary; this dictionary is, in turn, learned from the data in such a way as to represent the given training set in an (information theoretically) efficient way. When exposed to examples of polyphonic music, most of the dictionary elements take on the spectral characteristics of individual notes in the music, so that the sparse decomposition can be used to identify the notes in a polyphonic mixture. Our approach differs from other methods of polyphonic analysis based on spectral decomposition by combining all of the following: a) a formulation in terms of an explicitly given probabilistic model, in which the process estimating which notes are present corresponds naturally with the inference of latent variables in the model; b) a particularly simple generative model, motivated by very general considerations about efficient coding, that makes very few assumptions about the musical origins of the signals being processed; and c) the ability to learn a dictionary of atomic spectra (most of which converge to harmonic spectral profiles associated with specific notes) from polyphonic examples alone—no separate training on monophonic examples is required. Index Terms Learning overcomplete dictionaries, polyphonic music, probabilistic modeling, redundancy reduction, sparse factorial coding, unsupervised learning. ©2006 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any
Bayesian Models of Musical Structure and Cognition
- MUSICAE SCIENTIAE
, 2004
"... This paper explores the application of Bayesian probabilistic modeling to issues of music cognition and music theory. The main concern is with the problem of key-finding: the process of inferring the key from a pattern of notes. The Bayesian perspective leads to a simple, elegant, and highly effecti ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
This paper explores the application of Bayesian probabilistic modeling to issues of music cognition and music theory. The main concern is with the problem of key-finding: the process of inferring the key from a pattern of notes. The Bayesian perspective leads to a simple, elegant, and highly effective model of this process; the same approach can also be extended to other aspects of music perception, such as metrical structure and melodic structure. Bayesian modeling also relates in interesting ways to a number of other musical issues, including musical tension, ambiguity, expectation, and the quantitative description of styles and stylistic differences.
Bayesian Analysis of Polyphonic Western Tonal Music
- Journal of the Acoustical Society of America
, 2006
"... This paper deals with the computational analysis of musical audio from recorded audio waveforms. This general problem includes, as sub-tasks, music transcription, extraction of musical pitch, dynamics, timbre, instrument identity, and source separation. Analysis of real musical signals is a highly ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
This paper deals with the computational analysis of musical audio from recorded audio waveforms. This general problem includes, as sub-tasks, music transcription, extraction of musical pitch, dynamics, timbre, instrument identity, and source separation. Analysis of real musical signals is a highly ill-posed task which is made complicated by the presence of transient sounds, background interference or the complex structure of musical pitches in the time-frequency domain. This paper focuses on models and algorithms for computer transcription of multiple musical pitches in audio, elaborated from previous work by two of the authors. The audio data are supposedly pre-segmented into fixed pitch regimes such as individual chords. The models presented apply to pitched (tonal) music and are formulated via a Gabor representation of non-stationary signals. A Bayesian probabilistic structure is employed for representation of prior information about the parameters of the notes. This paper introduces a numerical Bayesian inference strategy for estimation of the pitches and other parameters of the waveform. The improved algorithm is much quicker, and makes the approach feasible in realistic sitautions.
Specmurt Analysis of Multi-Pitch Music Signals with Adaptive Estimation
- of Common Harmonic Structure,” Proc. Int. Symp. Music Info. Retrieval, 2005
"... This paper describes a multi-pitch analysis method using specmurt analysis with iterative estimation of the quasioptimal common harmonic structure function. Specmurt analysis (Sagayama et al., 2004) is based upon the idea that superimposed harmonic structure pattern can be expressed as a convolution ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
This paper describes a multi-pitch analysis method using specmurt analysis with iterative estimation of the quasioptimal common harmonic structure function. Specmurt analysis (Sagayama et al., 2004) is based upon the idea that superimposed harmonic structure pattern can be expressed as a convolution of two components, a fundamental frequency distribution and a ‘common harmonic structure ’ function if each underlying tone component has similar harmonic structure pattern. As proved in our previous work (Sagayama et al., 2004) inappropriate common structure function leads to inaccurate analysis results. The iterative algorithm proposed in this paper automatically chooses a proper structure, which results in finding concurrent multiple fundamental frequencies and reduces the dependency on heuristically chosen initial common harmonic structure. The experimental evaluation showed promising results.

