## Techniques for the Automated Analysis of Musical Audio (2003)

Citations: | 46 - 0 self |

### BibTeX

@TECHREPORT{Hainsworth03techniquesfor,

author = {Stephen Webley Hainsworth},

title = {Techniques for the Automated Analysis of Musical Audio},

institution = {},

year = {2003}

}

### Years of Citing Articles

### OpenURL

### Abstract

### Citations

4567 | A tutorial on hidden Markov models and selected applications in speech recognition
- Rabiner
- 1989
(Show Context)
Citation Context ...s of a symphonic movement. Ideas can also be gleaned from areas not specifically concerned with musical audio such as audio coding [223, 297, 334], noise reduction [124, 358], speech processing (e.g. =-=[276, 254]-=-), time-frequency signal classification [75] and blind source separation [46, 74]. None of these areas will be discussed further in this thesis. 2.5.1 Introduction 2.5 LITERATURE SURVEY OF BEAT TRACKI... |

4141 |
Pattern classification and scene analysis
- Duda, Hart
- 1973
(Show Context)
Citation Context ...fication form the second stage of the method. A fourdimensional feature vector is used, with parameters as follows: 1. FSR 2. a parameter determined by using Fisher linear discriminant analysis (LDA) =-=[103] o-=-n frequency and NP 3. one found using LDA on NFR and frequency 4. a parameter using LDA on an inversion of σf and σt, used to remove the effect of clusters consisting of only one sample, which give ... |

1735 | Orthonormal Bases of Compactly Supported Wavelets
- Daubechies
- 1988
(Show Context)
Citation Context ...s at higher frequency: the hop length used is constant throughout the frequency range and corresponds to the length used for the lowest frequency. See figure 2.8 for an illustration of this. Wavelets =-=[72]-=- are another constant-Q analysis method that have been popular in image processing and applied to music [214, 242, 243, 342]. The disadvantage of constant-Q methods in general is that they assume a lo... |

1314 | A tutorial on particle filters for on-line non-linear/non-gaussian bayesian tracking
- Arulampalam, Maskell, et al.
- 2002
(Show Context)
Citation Context ...y the algorithm. This allows the use of standard estimation procedures such as the Kalman filter [27], Markov chain Monte Carlo methods [122] or sequential Monte Carlo (particle filtering) algorithms =-=[9]-=-. Walmsley 1999 looked at beat analysis as part of his work on transcription [345] though the work on beat tracking was never published [343]. In line with his other work, the beat tracker was a fully... |

1262 |
Error bounds for convolution codes and an asymptotically optimal decoding algorithm
- Viterbi
- 1967
(Show Context)
Citation Context ...as Baum-Welch or expectation-maximisation (EM) algorithms. A full discussion is beyond the scope of this document though Rabiner [276] contains a review and discussion. However, the Viterbi algorithm =-=[341]-=- will be discussed further as it will be used in Chapter 6. The forward-backward algorithm can be used to find the probability of a path through the states for a given observation set but does not fin... |

1173 |
Novel approach to nonlinear/non gaussian bayesian state estimation
- Gordon, Salmond, et al.
- 1993
(Show Context)
Citation Context ...n [151], though their work was ignored for some time due to computational infeasibility. Interest was revived at the end of the 1980s and the first working particle filter was derived by Gordon et al =-=[128]-=- (who termed it the bootstrap filter) and simultaneously by Kitagawa [181]. Subsequently, interest increased within the engineering and statistics fields and many advances have been made. Arulampalam ... |

729 | On sequential monte carlo sampling methods for bayesian filtering
- Doucet, Godsill, et al.
- 2000
(Show Context)
Citation Context ...ootstrap filter) and simultaneously by Kitagawa [181]. Subsequently, interest increased within the engineering and statistics fields and many advances have been made. Arulampalam et al [9] and Doucet =-=[95, 99]-=- serve as good introductions and [97] gives a snapshot of the state of the art. The remainder of this chapter will introduce some of the concepts surrounding particle filtering in preparation for Chap... |

657 | Automatic musical genre classification of audio signals
- Tzanetakis, Essl, et al.
- 2001
(Show Context)
Citation Context ...tify chords; they did not test their algorithm conclusively. Musical style detection is a low level task for humans, though it can be difficult to implement in a computer algorithm. Tzanetakis & Cook =-=[326]-=-, Whitman [352] or Xu et al [361] can serve as a starting point in this somewhat underexplored area. Another low level task for humans is structure discovery, e.g. finding the chorus in popular music ... |

646 |
A generative theory of tonal music
- Lerdahl, Jackendoff
- 1983
(Show Context)
Citation Context ...d or mis-estimated event to a position on the metrical grid, a non trivial task in itself [49]. The phase of the beat is often determined by a series of stresses or accents, termed phenomenal accents =-=[199, 253]-=- or salience [90, 271]. It is generally assumed that stresses fall on the beat more often than not and that significant chordal changes also do so. While this is not always the case, and indeed many m... |

559 | Filtering via simulation: auxiliary particle filters
- Pitt, Shepherd
- 1999
(Show Context)
Citation Context ...k to a lag of length, κ) though this is algorithmically difficult. A variety of particle filtering algorithms which apply specific improvements have been produced. The auxiliary particle filter (APF)=-= [6, 267]-=- utilises a second step within the algorithm to produce an intermediary distribution, the aim being to boost the number of particles in likely 3 See [122] for more details on these.s5.5. Particle Filt... |

535 |
Auditory Scene Analysis
- BREGMAN
- 1990
(Show Context)
Citation Context ...e Tristan chord are not. He went on to account for the listening process being a constant flux of chords with varying stability.s2.2. Musical Background 25 Bregman Another approach is that of Bregman =-=[30, 31]-=- who was very much influenced by the Gestalt principles proposed in vision processing. The Gestalt theory is a bottom-up process whereby elements are grouped into wholes by rules such as proximity, co... |

502 | Sequential Monte Carlo methods for dynamic systems
- Liu, Chen
- 1998
(Show Context)
Citation Context ...ised multinomial resampling at every step, while the sequential importance resampling (SIR) filter only does so when the sample size drops below a threshold, i.e. �Neff < Nthresh. Residual resamplin=-=g [205]. Residua-=-l resampling was an attempt to reduce the variance exhibited by the multinomial sampling scheme. A number of particles, Ni = ⌊Nw (i) k ⌋, is chosen deterministically for each i, where ⌊·⌋ den... |

459 | Detection of Abrupt Changes: Theory and Application
- Basseville, Nikiforov
- 1993
(Show Context)
Citation Context ...tral change. This method shows improvements over both individual approaches. Traditional Change Detection The area of change detection is well researched and there are many books on the subject (e.g. =-=[17, 144]-=-). However, several important assumptions are made in these methods: firstly, it is assumed that the signal is generated by one of a small number of models and that these models are well defined. Seco... |

449 |
Design and analysis of modern tracking systems,” Artech House
- Blackman, Popoli
- 1999
(Show Context)
Citation Context ... [315] produced another model-based approach to polyphonic transcription. He used Pielemeier’s modal transform [261] as a front end, the peaks of which were tracked through frames by Kalman filterin=-=g [27]-=-. Following this, the tracks were associated into notes by Bayesian methods using grouping rules such as common onset and harmonicity. Multiple hypotheses were maintained via the multiple hypothesis t... |

433 |
Monte Carlo filter and smoother for non-Gaussian nonlinear state space models
- Kitagawa
- 1996
(Show Context)
Citation Context ... residual weights, Nw (i) k − Ni, i = 1, ..., N, which are then sampled from in a multinomial fashion to obtain the rest of the particles.s5.5. Particle Filtering Algorithms 115 Systematic resamplin=-=g [47, 182]. -=-Here, the weights are formed into a discrete cumulative distribution function (CDF), ci, by setting ci = ci−1 + w (i) k , i = 2, ..., N and c1 = 0. New samples are chosen by the following process: f... |

431 | signal separation: statistical principles
- Blind
- 1998
(Show Context)
Citation Context ...ncerned with musical audio such as audio coding [223, 297, 334], noise reduction [124, 358], speech processing (e.g. [276, 254]), time-frequency signal classification [75] and blind source separation =-=[46, 74]-=-. None of these areas will be discussed further in this thesis. 2.5.1 Introduction 2.5 LITERATURE SURVEY OF BEAT TRACKING Beat tracking with computers has been an active area of research since the ear... |

416 |
Speech analysis/synthesis based on a sinusoidal representation
- McAulay, Quatieri
- 1986
(Show Context)
Citation Context ...n the signal, with no knowledge or use of the underlying signal parameters. An example of this is audio coding where early systems neither took account of nor accurately represented signal transients =-=[223]-=-; this led to a resynthesised signal which was perceptually dissimilar from the original. By first analysing the signal and locating transients, these portions can be processed separately and more int... |

370 |
On the use of windows for harmonic analysis with the discrete Fourier transform
- Harris
- 1978
(Show Context)
Citation Context ...f various windows. The simplest window function is the rectangular window function which is equivalent to using the DFT on a short segment of the total signal. Others are explored in detail by Harris =-=[153]-=- who found that the Kaiser-Bessel and Blackman-Harris windows were the best for resolving the simple test signals used. The main two parameters which determine window performance are the width of the ... |

348 |
An introduction to the psychology of hearing
- Moore
- 1989
(Show Context)
Citation Context ...n, as they interact with the above discussion. The most obvious of these is auditory psychology and physiology: how the ear processes sound waves into the nerve impulses destined for the brain. Moore =-=[234]-=- is an authority on this. Beyond this in the auditory chain, there have been a number of psychoacoustic models proposed as to how pitch is perceived in the brain; examples of so-called place models ar... |

348 |
On the quantum correction for thermodynamic equilibrium, Phys
- Wigner
- 1932
(Show Context)
Citation Context ...ation to the reassignment method. 2.3.4 Bilinear Distributions The above Fourier methods all suffer from a limit on time-frequency resolution of signal components due to the windowing process. Wigner =-=[354] proposed a distrib-=-ution that was applied to the context of signal processing by Ville [338], and is now known as the Wigner-Ville (WV) distribution: W V (z; ω, t) = 1 � 2π z ∗� t − 1 2 τ� e −jωτ z � ... |

340 |
Psychoacoustics: Facts and Models
- Zwicker, Fastl
- 1990
(Show Context)
Citation Context ...applied to adjacent spectrogram frames. Recently, Duxbury, Bello et al [105, 106] have combined the previous two approaches 2 This is similar to the psychoacoustic masking thresholds found for humans =-=[234, 363]. -=-3 Gradient of E2 ≥ 0.13 was used. In order for this to be useful over all examples, the energy function was first normalised.s6.2. Musical Change Detection 126 E f Gradient of E f 2.5 2 1.5 1 0.5 0.... |

326 |
Tempo and beat analysis of acoustic musical signals
- Scheirer
- 1998
(Show Context)
Citation Context ...leator [318] M Eck [108] S 2) autocorrelative Brown [36] S Musclefish [357] A Tzanetakis et al [327] A Foote & Uchihashi [117] A Mayor [222] A Paulus & Klapuri [255] A 3) oscillating filters Scheirer =-=[292] -=-A X Large [195] S X McAuley [224] S X Toiviainen [323] S X Eck [109] A 4) histogramming Gouyon et al [137] A Seppänen [296] A X Wang & Vilermo [351] A Uhle & Herre [329] A Jensen & Andersen [168] A X... |

320 |
Time-Frequency Analysis
- Cohen
- 1995
(Show Context)
Citation Context ...z(t) is then termed the analytic signal. Much of the time-frequency mathematics is derived for analytic signals, though in reality, sometimes quadrature approximations to the analytic signal are used =-=[67]. The ene-=-rgy of the signal Ez can be written in two ways (Parseval’s theorem), namely Ez = � |z(t)| 2 dt = 1 � 2π |Z(ω)| 2 dω. (2.18) This confers the property of energy densities upon |z(t)| 2 and |Z... |

293 |
Stochastic Differential Equations
- Øksendal
- 2005
(Show Context)
Citation Context ...zk = ⎡ ⎣ τk ˙τk ⎤ ⎦ , (6.32) where τk is in beats and ˙τk is in beats per second (obviously related to bpm). Brownian motion, which is a limiting form of the random walk, defines the tem=-=po process by [10, 245] ˙τ(-=-t) = √ qB(t) + ˙τ(0), (6.33) where q controls the variance of the Brownian motion process B(t) (which is loosely the integral of a Gaussian noise process [15]) and hence the state evolution. This ... |

282 |
Cognitive Foundations of Musical Pitch
- Krumhansl
- 1990
(Show Context)
Citation Context ... are added, both from within the scale (e.g. 6 th s, 9 th s) and out of it (e.g. flattened 10 th s, sharpened 13 th s), which gives jazz its characteristic feel. Pitch Perception and Models Krumhansl =-=[191]-=- has conducted extensive studies into scales and tonal context. She produced profiles for each type of scale defining the perceptual match of each semitone within the tonal context or mode. Dowling [1... |

267 | Rao-Blackwellised particle filtering for dynamic Bayesian networks
- Doucet, Freitas, et al.
- 2000
(Show Context)
Citation Context ... space up in this way, the dimensionality considered in the particle filter itself is dramatically reduced and the number of particles needed to achieve a given accuracy is also significantly reduced =-=[98]. 5-=-.6 SUMMARY In this chapter, a stochastic approach to modelling physical phenomena was introduced. State space representations for time-varying systems were introduced (§5.2) and Bayes’ theorem used... |

234 | Content-based classification, search and retrieval of audio
- Wold, Blum, et al.
- 1996
(Show Context)
Citation Context ...49 Input Causal 1) rule-based Steedman [310] S Longuet-Higgins & Lee [207] S Povel & Essens [271] S Parncutt [253] S Temperley & Sleator [318] M Eck [108] S 2) autocorrelative Brown [36] S Musclefish =-=[357]-=- A Tzanetakis et al [327] A Foote & Uchihashi [117] A Mayor [222] A Paulus & Klapuri [255] A 3) oscillating filters Scheirer [292] A X Large [195] S X McAuley [224] S X Toiviainen [323] S X Eck [109] ... |

188 | Non-Linear and Non-Stationary Time Series Analysis - PRIESTLEY - 1988 |

185 |
The Analysis and Cognition of Basic Melodic Structures: The Implication-Realization Model
- Narmour
- 1990
(Show Context)
Citation Context ... a similar point regarding the use of notated music as a starting point for cognitive research. Narmour A comprehensive theory which maybe accounted for some of the above criticism is that of Narmour =-=[239, 240]. In c-=-ontrast to the GTTM which aimed “to account for the musical intuitions of a listener experienced in a particular musical idiom” [166], Narmour claimed “The theory is intended to capture the ever... |

183 |
The logical structure of linguistic theory
- Chomsky
- 1955
(Show Context)
Citation Context ...erdahl & Jackendoff, the Generative Theory of Tonal Music (GTTM) [199]. This is a rule based approach where the musical input is parsed in an analogous manner to the generative linguistics of Chomsky =-=[64].-=-s2.2. Musical Background 23 SCHENKER STRUCTURE Longuet-Higgins Dowling LERDAHL & JACKENDOFF Deliège Steedman Meredith Krumhansl Jones Cambouropoulos Parncutt NARMOUR TEMPERLEY COGNITION BREGMAN Leman... |

183 | Adaptive filtering and change detection
- Gustafsson
- 2000
(Show Context)
Citation Context ...tral change. This method shows improvements over both individual approaches. Traditional Change Detection The area of change detection is well researched and there are many books on the subject (e.g. =-=[17, 144]-=-). However, several important assumptions are made in these methods: firstly, it is assumed that the signal is generated by one of a small number of models and that these models are well defined. Seco... |

178 |
The Cognition of Basic Musical Structure
- Temperley
- 2001
(Show Context)
Citation Context ...mensurate to their level of knowledge and use these to build up a global structure representation. She produced a set of rules to this effect and tested them on several examples. Another is Temperley =-=[317-=-] who took a more computational standpoint and implemented a number of systematic rules on computer to account for such expe2.2. Musical Background 24 riences as metrical structure, pitch spelling, ke... |

177 |
The Physics of Musical Instruments
- Fletcher, Rossing
- 1998
(Show Context)
Citation Context ...es. Tones are ideally periodic sine waves but in real instruments, non-linearities in the source (and deliberate variations introduced by the performer) usually mean that they are only quasi-periodic =-=[116]-=-. The pitch, which is a perceptual quality, is determined by the frequency of the fundamental tone and the harmonic tones whose frequencies are approximately integer multiples of this fundamental 2 . ... |

173 |
Computational auditory scene analysis
- Brown, Cooke
- 1994
(Show Context)
Citation Context ...later attempted to produce working computer implementations of this, calling it computer auditory scene analysis: Ellis’s [110] focus was upon general auditory scenes (e.g. a city street) while Brow=-=n [33, 34] con-=-centrated more upon speech and musical examples. Slaney [302] criticises what he terms ‘pure audition’, i.e. bottom-up cognitive models where there is little top-down flow of information, citing B... |

168 |
A.E.: Bayesian statistics without tears: a samplingresampling perspective
- Smith, Gelfand
- 1992
(Show Context)
Citation Context ...ampling schemes should be unbiased though Crisan & Doucet [69] suggest that this need not be the case. Various resampling schemes have been proposed and shall now be discussed. Multinomial resampling =-=[290, 305]. T-=-he idea here is to resample (with replacement) N times from the approximate discrete distribution given in (5.51) so that the Pr(x ∗(j) k = ˜x(i) k ) = w(i) k . This is equivalent to simulating {Ni... |

167 | Automatic Extraction of Tempo and Beat from Expressive Performances
- Dixon
- 2001
(Show Context)
Citation Context ...t to a position on the metrical grid, a non trivial task in itself [49]. The phase of the beat is often determined by a series of stresses or accents, termed phenomenal accents [199, 253] or salience =-=[90, 271]-=-. It is generally assumed that stresses fall on the beat more often than not and that significant chordal changes also do so. While this is not always the case, and indeed many musical styles exhibit ... |

167 |
Using the SIR algorithm to simulate posterior distributions
- Rubin
- 1988
(Show Context)
Citation Context ...ampling schemes should be unbiased though Crisan & Doucet [69] suggest that this need not be the case. Various resampling schemes have been proposed and shall now be discussed. Multinomial resampling =-=[290, 305]. T-=-he idea here is to resample (with replacement) N times from the approximate discrete distribution given in (5.51) so that the Pr(x ∗(j) k = ˜x(i) k ) = w(i) k . This is equivalent to simulating {Ni... |

165 | Prediction-driven computational auditory scene analysis. Unpublished doctoral dissertation
- Ellis
- 1996
(Show Context)
Citation Context ... transform of the absolute squared Fourier coefficients (or power density spectrum). The correlogram is a model of the auditory processing within the ear and which relies heavily upon autocorrelation =-=[33, 111, 209]. Brown -=-[37, 40] devised a ‘narrowed’ autocorrelation function and Tolonen described an ‘enhanced summary’ autocorrelation function [324], both in the context of music audio analysis. The downfall of ... |

163 | Non-negative matrix factorization for polyphonic music transcription
- Smaragdis, Brown
- 2003
(Show Context)
Citation Context ...A) where a mathematical separation of number of sources is considered. Ideally, the sources are independent [46] but in music this is not the case. Some investigations into ICA for music are given in =-=[1, 188, 304]-=- and in the more usual speech separation context by Davies [74]. Also important for separating spectra are instrument models informed by the earlier decisions on which instruments are present. The cho... |

163 | The unscented particle filter
- Merwe, Freitas, et al.
- 2001
(Show Context)
Citation Context ... update the filter. The EKF breaks down if the model becomes highly non-linear, when significant errors are introduced into the posterior due to the use of only the first term in the Taylor expansion =-=[331]-=-. Another improvement is the unscented Kalman filter (UKF) where so-called sigma points are deterministically chosen to reflect the statistics of the distribution at a given time step (e.g. by taking ... |

161 |
Rao-blackwellisation of sampling schemes
- Casella, Robert
- 1996
(Show Context)
Citation Context ... UKF (see §5.3.1) to particle filters and is better than using a straight local linearisation of the importance distribution (e.g. [99]) when this is an issue. Finally, so-called Rao-Blackwellisation=-= [48, 100]-=- can be used if the state space can be broken down conveniently into two sets of variables. Jump Markov linear systems (JMLS) [96] typify this where the state space, x0:k, is broken down into {r0:k, z... |

159 |
Musical Sound Modeling with Sinusoids Plus Noise
- Serra
- 1997
(Show Context)
Citation Context ...esynthesised signal which was perceptually dissimilar from the original. By first analysing the signal and locating transients, these portions can be processed separately and more intelligently (e.g. =-=[297, 335]-=- in the audio coding case). This document explores techniques for analysing musical audio signals and for eliciting their content. A complete representation of all the musical structure and related in... |

151 | A survey of convergence results on particle filtering
- Crisan, Doucet
- 2002
(Show Context)
Citation Context ...hat if the samples are not statistically independent, then a SLLN can still be found under weak conditions and a central limit theorem under stronger conditions. More convergence results are given in =-=[69, 97]-=-. The problem with the above discussion is that it is usually impossible to sample from p(x0:k|y1:k) at any given time tk because the posterior is multivariate, non-standard and known only up to a pro... |

145 | Estimating and Interpreting the Instantaneous Frequency of a Signal -Part 2: Algorithms and - Boashash - 1992 |

140 | An audio-based real-time beat tracking system for music with or without drum-sounds
- Goto
- 2001
(Show Context)
Citation Context ...) histogramming Gouyon et al [137] A Seppänen [296] A X Wang & Vilermo [351] A Uhle & Herre [329] A Jensen & Andersen [168] A X 5) multiple agent Allen & Dannenberg [4] M Rosenthal [287] A Goto et al=-= [129]-=- A X Dixon [90] A/M 6) probabilistic Laroche [196] A Walmsley [343] A Cemgil et al [49, 50] M Raphael [279] A/M Morris & Sethares 2 [237] A Klapuri [185] A X Lam & Godsill [192] A Table 2.1: Table sum... |

135 | Particle Filters for State Estimation of Jump Markov Linear Systems
- Doucet, Gordon, et al.
- 1999
(Show Context)
Citation Context ...ce of particle filtering algorithms. Gordon et al [128] suggest the use of jitter, where each particle has a small amount of noise added to introduce diversity. Berzuini & Gilks [22] and Doucet et al =-=[100] s-=-uggest the use of a Markov transition kernel, q(x ′ 0:k |x0:k). If a Gibbs or a Metropolis-Hastings (MH) step 3 of invariant distribution is used, then the new set of particles is still distributed ... |

121 | A real-time music-scene-description system: predominant-f0 estimation for detecting melody and bass lines in real-world audio signals
- Goto
- 2004
(Show Context)
Citation Context ...ingle notes. The analysed results were then used for synthesis and perceptually better sounds were generated as a result. Odegard et al [244] used frequency reassignment for seismic analysis and Goto =-=[132]-=- used an instantaneous frequency method based on Abe [3] as a component in his program to extract melody and bass fundamental frequencies from audio recordings. An application not involved with time s... |

120 |
Virtual pitch and phase sensitivity of a computer model of the auditory periphery— I: pitch identification
- Meddis, Hewitt
- 1991
(Show Context)
Citation Context ...o how pitch is perceived in the brain; examples of so-called place models are Goldstein [126], Terhardt [320] and Wightman [353]. In contrast, temporal models include Licklider [204], Meddis & Hewitt =-=[225, 226]-=- and Slaney & Lyon [303]. A fuller discussion can be found in [145, 345]. Another branch of research which merits mention is that of performance analysis. Often it is easier to analyse the mechanical ... |

116 | To catch a chorus: using chroma-based representations for audio thumbnailing
- Bartsch, Wakefield
- 2001
(Show Context)
Citation Context ..., Whitman [352] or Xu et al [361] can serve as a starting point in this somewhat underexplored area. Another low level task for humans is structure discovery, e.g. finding the chorus in popular music =-=[16, 131]-=- or labelling the relevant parts of a symphonic movement. Ideas can also be gleaned from areas not specifically concerned with musical audio such as audio coding [223, 297, 334], noise reduction [124,... |

112 |
Zero-crossings of a wavelet transform
- Mallat
- 1991
(Show Context)
Citation Context ... length used for the lowest frequency. See figure 2.8 for an illustration of this. Wavelets [72] are another constant-Q analysis method that have been popular in image processing and applied to music =-=[214, 242, 243, 342]-=-. The disadvantage of constant-Q methods in general is that they assume a logarithmic relationship for the data in music. While this is true for the interval relationships between notes (leading to th... |