## Time Series Representations for Music Information Retrieval (2004)

Citations: | 1 - 1 self |

### BibTeX

@MISC{Adams04timeseries,

author = {Norman H. Adams},

title = {Time Series Representations for Music Information Retrieval},

year = {2004}

}

### OpenURL

### Abstract

Time series representations are common in MIR applications such as query-by-humming, where a sung query might be represented by a series of `notes' for database retrieval. While such a transcription into (pitch, duration) pairs is convenient and musically intuitive, there is no evidence that it is an optimal representation. The present work explores three time series representations for sung queries: a sequence of notes, a `smooth' pitch contour, and a novel sequence of pitch histograms. Dynamic alignment procedures are described for the three representations. Multiple continuity constraints are explored and a modified dynamic alignment procedure is described for the histogram representation. We measure the performance of the three representations using a collection of naturally sung queries applied to a target database of varying size. The results show that the note representation lends itself to rapid retrieval whereas the contour representation lends itself to robust performance. The histogram representation yields performance nearly as robust as the contour representation, but with computational complexity similar to the note representation.

### Citations

899 |
Algorithms on Strings, Trees, and Sequences
- Gusfield
- 1997
(Show Context)
Citation Context ...ethod. of reliably detecting note boundaries, a direct comparison of query and target sequences yields poor performance. A more robust method for comparison is taken from biological sequence analysis =-=[12]-=-. Let the estimated melody be given by a sequence of K `notes' pairs, Q = (q 1 , q 2 , , qK ) where q k = ( q k,pit , q k,dur ). Similarly, let a target theme be given by a sequence of N `notes' pairs... |

834 | A tutorial on hidden Markov models - Rabiner, Juang - 1989 |

430 | Dynamic programming algorithm optimization for spoken word recognition
- Sakoe, Chiba
- 1978
(Show Context)
Citation Context ...ery contour of "Row, Row, Row Your Boat" with the true melody. A more robust metric is yielded with dynamic time warping (DTW), an alignment procedure popularized in the 1970's for speech re=-=cognition [14, 36, 37]-=-. The increasing number of multimedia applications for large collections of time series has recently reinvigorated interest in DTW within the database community [18, 34, 42, 43]. To find the optimal a... |

192 | Query by Humming: Musical Information Retrieval in an Audio Database
- Ghias, Logan, et al.
- 1995
(Show Context)
Citation Context ... DTW algorithm, hence a modified dynamic alignment procedure is presented. We evaluate the performance of the three representations in the context of a query-by-humming system. Early QBH successes by =-=[11,15,28]-=- have inspired numerous alternative techniques [9, 13, 32]. While many exciting and novel ideas have been explored, results are often inconclusive and difficult to generalize [9,10]. In the present wo... |

182 |
Fundamentals of Statistical Signal Processing: Detection Theory
- Kay
- 1998
(Show Context)
Citation Context ... fixed. Simultaneously searching the space of all possible note boundaries and note pitches is considerably more difficult than first searching the space of note boundaries and then assigning pitches =-=[1, 16]-=-. Many QBH systems [11, 22, 23, 28] require the user to articulate each note with a separate `da' or `ta'; the user is required to perform note segmentation. In this case, a simple amplitude threshold... |

173 | Efficient retrieval of similar time sequences under time warping
- Yi, Jagadish, et al.
- 1998
(Show Context)
Citation Context ...esentations lend themselves to rapid retrieval whereas detailed representations lend themselves to accurate retrieval. Time series representations are ubiquitous in information retrieval applications =-=[18,43]-=-. MIR is no exception; sequences of notes, pitch estimates, or MFCCs, for example, are common [7, 9, 25]. It is well established that for comparing two time series, a direct comparison, such as the Eu... |

155 | Dimensionality Reduction for fast similarity search in large time series databases
- Keogh, Chakrabarti, et al.
- 2000
(Show Context)
Citation Context ...esentations lend themselves to rapid retrieval whereas detailed representations lend themselves to accurate retrieval. Time series representations are ubiquitous in information retrieval applications =-=[18,43]-=-. MIR is no exception; sequences of notes, pitch estimates, or MFCCs, for example, are common [7, 9, 25]. It is well established that for comparing two time series, a direct comparison, such as the Eu... |

139 | Accurate short-term analysis of the fundamental frequency and the harmonics-tonoise ratio of a sampled sound
- Boersma
- 1993
(Show Context)
Citation Context ...is lag-scaled and subtracted from itself to remove false peaks 4 . Smooth contours are then constructed using a nearest-neighbor algorithm. In the present work we use an algorithm proposed by Boersma =-=[4, 6]-=-, which we found to be robust to the volatile pitch fluctuations common to untrained singers. The algorithm computes the autocorrelation for overlapping windows of recorded data. The bias of the windo... |

132 |
Minimum prediction residual principle applied to speech recognition
- Itakura
- 1975
(Show Context)
Citation Context ...ery contour of "Row, Row, Row Your Boat" with the true melody. A more robust metric is yielded with dynamic time warping (DTW), an alignment procedure popularized in the 1970's for speech re=-=cognition [14, 36, 37]-=-. The increasing number of multimedia applications for large collections of time series has recently reinvigorated interest in DTW within the database community [18, 34, 42, 43]. To find the optimal a... |

100 | A probabilistic approach to fast pattern matching in time series databases
- Keogh, Smyth
- 1997
(Show Context)
Citation Context ...ries approximated by such piecewise linear representations [20]. 3.5.1 Implementation In the present work the piecewise linear approximation is computed using a simple `bottom-up' recursive algorithm =-=[19,21,33]-=-. The initial approximation is given by consecutive pairs of contour values; there are K 2 initial segments yielding zero approximation error. The algorithm then merges the two neighboring segments th... |

96 | A computationally efficient multipitch analysis model
- Tolonen, Karjalainen
- 2000
(Show Context)
Citation Context ...orrelation-based algorithm for estimating the pitch contour, but the two methods `tweak' the autocorrelation in different ways. Mazzoni & Dannenberg use an algorithm proposed by Tolonen & Karjalainen =-=[40]-=- in which the autocorrelation is lag-scaled and subtracted from itself to remove false peaks 4 . Smooth contours are then constructed using a nearest-neighbor algorithm. In the present work we use an ... |

80 |
The New Zealand Digital Library MELody inDEX
- McNab, Smith, et al.
- 1997
(Show Context)
Citation Context ...ckground & Motivation To date, most QBH systems first transcribe the sung melody into a sequence of `notes' 2 , and use this representation to search a database of similarly represented target themes =-=[2, 5, 11, 22, 27, 28]-=-. While a note representation may be musically intuitive, there is no hard evidence to imply that it is optimal in any specific technical sense. Neglecting the musical justification for notes, the tra... |

51 | Scaling Up Dynamic Time Warping to Massive Datasets
- Keogh, Pazzani
- 1999
(Show Context)
Citation Context ...general and Itakura continuity constraints. applicability to the current work is Keogh's use of DTW to measure the similarity between time series approximated by such piecewise linear representations =-=[20]-=-. 3.5.1 Implementation In the present work the piecewise linear approximation is computed using a simple `bottom-up' recursive algorithm [19,21,33]. The initial approximation is given by consecutive p... |

47 | Melody matching directly from audio - Mazzoni - 2001 |

37 | N.: The musart testbed for query-by-humming evaluation
- Dannenberg, Birmingham, et al.
(Show Context)
Citation Context ...curate retrieval. Time series representations are ubiquitous in information retrieval applications [18,43]. MIR is no exception; sequences of notes, pitch estimates, or MFCCs, for example, are common =-=[7, 9, 25]-=-. It is well established that for comparing two time series, a direct comparison, such as the Euclidean distance, yields a brittle metric [17,43]. A similarity metric that is robust to elastic shifts ... |

33 | CubyHum: A fully operational query by humming system
- Pauws
- 2002
(Show Context)
Citation Context ... present work we compare the relative merits of three time series representations for sung queries. Two representations have been previously proposed: a pitch contour 1 [25], and a sequences of notes =-=[9, 28, 32]-=-. A unified presentation of the alignment procedure is given for the two representations. For the contour representation we explore multiple continuity constraints and find that judicious slope constr... |

32 | Toward the Scientific Evaluation of Music Information Retrieval Systems - Downie |

25 |
HMMBased Musical Query Retrieval
- Shifrin, Pardo, et al.
- 2002
(Show Context)
Citation Context ...omputational complexity of comparing a query to a target be kept to a minimum. Other methods have been proposed for comparing query and target note representations, such as hidden Markov models (HMM) =-=[29, 38]-=-. While HMMs' have been found to yield good results in some situations, they are not considered in this work. Note representations and string-matching are discussed in detail in Section 4. Some resear... |

21 |
Waveform Segmentation Through Functional Approximation
- Pavlidis
- 1973
(Show Context)
Citation Context ...ries approximated by such piecewise linear representations [20]. 3.5.1 Implementation In the present work the piecewise linear approximation is computed using a simple `bottom-up' recursive algorithm =-=[19,21,33]-=-. The initial approximation is given by consecutive pairs of contour values; there are K 2 initial segments yielding zero approximation error. The algorithm then merges the two neighboring segments th... |

21 |
Mid-level music melody representation of polyphonic audio for query-by-humming system
- Song, Bae, et al.
- 2002
(Show Context)
Citation Context ...ach representation rather than a specific implementation. Finally, we propose a novel sequence of pitch histograms as an alternative query representation. Unlike other pitch histogram representations =-=[13, 39]-=-, which were proposed to allow for errors in pitch detection, we employ pitch histograms to eliminate ambiguous timing information, thus quickening the alignment procedure. This representation violate... |

19 |
Johnny Can't Sing: A Comprehensive Error Model for Sung Music Queries
- Meek, Birmingham
- 2002
(Show Context)
Citation Context ...arget theme be given by a sequence of N `notes' pairs, T = (t 1 , t 2 , , t N ). Because note off-set time is an unreliable statistic, inter-onset interval (IOI) is used in the place of note duration =-=[29]-=-. The prevailing method for aligning two sequences of notes is string matching [12]. The alignment is achieved by inserting, deleting, and replacing elements of the query sequence in order to match th... |

18 |
Melody Retrieval with Humming
- Kageyama, Mochiezuki, et al.
- 1993
(Show Context)
Citation Context ... DTW algorithm, hence a modified dynamic alignment procedure is presented. We evaluate the performance of the three representations in the context of a query-by-humming system. Early QBH successes by =-=[11,15,28]-=- have inspired numerous alternative techniques [9, 13, 32]. While many exciting and novel ideas have been explored, results are often inconclusive and difficult to generalize [9,10]. In the present wo... |

7 | Analysis of vowels in sung queries for a music information retrieval system
- Mellody, Bartsch, et al.
(Show Context)
Citation Context ...formation. Singers can sing the same melody with varying amplitude envelopes, lyrics and style. Furthermore, reliable timbrel or phonetic-stream information is difficult to extract from sung melodies =-=[30]-=-. As such, we desire our query representation to be, at least nominally, independent of such variables. An estimate of the sung pitch contour is therefore a natural query representation. Direct use of... |

7 |
Recognition of isolated musical patterns using context dependent dynamic time warping
- Pikrakis, Theodoridis, et al.
(Show Context)
Citation Context ...0's for speech recognition [14, 36, 37]. The increasing number of multimedia applications for large collections of time series has recently reinvigorated interest in DTW within the database community =-=[18, 34, 42, 43]-=-. To find the optimal alignment path between Q and T we recursively compute an NK matrix of minimum prefix alignment costs, # = [# n,k ]. # n,k is the minimum alignment cost for (q 1 q k ) and (t 1 t ... |

3 | Automatic Segmentation of Sung Melodies
- Adams
- 2002
(Show Context)
Citation Context ...on is one method for reducing the dimension of the query for database searching. The present work compares five note transcription methods. Three of the note estimators have been previously described =-=[1,2]-=-. The other two new methods will be described in greater detail [3]. 4.1 Note Segmentation Often, naturally sung melody transcription is implemented as a three-stage process. First, the sung pitch con... |

3 | Automatic Assessment of the Spasmodic Voice - Bartsch - 2002 |

3 |
Iterative Deepening Dynamic Time Warping for Time
- Chu, Keogh, et al.
- 2002
(Show Context)
Citation Context ...ment boundaries are defined using a maximum allowable segment approximation error rather than a constant segment length. This approach to dimension reduction has been recently explored by Keogh et al =-=[8,19--21]-=-. Of immediate 14 Edit-Distance Path Boundaries Itakura Path Boundaries Figure 8: Alignment path boundaries for the general and Itakura continuity constraints. applicability to the current work is Keo... |

3 |
Exact Indexing of Dynamic Time
- Keogh
- 2002
(Show Context)
Citation Context ... pitch estimates, or MFCCs, for example, are common [7, 9, 25]. It is well established that for comparing two time series, a direct comparison, such as the Euclidean distance, yields a brittle metric =-=[17,43]-=-. A similarity metric that is robust to elastic shifts and scales 1 of the time index is required. This has reinvigorated interest in dynamic time warping (DTW), string-matching, and other efficient t... |

3 |
Evaluation of a Melody Transcription System
- McNab, Smith
(Show Context)
Citation Context ...our representation. The plot on the left gives the performance vs. total approximation error and the right plot gives the performance vs. average query dimension. rudimentary note segmentation method =-=[26,29]-=-. This restriction is undesirable however, especially for systems designed for untrained singers. Three segmentation methods presented in [1] are included in the present work: a smoothed pitch derivat... |

3 |
An enhanced dynamic time warping model for improved estimation of dtw parameters
- Yaniv, Burshtein
(Show Context)
Citation Context ...0's for speech recognition [14, 36, 37]. The increasing number of multimedia applications for large collections of time series has recently reinvigorated interest in DTW within the database community =-=[18, 34, 42, 43]-=-. To find the optimal alignment path between Q and T we recursively compute an NK matrix of minimum prefix alignment costs, # = [# n,k ]. # n,k is the minimum alignment cost for (q 1 q k ) and (t 1 t ... |

2 |
Robust Sound Modeling for Song Identification
- Cano, Batlle, et al.
- 2002
(Show Context)
Citation Context ...curate retrieval. Time series representations are ubiquitous in information retrieval applications [18,43]. MIR is no exception; sequences of notes, pitch estimates, or MFCCs, for example, are common =-=[7, 9, 25]-=-. It is well established that for comparing two time series, a direct comparison, such as the Euclidean distance, yields a brittle metric [17,43]. A similarity metric that is robust to elastic shifts ... |

2 | Three Dimensional Continuous DP Algorithm for Multiple
- Heo, Suzuki, et al.
(Show Context)
Citation Context ...ach representation rather than a specific implementation. Finally, we propose a novel sequence of pitch histograms as an alternative query representation. Unlike other pitch histogram representations =-=[13, 39]-=-, which were proposed to allow for errors in pitch detection, we employ pitch histograms to eliminate ambiguous timing information, thus quickening the alignment procedure. This representation violate... |

2 |
et al, A PracticalL Query-By-Humming System for a Large Music Database
- Kosugi, Nishihara, et al.
(Show Context)
Citation Context ...ckground & Motivation To date, most QBH systems first transcribe the sung melody into a sequence of `notes' 2 , and use this representation to search a database of similarly represented target themes =-=[2, 5, 11, 22, 27, 28]-=-. While a note representation may be musically intuitive, there is no hard evidence to imply that it is optimal in any specific technical sense. Neglecting the musical justification for notes, the tra... |

2 |
Tech Report: Melody Matching Using Time Warping
- Mazzoni
- 2002
(Show Context)
Citation Context ... the reliance on sung melody transcription as the de facto method for reducing query dimensionality. In particular, Mazzoni and Dannenberg have proposed direct use of the pitch contour in QBH systems =-=[24, 25]-=-. In their work Mazzoni use dynamic time warping (DTW) to align the pitch contour of the sung query with the pitch contour for every melody in the target database. String-matching and DTW are largely ... |

1 |
Coding of Sung Queries for Music Information Retreival
- Adams, Bartch, et al.
(Show Context)
Citation Context ...M estimator, a second cost scheme is included, the standard DTW cost scheme given given by (2). The curve representing the HMM note estimator with the DTW cost scheme is labelled "Note: HMM, DTW.=-=" In [2, 3]-=-, numerous note estimators were compared using a single target database of fourteen themes; it was found that the baseline pitch-derivative yielded the worst performance by a considerable margin, the ... |

1 |
Note Segmentation and Quantization for Music Information Retreival, submitted to
- Adams, Bartch, et al.
(Show Context)
Citation Context ...easing the size of the target database. 3 This contrasts substantially from the common practice of having participants sing isolated pitches on a neutral vowel. 4 2.2 Target Database In previous work =-=[2, 3]-=- we employed a target database consisting solely of the fourteen themes for which we have sample queries. While we report classification accuracy in excess of 90% in [2, 3], this measure was not indic... |

1 |
The MUSART Testbed for Query-By-Humming Evaluation
- al
(Show Context)
Citation Context ...curate retrieval. Time series representations are ubiquitous in information retrieval applications [18,43]. MIR is no exception; sequences of notes, pitch estimates, or MFCCs, for example, are common =-=[7, 9, 25]-=-. It is well established that for comparing two time series, a direct comparison, such as the Euclidean distance, yields a brittle metric [17,43]. A similarity metric that is robust to elastic shifts ... |

1 |
Segmenting Time Series: A Survey and
- Keogh, Chu, et al.
- 1993
(Show Context)
Citation Context ...ries approximated by such piecewise linear representations [20]. 3.5.1 Implementation In the present work the piecewise linear approximation is computed using a simple `bottom-up' recursive algorithm =-=[19,21,33]-=-. The initial approximation is given by consecutive pairs of contour values; there are K 2 initial segments yielding zero approximation error. The algorithm then merges the two neighboring segments th... |

1 |
A Linear Hidden Markov Model for
- Liu, Wu, et al.
- 2003
(Show Context)
Citation Context ...rching the space of all possible note boundaries and note pitches is considerably more difficult than first searching the space of note boundaries and then assigning pitches [1, 16]. Many QBH systems =-=[11, 22, 23, 28]-=- require the user to articulate each note with a separate `da' or `ta'; the user is required to perform note segmentation. In this case, a simple amplitude threshold is used to detect note boundaries.... |

1 |
The Directory of Tunes, Spencer Brown and Co
- Parsons
- 1975
(Show Context)
Citation Context ...mple pitch contour. Other authors use "pitch contour" to refer to a coarsely quantized sequence of pitch differences of a note sequence, as in Parson's directory of themes, i.e., "UDUDR=-=U" for example [31]-=-. 2 Strictly speaking, many systems do not estimate the note sequence, but rather a coarsely quantized sequence of pitch differences. Furthermore, many systems do not code note duration [29]. 2 query ... |

1 |
of Speech Recognition
- Fundamentals
- 1993
(Show Context)
Citation Context ...ery contour of "Row, Row, Row Your Boat" with the true melody. A more robust metric is yielded with dynamic time warping (DTW), an alignment procedure popularized in the 1970's for speech re=-=cognition [14, 36, 37]-=-. The increasing number of multimedia applications for large collections of time series has recently reinvigorated interest in DTW within the database community [18, 34, 42, 43]. To find the optimal a... |