MATCH: A Music Alignment Tool CHest
 6 th International Conference on Music Information Retrival (ISMIR
, 2005
"... We present MATCH, a toolkit for aligning audio recordings of different renditions of the same piece of music, based on an efficient implementation of a dynamic time warping algorithm. A forward path estimation algorithm constrains the alignment path so that dynamic time warping can be performed with ..."
Cited by 54 (5 self)
We present MATCH, a toolkit for aligning audio recordings of different renditions of the same piece of music, based on an efficient implementation of a dynamic time warping algorithm. A forward path estimation algorithm constrains the alignment path so that dynamic time warping can be performed with time and space costs that are linear in the size of the audio files. Frames of audio are represented by a positive spectral difference vector, which emphasises note onsets in the alignment process. In tests with Classical and Romantic piano music, the average alignment error was 41ms (median 20ms), with only 2 out of 683 test cases failing to align. The software is useful for contentbased indexing of audio files and for the study of performance interpretation; it can also be used in realtime for tracking live performances. The toolkit also provides functions for displaying the cost matrix, the forward and backward paths, and any metadata associated with the recordings, which can be shown in real time as the alignment is computed.
A Comparison of Melodic Database Retrieval Techniques Using Sung Queries
, 2002
"... Querybyhumming systems search a database of music for good matches to a sung, hummed, or whistled melody. Errors in transcription and variations in pitch and tempo can cause substantial mismatch between queries and targets. Thus, algorithms for measuring melodic similarity in querybyhumming syst ..."
Cited by 42 (10 self)
Querybyhumming systems search a database of music for good matches to a sung, hummed, or whistled melody. Errors in transcription and variations in pitch and tempo can cause substantial mismatch between queries and targets. Thus, algorithms for measuring melodic similarity in querybyhumming systems should be robust. We compare several variations of search algorithms in an effort to improve search precision. In particular, we describe a new framebased algorithm that significantly outperforms notebynote algorithms in tests using sung queries and a database of MIDIencoded music.
Speech Recognition using Neural Networks
, 1995
"... This thesis examines how artificial neural networks can benefit a large vocabulary, speaker independent, continuous speech recognition system. Currently, most speech recognition systems are based on hidden Markov models (HMMs), a statistical framework that supports both acoustic and temporal modelin ..."
Cited by 38 (0 self)
This thesis examines how artificial neural networks can benefit a large vocabulary, speaker independent, continuous speech recognition system. Currently, most speech recognition systems are based on hidden Markov models (HMMs), a statistical framework that supports both acoustic and temporal modeling. Despite their stateoftheart performance, HMMs make a number of suboptimal modeling assumptions that limit their potential effectiveness. Neural networks avoid many of these assumptions, while they can also learn complex functions, generalize effectively, tolerate noise, and support parallelism. While neural networks can readily be applied to acoustic modeling, it is not yet clear how they can be used for temporal modeling. Therefore, we explore a class of systems called NNHMM hybrids, in which neural networks perform acoustic modeling, and HMMs perform temporal modeling. We argue that a NNHMM hybrid has several theoretical advantages over a pure HMM system, including better acoustic ...
Asymptotic Performance of Vector Quantizers with a Perceptual Distortion Measure
 in Proc. IEEE Int. Symp. on Information Theory, p. 55
, 1997
"... Gersho's bounds on the asymptotic performance of vector quantizers are valid for vector distortions which are powers of the Euclidean norm. Yamada, Tazaki and Gray generalized the results to distortion measures that are increasing functions of the norm of their argument. In both cases, the dist ..."
Cited by 36 (3 self)
Gersho's bounds on the asymptotic performance of vector quantizers are valid for vector distortions which are powers of the Euclidean norm. Yamada, Tazaki and Gray generalized the results to distortion measures that are increasing functions of the norm of their argument. In both cases, the distortion is uniquely determined by the vector quantization error, i.e., the Euclidean difference between the original vector and the codeword into which it is quantized. We generalize these asymptotic bounds to inputweighted quadratic distortion measures, a class of distortion measure often used for perceptually meaningful distortion. The generalization involves a more rigorous derivation of a fixed rate result of Gardner and Rao and a new result for variable rate codes. We also consider the problem of source mismatch, where the quantizer is designed using a probability density different from the true source density. The resulting asymptotic performance in terms of distortion increase in dB is shown...
Statistical Trajectory Models for Phonetic Recognition
, 1994
"... The main goal of this work is to develop an alternative methodology for acoustic phonetic modelling of speech sounds. The approach utilizes a segmentbased framework to capture the dynamical behavior and statistical dependencies of the acoustic attributes used to represent the speech waveform. Te ..."
Cited by 33 (3 self)
The main goal of this work is to develop an alternative methodology for acoustic phonetic modelling of speech sounds. The approach utilizes a segmentbased framework to capture the dynamical behavior and statistical dependencies of the acoustic attributes used to represent the speech waveform. Temporal behavior is modelled explicitly by creating dynamic tracks of the acoustic attributes used to represent the waveform, and by estimating the spatiotemporal correlation structure of the resulting errors. The tracks serve as templates from which synthetic segments of the acoustic attributes are generated. Scoring of an hypothesized phonetic segment is then based on the error between the measured acoustic attributes and the synthetic segments generated for each phonetic model.
Computing and Visualizing Dynamic Time Warping Alignments in R: The dtw Package
 Journal of Statistical Software
, 2009
"... This introduction to the R package dtw is a (slightly) modified version of Giorgino (2009), published in the Journal of Statistical Software. Dynamic time warping is a popular technique for comparing time series, providing both a distance measure that is insensitive to local compression and stretche ..."
Cited by 33 (0 self)
This introduction to the R package dtw is a (slightly) modified version of Giorgino (2009), published in the Journal of Statistical Software. Dynamic time warping is a popular technique for comparing time series, providing both a distance measure that is insensitive to local compression and stretches and the warping which optimally deforms one of the two input series onto the other. A variety of algorithms and constraints have been discussed in the literature. The dtw package provides an unification of them; it allows R users to compute time series alignments mixing freely a variety of continuity constraints, restriction windows, endpoints, local distance definitions, and so on. The package also provides functions for visualizing alignments and constraints using several classic diagram types.
OffLine Signature Verification By the Tracking of Feature and Stroke Positions
 PATTERN RECOGNITION
, 2003
"... There are inevitable variations in the signature patterns written by the same person. The variations can occur in the shape or in the relative positions of the characteristic features. In this paper, two methods are proposed to track the variations. Given the set of training signing samples, the fi ..."
Cited by 31 (0 self)
There are inevitable variations in the signature patterns written by the same person. The variations can occur in the shape or in the relative positions of the characteristic features. In this paper, two methods are proposed to track the variations. Given the set of training signing samples, the first method measures the positional variations of the onedimensional projection profiles of the signature patterns; and the second method determines the variations in relative stroke positions in the twodimension signature patterns. The statistics on these variations are determined from the training set. Given a signature to be verified, the positional displacements are determined and the authenticity is decided based on the statistics of the training samples. For the purpose of comparison, two existing methods proposed by other researchers were implemented and tested on the same database. Furthermore, two volunteers were recruited to perform the same verification task. Results show that the proposed system compares favorably with other methods and outperforms the volunteers.
LANDMARKBASED SPEECH RECOGNITION: REPORT OF THE 2004 Johns Hopkins Summer Workshop
, 2005
A Probabilistic Model of Melodic Similarity
 in International Computer Music Conference (ICMC). 2002. Goteborg, Sweden: The International Computer Music Association
, 2002
"... Melodic similarity is an important concept for music databases, musicological studies, and interactive music systems. Dynamic programming is commonly used to compare melodies, often with a distance function based on pitch differences measured in semitones. This approach computes an "edit distan ..."
Cited by 22 (0 self)
Melodic similarity is an important concept for music databases, musicological studies, and interactive music systems. Dynamic programming is commonly used to compare melodies, often with a distance function based on pitch differences measured in semitones. This approach computes an "edit distance" as a measure of melodic dissimilarity. The problem can also be viewed in probabilistic terms: What is the probability that a melody is a "mutation" of another melody, given a table of mutation probabilities? We explain this approach and demonstrate how it can be used to search a database of melodies. Our experiments show that the probabilistic model performs better than a typical "edit distance" comparison.