Results 1 - 10
of
67
Clustering by compression
- IEEE Transactions on Information Theory
, 2005
"... Abstract—We present a new method for clustering based on compression. The method does not use subject-specific features or background knowledge, and works as follows: First, we determine a parameter-free, universal, similarity distance, the normalized compression distance or NCD, computed from the l ..."
Abstract
-
Cited by 297 (25 self)
- Add to MetaCart
Abstract—We present a new method for clustering based on compression. The method does not use subject-specific features or background knowledge, and works as follows: First, we determine a parameter-free, universal, similarity distance, the normalized compression distance or NCD, computed from the lengths of compressed data files (singly and in pairwise concatenation). Second, we apply a hierarchical clustering method. The NCD is not restricted to a specific application area, and works across application area boundaries. A theoretical precursor, the normalized information distance, co-developed by one of the authors, is provably optimal. However, the optimality comes at the price of using the noncomputable notion of Kolmogorovcomplexity. We propose axioms to capture the real-world setting, and show that the NCD approximates optimality. To extract a hierarchy of clusters from the distance matrix, we determine a dendrogram (ternary tree) by a new quartet method and a fast heuristic to implement it. The method is implemented and available as public software, and is robust under choice of different compressors. To substantiate our claims of universality and robustness, we report evidence of successful application in areas as diverse as genomics, virology, languages, literature, music, handwritten digits, astronomy, and combinations of objects from completely different domains, using statistical, dictionary, and block sorting compressors. In genomics, we presented new evidence for major questions in Mammalian evolution, based on whole-mitochondrial genomic analysis: the Eutherian orders and the Marsupionta hypothesis against the Theria hypothesis. Index Terms—Heterogenous data analysis, hierarchical unsupervised clustering, Kolmogorovcomplexity, normalized compression distance, parameter-free data mining, quartet tree method, universal dissimilarity distance. I.
A Chorus-Section Detecting Method for Musical Audio Signals
, 2003
"... This paper describes a method for obtaining a list of chorus (refrain) sections in compact-disc recordings of popular music. The detection of chorus sections is essential for the computational modeling of music understanding and is useful in various applications, such as automatic chorus-preview fun ..."
Abstract
-
Cited by 97 (15 self)
- Add to MetaCart
This paper describes a method for obtaining a list of chorus (refrain) sections in compact-disc recordings of popular music. The detection of chorus sections is essential for the computational modeling of music understanding and is useful in various applications, such as automatic chorus-preview functions in music browsers or retrieval systems. Most previous methods detected as a chorus a repeated section of a given length and had difficulty in identifying both ends of a chorus section and in dealing with modulations (key changes). By analyzing relationships between various repeated sections, our method called RefraiD can detect all the chorus sections in a song and estimate both ends of each section. It can also detect modulated chorus sections by introducing a similarity that enables modulated repetition to be judged correctly. Experimental results with a popular-music database show that this method detects the correct chorus sections in 80 of 100 songs.
Acoustic chord transcription and key extraction from audio using key-dependent HMMs trained on synthesized audio
- IEEE TASLP
, 2008
"... We describe an acoustic chord transcription system that uses symbolic data to train hidden Markov models and gives best-of-class frame-level recognition results. We avoid the extremely laborious task of human annotation of chord names and boundaries—which must be done to provide machine learning mo ..."
Abstract
-
Cited by 52 (2 self)
- Add to MetaCart
We describe an acoustic chord transcription system that uses symbolic data to train hidden Markov models and gives best-of-class frame-level recognition results. We avoid the extremely laborious task of human annotation of chord names and boundaries—which must be done to provide machine learning models with ground truth—by performing automatic harmony analysis on symbolic music files. In parallel, we synthesize audio from the same symbolic files and extract acoustic feature vectors which are in perfect alignment with the labels. We, therefore, generate a large set of labeled training data with a minimal amount of human labor. This allows for richer models. Thus, we build 24 key-dependent HMMs, one for each key, using the key information derived from symbolic data. Each key model defines a unique state-transition characteristic and helps avoid confusions seen in the observation vector. Given acoustic input, we identify a musical key by choosing a key model with the maximum likelihood, and we obtain the chord sequence from the optimal state path of the corresponding key model, both of which are returned by a Viterbi decoder. This not only increases the chord recognition accuracy, but also gives key information. Experimental results show the models trained on synthesized data perform very well on real recordings, even though the labels automatically generated from symbolic data are not 100 % accurate. We also demonstrate the robustness of the tonal centroid feature, which outperforms the conventional chroma feature.
Media Segmentation using Self-Similarity Decomposition
- In Proc. SPIE Storage and Retrieval for Multimedia Databases
, 2003
"... We present afram# ork for analyzing the structure of digitalmita streamA Though ourmOS( ds work for video, text, and audio, we concentrate on detecting the structure of digitalm usic files. In the first step, spectral data is used to construct a simC#A"O ymAA(# calculated from inter-fram spectr ..."
Abstract
-
Cited by 50 (3 self)
- Add to MetaCart
We present afram# ork for analyzing the structure of digitalmita streamA Though ourmOS( ds work for video, text, and audio, we concentrate on detecting the structure of digitalm usic files. In the first step, spectral data is used to construct a simC#A"O ymAA(# calculated from inter-fram spectral simAC""O y. The digital audio can be robustly segmy ted by correlating a kernel along the diagonal of the sim##"jO ymS#("j OncesegmS ted, spectral statistics of each segm" t are com#)"#O In the second step, segmO ts are clustered based on the selfsim -OjSC y of their statistics. This reveals the structure of the digitalm usic in a set of segm) t boundaries and labels. Finally, them usic can besumAjOHS) by selecting clusters with repeated segm) ts throughout the piece. The sum)S"OH can becustomHS( for various applications based on the structure of the originalm usic.
Summarizing popular music via structural similarity analysis
- in Proc. IEEE Workshop Applications of Signal Processing to Audio and Acoustics
, 2003
"... We present a framework for summarizing digital media based on structural analysis. Though these methods are applicable to general media, we concentrate here on characterizing repetitive structure in popular music. In the first step, a similarity matrix is calculated from inter-frame spectral similar ..."
Abstract
-
Cited by 43 (0 self)
- Add to MetaCart
(Show Context)
We present a framework for summarizing digital media based on structural analysis. Though these methods are applicable to general media, we concentrate here on characterizing repetitive structure in popular music. In the first step, a similarity matrix is calculated from inter-frame spectral similarity. Segment boundaries, such as verse-chorus transitions, are found by correlating a kernel along the diagonal of the matrix. Once segmented, spectral statistics of each segment are computed. In the second step, segments are clustered based on the pairwise similarity of their statistics, using a matrix decomposition approach. Finally, the audio is summarized by combining segments representing the clusters most frequently repeated throughout the piece. We present results on a small corpus showing more than 90 % correct detection of verse and chorus segments. 1.
Summarizing Video using Non-Negative Similarity Matrix Factorization
"... We present a novel approach to automatically extracting summary excerpts from audio and video. Our approach is to maximize the average similarity between the excerpt and the source. We first calculate a similarity matrix by comparing each pair of time samples using a quantitative similarity measure. ..."
Abstract
-
Cited by 39 (0 self)
- Add to MetaCart
We present a novel approach to automatically extracting summary excerpts from audio and video. Our approach is to maximize the average similarity between the excerpt and the source. We first calculate a similarity matrix by comparing each pair of time samples using a quantitative similarity measure. To determine the segment with highest average similarity, we maximize the summation of the self-similarity matrix over the support of the segment. To select multiple excerpts while avoiding redundancy, we compute the non-negative matrix factorization (NMF) of the similarity matrix into its essential structural components. We then build a summary comprised of excerpts from the main components, selecting the excerpts for maximum average similarity within each component. Variations integrating segmentation and other information are also discussed, and experimental results are presented.
SEQUENCE REPRESENTATION OF MUSIC STRUCTURE USING HIGHER-ORDER SIMILARITY MATRIX AND MAXIMUM-LIKELIHOOD APPROACH
"... In this paper, we present a novel method for the automatic estimation of the structure of music tracks using a sequence representation. A set of timbre-related (MFCC and Spectral Contrast) and pitch-related (Pitch Class Profile) features are first extracted from the signal leading to three similarit ..."
Abstract
-
Cited by 33 (9 self)
- Add to MetaCart
(Show Context)
In this paper, we present a novel method for the automatic estimation of the structure of music tracks using a sequence representation. A set of timbre-related (MFCC and Spectral Contrast) and pitch-related (Pitch Class Profile) features are first extracted from the signal leading to three similarity matrices which are then combined. We then introduce the use of higher-order (2nd and 3rd order) similarity matrices in order to reinforce the diagonals corresponding to common repetitions and reduce the background noise. Segments are then detected and a maximum-likelihood approach is proposed in order to derive simultaneously the underlying sequence representation of the music track and the most representative segment of each sequence. The proposed method is evaluated positively on the MPEG-7 “melody repetition ” test set. 1
Deriving musical structure from signal analysis for music audio summary generation: Sequence and state approach
- in Lecture
"... Abstract. In this paper, we investigate the derivation of musical structures directly from signal analysis with the aim of generating visual and audio summaries. From the audio signal, we first derive features- static features (MFCC, chromagram) or proposed dynamic features. Two approaches are then ..."
Abstract
-
Cited by 28 (8 self)
- Add to MetaCart
(Show Context)
Abstract. In this paper, we investigate the derivation of musical structures directly from signal analysis with the aim of generating visual and audio summaries. From the audio signal, we first derive features- static features (MFCC, chromagram) or proposed dynamic features. Two approaches are then studied in order to derive automatically the structure of a piece of music. The sequence approach considers the audio signal as a repetition of sequences of events. Sequences are derived from the similarity matrix of the features by a proposed algorithm based on a 2D structuring filter and pattern matching. The state approach considers the audio signal as a succession of states. Since human segmentation and grouping performs better upon subsequent hearings, this natural approach is followed here using a proposed multi-pass approach combining time segmentation and unsupervised learning methods. Both sequence and state representations are used for the creation of an audio summary using various techniques. 1
Automatic music classification and summarization
- IEEE TRANS.SPEECH AUD. PROCESSING
, 2005
"... Automatic music classification and summarization are very useful to music indexing, content-based music retrieval and on-line music distribution, but it is a challenge to extract the most common and salient themes from unstructured raw music data. In this paper, we propose effective algorithms to a ..."
Abstract
-
Cited by 27 (2 self)
- Add to MetaCart
Automatic music classification and summarization are very useful to music indexing, content-based music retrieval and on-line music distribution, but it is a challenge to extract the most common and salient themes from unstructured raw music data. In this paper, we propose effective algorithms to automatically classify and summarize music content. Support vector machines are applied to classify music into pure music and vocal music by learning from training data. For pure music and vocal music, a number of features are extracted to characterize the music content, respectively. Based on calculated features, a clustering algorithm is applied to structure the music content. Finally, a music summary is created based on the clustering results and domain knowledge related to pure and vocal music. Support vector machine learning shows a better performance in music classification than traditional Euclidean distance methods and hidden Markov model methods. Listening tests are conducted to evaluate the quality of summarization. The experiments on different genres of pure and vocal music illustrate the results of summarization are significant and effective.
Automatic Structure Detection for Popular Music
- IEEE MultiMedia
"... Music structure is very important for semantic music understanding. We propose a novel approach for popular music structure detection. The proposed approach applies beat space segmentation, chord detection, singing voice boundary detection, melody and content based similarity region detection to mus ..."
Abstract
-
Cited by 26 (0 self)
- Add to MetaCart
Music structure is very important for semantic music understanding. We propose a novel approach for popular music structure detection. The proposed approach applies beat space segmentation, chord detection, singing voice boundary detection, melody and content based similarity region detection to music structure detection. A frequency scaling “Octave Scale ” is used to calculate Cepstral coefficients to represent the music content. The experiments illustrate that the proposed approach achieves better performance than existing methods. We also outline some applications which can use our refined music structural analysis.