Results 1 - 10
of
35
Clustering by compression
- IEEE Transactions on Information Theory
, 2005
"... Abstract—We present a new method for clustering based on compression. The method does not use subject-specific features or background knowledge, and works as follows: First, we determine a parameter-free, universal, similarity distance, the normalized compression distance or NCD, computed from the l ..."
Abstract
-
Cited by 120 (12 self)
- Add to MetaCart
Abstract—We present a new method for clustering based on compression. The method does not use subject-specific features or background knowledge, and works as follows: First, we determine a parameter-free, universal, similarity distance, the normalized compression distance or NCD, computed from the lengths of compressed data files (singly and in pairwise concatenation). Second, we apply a hierarchical clustering method. The NCD is not restricted to a specific application area, and works across application area boundaries. A theoretical precursor, the normalized information distance, co-developed by one of the authors, is provably optimal. However, the optimality comes at the price of using the noncomputable notion of Kolmogorovcomplexity. We propose axioms to capture the real-world setting, and show that the NCD approximates optimality. To extract a hierarchy of clusters from the distance matrix, we determine a dendrogram (ternary tree) by a new quartet method and a fast heuristic to implement it. The method is implemented and available as public software, and is robust under choice of different compressors. To substantiate our claims of universality and robustness, we report evidence of successful application in areas as diverse as genomics, virology, languages, literature, music, handwritten digits, astronomy, and combinations of objects from completely different domains, using statistical, dictionary, and block sorting compressors. In genomics, we presented new evidence for major questions in Mammalian evolution, based on whole-mitochondrial genomic analysis: the Eutherian orders and the Marsupionta hypothesis against the Theria hypothesis. Index Terms—Heterogenous data analysis, hierarchical unsupervised clustering, Kolmogorovcomplexity, normalized compression distance, parameter-free data mining, quartet tree method, universal dissimilarity distance. I.
Representing musical genre: A state of the art
- Journal of New Music Research
, 2003
"... Musical genre is probably the most popular music descriptor. In the context of large musical databases and Electronic Music Distribution, genre is therefore a crucial metadata for the description of music content. However, genre is intrinsically ill-defined and attempts at defining genre precisely h ..."
Abstract
-
Cited by 82 (5 self)
- Add to MetaCart
Musical genre is probably the most popular music descriptor. In the context of large musical databases and Electronic Music Distribution, genre is therefore a crucial metadata for the description of music content. However, genre is intrinsically ill-defined and attempts at defining genre precisely have a strong tendency to end up in circular, ungrounded projections of fantasies. Is genre an intrinsic attribute of music titles, as, say, tempo? Or is genre a extrinsic description of the whole piece? In this article, we discuss the various approaches in representing musical genre, and propose to classify these approaches in three main categories: manual, prescriptive and emergent approaches. We discuss the pros and cons of each approach, and illustrate our study with results of the Cuidado IST project. 1.
The quest for ground truth in musical artist similarity
- in Proc. International Symposium on Music Information Retrieval ISMIR-2002
, 2002
"... It would be interesting and valuable to devise an automatic measure of the similarity between two musicians based only on an analysis of their recordings. To develop such a measure, however, presupposes some ‘ground truth ’ training data describing the actual similarity between certain pairs of arti ..."
Abstract
-
Cited by 56 (8 self)
- Add to MetaCart
It would be interesting and valuable to devise an automatic measure of the similarity between two musicians based only on an analysis of their recordings. To develop such a measure, however, presupposes some ‘ground truth ’ training data describing the actual similarity between certain pairs of artists that constitute the desired output of the measure. Since artist similarity is wholly subjective, such data is not easily obtained. In this paper, we describe several attempts to construct a full matrix of similarity measures between a set of some 400 popular artists by regularizing limited subjective judgment data. We also detail our attempts to evaluate these measures by comparison with direct subjective similarity judgments collected via a webbased survey in April 2002. Overall, we find that subjective artist similarities are quite variable between users—casting doubt on the concept of a single ‘ground truth’. Our best measure, however, gives reasonable agreement with the subjective data, and forms a useable stand-in. In addition, our evaluation methodology may be useful for comparing other measures of artist similarity. 1.
Automatic genre classification using large high-level musical feature sets
- In Int. Conf. on Music Information Retrieval, ISMIR 2004
, 2004
"... This paper presents a system that extracts 109 musical features from symbolic recordings (MIDI, in this case) and uses them to classify the recordings by genre. The features used here are based on instrumentation, texture, rhythm, dynamics, pitch statistics, melody and chords. The classification is ..."
Abstract
-
Cited by 43 (3 self)
- Add to MetaCart
This paper presents a system that extracts 109 musical features from symbolic recordings (MIDI, in this case) and uses them to classify the recordings by genre. The features used here are based on instrumentation, texture, rhythm, dynamics, pitch statistics, melody and chords. The classification is performed hierarchically using different sets of features at different levels of the hierarchy. Which features are used at each level, and their relative weightings, are determined using genetic algorithms. Classification is performed using a novel ensemble of feedforward neural networks and k-nearest neighbour classifiers. Arguments are presented emphasizing the importance of using high-level musical features, something that has been largely neglected in automatic classification systems to date in favour of low-level features. The effect on classification performance of varying the number of candidate features is examined in order to empirically demonstrate the importance of using a large variety of musically meaningful features. Two differently sized hierarchies are used in order to test the performance of the system under different conditions. Very encouraging classification success rates of 98% for root genres and 90 % for leaf genres are obtained for a hierarchical taxonomy consisting of 9 leaf genres.
Algorithmic clustering of music based on string compression
- COMPUTER MUSIC JOURNAL
, 2004
"... All musical pieces are similar, but some are more similar than others. Apart from serving as an infinite source of discussion (‘‘Haydn is just like Mozart—No, he’s not!’’), such similarities are also crucial for the design of efficient music information retrieval systems. The amount of digitized mus ..."
Abstract
-
Cited by 35 (12 self)
- Add to MetaCart
All musical pieces are similar, but some are more similar than others. Apart from serving as an infinite source of discussion (‘‘Haydn is just like Mozart—No, he’s not!’’), such similarities are also crucial for the design of efficient music information retrieval systems. The amount of digitized music available on the Internet has grown dramatically in recent years, both in the public domain and on commercial sites; Napster and its clones are prime examples. Web sites offering musical content in some form like MP3, MIDI, or other, need a way to organize their wealth of material; they need to somehow classify their files according to musical genres and subgenres, putting similar pieces together. The purpose of such organization is to enable users to navigate to pieces of music they already know and like, but also to give them advice and recommendations (‘‘If you like this, you might also like...’’). Currently, such organization is mostly done manually by humans, or based on patterns in the purchasing behaviors of customers. However, some recent research has been examining the possibilities of automating music classification. A human expert, comparing different pieces of music with the goal of clustering similar works together, will generally look for certain specific similarities. Previous attempts to automate this process do the same. Generally speaking, they take a file containing a piece of music and extract from it various specific numerical features, related to pitch, rhythm, harmony, etc. One can extract such features using, for instance, Fourier transforms (Tzanetakis and Cook 2002) or wavelet transforms
Combining Musical and Cultural Features for Intelligent Style Detection
- in Proc. Int. Conf. Music Information Retrieval (ISMIR
, 2002
"... Musical genres aid in the listening-and-retrieval (L&R) process by allowing a user or consumer a sense of reference. By organizing physical shelves in record stores by genres, shoppers can browse and discover new music by walking down an aisle. But the digitization of ..."
Abstract
-
Cited by 34 (2 self)
- Add to MetaCart
Musical genres aid in the listening-and-retrieval (L&R) process by allowing a user or consumer a sense of reference. By organizing physical shelves in record stores by genres, shoppers can browse and discover new music by walking down an aisle. But the digitization of
Musical genre classification using support vector machines
- In Proceedings of IEEE ICASSP03, Hong Kong
, 2003
"... Automatic musical genre classification is very useful for music indexing and retrieval. In this paper, an efficient and effective automatic musical genre classification approach is presented. A set of features is extracted and used to characterize music content. A multi-layer classifier based on sup ..."
Abstract
-
Cited by 25 (0 self)
- Add to MetaCart
Automatic musical genre classification is very useful for music indexing and retrieval. In this paper, an efficient and effective automatic musical genre classification approach is presented. A set of features is extracted and used to characterize music content. A multi-layer classifier based on support vector machines is applied to musical genre classification. Support vector machines are used to obtain the optimal class boundaries between different genres of music by learning from training data. Experimental results of multi-layer support vector machines illustrate good performance in musical genre classification and are more advantageous than traditional Euclidean distance based method and other statistic learning methods. 1.
Musical Query-by-Description as a Multiclass Learning Problem
- In Proc. IEEE Multimedia Signal Processing Conference (MMSP
, 2002
"... We present the query-by-description (QBD) component of "Kandem," a time-aware music retrieval system. The QBD system we describe learns a relation between descriptive text concerning a musical artist and their actual acoustic output, making such queries as "Play me something loud with an electronic ..."
Abstract
-
Cited by 20 (1 self)
- Add to MetaCart
We present the query-by-description (QBD) component of "Kandem," a time-aware music retrieval system. The QBD system we describe learns a relation between descriptive text concerning a musical artist and their actual acoustic output, making such queries as "Play me something loud with an electronic beat" possible by merely analyzing the audio content of a database. We show a novel machine learning technique based on Regularized Least-Squares Classification (RLSC) that can quickly and efficiently learn the non-linear relation between descriptive language and audio features by treating the problem as a large number of possible output classes linked to the same set of input features. We show how the RLSC training can easily eliminate irrelevant labels. I.
Automatic Genre Classification of MIDI Recordings
, 2004
"... A software system that automatically classifies MIDI files into hierarchically organized taxonomies of musical genres is presented. This extensible software includes an easy to use and flexible GUI. An extensive library of high-level musical features is compiled, including many original features. A ..."
Abstract
-
Cited by 20 (12 self)
- Add to MetaCart
A software system that automatically classifies MIDI files into hierarchically organized taxonomies of musical genres is presented. This extensible software includes an easy to use and flexible GUI. An extensive library of high-level musical features is compiled, including many original features. A novel hybrid classification system is used that makes use of hierarchical, flat and round robin classification. Both k-nearest neighbour and neural network-based classifiers are used, and feature selection and weighting are performed using genetic algorithms. A thorough review of previous research in automatic genre classification is presented, along with an overview of automatic feature selection and classification techniques. Also included is a discussion of the theoretical issues relating to musical genre, including but not limited to what mechanisms humans use to classify music by genre and how realistic genre taxonomies can be constructed.
Melody Retrieval On The Web
- Proceedings of ACM/SPIE Conference on Multimedia Computing and Networking
, 2000
"... The emergence of digital music on the Internet requires new information retrieval methods adapted to the specific characteristics and needs. While music retrieval based on the text information, such as title, composers, or subject classification, has been implemented in many existing systems, retrie ..."
Abstract
-
Cited by 15 (1 self)
- Add to MetaCart
The emergence of digital music on the Internet requires new information retrieval methods adapted to the specific characteristics and needs. While music retrieval based on the text information, such as title, composers, or subject classification, has been implemented in many existing systems, retrieval of a piece of music based on music contents, especially based on an incomplete, imperfect recall of a fragment of the music, has not yet been fully explored. This thesis is to explore the main problems involved in a web-based melody retrieval system. I propose to build a query-by-humming system, which can find a piece of music in the digital music repository based on a few hummed notes, using a melody representation that combines both the pitch contour and the beat information. Since an input query (hummed melody) may have various errors due to uncertainty of the user's memory or the user's singing ability, the system should be able to tolerate the errors. Furthermore, extracting m...

