Results 1 - 10
of
20
Time Series Models for Semantic Music Annotation
- Audio, Speech, and Language Processing, IEEE Transactions on
, 2011
"... Abstract—Many state-of-the-art systems for automatic music tagging model music based on bag-of-features representations which give little or no account of temporal dynamics, a key characteristic of the audio signal. We describe a novel approach to automatic music annotation and retrieval that captur ..."
Abstract
-
Cited by 21 (12 self)
- Add to MetaCart
(Show Context)
Abstract—Many state-of-the-art systems for automatic music tagging model music based on bag-of-features representations which give little or no account of temporal dynamics, a key characteristic of the audio signal. We describe a novel approach to automatic music annotation and retrieval that captures temporal (e.g., rhythmical) aspects as well as timbral content. The proposed approach leverages a recently proposed song model that is based on a generative time series model of the musical content—the dynamic texture mixture (DTM) model—that treats fragments of audio as the output of a linear dynamical system. To model characteristic temporal dynamics and timbral content at the tag level, a novel, efficient, and hierarchical expectation–maximization (EM) algorithm for DTM (HEM-DTM) is used to summarize the common information shared by DTMs modeling individual songs associated with a tag. Experiments show learning the semantics of music benefits from modeling temporal dynamics. Index Terms—Audio annotation and retrieval, dynamic texture model, music information retrieval. I.
AUTOMATIC MUSIC TAGGING WITH TIME SERIES MODELS
"... State-of-the-art systems for automatic music tagging model music based on bag-of-feature representations which give little or no account of temporal dynamics, a key characteristic of the audio signal. We describe a novel approach to automatic music annotation and retrieval that captures temporal (e. ..."
Abstract
-
Cited by 7 (3 self)
- Add to MetaCart
(Show Context)
State-of-the-art systems for automatic music tagging model music based on bag-of-feature representations which give little or no account of temporal dynamics, a key characteristic of the audio signal. We describe a novel approach to automatic music annotation and retrieval that captures temporal (e.g., rhythmical) aspects as well as timbral content. The proposed approach leverages a recently proposed song model that is based on a generative time series model of the musical content — the dynamic texture mixture (DTM) model — that treats fragments of audio as the output of a linear dynamical system. To model characteristic temporal dynamics and timbral content at the tag level, a novel, efficient hierarchical EM algorithm for DTM (HEM-DTM) is used to summarize the common information shared by DTMs modeling individual songs associated with a tag. Experiments show learning the semantics of music benefits from modeling temporal dynamics. 1.
MULTIVARIATE AUTOREGRESSIVE MIXTURE MODELS FOR MUSIC AUTO-TAGGING
"... We propose the multivariate autoregressive model for content based music auto-tagging. At the song level our approach leverages the multivariate autoregressive mixture (ARM) model, a generative time-series model for audio, which assumes each feature vector in an audio fragment is a linear function o ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
(Show Context)
We propose the multivariate autoregressive model for content based music auto-tagging. At the song level our approach leverages the multivariate autoregressive mixture (ARM) model, a generative time-series model for audio, which assumes each feature vector in an audio fragment is a linear function of previous feature vectors. To tackle tagmodel estimation, we propose an efficient hierarchical EM algorithm for ARMs (HEM-ARM), which summarizes the acoustic information common to the ARMs modeling the individual songs associated with a tag. We compare the ARM model with the recently proposed dynamic texture mixture (DTM) model. We hence investigate the relative merits of different modeling choices for music time-series: i) the flexibility of selecting higher memory order in ARM, ii) the capability of DTM to learn specific frequency basis for each particular tag and iii) the effect of the hidden layer of the DT versus the time efficiency of learning and inference with fully observable AR components. Finally, we experiment with a support vector machine (SVM) approach that classifies songs based on a kernel calculated on the frequency responses of the corresponding song ARMs. We show that the proposed approach outperforms SVMs trained on a different kernel function, based on a competing generative model. 1.
SEMANTIC ANNOTATION AND RETRIEVAL OF MUSIC USING A BAG OF SYSTEMS REPRESENTATION
"... We present a content-based auto-tagger that leverages a rich dictionary of musical codewords, where each codeword is a generative model that captures timbral and temporal characteristics of music. This leads to a higher-level, concise “Bag of Systems ” (BoS) representation of the characteristics of ..."
Abstract
-
Cited by 6 (3 self)
- Add to MetaCart
(Show Context)
We present a content-based auto-tagger that leverages a rich dictionary of musical codewords, where each codeword is a generative model that captures timbral and temporal characteristics of music. This leads to a higher-level, concise “Bag of Systems ” (BoS) representation of the characteristics of a musical piece. Once songs are represented as a BoS histogram over codewords, traditional algorithms for text document retrieval can be leveraged for music autotagging. Compared to estimating a single generative model to directly capture the musical characteristics of songs associated with a tag, the BoS approach offers the flexibility to combine different classes of generative models at various time resolutions through the selection of the BoS codewords. Experiments show that this enriches the audio representation and leads to superior auto-tagging performance. 1.
Anomaly Detection and Localization in Crowded Scenes
"... Abstract—The detection and localization of anomalous behaviors in crowded scenes is considered, and a joint detector of temporal and spatial anomalies is proposed. The proposed detector is based on a video representation that accounts for both appearance and dynamics, using a set of mixture of dynam ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
(Show Context)
Abstract—The detection and localization of anomalous behaviors in crowded scenes is considered, and a joint detector of temporal and spatial anomalies is proposed. The proposed detector is based on a video representation that accounts for both appearance and dynamics, using a set of mixture of dynamic textures models. These models are used to implement 1) a center-surround discriminant saliency detector that produces spatial saliency scores, and 2) a model of normal behavior that is learned from training data and produces temporal saliency scores. Spatial and temporal anomaly maps are then defined at multiple spatial scales, by considering the scores of these operators at progressively larger regions of support. The multiscale scores act as potentials of a conditional random field that guarantees global consistency of the anomaly judgments. A data set of densely crowded pedestrian walkways is introduced and used to evaluate the proposed anomaly detector. Experiments on this and other data sets show that the latter achieves state-of-the-art anomaly detection results. Index Terms—Video analysis, surveillance, anomaly detection, crowded scene, dynamic texture, center-surround saliency Ç 1
Wavelet Domain Multi-fractal Analysis for Static and Dynamic Texture Classification
, 2011
"... In this paper, we propose a new texture descriptor for both static and dynamic textures. The new descriptor is built on the wavelet-based spatial-frequency analysis on two complementary wavelet pyramids: the standard multi-scale one and the so-called wavelet leader one. The introduced wavelet pyrami ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
In this paper, we propose a new texture descriptor for both static and dynamic textures. The new descriptor is built on the wavelet-based spatial-frequency analysis on two complementary wavelet pyramids: the standard multi-scale one and the so-called wavelet leader one. The introduced wavelet pyramids essentially capture the local texture responses in multiple high-pass channels in a multi-scale and multi-orientation fashion, in which there exists a strong power-law relationship for natural images. Such a power-law relationship is characterized by the so-called multi-fractal analysis. In addition, two more techniques, scale normalization and multi-orientation image averaging, are introduced to further improve the robustness of the proposed descriptor. Combining these techniques, the proposed descriptor enjoys both high discriminative power and robustness against many environmental changes. We apply the descriptor for classifying both static and dynamic textures. Our method has demonstrated excellent performance in comparison with the state-of-the-art approaches in several public benchmark datasets.
The variational hierarchical EM algorithm for clustering hidden Markov models
"... In this paper, we derive a novel algorithm to cluster hidden Markov models (HMMs) according to their probability distributions. We propose a variational hierarchical EM algorithm that i) clusters a given collection of HMMs into groups of HMMs that are similar, in terms of the distributions they repr ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
(Show Context)
In this paper, we derive a novel algorithm to cluster hidden Markov models (HMMs) according to their probability distributions. We propose a variational hierarchical EM algorithm that i) clusters a given collection of HMMs into groups of HMMs that are similar, in terms of the distributions they represent, and ii) characterizes each group by a “cluster center”, i.e., a novel HMM that is representative for the group. We illustrate the benefits of the proposed algorithm on hierarchical clustering of motion capture sequences as well as on automatic music tagging. 1
Growing a Bag of Systems Tree for Fast and Accurate Classification
"... The bag-of-systems (BoS) representation is a descriptor of motion in a video, where dynamic texture (DT) codewords represent the typical motion patterns in spatio-temporal patches extracted from the video. The efficacy of the BoS descriptor depends on the richness of the codebook, which directly dep ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
(Show Context)
The bag-of-systems (BoS) representation is a descriptor of motion in a video, where dynamic texture (DT) codewords represent the typical motion patterns in spatio-temporal patches extracted from the video. The efficacy of the BoS descriptor depends on the richness of the codebook, which directly depends on the number of codewords in the codebook. However, for even modest sized codebooks, mapping videos onto the codebook results in a heavy computational load. In this paper we propose the BoS Tree, which constructs a bottom-up hierarchy of codewords that enables efficient mapping of videos to the BoS codebook. By leveraging the tree structure to efficiently index the codewords, the BoS Tree allows for fast look-ups in the codebook and enables the practical use of larger, richer codebooks. We demonstrate the effectiveness of BoS Trees on classification of three video datasets, as well as on annotation of a music dataset. 1.
Intrinsic Characterization of Dynamic Surfaces
"... This paper presents a novel approach to characterize de-formable surface using intrinsic property dynamics. 3D dy-namic surfaces representing humans in motion can be ob-tained using multiple view stereo reconstruction methods or depth cameras. Nowadays these technologies have become capable to captu ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
(Show Context)
This paper presents a novel approach to characterize de-formable surface using intrinsic property dynamics. 3D dy-namic surfaces representing humans in motion can be ob-tained using multiple view stereo reconstruction methods or depth cameras. Nowadays these technologies have become capable to capture surface variations in real-time, and give details such as clothing wrinkles and deformations. Assum-ing repetitive patterns in the deformations, we propose to model complex surface variations using sets of linear dy-namical systems (LDS) where observations across time are given by surface intrinsic properties such as local curva-tures. We introduce an approach based on bags of dynam-ical systems, where each surface feature to be represented in the codebook is modeled by a set of LDS equipped with timing structure. Experiments are performed on datasets of real-world dynamical surfaces and show compelling results for description, classification and segmentation. 1.
That was fast! Speeding up NN search of high dimensional distributions.
"... We present a data structure for fast nearest neighbor retrieval of generative models of documents based on Kullback-Leibler (KL) divergence. Our data structure, which shares some similarity with Bregman Ball Trees, consists of a hierarchical partition of a database, and uses a novel branch and bound ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
We present a data structure for fast nearest neighbor retrieval of generative models of documents based on Kullback-Leibler (KL) divergence. Our data structure, which shares some similarity with Bregman Ball Trees, consists of a hierarchical partition of a database, and uses a novel branch and bound methodology for search. The main technical contribution of the paper is a novel and efficient algorithm for deciding whether to explore nodes during backtracking, based on a variational approximation. This reduces the number of computations per node, and overcomes the limitations of Bregman Ball Trees on high dimensional data. In addition, our strategy is applicable also to probability distributions with hidden state variables, and is not limited to regular exponential family distributions. Experiments demonstrate substantial speedups over both Bregman Ball Trees and over brute force search, on both moderate and high dimensional histogram data. In addition, experiments on linear dynamical systems demonstrate the flexibility of our approach to latent variable models. Proceedings of the 30 th