Results 1 -
7 of
7
Mel Frequency Cepstral Coefficients: An Evaluation of Robustness of MP3 Encoded Music
- IN PROCEEDINGS OF THE INTERNATIONAL SYMPOSIUM ON MUSIC INFORMATION RETRIEVAL
, 2006
"... In large MP3 databases, files are typically generated with different parameter settings, i.e., bit rate and sampling rates. This is of concern for MIR applications, as encoding difference can potentially confound meta-data estimation and similarity evaluation. In this paper we will discuss the influ ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
In large MP3 databases, files are typically generated with different parameter settings, i.e., bit rate and sampling rates. This is of concern for MIR applications, as encoding difference can potentially confound meta-data estimation and similarity evaluation. In this paper we will discuss the influence of MP3 coding for the Mel frequency cepstral coeficients (MFCCs). The main result is that the widely used subset of the MFCCs is robust at bit rates equal or higher than 128 kbits/s, for the implementations we have investigated. However, for lower bit rates, e.g., 64 kbits/s, the implementation of the Mel filter bank becomes an issue.
Logistic Stick-Breaking Process
"... Editor: A logistic stick-breaking process (LSBP) is proposed for non-parametric clustering of general spatially- or temporally-dependent data, imposing the belief that proximate data are more likely to be clustered together. The sticks in the LSBP are realized via multiple logistic regression functi ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
Editor: A logistic stick-breaking process (LSBP) is proposed for non-parametric clustering of general spatially- or temporally-dependent data, imposing the belief that proximate data are more likely to be clustered together. The sticks in the LSBP are realized via multiple logistic regression functions, with shrinkage priors employed to favor contiguous and spatially localized segments. The LSBP is also extended for the simultaneous processing of multiple data sets, yielding a hierarchical logistic stick-breaking process (H-LSBP). The model parameters (atoms) within the H-LSBP are shared across the multiple learning tasks. Efficient variational Bayesian inference is derived, and comparisons are made to related techniques in the literature. Experimental analysis is performed for audio waveforms and images, and it is demonstrated that for segmentation applications the LSBP yields generally homogeneous segments with sharp boundaries.
Audio Classification of Bird Species: a Statistical Manifold Approach
"... Our goal is to automatically identify which species of bird is present in an audio recording using supervised learning. Devising effective algorithms for bird species classification is a preliminary step toward extracting useful ecological data from recordings collected in the field. We propose a pr ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Our goal is to automatically identify which species of bird is present in an audio recording using supervised learning. Devising effective algorithms for bird species classification is a preliminary step toward extracting useful ecological data from recordings collected in the field. We propose a probabilistic model for audio features within a short interval of time, then derive its Bayes risk-minimizing classifier, and show that it is closely approximated by a nearest-neighbor classifier using Kullback-Leibler divergence to compare histograms of features. We note that feature histograms can be viewed as points on a statistical manifold, and KL divergence approximates geodesic distances defined by the Fisher information metric on such manifolds. Motivated by this fact, we propose the use of another approximation to the Fisher information metric, namely the Hellinger metric. The proposed classifiers achieve over 90 % accuracy on a data set containing six species of bird, and outperform support vector machines. 1
Unsupervised speaker change detection for broadcast news segmentation
- in Proc. EUSIPCO
, 2006
"... This paper presents a speaker change detection system for news broadcast segmentation based on a vector quantization (VQ) approach. The system does not make any assumption about the number of speakers or speaker identity. The system uses mel frequency cepstral coefficients and change detection is do ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
This paper presents a speaker change detection system for news broadcast segmentation based on a vector quantization (VQ) approach. The system does not make any assumption about the number of speakers or speaker identity. The system uses mel frequency cepstral coefficients and change detection is done using the VQ distortion measure and is evaluated against two other statistics, namely the symmetric Kullback-Leibler (KL2) distance and the so-called ‘divergence shape distance’. First level alarms are further tested using the VQ distortion. We find that the false alarm rate can be reduced without significant losses in the detection of correct changes. We furthermore evaluate the generalizability of the approach by testing the complete system on an independent set of broadcasts, including a channel not present in the training set. 1.
1 Sticky Hidden Markov Modeling of Comparative Genomic Hybridization 1
"... We develop a sticky hidden Markov model (HMM) with a Dirichlet distribution (DD) prior, motivated by the problem of analyzing comparative genomic hybridization (CGH) data. As formulated the sticky DD-HMM prior is employed to infer the number of states in an HMM, while also imposing state persistence ..."
Abstract
- Add to MetaCart
We develop a sticky hidden Markov model (HMM) with a Dirichlet distribution (DD) prior, motivated by the problem of analyzing comparative genomic hybridization (CGH) data. As formulated the sticky DD-HMM prior is employed to infer the number of states in an HMM, while also imposing state persistence. The form of the proposed hierarchical model allows efficient variational Bayesian (VB) inference, of interest for large-scale CGH problems. We compare alternative formulations of the sticky HMM, while also examining the relative efficacy of VB and Markov chain Monte Carlo (MCMC) inference. To validate the formulation, example results are presented for an illustrative synthesized data set, and for speaker diarization from audio data (the first problem class for which the sticky HMM was developed). Our main application is CGH, for which we consider data for breast cancer. For the latter, we also make comparisons and partially validate the CGH analysis through factor analysis of associated (but distinct) gene-expression data.
WebVoice: A Toolkit for Perceptual Insights into Speech Processing
"... Feature extraction and modeling techniques for speech processing are often complex. Understanding a new technique theoretically can be difficult for a novice, just as it is difficult for a practitioner to find the best parameter settings and/or combination of methods for a new task or data. In this ..."
Abstract
- Add to MetaCart
Feature extraction and modeling techniques for speech processing are often complex. Understanding a new technique theoretically can be difficult for a novice, just as it is difficult for a practitioner to find the best parameter settings and/or combination of methods for a new task or data. In this paper, a novel approach and a corresponding software toolkit for facilitating both education and experimentation in speech processing is presented: listening to the results of feature extraction and modeling is made possible via resynthesis of intermediate pattern recognition results. The software is made publicly available as a web service called WebVoice with accompanying user interfaces for ease of use. 1.
Technical Report (Not Peer Reviewed): Acoustic Classification of Bird Species from Syllables: an Empirical Study
"... In order to automatically extract ecologically useful information from audio recordings of birds, we need fast and accurate algorithms to classify bird sounds. We conduct a large-scale empirical study to evaluate algorithms for classifying bird species from audio using combinations of 3 feature sets ..."
Abstract
- Add to MetaCart
In order to automatically extract ecologically useful information from audio recordings of birds, we need fast and accurate algorithms to classify bird sounds. We conduct a large-scale empirical study to evaluate algorithms for classifying bird species from audio using combinations of 3 feature sets (Mel-frequency cepstral coefficients, average spectra, and noise-robust measurements), with 10 classifiers including support vector machines and AdaBoost (with J48), on a 2.49 Gb data set consisting of recordings of 20 species of bird from the Cornell Macaulay library. Using implementations from the Weka machine learning system, Random Forest is always close to AdaBoost and usually more accurate than SVM, while being an order of magnitude faster than both. 1

