Results 1 - 10
of
68
Analysis of the Meter of Acoustic Musical Signals
- IEEE Trans. Speech and Audio Processing
, 2004
"... Ametho is decribed which analyzes the basic patterno beats in a pieceo music, the musical meter. The analysis isperfoVRm jofoV at three different time scales: at the atopo tatum pulse level, at the tactus pulse level which com{CfixVm8 to thetempo o a piece, and at the musicalme0LN level.Aco9@9R ..."
Abstract
-
Cited by 59 (7 self)
- Add to MetaCart
Ametho is decribed which analyzes the basic patterno beats in a pieceo music, the musical meter. The analysis isperfoVRm jofoV at three different time scales: at the atopo tatum pulse level, at the tactus pulse level which com{CfixVm8 to thetempo o a piece, and at the musicalme0LN level.Aco9@9R signalsfro arbitrary musical genres arecojRV}}m8} Fo r the initial timefrequency analysis, a new technique ispro}Rx} which measures the degreeo musical accent as a functio o time atfo@ different frequency ranges. This isfoj{ wed by a banko cok filterreso}R@}R which extracts featuresfo estimating theperioj and phaseso the three pulses. The features arepro} essed by a proC}m8jfifi@fi moo which represents primitive musicalkno wledge and uses thelo w-level om@{j atio{ to perfoC jofo estimatio o the tatum, tactus, and measure pulses. Themom} takesinto accoj thetempojR dependencies between successive estimates and enablesbob causal and nom causal analysis. Themetho is validated using a manually annollym databaseo 474 music signals fro varioC genres. Themetho wo{j ro ustlyfo different typeso music andimpro veso ver two state-o8j9}@fimooofimo9@Cm9@VmoRmo Inde x TeFFD Aco9fim8{R@@fimooofimo9@Cm9@VmoRmo EDICS: 2-MUSI ToappeC in IEEE Trans. Spe0 h and Audio ProceLCY1 . 2004 IEEE. Pe rsonaluse of thismatefifiF ispeRfifiV0(V Howe ve ,peNfi10(VY to reNYNYY0 eNYNYY0 this mate0Dfi foradve1CC0(L or promotionalpurpose or for cre0YYR ne wcolle0(LC works for reNLR or r eR1fiL0( ution toseFNN s or lists, or to refiD anycopyrighte componeh of this work inothe works mustbe obtaine fromthe IEEE. I.
Multiple Fundamental Frequency Estimation Based on Harmonicity and Spectral Smoothness
, 2003
"... A new method for estimating the fundamental frequencies of concurrent musical sounds is described. The method is based on an iterative approach, where the fundamental frequency of the most prominent sound is estimated, the sound is subtracted from the mixture, and the process is repeated for the res ..."
Abstract
-
Cited by 46 (5 self)
- Add to MetaCart
A new method for estimating the fundamental frequencies of concurrent musical sounds is described. The method is based on an iterative approach, where the fundamental frequency of the most prominent sound is estimated, the sound is subtracted from the mixture, and the process is repeated for the residual signal. For the estimation stage, an algorithm is proposed which utilizes the frequency relationships of simultaneous spectral components, without assuming ideal harmonicity. For the subtraction stage, the spectral smoothness principle is proposed as an efficient new mechanism in estimating the spectral envelopes of detected sounds. With these techniques, multiple fundamental frequency estimation can be performed quite accurately in a single time frame, without the use of long-term temporal features. The experimental data comprised recorded samples of 30 musical instruments from four different sources. Multiple fundamental frequency estimation was performed for random sound source and pitch combinations. Error rates for mixtures ranging from one to six simultaneous sounds were 1.8%, 3.9%, 6.3%, 9.9%, 14%, and 18%, respectively. In musical interval and chord identification tasks, the algorithm outperformed the average of ten trained musicians. The method works robustly in noise, and is able to handle sounds that exhibit inharmonicities. The inharmonicity factor and spectral envelope of each sound is estimated along with the fundamental frequency.
Signal Processing Methods for the Automatic Transcription of Music
, 2004
"... Signal processing methods for the automatic transcription of music are developed in this thesis. Music transcription is here understood as the process of analyzing a music signal so as to write down the parameters of the sounds that occur in it. The applied notation can be the traditional musical no ..."
Abstract
-
Cited by 33 (3 self)
- Add to MetaCart
Signal processing methods for the automatic transcription of music are developed in this thesis. Music transcription is here understood as the process of analyzing a music signal so as to write down the parameters of the sounds that occur in it. The applied notation can be the traditional musical notation or any symbolic representation which gives sufficient information for performing the piece using the available musical instruments. Recovering the musical notation automatically for a given acoustic signal allows musicians to reproduce and modify the original performance. Another principal application is structured audio coding: a MIDI-like representation is extremely compact yet retains the identifiability and characteristics of a piece of music to an important degree. The scope of this thesis is in the automatic transcription of the harmonic and melodic parts of real-world music signals. Detecting or labeling the sounds of percussive instruments (drums) is not attempted, although the presence of these is allowed in the target signals. Algorithms are proposed that address two distinct subproblems of music transcription. The main part of the thesis is dedicated to multiple fundamental frequency (F0) estimation, that is, estimation of the F0s of several concurrent musical sounds. The other subproblem addressed is musical meter estimation. This has to do with rhythmic aspects of music and refers to the estimation of the regular pattern of strong and weak beats in a piece of music. For multiple-F0 estimation, two different algorithms are proposed. Both methods are based on an iterative approach, where the F0 of the most prominent sound is estimated, the sound is cancelled from the mixture, and the process is repeated for the residual. The first method is derived in a prag...
An overview of text-independent speaker recognition: from features to supervectors
, 2009
"... This paper gives an overview of automatic speaker recognition technology, with an emphasis on text-independent recognition. Speaker recognition has been studied actively for several decades. We give an overview of both the classical and the state-of-the-art methods. We start with the fundamentals of ..."
Abstract
-
Cited by 31 (14 self)
- Add to MetaCart
This paper gives an overview of automatic speaker recognition technology, with an emphasis on text-independent recognition. Speaker recognition has been studied actively for several decades. We give an overview of both the classical and the state-of-the-art methods. We start with the fundamentals of automatic speaker recognition, concerning feature extraction and speaker modeling. We elaborate advanced computational techniques to address robustness and session variability. The recent progress from vectors towards supervectors opens up a new area of exploration and represents a technology trend. We also provide an overview of this recent development and discuss the evaluation methodology of speaker recognition systems. We conclude the paper with discussion on future directions.
Automatic music transcription as we know it today
- Journal of New Music Research
, 2004
"... The aim of this overview is to describe methods for the automatic transcription of Western polyphonic music. The transcription task is here understood as transforming an acoustic musical signal into a MIDI-like symbolic representation. Only pitched musical instruments are considered: recognizing the ..."
Abstract
-
Cited by 18 (0 self)
- Add to MetaCart
The aim of this overview is to describe methods for the automatic transcription of Western polyphonic music. The transcription task is here understood as transforming an acoustic musical signal into a MIDI-like symbolic representation. Only pitched musical instruments are considered: recognizing the sounds of drum instruments is not discussed. The main emphasis is laid on estimating the multiple fundamental frequencies of several concurrent sounds. Various approaches to solve this problem are discussed, including methods that are based on modelling the human auditory periphery, methods that mimic the human auditory scene analysis function, signal model-based Bayesian inference methods, and data-adaptive methods. Another subproblem addressed is the rhythmic parsing of acoustic musical signals. From the transcription point of view, this amounts to the temporal segmentation of music signals at different time scales. The relationship between the two subproblems and the general structure of the transcription problem is discussed. 1.
A Probabilistic Model for the Transcription of Single-Voice Melodies
- Tampere University of Technology
, 2003
"... A method is proposed for the automatic transcription of single-voice melodies from an acoustic waveform into a symbolic musical notation (a MIDI file). The system consists of a signal processing front-end which calculates a continuous pitch track and of a probabilistic model which converts the pitch ..."
Abstract
-
Cited by 16 (3 self)
- Add to MetaCart
A method is proposed for the automatic transcription of single-voice melodies from an acoustic waveform into a symbolic musical notation (a MIDI file). The system consists of a signal processing front-end which calculates a continuous pitch track and of a probabilistic model which converts the pitch track into a discrete musical notation. Our proposed probabilistic model consists of three parts operating in parallel: a pitch trajectory model, a musicological model, and a duration model. The first handles imperfections in the performed/estimated pitch values using a hidden Markov model, the second estimates musical key signature to improve the transcription accuracy, and the last models the duration of the notes.
Pitch-based emphasis detection for characterization of meeting recordings
- in Proc. ASRU, Virgin Islands
, 2003
"... The automatic extraction of key utterances in spoken data has emerged as an interesting and difficult topic in automatic speech recognition. “Emphasis ” or “excitement ” may be a useful identifier for these utterances of interest. In this paper, we undertake the task of reliably and automatically id ..."
Abstract
-
Cited by 13 (0 self)
- Add to MetaCart
The automatic extraction of key utterances in spoken data has emerged as an interesting and difficult topic in automatic speech recognition. “Emphasis ” or “excitement ” may be a useful identifier for these utterances of interest. In this paper, we undertake the task of reliably and automatically identifying emphasized or excited utterances in natural speech in a meeting setting. We start by endeavoring to establish reliable ground truth emphasis labels by using several hand-labelers. The results show that human listeners can reliably identify emphasized utterances in meeting recordings. We then build an automatic emphasis detection system, which uses normalized pitch as its only acoustic predictor. The results show that this pitch-based emphasis detection scheme can distinguish between non-emphasized and emphasized utterances with an accuracy of 92 % when ambiguous cases are excluded, a rate comparable to human interlabeler agreement. 1.
Modelling of note events for singing transcription
- in Proc. ISCA Tutorial and Research Workshop on Statistical and Perceptual Audio
, 2004
"... This paper concerns the automatic transcription of music and proposes a method for transcribing sung melodies. The method produces symbolic notations (i.e., MIDI files) from acoustic inputs based on two probabilistic models: a note event model and a musicological model. Note events are described wit ..."
Abstract
-
Cited by 12 (5 self)
- Add to MetaCart
This paper concerns the automatic transcription of music and proposes a method for transcribing sung melodies. The method produces symbolic notations (i.e., MIDI files) from acoustic inputs based on two probabilistic models: a note event model and a musicological model. Note events are described with a hidden Markov model (HMM) using four musical features: pitch, voicing, accent, and metrical accent. The model uses these features to calculate the likelihoods of different notes and performs note segmentation. The musicological model applies key estimation and the likelihoods of two-note and three-note sequences to determine transition likelihoods between different note events. These two models form a melody transcription system with a modular architecture which can be extended with desired front-end feature extractors and musicological rules. The system transcribes correctly over 90 % of notes, thus halving the amount of errors compared to a simple rounding of pitch estimates to the nearest MIDI note. 1.
Comparative evaluation of F0 estimation algorithms
"... This paper reports the comparative evaluation of several speech F# evaluation algorithms over a wide database of laryngographlabeled speech. Included are several classic algorithms that are available in software on the net, as well as two new algorithms that offer greatly reduced error rates. Part ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
This paper reports the comparative evaluation of several speech F# evaluation algorithms over a wide database of laryngographlabeled speech. Included are several classic algorithms that are available in software on the net, as well as two new algorithms that offer greatly reduced error rates. Particular attention is given to the methodology of evaluation.
A Hierarchical Approach to Onset Detection
- In Proc. Int. Computer Music Conference
, 2006
"... Onset detection in vocal music and many other instruments is complicated by the possibility of soft transitions between notes. Most systems try to identify onsets within a short-time window as it is easier to define transition functions over a restricted space. However, it may not be possible to det ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
Onset detection in vocal music and many other instruments is complicated by the possibility of soft transitions between notes. Most systems try to identify onsets within a short-time window as it is easier to define transition functions over a restricted space. However, it may not be possible to detect soft onsets without considering a long-time window, for which defining and computing the transition function can be hard and computationally costly. We present a method which looks for onsets between locations of increasing distance and is able to capture such onsets without considering all the points within the window. For the onset identification function we use both a simple manual function and support vector machines trained using a labelled corpus.

