Results 1 - 10
of
25
Automatic music transcription: challenges and future directions
- J INTELL INF SYST
, 2013
"... Automatic music transcription is considered by many to be a key enabling technology in music signal processing. However, the performance of transcription systems is still significantly below that of a human expert, and accuracies reported in recent years seem to have reached a limit, although the fi ..."
Abstract
-
Cited by 15 (4 self)
- Add to MetaCart
Automatic music transcription is considered by many to be a key enabling technology in music signal processing. However, the performance of transcription systems is still significantly below that of a human expert, and accuracies reported in recent years seem to have reached a limit, although the field is still very active. In this paper we analyse limitations of current methods and identify promising directions for future research. Current transcription methods use general purpose models which are unable to capture the rich diversity found in music signals. One way to overcome the limited performance of transcription systems is to tailor algorithms to specific use-cases. Semi-automatic approaches are another way of achieving a more reliable transcription. Also, the wealth of musical scores and corresponding
AUTOMATIC MUSIC TRANSCRIPTION: BREAKING THE GLASS CEILING
, 2012
"... Automatic music transcription is considered by many to be the Holy Grail in the field of music signal analysis. However, the performance of transcription systems is still significantly below that of a human expert, and accuracies reported in recent years seem to have reached a limit, although the fi ..."
Abstract
-
Cited by 11 (6 self)
- Add to MetaCart
(Show Context)
Automatic music transcription is considered by many to be the Holy Grail in the field of music signal analysis. However, the performance of transcription systems is still significantly below that of a human expert, and accuracies reported in recent years seem to have reached a limit, although the field is still very active. In this paper we analyse limitations of current methods and identify promising directions for future research. Current transcription methods use general purpose models which are unable to capture the rich diversity found in music signals. In order to overcome the limited performance of transcription systems, algorithms have to be tailored to specific use-cases. Semiautomatic approaches are another way of achieving a more reliable transcription. Also, the wealth of musical scores and corresponding audio data now available are a rich potential source of training data, via forced alignment of audio to scores, but large scale utilisation of such data has yet to be attempted. Other promising approaches include the integration of information across different methods and musical aspects.
DISCRIMINATIVE NON-NEGATIVE MATRIX FACTORIZATION FOR MULTIPLE PITCH ESTIMATION
"... In this paper, we present a supervised method to improve the multiple pitch estimation accuracy of the non-negative matrix factorization (NMF) algorithm. The idea is to extend the sparse NMF framework by incorporating pitch information present in time-aligned musical scores in order to extract featu ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
(Show Context)
In this paper, we present a supervised method to improve the multiple pitch estimation accuracy of the non-negative matrix factorization (NMF) algorithm. The idea is to extend the sparse NMF framework by incorporating pitch information present in time-aligned musical scores in order to extract features that enforce the separability between pitch labels. We introduce two discriminative criteria that maximize inter-class scatter and quantify the predictive potential of a given decomposition using logistic regressors. Those criteria are applied to both the latent variable and the deterministic autoencoder views of NMF, and we devise efficient update rules for each. We evaluate our method on three polyphonic datasets of piano recordings and orchestral instrument mixes. Both models greatly enhance the quality of the basis spectra learned by NMF and the accuracy of multiple pitch estimation. 1.
POLYPHONIC PIANO TRANSCRIPTION USING NON-NEGATIVE MATRIX FACTORISATIONWITH GROUP SPARSITY
"... Non-negative Matrix Factorisation (NMF) is a popular tool in musical signal processing. However, problems using this methodology in the context of Automatic Music Transcrip-tion (AMT) have been noted resulting in the proposal of su-pervised and constrained variants of NMF for this purpose. Group spa ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
(Show Context)
Non-negative Matrix Factorisation (NMF) is a popular tool in musical signal processing. However, problems using this methodology in the context of Automatic Music Transcrip-tion (AMT) have been noted resulting in the proposal of su-pervised and constrained variants of NMF for this purpose. Group sparsity has previously been seen to be effective for AMT when used with stepwise methods. In this paper group sparsity is introduced to supervised NMF decompositions and a dictionary tuning approach to AMT is proposed based upon group sparse NMF using the β-divergence. Experimental re-sults are given showing improved AMT results over the state-of-the-art NMF-based AMT system Index Terms — Automatic music transcription, non-negative matrix factorisation, group sparsity
Automatic Transcription of Polyphonic Music Exploiting Temporal Evolution
, 2012
"... Automatic music transcription is the process of converting an audio recording into a symbolic representation using musical notation. It has numerous ap-plications in music information retrieval, computational musicology, and the creation of interactive systems. Even for expert musicians, transcrib ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
Automatic music transcription is the process of converting an audio recording into a symbolic representation using musical notation. It has numerous ap-plications in music information retrieval, computational musicology, and the creation of interactive systems. Even for expert musicians, transcribing poly-phonic pieces of music is not a trivial task, and while the problem of automatic pitch estimation for monophonic signals is considered to be solved, the creation of an automated system able to transcribe polyphonic music without setting restrictions on the degree of polyphony and the instrument type still remains
Automatic Transcription of Turkish Makam Music
- In Proceedings of ISMIR - International Conference on Music Information Retrieval
, 2013
"... In this paper we propose an automatic system for tran-scribing makam music of Turkey. We document the spe-cific traits of this music that deviate from properties that were targeted by transcription tools so far and we compile a dataset of makam recordings along with aligned micro-tonal ground-truth. ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
(Show Context)
In this paper we propose an automatic system for tran-scribing makam music of Turkey. We document the spe-cific traits of this music that deviate from properties that were targeted by transcription tools so far and we compile a dataset of makam recordings along with aligned micro-tonal ground-truth. An existing multi-pitch detection al-gorithm is adapted for transcribing music in 20 cent res-olution, and the final transcription is centered around the tonic frequency of the recording. Evaluation metrics for transcribing microtonal music are utilized and results show that transcription of Turkish makam music in e.g. an inter-active transcription software is feasible using the current state-of-the-art. 1.
THE TEMPERAMENT POLICE: THE TRUTH, THE GROUND TRUTH, AND NOTHING BUT THE TRUTH
"... The tuning system of a keyboard instrument is chosen so that frequently used musical intervals sound as consonant as possible. Temperament refers to the compromise arising from the fact that not all intervals can be maximally consonant simultaneously. Recent work showed that it is possible to estima ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
The tuning system of a keyboard instrument is chosen so that frequently used musical intervals sound as consonant as possible. Temperament refers to the compromise arising from the fact that not all intervals can be maximally consonant simultaneously. Recent work showed that it is possible to estimate temperament from audio recordings with no prior knowledge of the musical score, using a conservative (high precision, low recall) automatic transcription algorithm followed by frequency estimation using quadratic interpolation and bias correction from the log magnitude spectrum. In this paper we develop a harpsichord-specific transcription system to analyse over 500 recordings of solo harpsichord music for which the temperament is specified on the CD sleeve notes. We compare the measured temperaments with the annotations and discuss the differences between temperament as a theoretical construct and as a practical issue for professional performers and tuners. The implications are that ground truth is not always scientific truth, and that content-based analysis has an important role in the study of historical performance practice. 1.
UNSUPERVISED TRAINING OF DETECTION THRESHOLD FOR POLYPHONIC MUSICAL NOTE TRACKING BASED ON EVENT PERIODICITY
"... A common approach to the detection of simultaneous musi-cal notes in an acoustic recording involves defining a function that yields activation levels for each candidate musical note over time. These levels tend to be high when the note is ac-tive and low when it is not. Therefore, by applying a simp ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
A common approach to the detection of simultaneous musi-cal notes in an acoustic recording involves defining a function that yields activation levels for each candidate musical note over time. These levels tend to be high when the note is ac-tive and low when it is not. Therefore, by applying a simple threshold decision process, it is possible to decide whether each note is active or not at a given time. Such a threshold, in general, is hard to set and has no physical meaning. In this paper, it is shown that the rhythmic characteristic of the mu-sical signal may be used to obtain a suitable threshold. The proposed method for obtaining the threshold is shown to have a greater generalization capability over different databases.
EXPLICIT DURATION HIDDEN MARKOV MODELS FOR MULTIPLE-INSTRUMENT POLYPHONIC MUSIC TRANSCRIPTION
"... In this paper, a method for multiple-instrument automatic music transcription is proposed that models the temporal evolution and duration of tones. The proposed model sup-ports the use of spectral templates per pitch and instrument which correspond to sound states such as attack, sustain, and decay. ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
In this paper, a method for multiple-instrument automatic music transcription is proposed that models the temporal evolution and duration of tones. The proposed model sup-ports the use of spectral templates per pitch and instrument which correspond to sound states such as attack, sustain, and decay. Pitch-wise explicit duration hidden Markov models (EDHMMs) are integrated into a convolutive prob-abilistic framework for modelling the temporal evolution and duration of the sound states. A two-stage transcrip-tion procedure integrating note tracking information is per-formed in order to provide more robust pitch estimates. The proposed system is evaluated on multi-pitch detection and instrument assignment using various publicly available datasets. Results show that the proposed system outper-forms a hidden Markov model-based transcription system using the same framework, as well as several state-of-the-art automatic music transcription systems. 1.
Automatic transcription of pitched and unpitched sounds from polyphonic music
- in Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP
"... Automatic transcription of polyphonic music has been an active re-search field for several years and is considered by many to be a key enabling technology in music signal processing. However, current transcription approaches either focus on detecting pitched sounds (from pitched musical instruments) ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
Automatic transcription of polyphonic music has been an active re-search field for several years and is considered by many to be a key enabling technology in music signal processing. However, current transcription approaches either focus on detecting pitched sounds (from pitched musical instruments) or on detecting unpitched sounds (from drum kits). In this paper, we propose a method that jointly transcribes pitched and unpitched sounds from polyphonic music recordings. The proposed model extends the probabilistic latent component analysis algorithm and supports the detection of pitched sounds from multiple instruments as well as the detection of un-pitched sounds from drum kit components, including bass drums, snare drums, cymbals, hi-hats, and toms. Our experiments based on polyphonic Western music containing both pitched and unpitched instruments led to very encouraging results in multi-pitch detection and drum transcription tasks. Index Terms — Music signal analysis, automatic music tran-scription, multi-pitch detection, drum transcription 1.