Results 1 - 10
of
20
Source/filter model for unsupervised main melody extraction from polyphonic audio signals
- IEEE Trans. on Audio, Speech, and Language Processing
, 2010
"... Abstract—Extracting the main melody from a polyphonic music recording seems natural even to untrained human listeners. To a certain extent it is related to the concept of source separation, with the human ability of focusing on a specific source in order to extract relevant information. In this pape ..."
Abstract
-
Cited by 37 (8 self)
- Add to MetaCart
(Show Context)
Abstract—Extracting the main melody from a polyphonic music recording seems natural even to untrained human listeners. To a certain extent it is related to the concept of source separation, with the human ability of focusing on a specific source in order to extract relevant information. In this paper, we propose a new approach for the estimation and extraction of the main melody (and in particular the leading vocal part) from polyphonic audio signals. To that aim, we propose a new signal model where the leading vocal part is explicitly represented by a specific source/filter model. The proposed representation is investigated in the framework of two statistical models: a Gaussian Scaled Mixture Model (GSMM) and an extended Instantaneous Mixture Model (IMM). For both models, the estimation of the different parameters is done within a maximumlikelihood framework adapted from single-channel source separation techniques. The desired sequence of fundamental frequencies is then inferred from the estimated parameters. The results obtained in a recent evaluation campaign (MIREX08) show that the proposed approaches are very promising and reach state-of-the-art performances on all test sets. Index Terms—Blind audio source separation, Expectation–Maximization (EM) algorithm, Gaussian scaled mixture model (GSMM), main melody extraction, maximum likelihood, music, non-negative matrix factorization (NMF), source/filter model, spectral analysis. I.
MUSIC/VOICE SEPARATION USING THE SIMILARITY MATRIX
"... Repetition is a fundamental element in generating and perceiving structure in music. Recent work has applied this principle to separate the musical background from the vocal foreground in a mixture, by simply extracting the underlying repeating structure. While existing methods are effective, they d ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
(Show Context)
Repetition is a fundamental element in generating and perceiving structure in music. Recent work has applied this principle to separate the musical background from the vocal foreground in a mixture, by simply extracting the underlying repeating structure. While existing methods are effective, they depend on an assumption of periodically repeating patterns. In this work, we generalize the repetitionbased source separation approach to handle cases where repetitions also happen intermittently or without a fixed period, thus allowing the processing of music pieces with fast-varying repeating structures and isolated repeating elements. Instead of looking for periodicities, the proposed method uses a similarity matrix to identify the repeating elements. It then calculates a repeating spectrogram model using the median and extracts the repeating patterns using a time-frequency masking. Evaluation on a data set of 14 full-track real-world pop songs showed that use of a similarity matrix can overall improve on the separation performance compared with a previous repetition-based source separation method, and a recent competitive music/voice separation method, while still being computationally efficient. 1.
A Tandem Algorithm for Singing Pitch Extraction and Voice Separation From Music Accompaniment
"... Abstract—Singing pitch estimation and singing voice separation are challenging due to the presence of music accompaniments that are often nonstationary and harmonic. Inspired by computational auditory scene analysis (CASA), this paper investigates a tandem algorithm that estimates the singing pitch ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
(Show Context)
Abstract—Singing pitch estimation and singing voice separation are challenging due to the presence of music accompaniments that are often nonstationary and harmonic. Inspired by computational auditory scene analysis (CASA), this paper investigates a tandem algorithm that estimates the singing pitch and separates the singing voice jointly and iteratively. Rough pitches are first estimated and then used to separate the target singer by considering harmonicity and temporal continuity. The separated singing voice and estimated pitches are used to improve each other iteratively. To enhance the performance of the tandem algorithm for dealing with musical recordings, we propose a trend estimation algorithm to detect the pitch ranges of a singing voice in each time frame. The detected trend substantially reduces the difficulty of singing pitch detection by removing a large number of wrong pitch candidates either produced by musical instruments or the overtones of the singing voice. Systematic evaluation shows that the tandem algorithm outperforms previous systems for pitch extraction and singing voice separation. Index Terms—Computational auditory scene analysis (CASA), iterative procedure, pitch extraction, singing voice separation, tandem algorithm. I.
VOCALIST GENDER RECOGNITION IN RECORDED POPULAR MUSIC
"... We introduce the task of vocalist gender recognition in popular music and evaluate the benefit of Non-Negative Matrix Factorization based enhancement of melodic components to this aim. The underlying automatic separation of drum beats is described in detail, and the obtained significant gain by its ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
We introduce the task of vocalist gender recognition in popular music and evaluate the benefit of Non-Negative Matrix Factorization based enhancement of melodic components to this aim. The underlying automatic separation of drum beats is described in detail, and the obtained significant gain by its use is verified in extensive test-runs on a novel database of 1.5 days of MP3 coded popular songs based on transcriptions of the Karaoke-game UltraStar. As classifiers serve Support Vector Machines and Hidden Naive Bayes. Overall, the suggested methods lead to fully automatic recognition of the pre-dominant vocalist gender at 87.31 % accuracy on song level for artists unkown to the system in originally recorded music. 1.
Automatic Transcription of Polyphonic Music Exploiting Temporal Evolution
, 2012
"... Automatic music transcription is the process of converting an audio recording into a symbolic representation using musical notation. It has numerous ap-plications in music information retrieval, computational musicology, and the creation of interactive systems. Even for expert musicians, transcrib ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
Automatic music transcription is the process of converting an audio recording into a symbolic representation using musical notation. It has numerous ap-plications in music information retrieval, computational musicology, and the creation of interactive systems. Even for expert musicians, transcribing poly-phonic pieces of music is not a trivial task, and while the problem of automatic pitch estimation for monophonic signals is considered to be solved, the creation of an automated system able to transcribe polyphonic music without setting restrictions on the degree of polyphony and the instrument type still remains
Efficient implementation of a system for solo accompaniment separation in polyphonic music
- in Proc. 20th Eur. Signal Process. Conf
"... Our goal is to obtain improved perceptual quality for sepa-rated solo instruments and accompaniment in polyphonic mu-sic. The proposed approach uses a pitch detection algorithm in conjunction with a spectral filtering based source separa-tion. The algorithm was designed to work with polyphonic signa ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
(Show Context)
Our goal is to obtain improved perceptual quality for sepa-rated solo instruments and accompaniment in polyphonic mu-sic. The proposed approach uses a pitch detection algorithm in conjunction with a spectral filtering based source separa-tion. The algorithm was designed to work with polyphonic signals regardless of the main instrument, type of accompani-ment or musical style. Our approach features a fundamental frequency estimation stage, a refined harmonic structure for the spectral mask and a post-processing stage to reduce ar-tifacts. The processing chain has been kept light. The use of perceptual measures for quality assessment revealed im-proved quality in the extracted signals with respect to our pre-vious approach. The results obtained with our algorithm were compared with other state-of-the-art algorithms under SISEC
Music information retrieval meets music education
- Multimodal Music Processing, volume 3 of Dagstuhl Follow-Ups
, 2012
"... This paper addresses the use of Music Information Retrieval (MIR) techniques in music education and their integration in learning software. A general overview of systems that are either commercially available or in research stage is presented. Furthermore, three well-known MIR methods used in music ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
This paper addresses the use of Music Information Retrieval (MIR) techniques in music education and their integration in learning software. A general overview of systems that are either commercially available or in research stage is presented. Furthermore, three well-known MIR methods used in music learning systems and their state-of-the-art are described: music transcription, solo and accompaniment track creation, and generation of performance instructions. As a representative example of a music learning system developed within the MIR community, the Songs See software is outlined. Finally, challenges and directions for future research are described.
Automatic Transcription of Pitch Content in Music and Selected Applications
"... Transcription of music refers to the analysis of a music signal in order to produce a parametric representation of the sounding notes in the signal. This is conventionally carried out by listening to a piece of music and writing down the symbols of common musical notation to represent the occurring ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
(Show Context)
Transcription of music refers to the analysis of a music signal in order to produce a parametric representation of the sounding notes in the signal. This is conventionally carried out by listening to a piece of music and writing down the symbols of common musical notation to represent the occurring notes in the piece. Automatic transcription of music refers to the extraction of such representations using signal-processing methods. This thesis concerns the automatic transcription of pitched notes in musical audio and its applications. Emphasis is laid on the transcription of realistic polyphonic music, where multiple pitched and percussive instruments are sounding simultaneously. The methods included in this thesis are based on a framework which combines both low-level acoustic modeling and high-level musicological modeling. The emphasis in the acoustic modeling has been set to note events so that the methods produce discrete-pitch notes with onset times and durations
Combining rhythm-based and pitch-based methods for background and melody separation
- IEEE/ACM Trans. Audio, Speech, and Language Processing
, 2014
"... Abstract—Musical works are often composed of two character-istic components: the background (typically the musical accom-paniment), which generally exhibits a strong rhythmic structure with distinctive repeating time elements, and the melody (typically the singing voice or a solo instrument), which ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
(Show Context)
Abstract—Musical works are often composed of two character-istic components: the background (typically the musical accom-paniment), which generally exhibits a strong rhythmic structure with distinctive repeating time elements, and the melody (typically the singing voice or a solo instrument), which generally exhibits a strong harmonic structure with a distinctive predominant pitch contour. Drawing from findings in cognitive psychology, we pro-pose to investigate the simple combination of two dedicated ap-proaches for separating those two components: a rhythm-based method that focuses on extracting the background via a rhythmic mask derived from identifying the repeating time elements in the mixture and a pitch-based method that focuses on extracting the melody via a harmonic mask derived from identifying the predom-inant pitch contour in the mixture. Evaluation on a data set of song clips showed that combining such two contrasting yet complemen-tary methods can help to improve separation performance—from the point of view of both components—compared with using only one of those methods, and also compared with two other state-of-the-art approaches. Index Terms—Background, melody, pitch, rhythm, separation. I.
ADAPTATION OF A SPEECH RECOGNIZER FOR SINGING VOICE
"... This paper studies the speaker adaptation techniques that can be applied for adapting a speech recognizer to singing voice. Maximum likelihood linear regression (MLLR) techniques are studied, with specific details in choosing the number and types of transforms. The recognition performance of the dif ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
This paper studies the speaker adaptation techniques that can be applied for adapting a speech recognizer to singing voice. Maximum likelihood linear regression (MLLR) techniques are studied, with specific details in choosing the number and types of transforms. The recognition performance of the different methods is measured in terms of phoneme recognition rate and singing-to-lyrics alignment errors of the adapted recognizers. Different methods improve the correct recognition rate with up to 10 percentage units, compared to the non-adapted system. In singing-to-lyrics alignment we obtain a best of 0.94 seconds mean absolute alignment error, compared to 1.26 seconds for the non-adapted system. Global adaptation was found to provide the most improvement in the performance, but small further improvement was obtained with regression tree adaptation. 1.