Results 1 -
7 of
7
Sound Source Separation in Monaural Music Signals
, 2006
"... Sound source separation refers to the task of estimating the signals produced by individual sound sources from a complex acoustic mixture. It has several applications, since monophonic signals can be processed more efficiently and flexibly than polyphonic mixtures. This thesis deals with the separat ..."
Abstract
-
Cited by 11 (3 self)
- Add to MetaCart
Sound source separation refers to the task of estimating the signals produced by individual sound sources from a complex acoustic mixture. It has several applications, since monophonic signals can be processed more efficiently and flexibly than polyphonic mixtures. This thesis deals with the separation of monaural, or, one-channel music recordings. We concentrate on separation methods, where the sources to be separated are not known beforehand. Instead, the separation is enabled by utilizing the common properties of real-world sound sources, which are their continuity, sparseness, and repetition in time and frequency, and their harmonic spectral structures. One of the separation approaches taken here use unsupervised learning and the other uses model-based inference based on sinusoidal modeling. Most of the existing unsupervised separation algorithms are based on a linear instantaneous signal model, where each frame of the input mixture signal is
CONVOLUTIVE NON-NEGATIVE MATRIX FACTORISATION WITH A SPARSENESS CONSTRAINT
"... Discovering a representation which allows auditory data to be parsimoniously represented is useful for many machine learning and signal processing tasks. Such a representation can be constructed by Non-negative Matrix Factorisation (NMF), a method for finding parts-based representations of non-negat ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Discovering a representation which allows auditory data to be parsimoniously represented is useful for many machine learning and signal processing tasks. Such a representation can be constructed by Non-negative Matrix Factorisation (NMF), a method for finding parts-based representations of non-negative data. We present an extension to NMF that is convolutive and includes a sparseness constraint. In combination with a spectral magnitude transform, this method discovers auditory objects and their associated sparse activation patterns. 1.
Discovering speech phones using convolutive non-negative matrix factorisation with a sparseness constraint
- Neurocomputing
, 2008
"... Discovering a representation that allows auditory data to be parsimoniously represented is useful for many machine learning and signal processing tasks. Such a representation can be constructed by Non-negative Matrix Factorisation (NMF), a method for finding parts-based representations of non-negati ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Discovering a representation that allows auditory data to be parsimoniously represented is useful for many machine learning and signal processing tasks. Such a representation can be constructed by Non-negative Matrix Factorisation (NMF), a method for finding parts-based representations of non-negative data. Here, we present an extension to convolutive NMF that includes a sparseness constraint, where the resultant algorithm has multiplicative updates and utilises the beta divergence as its reconstruction objective. In combination with a spectral magnitude transform of speech, this method discovers auditory objects that resemble speech phones along with their associated sparse activation patterns. We use these in a supervised separation scheme for monophonic mixtures, finding improved separation performance in comparison to classic convolutive NMF. Keywords: Non-negative matrix factorisation; Sparse representations; Convolutive
ANALYSIS-AND-MANIPULATION APPROACH TO PITCH AND DURATION OF MUSICAL INSTRUMENT SOUNDS WITHOUT DISTORTING TIMBRAL CHARACTERISTICS
"... This paper presents an analysis-manipulation method that can generate musical instrument sounds with arbitrary pitches and durations from the sound of a given musical instrument (called seed) without distorting its timbral characteristics. Based on psychoacoustical knowledge of the auditory effects ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
This paper presents an analysis-manipulation method that can generate musical instrument sounds with arbitrary pitches and durations from the sound of a given musical instrument (called seed) without distorting its timbral characteristics. Based on psychoacoustical knowledge of the auditory effects of timbres, we defined timbral features based on the spectrogram of the sound of a musical instrument as (i) the relative amplitudes of the harmonic peaks, (ii) the distribution of the inharmonic component, and (iii) temporal envelopes. First, to analyze the timbral features of a seed, it was separated into harmonic and inharmonic components using Itoyama’s integrated model. For pitch manipulation, we took into account the pitch-dependency of features (i) and (ii). We predicted the values of each feature by using a cubic polynomial that approximated the distribution of these features over pitches. To manipulate duration, we focused on preserving feature (iii) in the attack and decay duration of a seed. Therefore, only steady durations were expanded or shrunk. In addition, we propose a method for reproducing the properties of vibrato. Experimental results demonstrated the quality of the synthesized sounds produced using our method. The spectral and MFCC distances between the synthesized sounds and actual sounds of 32 instruments were reduced by 64.70 % and 32.31%, respectively. 1.
COMPRESSIVE SAMPLING OF NON-NEGATIVE SIGNALS
"... Traditional Nyquist-Shannon sampling dictates that a continuous time signal be sampled at twice its bandwidth to achieve perfect recovery. However, It has been recently demonstrated that by exploiting the structure of the signal, it is possible to sample a signal below the Nyquist rate and achieve p ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Traditional Nyquist-Shannon sampling dictates that a continuous time signal be sampled at twice its bandwidth to achieve perfect recovery. However, It has been recently demonstrated that by exploiting the structure of the signal, it is possible to sample a signal below the Nyquist rate and achieve perfect reconstruction using a random projection, sparse representation and an ℓ1-norm minimisation. These methods constitute a new and emerging theory known as Compressive Sampling (or Compressed sensing). Here, we apply Compressive Sampling to non-negative signals, and propose an algorithm—Non-negative Underdetermined Iteratively Reweighted Least Squares (NUIRLS) —for signal recovery. NUIRLS is derived within the framework of Non-negative Matrix Factorisation (NMF) and utilises Iteratively Reweighted Least Squares as its objective, recovering non-negative minimum ℓp-norm solutions, 0 ≤ p ≤ 1. We demonstrate that—for sufficiently sparse non-negative signals—the signals recovered by NUIRLS and NMF are essentially the same, which suggests that a non-negativity constraint is enough to recover sufficiently sparse signals. 1.
Shifted 2D Non-negative Tensor Factorisation
"... ... developed as a means of separating harmonic instruments from single channel mixtures. This technique uses a model which is convolutive in both time and frequency, and so can capture instruments which have both time-varying spectra and timevarying fundamental frequencies simultaneously. However, ..."
Abstract
- Add to MetaCart
... developed as a means of separating harmonic instruments from single channel mixtures. This technique uses a model which is convolutive in both time and frequency, and so can capture instruments which have both time-varying spectra and timevarying fundamental frequencies simultaneously. However, in many cases two or more channels are available, in which case it would be advantageous to have a multi-channel version of the algorithm. To this end, a shifted 2D Non-negative Tensor Factorisation algorithm is derived, which extends Non-negative Matrix Factor 2D Deconvolution to the multi-channel case. The use of this algorithm for multi-channel sound source separation of pitched instruments is demonstrated.
DRUM TRANSCRIPTION FROM MULTICHANNEL RECORDINGS WITH NON-NEGATIVE MATRIX FACTORIZATION
"... Automatic drum transcription enables handling symbolic data instead of plain acoustic information in music information retrieval applications. Usually the input to the transcription system is single-channel audio, and as a result the proposed solutions are designed for this kind of input. However, i ..."
Abstract
- Add to MetaCart
Automatic drum transcription enables handling symbolic data instead of plain acoustic information in music information retrieval applications. Usually the input to the transcription system is single-channel audio, and as a result the proposed solutions are designed for this kind of input. However, in studio environment the multichannel recording of the drums is often available. This paper proposes an extension to a non-negative matrix factorization drum transcription method to handle multichannel data. The method creates spectral templates for all target drums from all available channels, and in transcription estimates time-varying gains for each of them so that the sum approximates the recorded signal. Sound event onsets are detected from the estimated gains. The system is evaluated with multichannel data from a publicly available data set, and compared with other methods. The results suggest that the use of multiple channels instead of a single-channel mix improves the transcription result. 1.

