#### DMCA

## Non-negative Matrix Factorization based Algorithms to cluster Frequency Basis Functions for Monaural Sound Source Separation (2013)

### Citations

11946 | Maximum-likelihood from incomplete data via the EM algorithm
- Dempster, Laird, et al.
- 1977
(Show Context)
Citation Context ...d the KL divergence as detailed in [23]. Convergence proofs The convergence proofs can be derived by making use of an auxiliary function similar to that used in the Expectation-Maximization algorithm =-=[26]-=-, [27]. Let g(b, b ′ ) be an auxiliary function for function f such that it satisfies the following conditions: g(b, b ′ ) ≥ f(b) g(b, b) = f(b) (1.42) Then, function f is non-increasing under the fol... |

2412 | Nonlinear dimensionality reduction by locally linear embedding
- Roweis, Saul
- 2000
(Show Context)
Citation Context ...based 83 on the simple modification of the clustering stage of the source-filter model as shown above in the block diagram. Here, we use LLE, a dimension reduction technique proposed by Saul et al in =-=[74]-=-. The Mel-scale basis functions are obtained from NMF frequency basis function in a similar manner as detailed in section 1.6.1. Then, an attempt is made to group the frequency basis functions in mel ... |

2305 | Survey of independent component analysis
- Hyvärinen
- 1999
(Show Context)
Citation Context ...tions. 1.5.1 Independent Component Analysis ICA has been successfully used to solve blind source separation problems in several application areas [64, 67]. A survey of ICA based algorithms is done in =-=[63]-=-. ICA separates an observation vector by finding a de-mixing matrix, so that the estimated variables, the elements of vector, are statistically independent 38 from each other. Consider the cocktail pa... |

1848 |
Independent component analysis, a new concept? Signal processing
- Comon
- 1994
(Show Context)
Citation Context ...ources from a given mixture. 1.5 DSP methods for Source Separation Independent Component Analysis (ICA) was developed for the estimation of sound signals (independent components) from a given mixture =-=[66, 57, 58]-=- in case of determined systems. Another method Degenerate Unmixing Estimation Technique (DUET) [47] was proposed to separate a given source from an audio mixture using a time-frequency mask correspond... |

955 | Sparse coding with an overcomplete basis set: a strategy employed by V1?, Vision Res. 37 - Olshausen, Field - 1997 |

883 | Fast and robust fixed-point algorithms for independent component analysis
- Hyvärinen
- 1999
(Show Context)
Citation Context ...ources from a given mixture. 1.5 DSP methods for Source Separation Independent Component Analysis (ICA) was developed for the estimation of sound signals (independent components) from a given mixture =-=[66, 57, 58]-=- in case of determined systems. Another method Degenerate Unmixing Estimation Technique (DUET) [47] was proposed to separate a given source from an audio mixture using a time-frequency mask correspond... |

737 |
Introduction to Random Signals and Applied Kalman Filtering with Matlab Exercises and Solutions, Wiley, 3rd Edition
- Brown, Hwang
- 1997
(Show Context)
Citation Context ... In effect, the Wiener filter method allocates energy in a given time-frequency bin to the sources according to a least-square best fit. Thus, the masks obtained are optimal in the lease square sense =-=[54]-=-. However, the masks generated does not give any quantitative measure to justify that they are equally good in the perceptual sense. Hence, it can be argued that from a perceptual point of view, other... |

687 | Auditory scene analysis
- Bregman
- 1990
(Show Context)
Citation Context ...sed over the years consistently in Audio Source Separation. But to make it clearer, by source one actually means Auditory Streams which is to be understood the same way as Bregman used it decades ago =-=[61]-=-. An auditory stream is produced by a continuous activity of a physical source in the form of waves by interaction with the environment. For example, in the case of a piano played in a closed reverber... |

528 | Blind signal separation: Statistical principles
- Cardoso
- 1998
(Show Context)
Citation Context ...ources from a given mixture. 1.5 DSP methods for Source Separation Independent Component Analysis (ICA) was developed for the estimation of sound signals (independent components) from a given mixture =-=[66, 57, 58]-=- in case of determined systems. Another method Degenerate Unmixing Estimation Technique (DUET) [47] was proposed to separate a given source from an audio mixture using a time-frequency mask correspond... |

490 |
What is the goal of sensory coding
- Field, J
- 1994
(Show Context)
Citation Context ... group sparsity in NMF with IS divergence The term sparse refers to a signal model, where only a few units of data out of a large population can be used to efficiently represent a typical data vector =-=[14]-=-. A property of NMF is that it typically generates a sparse representation of the given audio data. This makes the frequency basis function sparse in nature. However, NMF does not impose any quantitat... |

486 | Non-negative matrix factorization with sparseness constraints
- HOYER
- 2004
(Show Context)
Citation Context ...s corresponding to sources were estimated by minimizing the weighted divergence between the above model and the observed power spectrogram. Another method was proposed in [28] that uses sparse coding =-=[17]-=- with some modifications and as well as a temporal continuity constraint [31]. A cost term, comprised of the sum of squared differences between the gains in the adjacent frames of the activation funct... |

481 |
Psychoacoustics, Facts and Models
- Zwicker, Fastl
- 1990
(Show Context)
Citation Context ...he high-energy components of the input signal can be compressed by modelling the loudness perception of human auditory system using perceptually motivated weights for each critical band in each frame =-=[30]-=-. Thus, for each critical band, the perceptually significant low-energy characteristics of sources can also be estimated. The individual components corresponding to sources were estimated by minimizin... |

371 |
An information– maximization approach to blind separation and blind deconvolution.
- Bell, Sejnowski
- 1995
(Show Context)
Citation Context ...by computing s(i)j = wTj x(i). ICA Algorithm While there are many ICA algorithms, here we present a derivation of a method for Maximum likelihood estimation to find independent sources as detailed in =-=[59]-=-. We suppose that the distribution of each source si is given by a density p(s), and that the joint distribution of the sources s is given by p(s) = n∏ i=1 ps(si) (1.18) It can be noted that preproces... |

321 | Blind separation of speech mixtures via time-frequency masking,"
- Yilmaz, Rickard
- 2004
(Show Context)
Citation Context ... the sounds will interfere with each other in the mixture and the estimated parameters for each source Ai and Di will deviate from its actual value. An algorithm was proposed to solve this problem in =-=[49]-=-. The algorithm was based on the fact that the estimated attenuation and time delay parameters (Ai, Di) for each source will still contain values within the close range of the actual parameter value. ... |

298 | Mel frequency cepstral coefficients for music modeling - Logan - 2000 |

268 |
Performance measurement in blind audio source separation
- Vincent, Gribonval, et al.
- 2006
(Show Context)
Citation Context ...e signal-to-interference ratio (SIR), and the signal-to-artifacts ratio (SAR). These measures are widely used for the evaluation of separation quality and the details of these metrics can be found in =-=[89]-=-. SDR determines the overall sound quality of the recovered signal, SIR measures the interference of other sources in the separates sound source and SAR calculates the artefacts present in separated s... |

243 |
A Fast Computational Algorithm for the Discrete Cosine Transform
- Chen, Smith, et al.
- 1977
(Show Context)
Citation Context ...r but it also increases the offset value log(c). Therefore, there is a trade off between these two and the value of at was 72 determined through experiments. Then, the Discrete Cosine Transform (DCT) =-=[70]-=- is used to separate out or decorrelate the source component and spectral component by dropping out eigenvalues corresponding to signal energy (first coefficient) and higher frequency components. In t... |

207 |
Computational auditory scene analysis,”
- Brown, Cooke
- 1994
(Show Context)
Citation Context ...at can replicate the human auditory system for ASA. The Computational modelling of the human auditory system to process real world sound signals is called Computational Auditory Scene Analysis (CASA) =-=[3]-=-, [4]. CASA systems aim to computationally implement the rules derived from psychoacoustics to segregate or stream the components of sounds in a similar way as human hearing. CASA systems aim to be ab... |

206 | Charting a manifold,
- Brand
- 2003
(Show Context)
Citation Context ...hod were not consistent, i.e. we were getting a random separation each time for the same test mixture. This may be due to the fact that NMF gives a sparse representation and for the reasons stated in =-=[75, 76]-=-, LLE is quite sensitive to the sparse data sets. Processing of sparse data by LLE results in deteriorating the local geometry of the data manifolds in the embedding space. This is because the reconst... |

182 | Monaural Sound Source Separation by Nonnegative Matrix Factorization With Temporal Continuity and Sparseness Criteria
- Virtanen
- 2007
(Show Context)
Citation Context ...ection 1.4. In recent years, many factorisation techniques, such as Non-negative Matrix Factorisation (NMF) [23] of magnitude spectrograms have been proposed to separate out sources from spectrograms =-=[24, 28, 25]-=-. NMF decomposes a spectrogram into frequency basis functions which typically corresponds to the notes and the chords in the given mixture. It is important to note that the number of notes present in ... |

171 | Signal estimation from modified short-time fourier transform
- Griffin, Lim
- 1984
(Show Context)
Citation Context ...ls from the separated source spectrograms. Many attempts have been made to overcome this problem. Griffin and Lim proposed a phase estimation technique to recover the phase of the source spectrograms =-=[35]-=-. Le Roux et al [36] have used explicit consistency constraints on the STFT spectrograms for the phase reconstruction. An alternative approach to resynthesize the recovered signals was to simply reuse... |

154 | Calculation of a constant Q spectral transform - Brown - 1991 |

152 |
Nonnegative matrix factorization with the Itakura-Saito divergence. With application to music analysis
- Févotte, Bertin, et al.
- 2009
(Show Context)
Citation Context ...both A and B and it would not be feasible to find the global minima. A detailed description and derivation of IS divergence update equations and its application in musical analysis can be be found in =-=[22]-=-. Having discussed the convergence proofs of the cost functions, now we will explain how the decomposition of the magnitude spectrogram using NMF is of benefit in musical applications especially sound... |

125 | Blind separation of disjoint orthogonal signals: demixing n sources from 2 mixtures,”
- Jourjine, Rickard, et al.
- 2000
(Show Context)
Citation Context ...e can be obtained using binary time-frequency masks provided the TFRs of the individual sources present do not overlap with each other [47]. This phenomenon is known as W-disjoint orthogonality (WDO) =-=[50]-=-. Let x(t) denote a mixture signal containing p number of sources such that x(t) = P∑ i=1 si(t) (1.22) 42 where si(t) represents signals produced by the i th source. Further, the time-frequency repres... |

94 | Aggregate and mixed-order markov models for statistical language processing
- Saul, Pereira
- 1997
(Show Context)
Citation Context ...KL divergence as detailed in [23]. Convergence proofs The convergence proofs can be derived by making use of an auxiliary function similar to that used in the Expectation-Maximization algorithm [26], =-=[27]-=-. Let g(b, b ′ ) be an auxiliary function for function f such that it satisfies the following conditions: g(b, b ′ ) ≥ f(b) g(b, b) = f(b) (1.42) Then, function f is non-increasing under the following... |

91 |
Algorithm 862: MATLAB tensor classes for fast algorithm prototyping
- Bader, Kolda
(Show Context)
Citation Context ... resolution. Notations We now define the parameters and notations used in the SNMF model. The notations for tensor parameters used to define the SNMF model [44] is as per the conventions described in =-=[90]-=-. Calligraphic upper-case letters (R) are used to denote tensors of any given dimension. A contracted tensor product of two tensors of finite dimension is defined as follows. Let a tensor R be of dime... |

64 | On the approximate w-disjoint orthogonality of speech,”
- Rickard, Yilmaz
- 2002
(Show Context)
Citation Context ... developed for the estimation of sound signals (independent components) from a given mixture [66, 57, 58] in case of determined systems. Another method Degenerate Unmixing Estimation Technique (DUET) =-=[47]-=- was proposed to separate a given source from an audio mixture using a time-frequency mask corresponding to that source. Barry et al developed a source separation algorithm known as ADRess that uses t... |

63 |
Computational auditory scene analysis: a representational approach
- Brown
- 1992
(Show Context)
Citation Context ...n replicate the human auditory system for ASA. The Computational modelling of the human auditory system to process real world sound signals is called Computational Auditory Scene Analysis (CASA) [3], =-=[4]-=-. CASA systems aim to computationally implement the rules derived from psychoacoustics to segregate or stream the components of sounds in a similar way as human hearing. CASA systems aim to be able to... |

57 |
Sound source separation using sparse coding with temporal continuity objective
- Virtanen
- 2003
(Show Context)
Citation Context ...is functions in the original implementation of GS [53]. However, many recent works in audio have used NMF of magnitude spectra instead of power spectra because it gave better sound separation quality =-=[28, 68, 41]-=-. Therefore, we will use magnitude spectrograms for the the calculation of the frequency basis functions. To this end, we propose that this incorporation of GS in NMF of magnitude spectra may improve ... |

52 |
Polyphonic transcription by nonnegative sparse coding of power spectra
- Abdallah, Plumbley
- 2004
(Show Context)
Citation Context ...l continuity and sparseness was favoured by penalizing non-zero gains in B. In [82], the clustering was done manually. A non-negative sparse coding algorithm was suggested by Abdallah and Plumbley in =-=[24]-=- that assumes that the sources sum in the power spectral domain, so that the observation vector and basis functions are power spectra. Despite these improvements to group the NMF basis function for so... |

45 | Supervised and semi-supervised separation of sounds from singlechannel mixtures
- Smaragdis, Raj, et al.
- 2007
(Show Context)
Citation Context ...to set the optimal level of sparsity automatically. Nevertheless, there are cases in which additional constraints may be imposed to control the degree of sparseness to identify components in mixtures =-=[16, 28]-=-. 74 Such a constraint has been proposed by [53] that generates a set of NMF basis functions which benefits from sparsity at a group level. Given a magnitude spectrogram, X of size m × n, the power sp... |

42 | Non negative sparse representation for Wiener based source separation with a single sensor
- Benaroya, Gribonval, et al.
(Show Context)
Citation Context ...iginal complex valued spectrogram to obtain the complex-valued individual source spectrograms. The generalised Wiener filter in the context of monaural separation was first proposed by Benaroya et al =-=[34]-=-. Recently, Le Roux et al [37] have utilised a spectrogram consistency constraint to obtain better performing masks for phase estimation of the recovered spectrograms. It can be noted that the creatio... |

42 |
Constant-q transform toolbox for music processing
- Schörkhuber, Klapuri
- 2010
(Show Context)
Citation Context ...g a sparse matrix Y. The CQT can then be obtained by multiplying a linear domain spectrogram P by the the conjugate transpose of the sparse matrix Y as shown in equation. Following the terminology of =-=[85]-=- we will the address the matrix Y as a spectral kernel. Q = Y∗P (3.1) The CQT method [85] used here is an extension to the method discussed above[84]. Earlier, since a wide range of frequencies was co... |

41 | Bayesian extensions to nonnegative matrix factorisation for audio signal modelling
- Virtanen, Cemgil, et al.
- 2008
(Show Context)
Citation Context ...he iterative multiplicative updates used for the translated frequency basis functions in D are determined in a similar manner as done in [44]. This can be formulated as follows: D ← D · (〈〈PA〉{1,1}H〉{=-=[1,3]-=-,[1,3]} 〈〈PO〉{1,1}H〉{[1,3],[1,3]} ) (4.5) where O of size n×r is a tensor of all ones. Similarly, the multiplicative updates for the activation functions in H are calculated as follows: H ← H · ( 〈〈PD... |

39 |
V.: Subjective and objective quality assessment of audio source separation
- Emiya, Vincent, et al.
(Show Context)
Citation Context ...imited to testing and evaluating the performance of the proposed family of masks using various separation algorithms. I was also involved in the discussion of the results obtained using PEASS toolbox =-=[38]-=-. However, the original idea and the derivation of the divergence based masks is done by Derry Fitzgerald [39]. 6.2 Divergence-Based Masks As mentioned previously in section 6.1, the generalised Wiene... |

37 | Audio-visual enhancement of speech noise,” - Girin, Schwartz, et al. - 2001 |

35 | Drum transcription with nonnegative spectrogram factorization.
- Paulus, Virtanen
- 2005
(Show Context)
Citation Context ...step in automatic music transcription. It is comparatively easier to estimate the fundamental frequencies corresponding to individual notes for a given instrument rather than a mixture of instruments =-=[5]-=-. SSS can be used for automatic speech recognition. When, speaking in a microphone, such as a mobile phone, there may be sources of interference like background noise, that can deteriorate the target ... |

35 | Algorithms for nonnegative matrix factorization with the beta-divergence,” Neural Computation
- Févotte, Idier
(Show Context)
Citation Context ...xyβ−1) if β ∈ {0, 2} x log x y + y − x if β = 1 x y − log x y − 1 if β = 0 (1.35) It can be noted that the Dβ is continuous for β at 0 and 1. A thorough review of the beta divergences can be found in =-=[60]-=-. The three most commonly used divergences which are a part of the family of beta divergence are as follows: DEUC(x, y) = 1 2 (x− y)2 the Euclidean norm DKL(x, y) = xlog x y + y − x the Kullback-Leibl... |

33 | Discovering Auditory Objects Through NonNegativity Constraints,''
- Smaragdis
- 2004
(Show Context)
Citation Context ...ke use of the magnitude or the power spectrograms and disregard the phase information of the given audio signal. In general, the SSS algorithms filter out the phase information from the audio signals =-=[1, 33]-=- and reduce the algorithm to a subset of an image signal processing problem. This helps in reducing the complexity in the analysis of the signal in order to separate meaningful identities to reconstru... |

32 | Sound Source Separation: Azimuth Discrimination and Resynthesis
- Barry, Lawlor, et al.
- 2004
(Show Context)
Citation Context ... two microphones, hence it will not work in the case of mono signals. 1.5.3 Azimuth Discrimination and Resynthesis (ADRess) ADRess is an efficient source separation algorithm developed by Barry et al =-=[62]-=-. It is based on azimuth discrimination of sources within the stereo field. It uses the pan positions to estimate the sources in stereo recordings. The algorithm is designed for stereo recordings made... |

31 |
Extended nonnegative tensor factorisation models for musical sound source separation
- FitzGerald, Cranitch, et al.
(Show Context)
Citation Context ...basis functions. This method of unsupervised clustering is explained in the following section. 1.6.1 Source-Filter Based Clustering for Monaural BSS Separation According to the source-filter model in =-=[20]-=- and [79], each frequency basis vector in A is a product of an excitation or source signal E and an instrument-specific resonance filter R. These filters are mainly responsible for the formants in the... |

30 | Automatic drum transcription and source separation. - FitzGerald - 2004 |

29 |
Mixing audiovisual speech processing and blind source separation for the extraction of speech signals from convolutive mixtures,”
- Rivet, Girin, et al.
- 2007
(Show Context)
Citation Context ...ement of the source speaker and spatial location of a source operate quite well in a relative sense to help in focusing on separating individual sources from complex mixtures even in noisy conditions =-=[21]-=-. The psychoacoustic cues such as binaural masking, source 10 classification and sound localization also help in filtering out the separate sounds from a sound mixture. The field of study that deals w... |

25 | Computing Mel-frequency cepstral coefficients on the power spectrum,”
- Molau, Pitz, et al.
- 2001
(Show Context)
Citation Context ...urce separation method exploits this instrument-specific information to filter out the resonance effect in the mixture. Here, we will briefly cover the calculation of MFCC coefficients. The Mel scale =-=[72]-=- is defined as a perceptual scale of any two consecutive pitches perceived by listeners to be equidistant from one another. The frequency f in mels m is given by: R = 2595log10 ( f 700 + 1 ) (1.77) 69... |

22 |
Music transcription with ISA and HMM
- Vincent, Rodet
- 2004
(Show Context)
Citation Context ...ce between the above model and the observed power spectrogram. Another method was proposed in [28] that uses sparse coding [17] with some modifications and as well as a temporal continuity constraint =-=[31]-=-. A cost term, comprised of the sum of squared differences between the gains in the adjacent frames of the activation function in B, was used to impart the temporal continuity and sparseness was favou... |

15 |
Shifted non-negative matrix factorisation for sound source separation
- FitzGerald, Coyle
(Show Context)
Citation Context ...here is still room for improvement in clustering the basis functions to sources. Recently, Shifted NMF (SNMF) was proposed in order to avoid the need of 17 clustering of the frequency basis functions =-=[44]-=-. The SNMF algorithm assumes that the timbre of the notes played by an instrument remains constant. However, this assumption is not true in general. Another drawback of using the SNMF algorithm is tha... |

15 | Robust locally linear embedding,”
- Chang, Yeung
- 2006
(Show Context)
Citation Context ...hod were not consistent, i.e. we were getting a random separation each time for the same test mixture. This may be due to the fact that NMF gives a sparse representation and for the reasons stated in =-=[75, 76]-=-, LLE is quite sensitive to the sparse data sets. Processing of sparse data by LLE results in deteriorating the local geometry of the data manifolds in the embedding space. This is because the reconst... |

15 | Foundations of Modern Auditory Theory - Tobias - 1970 |

15 | Unsupervised single-channel music source separation by average harmonic structure modeling
- Duan, Zhang, et al.
(Show Context)
Citation Context ...ctions. This method of unsupervised clustering is explained in the following section. 1.6.1 Source-Filter Based Clustering for Monaural BSS Separation According to the source-filter model in [20] and =-=[79]-=-, each frequency basis vector in A is a product of an excitation or source signal E and an instrument-specific resonance filter R. These filters are mainly responsible for the formants in the mixture.... |

14 |
Constructing an Invertible Constant-Q Transform with Nonstationary Gabor Frames,” in
- Velasco, Holighaus, et al.
- 2011
(Show Context)
Citation Context ...mproved, in the order of 10 dB over the original implementation of the SNMF separation algorithm. It is important to note that more recently, a new invertible CQT method was proposed by Velasco et al =-=[55]-=-, where it uses the non-stationary Gabor frames [56] to reconstruct more efficient and near to perfect inverse CQT from the CQT transform. However, we have not tested the proposed CQT method [55] to e... |

11 | Explicit consistency constraints for STFT spectrograms and their application to phase reconstruction
- Roux, Ono, et al.
- 2008
(Show Context)
Citation Context ...d source spectrograms. Many attempts have been made to overcome this problem. Griffin and Lim proposed a phase estimation technique to recover the phase of the source spectrograms [35]. Le Roux et al =-=[36]-=- have used explicit consistency constraints on the STFT spectrograms for the phase reconstruction. An alternative approach to resynthesize the recovered signals was to simply reuse the phase of the gi... |

11 | Itakura-Saito nonnegative matrix factorization with group sparsity
- Lefèvre, Bach, et al.
- 2011
(Show Context)
Citation Context ...Nevertheless, there are cases in which additional constraints may be imposed to control the degree of sparseness to identify components in mixtures [16, 28]. 74 Such a constraint has been proposed by =-=[53]-=- that generates a set of NMF basis functions which benefits from sparsity at a group level. Given a magnitude spectrogram, X of size m × n, the power spectrogram can be calculated by V = |X|2 (1.84) T... |

10 | Monaural sound source separation by perceptually weighted non-negative matrix factorization,”
- Virtanen
- 2007
(Show Context)
Citation Context ... may yield better grouping of the frequency basis functions. Here, we use the relation between KL-NMF (NMF using 134 KL divergence) and ML problem of estimating A and B using the Poisson distribution =-=[29]-=- as explained in section 5.3. We also propose that the GS constraint can further be integrated in the SNMF model for better separation of the individual sources. Here, the SNMF model refers to the SNM... |

10 | An independent component analysis approach to automatic music transcription
- Abdallah, Plumbley
- 2003
(Show Context)
Citation Context ... We will discuss these techniques in the following sections. 1.5.1 Independent Component Analysis ICA has been successfully used to solve blind source separation problems in several application areas =-=[64, 67]-=-. A survey of ICA based algorithms is done in [63]. ICA separates an observation vector by finding a de-mixing matrix, so that the estimated variables, the elements of vector, are statistically indepe... |

10 |
An efficient algorithm for the calculation of a Constant Q
- Brown, Puckette
- 1992
(Show Context)
Citation Context ...dio mixture signal Figure 1.4 shows the constant Q magnitude spectrogram of a test signal containing music signals of two pitched instruments. We will first discuss the calculation of CQT detailed in =-=[84]-=-. In western music, 34 according to even tempered chromatic scale [52], the fundamental frequencies of the adjacent notes are geometrically spaced by a factor of 12 √ 2. Thus, a frequency spacing of 1... |

9 | Consistent wiener filtering: Generalized time-frequency masking respecting spectrogram consistency
- LeRoux, Vincent, et al.
- 2010
(Show Context)
Citation Context ...ram to obtain the complex-valued individual source spectrograms. The generalised Wiener filter in the context of monaural separation was first proposed by Benaroya et al [34]. Recently, Le Roux et al =-=[37]-=- have utilised a spectrogram consistency constraint to obtain better performing masks for phase estimation of the recovered spectrograms. It can be noted that the creation of soft masks is the same as... |

8 |
Generalised Short-Time Power Spectra and Autocorrelation Functions”,
- Schroeder, Atal
- 1962
(Show Context)
Citation Context ...Constant Q transform (CQT). 28 1.4.1 Short-time Fourier Transform (STFT) The Short-Time Fourier Transform (STFT) is a powerful general-purpose tool for obtaining a TFR. The STFT was first proposed in =-=[9]-=-. The STFT is used for analysing non-stationary signals, whose frequency characteristics vary with time. In essence, the STFT extracts several frames of the signal to be analysed with a window that mo... |

8 | Investigating single-channel audio source separation methods based on non-negative matrix factorization
- Wang, Plumbley
- 2006
(Show Context)
Citation Context ...ared differences between the gains in the adjacent frames of the activation function in B, was used to impart the temporal continuity and sparseness was favoured by penalizing non-zero gains in B. In =-=[82]-=-, the clustering was done manually. A non-negative sparse coding algorithm was suggested by Abdallah and Plumbley in [24] that assumes that the sources sum in the power spectral domain, so that the ob... |

7 |
and tuning”, in The psychology of music,
- Burns, “Intervals
- 1999
(Show Context)
Citation Context ...am of a test signal containing music signals of two pitched instruments. We will first discuss the calculation of CQT detailed in [84]. In western music, 34 according to even tempered chromatic scale =-=[52]-=-, the fundamental frequencies of the adjacent notes are geometrically spaced by a factor of 12 √ 2. Thus, a frequency spacing of 12 √ 2f would cover all the notes for musical analysis. Therefore, the ... |

7 |
Nonstationary Gabor frames
- Jaillet, Balazs, et al.
- 2009
(Show Context)
Citation Context ...lementation of the SNMF separation algorithm. It is important to note that more recently, a new invertible CQT method was proposed by Velasco et al [55], where it uses the non-stationary Gabor frames =-=[56]-=- to reconstruct more efficient and near to perfect inverse CQT from the CQT transform. However, we have not tested the proposed CQT method [55] to evaluate the performance in context of the separation... |

6 |
Constant-Q signal analysis and synthesis
- Youngenberg, Boll
- 1978
(Show Context)
Citation Context ... [k] for each value of k and k varies from 1, 2, . . .K which indexes the frequency bins in the Constant Q domain. The CQT was first proposed by JC Brown [45] inspired by many earlier works including =-=[10, 11, 12]-=-. Figure 1.4: Constant Q Spectrogram of an audio mixture signal Figure 1.4 shows the constant Q magnitude spectrogram of a test signal containing music signals of two pitched instruments. We will firs... |

6 | Source-filter based clustering for monaural blind source separation.
- SPIERTZ, GNANN
- 2009
(Show Context)
Citation Context ...cluster the basis functions obtained from factorisation techniques. Supervised clustering methods have been discussed in [28] and [29] to map the separated signals to their sources. Spiertz and Gnann =-=[41]-=- have used a source-filter model to cluster the separated frequency basis functions by mapping the basis functions to the Mel frequency cepstral domain where clustering is performed. While these metho... |

6 |
Dictionary Learning methods for single-channel audio source separation
- Lefèvre
- 2012
(Show Context)
Citation Context ...tions the linear summation of magnitude spectra has produced better results. With the assumption that the local amplitudes of the sources are independent from each other and for the reasons stated in =-=[51]-=- [22], the minimisation of the IS divergence cost function, DIS(V||V̂) (see equation 1.36) is equivalent to the maximum likelihood problem of estimating (W, H) in sum of Gaussian components. This is b... |

5 |
Clustering NMF basis functions using shifted NMF for monaural sound source separation
- Jaiswal, FitzGerald, et al.
- 2011
(Show Context)
Citation Context ...Fkl−gkl 11.79 27.09 12.38 SNMFgkl−kl 10.83 26.04 11.43 SNMFgkl−gkl 10.98 25.81 11.64 Table 5.2: Mean SDR, SIR and SAR for separated sound sources using SNMF algorithm To compare the results listed in =-=[2]-=-, the highest scores of the quality measures for the separated sound sources for each mixture were hand-picked for the given range of frequency shifts such that SDR = max k SDRk, k ∈ K (5.26) where K ... |

5 | A Weighted K-means Algorithm applied to Brain Tissue Classification
- Abras, Ballarin
- 2005
(Show Context)
Citation Context ...ectral component by dropping out eigenvalues corresponding to signal energy (first coefficient) and higher frequency components. In the last step, a simple clustering (of sources) method like k-means =-=[46]-=- is applied on k mfcc components to find a permutation matrix that indicates which basis function belongs to which source in the given mixture. NMF Clustering As discussed in section 1.6.1, the DCT he... |

4 |
A contribution to the theory of short-time spectral analysis with non-uniform bandwidth filters
- Gambardella
- 1971
(Show Context)
Citation Context ... [k] for each value of k and k varies from 1, 2, . . .K which indexes the frequency bins in the Constant Q domain. The CQT was first proposed by JC Brown [45] inspired by many earlier works including =-=[10, 11, 12]-=-. Figure 1.4: Constant Q Spectrogram of an audio mixture signal Figure 1.4 shows the constant Q magnitude spectrogram of a test signal containing music signals of two pitched instruments. We will firs... |

4 |
The Mellin transforms and constant-Q spectral analysis
- Gambardella
- 1979
(Show Context)
Citation Context ... [k] for each value of k and k varies from 1, 2, . . .K which indexes the frequency bins in the Constant Q domain. The CQT was first proposed by JC Brown [45] inspired by many earlier works including =-=[10, 11, 12]-=-. Figure 1.4: Constant Q Spectrogram of an audio mixture signal Figure 1.4 shows the constant Q magnitude spectrogram of a test signal containing music signals of two pitched instruments. We will firs... |

4 |
User assisted source separation using non negative matrix factorisation
- Fitzgerald
- 2011
(Show Context)
Citation Context ...he timbre of the instrument to change with pitch. The details of the algorithm can be found in [20]. The second algorithm is the NMF based user-assisted source separation algorithm (UA)as detailed in =-=[18]-=-. In this algorithm, the user sings along with the given song and records the source to be separated. The recording is then factorised to obtained frequency basis functions using NMF. The resultant fr... |

4 | On the use of masking filters in sound source separation
- Fitzgerald, Jaiswal
- 2012
(Show Context)
Citation Context ...rithms. I was also involved in the discussion of the results obtained using PEASS toolbox [38]. However, the original idea and the derivation of the divergence based masks is done by Derry Fitzgerald =-=[39]-=-. 6.2 Divergence-Based Masks As mentioned previously in section 6.1, the generalised Wiener filtering approach is optimal in a least-square sense. However, in case of sound source separation algorithm... |

4 |
Upmixing from mono - a source separation approach
- FitzGerald
- 2011
(Show Context)
Citation Context ...of the separated sources is equal to the original mixture, the interference due to the other sources and errors in separation will often be masked and will be less prominent in the upmix stereo space =-=[40]-=-. In effect, the Wiener filter method allocates energy in a given time-frequency bin to the sources according to a least-square best fit. Thus, the masks obtained are optimal in the lease square sense... |

3 | Single channel Source Separation using short-time independent component analysis
- Barry, Fitzgerald, et al.
- 2005
(Show Context)
Citation Context ... We will discuss these techniques in the following sections. 1.5.1 Independent Component Analysis ICA has been successfully used to solve blind source separation problems in several application areas =-=[64, 67]-=-. A survey of ICA based algorithms is done in [63]. ICA separates an observation vector by finding a de-mixing matrix, so that the estimated variables, the elements of vector, are statistically indepe... |

2 |
NMF-based algorithms for user assisted sound source separation
- Fitzgerald
- 2012
(Show Context)
Citation Context ...e backing tracks were available separately. Further, these recordings were used to create mono mixtures by manually synchronising and mixing the tracks. The details of the 165 testset can be found in =-=[19]-=-. The third algorithm is the SNMF clustering algorithm (SNMFmask) discussed in chapter 2. The testset used for the first and the third algorithm are same as detailed in section 2.4. Here SNMF was used... |

2 | Shifted NMF Using an Efficient Constant Q Transform for - Jaiswal, FitzGerald, et al. - 2011 |

2 | Locally linear embedding for classification,” Pattern Recognition Group - Ridder, Duin - 2002 |

2 |
Towards an inverse Constant Q Transform,” 120th Audio engineering Society Convention
- FitzGerald, Cranitch, et al.
- 2006
(Show Context)
Citation Context ...esentations using CQT give a far better understanding of the musical signals and can be potentially used for the musical signal processing. An approximate inverse transform was proposed by Fitzgerald =-=[88]-=- with the assumption that the music signals can be sparsely represented in the linear frequency domain. However, the assumption does not hold good for all audio signals and the algorithm was extremely... |

1 |
Discrete-word recognition utilizing a word dictionary and phonological rules
- Itahashi, Makino, et al.
- 1973
(Show Context)
Citation Context ... there may be sources of interference like background noise, that can deteriorate the target speech signal. Here, source separation can be used to separate out the noise from the target speech signal =-=[6]-=-. Separation of source signals can be used to remove or change temporal properties (move or extend in time) of certain instruments or vocals to create remixes or karaoke applications. Further, these S... |

1 |
The Good Vibrations Problem,” 134th International Audio Engineering Society Convention
- Fitzgerald
- 2013
(Show Context)
Citation Context ...ngs. Recently, Fitzgerald has utilised his sound source separation technologies to create the first ever officially released stereo mixes of several songs of the Beach Boys, including Good Vibrations =-=[7]-=-. 16 1.2.2 The Clustering Problem In general, the separation of the individual sound sources from a given audio mixture is done using a time-frequency representation such as a spectrogram. A detailed ... |

1 |
Non-negative matrix factorisation for polyphonic music transcription
- Smaragdis, Brown
- 2003
(Show Context)
Citation Context ...ection 1.4. In recent years, many factorisation techniques, such as Non-negative Matrix Factorisation (NMF) [23] of magnitude spectrograms have been proposed to separate out sources from spectrograms =-=[24, 28, 25]-=-. NMF decomposes a spectrogram into frequency basis functions which typically corresponds to the notes and the chords in the given mixture. It is important to note that the number of notes present in ... |

1 |
Algorithms for non-negative matrix factorisation,” Advances in neural information processing system
- Lee, Seung
- 2000
(Show Context)
Citation Context ...such as a spectrogram. A detailed description of time-frequency representation is given in section 1.4. In recent years, many factorisation techniques, such as Non-negative Matrix Factorisation (NMF) =-=[23]-=- of magnitude spectrograms have been proposed to separate out sources from spectrograms [24, 28, 25]. NMF decomposes a spectrogram into frequency basis functions which typically corresponds to the not... |

1 | Real-time time frequency based blind source separation - Rickard, Balan, et al. - 2001 |

1 | Using Tensor Factorization model to Separate drums from Polyphonic music - FitzGerald, Cranitch, et al. - 2009 |

1 | An introduction to locally linear embedding,” 2001, Available from http://www.cs.toronto.edu/ roweis/lle - Saul, Roweis |

1 |
Clustering-based locally linear embedding
- Kanghua, Chunheng
- 2008
(Show Context)
Citation Context ...ures that the weights computed restores the intrinsic geometrical properties of the original data. Details on how the defined constraints help in optimising the calculation of weights can be found in =-=[80]-=-. Having obtained the weightsWij, the algorithm maps a low dimensional data point corresponding to each of the high dimensional data points in P . This is done by randomly initialising dataset Y that ... |

1 |
MATLAB toolbox for the CQT,” http://www.elec.qmul.ac.uk/people/anssik/cqt
- Schörkhuber, Klapuri
(Show Context)
Citation Context ... using this approach. A complete implementation of CQT can be found in [85]. In this chapter, we have used the MATLAB toolbox of the reference implementation of the above discussed method provided at =-=[83]-=- to obtain the Constant Q spectrogram. 114 3.7 Experimental Set-up The experimental setup for this experiment is the same as described in section 2.4. The same set of input mixtures were taken for eva... |

1 | Shifted NMF with Group Sparsity for clustering NMF basis functions - Jaiswal, Fitzgerald, et al. - 2012 |

1 |
Towards Shifted NMF for improved monaural separation
- Jaiswal, Fitzgerald, et al.
- 2013
(Show Context)
Citation Context ...tant Q Transform (CQT) of the frequency basis functions. Here, we argue that incorporating the CQT into the SNMF model can be used to better the separation quality of individual sources. An algorithm =-=[87]-=- is presented to estimate sound sources and will be shown to be an improvement to the existing techniques. The system model for the proposed algorithm is shown in figure 4.1. 120 Figure 4.1: Block Dia... |

1 |
Advanced Orchestra Library Set,” Available at http://eleceng.dit.ie/derryfitzgerald/index.php?uid=489&menu_id=52
- Siedlaczek
(Show Context)
Citation Context ... in Matlab for single channel audio mixtures. The SNMF model was tested for 25 monaural input mixtures of 2 instruments from a total of 15 different orchestral instruments taken from a sample library =-=[91]-=- including brass, woodwind and strings. The signals in the test set varied 96 in duration of roughly 4 to 8 seconds with a sampling frequency of 44.1kHz. To imitate real world melodies, the notes play... |