Results 1 - 10
of
19
A SURVEY OF CONVOLUTIVE BLIND SOURCE SEPARATION METHODS
- SPRINGER HANDBOOK ON SPEECH PROCESSING AND SPEECH COMMUNICATION
"... In this chapter, we provide an overview of existing algorithms for blind source separation of convolutive audio mixtures. We provide a taxonomy, wherein many of the existing algorithms can be organized, and we present published results from those algorithms that have been applied to real-world audio ..."
Abstract
-
Cited by 14 (0 self)
- Add to MetaCart
In this chapter, we provide an overview of existing algorithms for blind source separation of convolutive audio mixtures. We provide a taxonomy, wherein many of the existing algorithms can be organized, and we present published results from those algorithms that have been applied to real-world audio separation tasks.
Two-microphone Separation of Speech Mixtures
"... Abstract—Separation of speech mixtures, often referred to as the cocktail party problem, has been studied for decades. In many source separation tasks, the separation method is limited by the assumption of at least as many sensors as sources. Further, many methods require that the number of signals ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
Abstract—Separation of speech mixtures, often referred to as the cocktail party problem, has been studied for decades. In many source separation tasks, the separation method is limited by the assumption of at least as many sensors as sources. Further, many methods require that the number of signals within the recorded mixtures be known in advance. In many real-world applications these limitations are too restrictive. We propose a novel method for underdetermined blind source separation using an instantaneous mixing model which assumes closely spaced microphones. Two source separation techniques have been combined, independent component analysis (ICA) and binary timefrequency masking. By estimating binary masks from the outputs of an ICA algorithm, it is possible in an iterative way to extract basis speech signals from a convolutive mixture. The basis signals are afterwards improved by grouping similar signals. Using two microphones we can separate in principle an arbitrary number of mixed speech signals. We show separation results for mixtures with as many as seven speech signals under instantaneous conditions. We also show that the proposed method is applicable to segregate speech signals under reverberant conditions, and we compare our proposed method to another state-of-the-art algorithm. The number of source signals is not assumed to be known in advance and it is possible to maintain the extracted signals as stereo signals. Index Terms—Underdetermined speech separation, ICA, timefrequency masking, ideal binary mask.
Modeling Perceptual Similarity of Audio Signals for Blind Source Separation Evaluation
"... Abstract. Existing perceptual models of audio quality, such as PEAQ, were designed to measure audio codec performance and are not well suited to evaluation of audio source separation algorithms. The relationship of many other signal quality measures to human perception is not well established. We co ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Abstract. Existing perceptual models of audio quality, such as PEAQ, were designed to measure audio codec performance and are not well suited to evaluation of audio source separation algorithms. The relationship of many other signal quality measures to human perception is not well established. We collected subjective human assessments of distortions encountered when separating audio sources from mixtures of two to four harmonic sources. We then correlated these assessments to 18 machine-measurable parameters. Results show a strong correlation (r=0.96) between a linear combination of a subset of four of these parameters and mean human assessments. This correlation is stronger than that between human assessments and several measures currently in use.
Active Source Estimation for Improved Source Separation
- Northwestern University, EECS Dept
, 2006
"... Recent work in blind source separation applied to anechoic mixtures of speech allows for reconstruction of sources that rarely overlap in a time-frequency representation. While the assumption that speech mixtures do not overlap significantly in time-frequency is reasonable, music mixtures rarely mee ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
Recent work in blind source separation applied to anechoic mixtures of speech allows for reconstruction of sources that rarely overlap in a time-frequency representation. While the assumption that speech mixtures do not overlap significantly in time-frequency is reasonable, music mixtures rarely meet this constraint, requiring new approaches. We introduce a method that uses spatial cues from anechoic, stereo music recordings and assumptions regarding the structure of musical source signals to effectively separate mixtures of tonal music. We discuss existing techniques to create partial source signal estimates from regions of the mixture where source signals do not overlap significantly. We use these partial signals within a new demixing framework, in which we estimate harmonic masks for each source, allowing the determination of the number of active sources in important time-frequency frames of the mixture. We then propose a method for distributing energy from time-frequency frames of the mixture to multiple source signals. This allows dealing with mixtures that contain time-frequency frames in which multiple harmonic sources are active without requiring knowledge of source characteristics. *An abbreviated version of this paper was submitted on December 1 st, 2006, to the EURASIP
NONNEGATIVE MATRIX FACTORIZATION AND SPATIAL COVARIANCE MODEL FOR UNDER-DETERMINED REVERBERANT AUDIO SOURCE SEPARATION
"... We address the problem of blind audio source separation in the under-determined and convolutive case. The contribution of each source to the mixture channels in the time-frequency domain is modeled by a zero-mean Gaussian random vector with a full rank covariance matrix composed of two terms: a vari ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
We address the problem of blind audio source separation in the under-determined and convolutive case. The contribution of each source to the mixture channels in the time-frequency domain is modeled by a zero-mean Gaussian random vector with a full rank covariance matrix composed of two terms: a variance which represents the spectral properties of the source and which is modeled by a nonnegative matrix factorization (NMF) model and another full rank covariance matrix which encodes the spatial properties of the source contribution in the mixture. We address the estimation of these parameters by maximizing the likelihood of the mixture using an expectation-maximization (EM) algorithm. Theoretical propositions are corroborated by experimental studies on stereo reverberant music mixtures. 1.
Underdetermined Sparse Blind Source Separation with Delays
- in Workshop on Signal Processing with Adaptative Sparse Structured Representation (SPARS05
, 2005
"... In this paper, we address the problem of under-determined blind source separation (BSS), mainly for speech signals, in an anechoic environment. Our approach is based on exploiting the sparsity of Gabor expansions of speech signals. For parameter estimation, we adopt the clustering approach of DUET [ ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
In this paper, we address the problem of under-determined blind source separation (BSS), mainly for speech signals, in an anechoic environment. Our approach is based on exploiting the sparsity of Gabor expansions of speech signals. For parameter estimation, we adopt the clustering approach of DUET [19]. However, unlike in the case of DUET where only two mixtures are used, we use all available mixtures to get more precise estimates. For source extraction, we propose two methods, both of which are based on constrained optimization. Our first method uses a constrained # (0 1) approach, and our second method uses a constrained "modified" # minimization approach. In both cases, our algorithms use all available mixtures, and are suited to the anechoic mixing scenario. Experiments indicate that the performances of the proposed algorithms are superior compared to DUET in many different settings.
3D-Audio Matting, Post-editing and Re-rendering from Field Recordings
"... Figure 1: Left: We use multiple arbitrarily positioned microphones (circled in yellow) to simultaneously record real-world auditory environments. Middle: We analyze the recordings to extract the positions of various sound components through time. Right: This high-level representation allows for post ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Figure 1: Left: We use multiple arbitrarily positioned microphones (circled in yellow) to simultaneously record real-world auditory environments. Middle: We analyze the recordings to extract the positions of various sound components through time. Right: This high-level representation allows for post-editing and re-rendering the acquired soundscape within generic 3D-audio rendering architectures. We present a novel approach to real-time spatial rendering of realistic auditory environments and sound sources recorded live, in the field. Using a set of standard microphones distributed throughout a real-world environment we record the sound-field simultaneously from several locations. After spatial calibration, we segment from this set of recordings a number of auditory components, together with their location. We compare existing time-delay of arrival estimations techniques between pairs of widely-spaced microphones and introduce a novel efficient hierarchical localization algorithm. Using the high-level representation thus obtained, we can edit and re-render the acquired auditory scene over a variety of listening setups. In particular, we can move or alter the different sound sources and arbitrarily choose the listening position. We can also composite elements of different scenes together in a spatially consistent way. Our approach provides efficient rendering of complex soundscapes which would be challenging to model using discrete point sources and traditional virtual acoustics techniques. We demonstrate a wide range of possible applications for games, virtual and augmented reality and audio-visual post-production.
Underdetermined Anechoic Blind Source Separation via ℓ q-Basis-Pursuit with q < 1
"... In this paper, we address the problem of under-determined Blind Source Separation (BSS) of anechoic speech mixtures. We propose a demixing algorithm that exploits the sparsity of certain time-frequency expansions of speech signals. Our algorithm merges ℓ q-basis-pursuit with ideas based on the degen ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
In this paper, we address the problem of under-determined Blind Source Separation (BSS) of anechoic speech mixtures. We propose a demixing algorithm that exploits the sparsity of certain time-frequency expansions of speech signals. Our algorithm merges ℓ q-basis-pursuit with ideas based on the degenerate unmixing estimation technique (DUET) [1]. There are two main novel components to our approach: (1) Our algorithm makes use of all available mixtures in the anechoic scenario where both attenuations and arrival delays between sensors are considered, without imposing any structure on the microphone positions. (2) We illustrate experimentally that the separation performance is improved when one uses l q-basis-pursuit with q < 1 compared to the q = 1 case. Moreover, we provide a probabilistic interpretation of the proposed algorithm that explains why a choice of 0.1 ≤ q ≤ 0.4 is appropriate in the case of speech. Experimental results on both simulated and real data demonstrate significant gains in separation performance when compared to other state-of-the-art BSS algorithms reported in the literature. A preliminary version of this work can be found in [2].
Springer Handbook on Speech Processing and Speech Communication 1 A SURVEY OF CONVOLUTIVE BLIND SOURCE SEPARATION METHODS
"... In this chapter, we provide an overview of existing algorithms for blind source separation of convolutive audio mixtures. We provide a taxonomy, wherein many of the existing algorithms can be organized, and we present published results from those algorithms that have been applied to real-world audio ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
In this chapter, we provide an overview of existing algorithms for blind source separation of convolutive audio mixtures. We provide a taxonomy, wherein many of the existing algorithms can be organized, and we present published results from those algorithms that have been applied to real-world audio separation tasks. 1.
Finding Intensities of Individual Notes in Piano Music
"... Abstract. Timing and dynamics are two important factors in music performance. Research on dynamics-related issues is comparatively rare because data on dynamics is difficult to obtain from music performance. Nevertheless, research of this kind is vital to the understanding of music performance and h ..."
Abstract
- Add to MetaCart
Abstract. Timing and dynamics are two important factors in music performance. Research on dynamics-related issues is comparatively rare because data on dynamics is difficult to obtain from music performance. Nevertheless, research of this kind is vital to the understanding of music performance and here we are investigating ways to identify the intensities of individual piano notes in a mixture of simultaneous notes. The approach to this problem is divided into two stages. The first stage, Stage 1, consists of 2 steps. The first step (Stage 1a) is to obtain the magnitude of the fundamental frequency of an individual note to determine its intensity out of a mixture of simultaneous notes, on condition that the corresponding pitches of which are given. Two simultaneous notes one or two octaves apart are also included in this study. The second step (Stage 1b) consists of generating, artificially, a mixture of notes from a recorded single-note database, subsequently referred to as “estimated mixture”. The time lag between individual notes in the estimated mixture is adjusted, so that the residual between which and the input comes to a minimum. If the ratio of the residual power to the signal power is greater than a threshold, neighboring intensities will be searched in Stage 2. The proposed method is verified with real data and the result is satisfactory. 1

