Results 1  10
of
133
Efficient Coding of TimeRelative Structure Using Spikes
, 2005
"... Nonstationary acoustic features provide essential cues for many auditory tasks, including sound localization, auditory stream analysis, and speech recognition. These features can best be characterized relative to a precise point in time, such as the onset of a sound or the beginning of a harmonic pe ..."
Abstract

Cited by 53 (2 self)
 Add to MetaCart
Nonstationary acoustic features provide essential cues for many auditory tasks, including sound localization, auditory stream analysis, and speech recognition. These features can best be characterized relative to a precise point in time, such as the onset of a sound or the beginning of a harmonic periodicity. Extracting these types of features is a difficult problem. Part of the difficulty is that with standard blockbased signal analysis methods, the representation is sensitive to the arbitrary alignment of the blocks with respect to the signal. Convolutional techniques such as shiftinvariant transformations can reduce this sensitivity, but these do not yield a code that is efficient, that is, one that forms a nonredundant representation of the underlying structure. Here, we develop a nonblockbased method for signal representation that is both time relative and efficient. Signals are represented using a linear superposition of timeshiftable kernel functions, each with an associated magnitude and temporal position. Signal decomposition in this method is a nonlinear process that consists of optimizing the kernel function scaling coefficients and temporal positions to form an efficient, shiftinvariant representation. We demonstrate the properties of this representation for the purpose of characterizing structure in various types of nonstationary acoustic signals. The computational problem investigated here has direct relevance to the neural coding at the auditory nerve and the more general issue of how to encode complex, timevarying signals with a population of spiking neurons.
A maximum likelihood approach to singlechannel source separation
 Journal of Machine Learning Research
, 2003
"... This paper presents a new technique for achieving blind signal separation when given only a single channel recording. The main concept is based on exploiting a priori sets of timedomain basis functions learned by independent component analysis (ICA) to the separation of mixed source signals observe ..."
Abstract

Cited by 47 (0 self)
 Add to MetaCart
This paper presents a new technique for achieving blind signal separation when given only a single channel recording. The main concept is based on exploiting a priori sets of timedomain basis functions learned by independent component analysis (ICA) to the separation of mixed source signals observed in a single channel. The inherent time structure of sound sources is reflected in the ICA basis functions, which encode the sources in a statistically efficient manner. We derive a learning algorithm using a maximum likelihood approach given the observed single channel data and sets of basis functions. For each time point we infer the source parameters and their contribution factors. This inference is possible due to prior knowledge of the basis functions and the associated coefficient densities. A flexible model for density estimation allows accurate modeling of the observation and our experimental results exhibit a high level of separation performance for simulated mixtures as well as real environment recordings employing mixtures of two different sources.
Sparse and shiftinvariant representations of music
 IEEE Transactions on Speech and Audio Processing
, 2006
"... c○2006 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other w ..."
Abstract

Cited by 46 (9 self)
 Add to MetaCart
(Show Context)
c○2006 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the
Sequential optimal design of neurophysiology experiments
, 2008
"... Adaptively optimizing experiments has the potential to significantly reduce the number of trials needed to build parametric statistical models of neural systems. However, application of adaptive methods to neurophysiology has been limited by severe computational challenges. Since most neurons are hi ..."
Abstract

Cited by 42 (8 self)
 Add to MetaCart
(Show Context)
Adaptively optimizing experiments has the potential to significantly reduce the number of trials needed to build parametric statistical models of neural systems. However, application of adaptive methods to neurophysiology has been limited by severe computational challenges. Since most neurons are high dimensional systems, optimizing neurophysiology experiments requires computing highdimensional integrations and optimizations in real time. Here we present a fast algorithm for choosing the most informative stimulus by maximizing the mutual information between the data and the unknown parameters of a generalized linear model (GLM) which we want to fit to the neuron’s activity. We rely on important logconcavity and asymptotic normality properties of the posterior to facilitate the required computations. Our algorithm requires only lowrank matrix manipulations and a 2dimensional search to choose the optimal stimulus. The average running time of these operations scales quadratically with the dimensionality of the GLM, making realtime adaptive experimental design feasible even for highdimensional stimulus and parameter spaces. For example, we
Unsupervised analysis of polyphonic music by sparse coding
 IEEE Transactions on Neural Networks
, 2006
"... We investigate a datadriven approach to the analysis and transcription of polyphonic music, using a probabilistic model which is able to find sparse linear decompositions of a sequence of shortterm Fourier spectra. The resulting system represents each input spectrum as a weighted sum of a small nu ..."
Abstract

Cited by 41 (4 self)
 Add to MetaCart
We investigate a datadriven approach to the analysis and transcription of polyphonic music, using a probabilistic model which is able to find sparse linear decompositions of a sequence of shortterm Fourier spectra. The resulting system represents each input spectrum as a weighted sum of a small number of “atomic ” spectra chosen from a larger dictionary; this dictionary is, in turn, learned from the data in such a way as to represent the given training set in an (information theoretically) efficient way. When exposed to examples of polyphonic music, most of the dictionary elements take on the spectral characteristics of individual notes in the music, so that the sparse decomposition can be used to identify the notes in a polyphonic mixture. Our approach differs from other methods of polyphonic analysis based on spectral decomposition by combining all of the following: a) a formulation in terms of an explicitly given probabilistic model, in which the process estimating which notes are present corresponds naturally with the inference of latent variables in the model; b) a particularly simple generative model, motivated by very general considerations about efficient coding, that makes very few assumptions about the musical origins of the signals being processed; and c) the ability to learn a dictionary of atomic spectra (most of which converge to harmonic spectral profiles associated with specific notes) from polyphonic examples alone—no separate training on monophonic examples is required. Index Terms Learning overcomplete dictionaries, polyphonic music, probabilistic modeling, redundancy reduction, sparse factorial coding, unsupervised learning. ©2006 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any
Nonlinear Extraction of Independent Components of Natural Images Using Radial Gaussianization
, 2009
"... We consider the problem of efficiently encoding a signal by transforming it to a new representation whose components are statistically independent. A widely studied linear solution, known as independent component analysis (ICA), exists for the case when the signal is generated as a linear transforma ..."
Abstract

Cited by 30 (5 self)
 Add to MetaCart
(Show Context)
We consider the problem of efficiently encoding a signal by transforming it to a new representation whose components are statistically independent. A widely studied linear solution, known as independent component analysis (ICA), exists for the case when the signal is generated as a linear transformation of independent nongaussian sources. Here, we examine a complementary case, in which the source is nongaussian and elliptically symmetric. In this case, no invertible linear transform suffices to decompose the signal into independent components, but we show that a simple nonlinear transformation, which we call radial gaussianization (RG), is able to remove all dependencies. We then examine this methodology in the context of natural image statistics. We first show that distributions of spatially proximal bandpass filter responses are better described as elliptical than as linearly transformed independent sources. Consistent with this, we demonstrate that the reduction in dependency achieved by applying RG to either nearby pairs or blocks of bandpass filter responses is significantly greater than that achieved by ICA. Finally, we show that the RG transformation may be closely approximated by divisive normalization, which has been used to model the nonlinear response properties of visual neurons.
TOWARDS USING HIERARCHICAL POSTERIORS FOR FLEXIBLE AUTOMATIC SPEECH RECOGNITION SYSTEMS
"... Local state (or phone) posterior probabilities are often investigated as local classifiers (e.g., hybrid HMM/ANN systems) or as transformed acoustic features (e.g., “TANDEM”) towards improved speech recognition systems. In this paper, we present initial results towards boosting these approaches by ..."
Abstract

Cited by 23 (13 self)
 Add to MetaCart
Local state (or phone) posterior probabilities are often investigated as local classifiers (e.g., hybrid HMM/ANN systems) or as transformed acoustic features (e.g., “TANDEM”) towards improved speech recognition systems. In this paper, we present initial results towards boosting these approaches by improving the local state, phone, or word posterior estimates, using all possible acoustic information (as available in the whole utterance), as well as possible prior information (such as topological constraints). Furthermore, this approach results in a family of new HMM based systems, where only (local and global) posterior probabilities are used, while also providing a new, principled, approach towards a hierarchical use/integration of these posteriors, from the frame level up to the sentence level. Initial results on several speech (as well as other multimodal) tasks resulted in significant improvements. In this paper, we present recognition results on Numbers’95 and on a reduced vocabulary version (1000 words) of the DARPA Conversational Telephone Speechtotext (CTS) task.
LEARNING A BETTER REPRESENTATION OF SPEECH SOUND WAVES USING RESTRICTED BOLTZMANN MACHINES
"... State of the art speech recognition systems rely on preprocessed speech features such as Mel cepstrum or linear predictive coding coefficients that collapse high dimensional speech sound waves into low dimensional encodings. While these have been successfully applied in speech recognition systems, s ..."
Abstract

Cited by 21 (5 self)
 Add to MetaCart
(Show Context)
State of the art speech recognition systems rely on preprocessed speech features such as Mel cepstrum or linear predictive coding coefficients that collapse high dimensional speech sound waves into low dimensional encodings. While these have been successfully applied in speech recognition systems, such low dimensional encodings may lose some relevant information and express other information in a way that makes it difficult to use for discrimination. Higher dimensional encodings could both improve performance in recognition tasks, and also be applied to speech synthesis by better modeling the statistical structure of the sound waves. In this paper we present a novel approach for modeling speech sound waves using a Restricted Boltzmann machine (RBM) with a novel type of hidden variable and we report initial results demonstrating phoneme recognition performance better than the current stateoftheart for methods based on Mel cepstrum coefficients.
M.: Parametric Dictionary Design for Sparse Coding
 IEEE Trans. on Signal Processing
, 2009
"... Abstract—This paper introduces a new dictionary design method for sparse coding of a class of signals. It has been shown that one can sparsely approximate some natural signals using an overcomplete set of parametric functions, e.g. [1], [2]. A problem in using these parametric dictionaries is how to ..."
Abstract

Cited by 20 (5 self)
 Add to MetaCart
(Show Context)
Abstract—This paper introduces a new dictionary design method for sparse coding of a class of signals. It has been shown that one can sparsely approximate some natural signals using an overcomplete set of parametric functions, e.g. [1], [2]. A problem in using these parametric dictionaries is how to choose the parameters. In practice these parameters have been chosen by an expert or through a set of experiments. In the sparse approximation context, it has been shown that an incoherent dictionary is appropriate for the sparse approximation methods. In this paper we first characterize the dictionary design problem, subject to a constraint on the dictionary. Then we briefly explain that equiangular tight frames have minimum coherence. The complexity of the problem does not allow it to be solved exactly. We introduce a practical method to approximately solve it. Some experiments show the advantages one gets by using these dictionaries.
Sparse Spectrotemporal Coding of Sounds
, 2003
"... this paper, we demonstrate that these characteristics of the auditory system can also be understood in terms of sparse activity in response to natural input, which in this case is approximated by speech data. Representations that eciently code for speech data, adequately represented by spectrograms, ..."
Abstract

Cited by 17 (2 self)
 Add to MetaCart
this paper, we demonstrate that these characteristics of the auditory system can also be understood in terms of sparse activity in response to natural input, which in this case is approximated by speech data. Representations that eciently code for speech data, adequately represented by spectrograms, are also of obvious technical interest since the right type of sound representation might be a key to improved recognition of natural language, speech denoising, or speech generation