Results 1  10
of
97
Support vector machines for speech recognition
 Proceedings of the International Conference on Spoken Language Processing
, 1998
"... Statistical techniques based on hidden Markov Models (HMMs) with Gaussian emission densities have dominated signal processing and pattern recognition literature for the past 20 years. However, HMMs trained using maximum likelihood techniques suffer from an inability to learn discriminative informati ..."
Abstract

Cited by 114 (2 self)
 Add to MetaCart
Statistical techniques based on hidden Markov Models (HMMs) with Gaussian emission densities have dominated signal processing and pattern recognition literature for the past 20 years. However, HMMs trained using maximum likelihood techniques suffer from an inability to learn discriminative information and are prone to overfitting and overparameterization. Recent work in machine learning has focused on models, such as the support vector machine (SVM), that automatically control generalization and parameterization as part of the overall optimization process. In this paper, we show that SVMs provide a significant improvement in performance on a static pattern classification task based on the Deterding vowel data. We also describe an application of SVMs to large vocabulary speech recognition, and demonstrate an improvement in error rate on a continuous alphadigit task (OGI Aphadigits) and a large vocabulary conversational speech task (Switchboard). Issues related to the development and optimization of an SVM/HMM hybrid system are discussed.
Synergy, Redundancy, and Independence in Population Codes
 The Journal of Neuroscience
, 2003
"... A key issue in understanding the neural code for an ensemble of neurons is the nature and strength of correlations between neurons and how these correlations are related to the stimulus. The issue is complicated by the fact that there is not a single notion of independence or lack of correlation. We ..."
Abstract

Cited by 55 (0 self)
 Add to MetaCart
A key issue in understanding the neural code for an ensemble of neurons is the nature and strength of correlations between neurons and how these correlations are related to the stimulus. The issue is complicated by the fact that there is not a single notion of independence or lack of correlation. We distinguish three kinds: (1) activity independence; (2) conditional independence; and (3) information independence. Each notion is related to an information measure: the information between cells, the information between cells given the stimulus, and the synergy of cells about the stimulus, respectively. We show that these measures form an interrelated framework for evaluating contributions of signal and noise correlations to the joint information conveyed about the stimulus and that at least two of the three measures must be calculated to characterize a population code. This framework is compared with others recently proposed in the literature. In addition, we distinguish questions about how information is encoded by a population of neurons from how that information can be decoded. Although information theory is natural and powerful for questions of encoding, it is not sufficient for characterizing the process of decoding. Decoding fundamentally requires an error measure that quantifies the importance of the deviations of estimated stimuli from actual stimuli. Because there is no a priori choice of error measure, questions about decoding cannot be put on the same level of generality as for encoding.
Predictability, Complexity, and Learning
, 2001
"... We define predictive information Ipred(T) as the mutual information between the past and the future of a time series. Three qualitatively different behaviors are found in the limit of large observation times T: Ipred(T) can remain finite, grow logarithmically, or grow as a fractional power law. If t ..."
Abstract

Cited by 46 (2 self)
 Add to MetaCart
(Show Context)
We define predictive information Ipred(T) as the mutual information between the past and the future of a time series. Three qualitatively different behaviors are found in the limit of large observation times T: Ipred(T) can remain finite, grow logarithmically, or grow as a fractional power law. If the time series allows us to learn a model with a finite number of parameters, then Ipred(T) grows logarithmically with a coefficient that counts the dimensionality of the model space. In contrast, powerlaw growth is associated, for example, with the learning of infinite parameter (or nonparametric) models such as continuous functions with smoothness constraints. There are connections between the predictive information and measures of complexity that have been defined both in learning theory and the analysis of physical systems through statistical mechanics and dynamical systems theory. Furthermore, in the same way that entropy provides the unique measure of available information consistent with some simple and plausible conditions, we argue that the divergent part of Ipred(T) provides the unique measure for the complexity of dynamics underlying a time series. Finally, we discuss how these ideas may be useful in problems in physics, statistics, and biology.
On Decoding the Responses of a Population of Neurons from Short Time Windows
, 1999
"... The effectiveness of various stimulus identification (decoding) procedures for extracting the information carried by the responses of a population of neurons to a set of repeatedly presented stimuli is studied analytically, in the limit of short time windows. It is shown that in this limit, the enti ..."
Abstract

Cited by 46 (5 self)
 Add to MetaCart
The effectiveness of various stimulus identification (decoding) procedures for extracting the information carried by the responses of a population of neurons to a set of repeatedly presented stimuli is studied analytically, in the limit of short time windows. It is shown that in this limit, the entire information content of the responses can sometimes be decoded, and when this is not the case, the lost information is quantified. In particular, the mutual information extracted by taking into account only the most likely stimulus in each trial turns out to be, if not equal, much closer to the true value than that calculated from all the probabilities that each of the possible stimuli in the set was the actual one. The relation between the mutual information extracted by decoding and the percentage of correct stimulus decodings is also derived analytically in the same limit, showing that the metric content index can be estimated reliably from a few cells recorded from brief periods. Computer simulations as well as the activity of real neurons recorded in the primate hippocampus serve to confirm these results and illustrate the utility and limitations of the approach.
Optimal ShortTerm Population Coding: When Fisher Information Fails
, 2002
"... Efficient coding has been proposed as a first principle explaining neuronal response properties in the central nervous system. The shape of optimal codes, however, strongly depends on the natural limitations of the particular physical system. Here we investigate how optimal neuronal encoding strateg ..."
Abstract

Cited by 44 (10 self)
 Add to MetaCart
Efficient coding has been proposed as a first principle explaining neuronal response properties in the central nervous system. The shape of optimal codes, however, strongly depends on the natural limitations of the particular physical system. Here we investigate how optimal neuronal encoding strategies are influenced by the finite number of neurons N (place constraint), the limited decoding time window length T (time constraint), the maximum neuronal ring rate fmax (power constraint), and the maximal average rate h f imax (energy constraint). While Fisher information provides a general lower bound for the mean squared error of unbiased signal reconstruction, its use to characterize the coding precision is limited. Analyzing simple examples, we illustrate some typical pitfalls and thereby show that Fisher information provides a valid measure for the precision of a code only if the dynamic range ( fminT, fmaxT) is sufficiently large. In particular, we demonstrate that the optimal width of gaussian tuning curves depends on the available decoding time T. Within the broader class of unimodal tuning functions, it turns out that the shape of a Fisheroptimal coding scheme is not unique. We solve this ambiguity by taking the minimum mean square error into account, which leads to at tuning curves. The tuning width, however, remains to be determined by energy constraints rather than by the principle of efficient coding.
Narrow Versus Wide Tuning Curves: What’s Best for a Population Code?
, 1999
"... Neurophysiologists are often faced with the problem of evaluating the quality of a code for a sensory or motor variable, either to relate it to the performance of the animal in a simple discrimination task or to compare the codes at various stages along the neuronal pathway. One common belief that h ..."
Abstract

Cited by 40 (0 self)
 Add to MetaCart
Neurophysiologists are often faced with the problem of evaluating the quality of a code for a sensory or motor variable, either to relate it to the performance of the animal in a simple discrimination task or to compare the codes at various stages along the neuronal pathway. One common belief that has emerged from such studies is that sharpening of tuning curves improves the quality of the code, although only to a certain point; sharpening beyond that is believed to be harmful. We show that this belief relies on either problematic technical analysis or improper assumptions about the noise. We conclude that one cannot tell, in the general case, whether narrow tuning curves are better than wide ones; the answer depends critically on the covariance of the noise. The same conclusion applies to other manipulations of the tuning curve profiles such as gain increase.
Modelbased decoding, information estimation, and changepoint detection in multineuron spike trains
 UNDER REVIEW, NEURAL COMPUTATION
, 2007
"... Understanding how stimulus information is encoded in spike trains is a central problem in computational neuroscience. Decoding methods provide an important tool for addressing this problem, by allowing us to explicitly read out the information contained in spike responses. Here we introduce several ..."
Abstract

Cited by 37 (17 self)
 Add to MetaCart
Understanding how stimulus information is encoded in spike trains is a central problem in computational neuroscience. Decoding methods provide an important tool for addressing this problem, by allowing us to explicitly read out the information contained in spike responses. Here we introduce several decoding methods based on pointprocess neural encoding models (i.e. “forward ” models that predict spike responses to novel stimuli). These models have concave loglikelihood functions, allowing for efficient fitting via maximum likelihood. Moreover, we may use the likelihood of the observed spike trains under the model to perform optimal decoding. We present: (1) a tractable algorithm for computing the maximum a posteriori (MAP) estimate of the stimulus — the most probable stimulus to have generated the observed single or multiplespike train response, given some prior distribution over the stimulus; (2) a Gaussian approximation to the posterior distribution, which allows us to quantify the fidelity with which various stimulus features are encoded; (3) an efficient method for estimating the mutual information between the stimulus and the response; and (4) a framework for the detection of changepoint times (e.g. the time at which the stimulus undergoes a change in mean or variance), by marginalizing over the posterior distribution of stimuli. We show several examples illustrating the performance of these estimators with simulated data.
MultiDimensional Encoding Strategy of Spiking Neurons
 Neural Computation
, 2000
"... Neural responses in sensory systems are typically triggered by a multitude of stimulus features. Using information theory, we study the encoding accuracy of a population of stochastically spiking neurons characterized by different tuning widths for the different features. The optimal encoding strate ..."
Abstract

Cited by 31 (6 self)
 Add to MetaCart
(Show Context)
Neural responses in sensory systems are typically triggered by a multitude of stimulus features. Using information theory, we study the encoding accuracy of a population of stochastically spiking neurons characterized by different tuning widths for the different features. The optimal encoding strategy for representing one feature most accurately consists of (i) narrow tuning in the dimension to be encoded to increase the singleneuron Fisher information, and (ii) broad tuning in all other dimensions to increase the number of active neurons. Extremely narrow tuning without sufficient receptive field overlap will severely worsen the coding. This implies the existence of an optimal tuning width for the feature to be encoded. Empirically, only a subset of all stimulus features will normally be accessible. In this case, relative encoding errors can be calculated which yield a criterion for the function of a neural population based on the measured tuning curves. 1 Introduction The question...
Representational Accuracy of Stochastic Neural Populations
, 2001
"... this article that the choice of a variability model has a major, nontrivial impact on the encoding properties of the neural population. The immense variability of individual response parameters, such as tuning widths or correlation coef#cients, has also been neglected in most previous work. Although ..."
Abstract

Cited by 27 (5 self)
 Add to MetaCart
this article that the choice of a variability model has a major, nontrivial impact on the encoding properties of the neural population. The immense variability of individual response parameters, such as tuning widths or correlation coef#cients, has also been neglected in most previous work. Although these parameter variations are always found in empirical data, they were considered functionally insignificant, and hence theoretical studies have almost always assumed uniform parameters throughout the population. We will show here that this uniform case is unfavorable in the sense that the introduction of parameter variability improves the encoding performance
Fast Population Coding
 Neural Computation
, 2007
"... Uncertainty coming from the noise in its neurons and the illposed nature of many tasks plagues neural computations. Maybe surprisingly, many studies show that the brain manipulates these forms of uncertainty in a probabilistically consistent and normative manner, and there is now a rich theoretical ..."
Abstract

Cited by 27 (4 self)
 Add to MetaCart
(Show Context)
Uncertainty coming from the noise in its neurons and the illposed nature of many tasks plagues neural computations. Maybe surprisingly, many studies show that the brain manipulates these forms of uncertainty in a probabilistically consistent and normative manner, and there is now a rich theoretical literature on the capabilities of populations of neurons to implement computations in the face of uncertainty. However, one major facet of uncertainty has received comparatively little attention: time. In a dynamic, rapidly changing world, data are only temporarily relevant. Here, we analyze the computational consequences of encoding stimulus trajectories in populations of neurons. For the most obvious, simple, instantaneous encoder, the correlations induced by natural, smooth stimuli engender a decoder that requires access to information that is nonlocal both in time and across neurons. This formally amounts to a ruinous representation. We show that there is an alternative encoder that is computationally and representationally powerful in which each spike contributes independent information; it is independently decodable, in other words. We suggest this as an appropriate foundation for understanding timevarying population codes. Furthermore, we show how adaptation to