Results 1  10
of
100
An informationmaximization approach to blind separation and blind deconvolution
 NEURAL COMPUTATION
, 1995
"... ..."
Multitask Learning
 MACHINE LEARNING
, 1997
"... Multitask Learning is an approach to inductive transfer that improves generalization by using the domain information contained in the training signals of related tasks as an inductive bias. It does this by learning tasks in parallel while using a shared representation; what is learned for each task ..."
Abstract

Cited by 465 (7 self)
 Add to MetaCart
Multitask Learning is an approach to inductive transfer that improves generalization by using the domain information contained in the training signals of related tasks as an inductive bias. It does this by learning tasks in parallel while using a shared representation; what is learned for each task can help other tasks be learned better. This paper reviews prior work on MTL, presents new evidence that MTL in backprop nets discovers task relatedness without the need of supervisory signals, and presents new results for MTL with knearest neighbor and kernel regression. In this paper we demonstrate multitask learning in three domains. We explain how multitask learning works, and show that there are many opportunities for multitask learning in real domains. We present an algorithm and results for multitask learning with casebased methods like knearest neighbor and kernel regression, and sketch an algorithm for multitask learning in decision trees. Because multitask learning works, can be applied to many different kinds of domains, and can be used with different learning algorithms, we conjecture there will be many opportunities for its use on realworld problems.
The Helmholtz Machine
, 1995
"... Discovering the structure inherent in a set of patterns is a fundamental aim of statistical inference or learning. One fruitful approach is to build a parameterized stochastic generative model, independent draws from which are likely to produce the patterns. For all but the simplest generative model ..."
Abstract

Cited by 194 (22 self)
 Add to MetaCart
Discovering the structure inherent in a set of patterns is a fundamental aim of statistical inference or learning. One fruitful approach is to build a parameterized stochastic generative model, independent draws from which are likely to produce the patterns. For all but the simplest generative models, each pattern can be generated in exponentially many ways. It is thus intractable to adjust the parameters to maximize the probability of the observed patterns. We describe a way of finessing this combinatorial explosion by maximizing an easily computed lower bound on the probability of the observations. Our method can be viewed as a form of hierarchical selfsupervised learning that may relate to the function of bottomup and topdown cortical processing pathways.
Computational Models of Sensorimotor Integration
 SCIENCE
, 1997
"... The sensorimotor integration system can be viewed as an observer attempting to estimate its own state and the state of the environment by integrating multiple sources of information. We describe a computational framework capturing this notion, and some specific models of integration and adaptati ..."
Abstract

Cited by 178 (10 self)
 Add to MetaCart
The sensorimotor integration system can be viewed as an observer attempting to estimate its own state and the state of the environment by integrating multiple sources of information. We describe a computational framework capturing this notion, and some specific models of integration and adaptation that result from it. Psychophysical results from two sensorimotor systems, subserving the integration and adaptation of visuoauditory maps, and estimation of the state of the hand during arm movements, are presented and analyzed within this framework. These results suggest that: (1) Spatial information from visual and auditory systems is integrated so as to reduce the variance in localization. (2) The effects of a remapping in the relation between visual and auditory space can be predicted from a simple learning rule. (3) The temporal propagation of errors in estimating the hand's state is captured by a linear dynamic observer, providing evidence for the existence of an intern...
Learning with Labeled and Unlabeled Data
, 2001
"... In this paper, on the one hand, we aim to give a review on literature dealing with the problem of supervised learning aided by additional unlabeled data. On the other hand, being a part of the author's first year PhD report, the paper serves as a frame to bundle related work by the author as well as ..."
Abstract

Cited by 165 (3 self)
 Add to MetaCart
In this paper, on the one hand, we aim to give a review on literature dealing with the problem of supervised learning aided by additional unlabeled data. On the other hand, being a part of the author's first year PhD report, the paper serves as a frame to bundle related work by the author as well as numerous suggestions for potential future work. Therefore, this work contains more speculative and partly subjective material than the reader might expect from a literature review. We give a rigorous definition of the problem and relate it to supervised and unsupervised learning. The crucial role of prior knowledge is put forward, and we discuss the important notion of inputdependent regularization. We postulate a number of baseline methods, being algorithms or algorithmic schemes which can more or less straightforwardly be applied to the problem, without the need for genuinely new concepts. However, some of them might serve as basis for a genuine method. In the literature revi...
Slow Feature Analysis: Unsupervised Learning of Invariances
"... Invariant features of temporally varying signals are useful for analysis and classification. Slow feature analysis (SFA) is a new method for learning invariant or slowly varying features from a vectorial input signal. It is based on a nonlinear expansion of the input signal and application of princi ..."
Abstract

Cited by 146 (9 self)
 Add to MetaCart
Invariant features of temporally varying signals are useful for analysis and classification. Slow feature analysis (SFA) is a new method for learning invariant or slowly varying features from a vectorial input signal. It is based on a nonlinear expansion of the input signal and application of principal component analysis to this expanded signal and its time derivative. It is guaranteed to find the optimal solution within a family of functions directly and can learn to extract a large number of decorrelated features, which are ordered by their degree of invariance. SFA can be applied hierarchically to process highdimensional input signals and extract complex features. SFA is applied first to complex cell tuning properties based on simple cell output, including disparity and motion. Then more complicated inputoutput functions are learned by repeated application of SFA. Finally, a hierarchical network of SFA modules is presented as a simple model of the visual system. The same unstructured network can learn translation, size, rotation, contrast, or, to a lesser degree, illumination invariance for onedimensional objects, depending on only the training stimulus. Surprisingly, only a few training objects suffice to achieve good generalization to new objects. The generated representation is suitable for object recognition. Performance degrades if the network is trained to learn multiple invariances simultaneously.
Generative models for discovering sparse distributed representations
 Philosophical Transactions of the Royal Society B
, 1997
"... We describe a hierarchical, generative model that can be viewed as a nonlinear generalization of factor analysis and can be implemented in a neural network. The model uses bottomup, topdown and lateral connections to perform Bayesian perceptual inference correctly. Once perceptual inference has b ..."
Abstract

Cited by 120 (5 self)
 Add to MetaCart
We describe a hierarchical, generative model that can be viewed as a nonlinear generalization of factor analysis and can be implemented in a neural network. The model uses bottomup, topdown and lateral connections to perform Bayesian perceptual inference correctly. Once perceptual inference has been performed the connection strengths can be updated using a very simple learning rule that only requires locally available information. We demonstrate that the network learns to extract sparse, distributed, hierarchical representations.
Clustering Based on Conditional Distributions in an Auxiliary Space
 Neural Computation
, 2001
"... We study the problem of learning groups or categories that are local ..."
Abstract

Cited by 79 (22 self)
 Add to MetaCart
We study the problem of learning groups or categories that are local