Results 1  10
of
94
Extracting and Composing Robust Features with Denoising Autoencoders
, 2008
"... Previous work has shown that the difficulties in learning deep generative or discriminative models can be overcome by an initial unsupervised learning step that maps inputs to useful intermediate representations. We introduce and motivate a new training principle for unsupervised learning of a repre ..."
Abstract

Cited by 77 (15 self)
 Add to MetaCart
Previous work has shown that the difficulties in learning deep generative or discriminative models can be overcome by an initial unsupervised learning step that maps inputs to useful intermediate representations. We introduce and motivate a new training principle for unsupervised learning of a representation based on the idea of making the learned representations robust to partial corruption of the input pattern. This approach can be used to train autoencoders, and these denoising autoencoders can be stacked to initialize deep architectures. The algorithm can be motivated from a manifold learning and information theoretic perspective or from a generative model perspective. Comparative experiments clearly show the surprising advantage of corrupting the input of autoencoders on a pattern classification benchmark suite.
Bootstrapping with Noise: An Effective Regularization Technique
 Connection Science
, 1996
"... Bootstrap samples with noise are shown to be an effective smoothness and capacity control technique for training feedforward networks and for other statistical methods such as generalized additive models. It is shown that noisy bootstrap performs best in conjunction with weight decay regularization ..."
Abstract

Cited by 60 (16 self)
 Add to MetaCart
Bootstrap samples with noise are shown to be an effective smoothness and capacity control technique for training feedforward networks and for other statistical methods such as generalized additive models. It is shown that noisy bootstrap performs best in conjunction with weight decay regularization and ensemble averaging. The twospiral problem, a highly nonlinear noisefree data, is used to demonstrate these findings. The combination of noisy bootstrap and ensemble averaging is also shown useful for generalized additive modeling, and is also demonstrated on the well known Cleveland Heart Data [7]. Keywords: Noise Injection, Combining Estimators, Pattern Classification, Two Spiral Problem Clinical Data Analysis. 1 Introduction The bootstrap technique has become one of the major tools for producing empirical confidence intervals of estimated parameters or predictors [8]. One way to view bootstrap is as a method to simulate noise inherent in the data, and thus, increase effectively t...
Computing the meanings of words in reading: cooperative division of labor between visual and phonological processes
 PSYCHOLOGICAL REVIEW
, 2003
"... ..."
Incorporating Prior Information in Machine Learning by Creating Virtual Examples
 Proceedings of the IEEE
, 1998
"... One of the key problems in supervised learning is the insufficient size of the training set. The natural way for an intelligent learner to counter this problem and successfully generalize is to exploit prior information that may be available about the domain or that can be learned from prototypical ..."
Abstract

Cited by 42 (2 self)
 Add to MetaCart
One of the key problems in supervised learning is the insufficient size of the training set. The natural way for an intelligent learner to counter this problem and successfully generalize is to exploit prior information that may be available about the domain or that can be learned from prototypical examples. We discuss the notion of using prior knowledge by creating virtual examples and thereby expanding the effective training set size. We show that in some contexts, this idea is mathematically equivalent to incorporating the prior knowledge as a regularizer, suggesting that the strategy is wellmotivated. The process of creating virtual examples in real world pattern recognition tasks is highly nontrivial. We provide demonstrative examples from object recognition and speech recognition to illustrate the idea. 1 Learning from Examples Recently, machine learning techniques have become increasingly popular as an alternative to knowledgebased approaches to artificial intelligence pro...
Diversity in Neural Network Ensembles
, 2004
"... We study the issue of error diversity in ensembles of neural networks. In ensembles of regression estimators, the measurement of diversity can be formalised as the BiasVarianceCovariance decomposition. In ensembles of classifiers, there is no neat theory in the literature to date. Our objective is ..."
Abstract

Cited by 37 (4 self)
 Add to MetaCart
We study the issue of error diversity in ensembles of neural networks. In ensembles of regression estimators, the measurement of diversity can be formalised as the BiasVarianceCovariance decomposition. In ensembles of classifiers, there is no neat theory in the literature to date. Our objective is to understand how to precisely define, measure, and create diverse errors for both cases. As a focal point we study one algorithm, Negative Correlation (NC) Learning which claimed, and showed empirical evidence, to enforce useful error diversity, creating neural network ensembles with very competitive performance on both classification and regression problems. With the lack of a solid understanding of its dynamics, we engage in a theoretical and empirical investigation. In an initial empirical stage, we demonstrate the application of an evolutionary search algorithm to locate the optimal value for λ, the configurable parameter in NC. We observe the behaviour of the optimal parameter under different ensemble architectures and datasets; we note a high degree of unpredictability, and embark on a more formal investigation. During the theoretical investigations, we find that NC succeeds due to exploiting the
Second Order Cone Programming Approaches for Handling Missing and Uncertain Data
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2006
"... We propose a novel second order cone programming formulation for designing robust classifiers which can handle uncertainty in observations. Similar formulations are also derived for designing regression functions which are robust to uncertainties in the regression setting. The proposed formulations ..."
Abstract

Cited by 34 (9 self)
 Add to MetaCart
We propose a novel second order cone programming formulation for designing robust classifiers which can handle uncertainty in observations. Similar formulations are also derived for designing regression functions which are robust to uncertainties in the regression setting. The proposed formulations are independent of the underlying distribution, requiring only the existence of second order moments. These formulations are then specialized to the case of missing values in observations for both classification and regression problems. Experiments show that the proposed formulations outperform imputation.
Regularized Principal Manifolds
 In Computational Learning Theory: 4th European Conference
, 2001
"... Many settings of unsupervised learning can be viewed as quantization problems  the minimization ..."
Abstract

Cited by 32 (4 self)
 Add to MetaCart
Many settings of unsupervised learning can be viewed as quantization problems  the minimization
BYY Harmony Learning, Independent State Space, and Generalized APT Financial Analyses
, 2001
"... First, the relationship between factor analysis (FA) and the wellknown arbitrage pricing theory (APT) for financial market has been discussed comparatively, with a number of tobeimproved problems listed. An overview has been made from a unified perspective on the related studies in the literature ..."
Abstract

Cited by 23 (20 self)
 Add to MetaCart
First, the relationship between factor analysis (FA) and the wellknown arbitrage pricing theory (APT) for financial market has been discussed comparatively, with a number of tobeimproved problems listed. An overview has been made from a unified perspective on the related studies in the literatures of statistics, control theory, signal processing, and neural networks. Second, we introduce the fundamentals of the Bayesian Ying Yang (BYY) system and the harmony learning principle which has been systematically developed in past several years as a unified statistical framework for parameter learning, regularization and model selection, in both nontemporal and temporal stochastic environments. We further show that a specific case of the framework, called BYY independent state space (ISS) system, provides a general guide for systematically tackling various FA related learning tasks and the above tobeimproved problems for the APT analyses. Third, on various specific cases of the BYY ISS s...
Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion
, 2010
"... ..."
From data distributions to regularization in invariant learning
 Neural Computation
, 1995
"... Ideally pattern recognition machines provide constant output when the inputs are transformed under a group G of desired invariances. These invariances can be achieved by enhancing the training data to include examples of inputs transformed by elements of G, while leaving the corresponding targets un ..."
Abstract

Cited by 22 (0 self)
 Add to MetaCart
Ideally pattern recognition machines provide constant output when the inputs are transformed under a group G of desired invariances. These invariances can be achieved by enhancing the training data to include examples of inputs transformed by elements of G, while leaving the corresponding targets unchanged. Alternatively the cost function for training can include a regularization term that penalizes changes in the output when the input is transformed under the group. This paper relates the two approaches, showing precisely the sense in which the regularized cost function approximates the result of adding transformed (or distorted) examples to the training data. The cost function for the enhanced training set is equivalent tothe sum of the original cost function plus a regularizer. For unbiased models, the regularizer reduces to the intuitively obvious choice {