Results 1  10
of
728
A fast learning algorithm for deep belief nets
 Neural Computation
, 2006
"... We show how to use “complementary priors ” to eliminate the explaining away effects that make inference difficult in denselyconnected belief nets that have many hidden layers. Using complementary priors, we derive a fast, greedy algorithm that can learn deep, directed belief networks one layer at a ..."
Abstract

Cited by 970 (49 self)
 Add to MetaCart
We show how to use “complementary priors ” to eliminate the explaining away effects that make inference difficult in denselyconnected belief nets that have many hidden layers. Using complementary priors, we derive a fast, greedy algorithm that can learn deep, directed belief networks one layer
Greedy layerwise training of deep networks
, 2006
"... Complexity theory of circuits strongly suggests that deep architectures can be much more efficient (sometimes exponentially) than shallow architectures, in terms of computational elements required to represent some functions. Deep multilayer neural networks have many levels of nonlinearities allow ..."
Abstract

Cited by 394 (48 self)
 Add to MetaCart
introduced a greedy layerwise unsupervised learning algorithm for Deep Belief Networks (DBN), a generative model with many layers of hidden causal variables. In the context of the above optimization problem, we study this algorithm empirically and explore variants to better understand its success
Inducing Features of Random Fields
 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
, 1997
"... We present a technique for constructing random fields from a set of training samples. The learning paradigm builds increasingly complex fields by allowing potential functions, or features, that are supported by increasingly large subgraphs. Each feature has a weight that is trained by minimizing the ..."
Abstract

Cited by 670 (10 self)
 Add to MetaCart
the KullbackLeibler divergence between the model and the empirical distribution of the training data. A greedy algorithm determines how features are incrementally added to the field and an iterative scaling algorithm is used to estimate the optimal values of the weights. The random field models and techniques
Policy gradient methods for reinforcement learning with function approximation.
 In NIPS,
, 1999
"... Abstract Function approximation is essential to reinforcement learning, but the standard approach of approximating a value function and determining a policy from it has so far proven theoretically intractable. In this paper we explore an alternative approach in which the policy is explicitly repres ..."
Abstract

Cited by 439 (20 self)
 Add to MetaCart
Abstract Function approximation is essential to reinforcement learning, but the standard approach of approximating a value function and determining a policy from it has so far proven theoretically intractable. In this paper we explore an alternative approach in which the policy is explicitly
Learning to Order Things
 Journal of Artificial Intelligence Research
, 1998
"... There are many applications in which it is desirable to order rather than classify instances. Here we consider the problem of learning how to order, given feedback in the form of preference judgments, i.e., statements to the effect that one instance should be ranked ahead of another. We outline a ..."
Abstract

Cited by 409 (12 self)
 Add to MetaCart
that the problem of finding the ordering that agrees best with a preference function is NPcomplete, even under very restrictive assumptions. Nevertheless, we describe a simple greedy algorithm that is guaranteed to find a good approximation. We then discuss an online learning algorithm, based on the "
Using mutual information for selecting features in supervised neural net learning
 IEEE TRANSACTIONS ON NEURAL NETWORKS
, 1994
"... This paper investigates the application of the mutual infor“ criterion to evaluate a set of candidate features and to select an informative subset to be used as input data for a neural network classifier. Because the mutual information measures arbitrary dependencies between random variables, it is ..."
Abstract

Cited by 358 (1 self)
 Add to MetaCart
, the use of the mutual information for tasks characterized by high input dimensionality requires suitable approximations because of the prohibitive demands on computation and samples. An algorithm is proposed that is based on a “greedy” selection of the features and that takes both the mutual information
Sparse Greedy Matrix Approximation for Machine Learning
, 2000
"... In kernel based methods such as Regularization Networks large datasets pose signi cant problems since the number of basis functions required for an optimal solution equals the number of samples. We present a sparse greedy approximation technique to construct a compressed representation of the ..."
Abstract

Cited by 222 (10 self)
 Add to MetaCart
In kernel based methods such as Regularization Networks large datasets pose signi cant problems since the number of basis functions required for an optimal solution equals the number of samples. We present a sparse greedy approximation technique to construct a compressed representation
Approximation and learning by greedy algorithms
 Ann. Statist
, 2008
"... We consider the problem of approximating a given element f from a Hilbert space H by means of greedy algorithms and the application of such procedures to the regression problem in statistical learning theory. We improve on the existing theory of convergence rates for both the orthogonal greedy algor ..."
Abstract

Cited by 56 (9 self)
 Add to MetaCart
We consider the problem of approximating a given element f from a Hilbert space H by means of greedy algorithms and the application of such procedures to the regression problem in statistical learning theory. We improve on the existing theory of convergence rates for both the orthogonal greedy
Induction of Selective Bayesian Classifiers
 CONFERENCE ON UNCERTAINTY IN ARTIFICIAL INTELLIGENCE
, 1994
"... In this paper, we examine previous work on the naive Bayesian classifier and review its limitations, which include a sensitivity to correlated features. We respond to this problem by embedding the naive Bayesian induction scheme within an algorithm that carries out a greedy search through the space ..."
Abstract

Cited by 265 (7 self)
 Add to MetaCart
In this paper, we examine previous work on the naive Bayesian classifier and review its limitations, which include a sensitivity to correlated features. We respond to this problem by embedding the naive Bayesian induction scheme within an algorithm that carries out a greedy search through the space
Adaptive forwardbackward greedy algorithm for learning sparse representations
 IEEE Trans. Inform. Theory
, 2011
"... Consider linear prediction models where the target function is a sparse linear combination of a set of basis functions. We are interested in the problem of identifying those basis functions with nonzero coefficients and reconstructing the target function from noisy observations. Two heuristics that ..."
Abstract

Cited by 101 (9 self)
 Add to MetaCart
that are widely used in practice are forward and backward greedy algorithms. First, we show that neither idea is adequate. Second, we propose a novel combination that is based on the forward greedy algorithm but takes backward steps adaptively whenever beneficial. We prove strong theoretical results showing
Results 1  10
of
728