Results 1  10
of
179,474
The Nature of Statistical Learning Theory
, 1999
"... Statistical learning theory was introduced in the late 1960’s. Until the 1990’s it was a purely theoretical analysis of the problem of function estimation from a given collection of data. In the middle of the 1990’s new types of learning algorithms (called support vector machines) based on the deve ..."
Abstract

Cited by 13236 (32 self)
 Add to MetaCart
Statistical learning theory was introduced in the late 1960’s. Until the 1990’s it was a purely theoretical analysis of the problem of function estimation from a given collection of data. In the middle of the 1990’s new types of learning algorithms (called support vector machines) based
Maximum likelihood from incomplete data via the EM algorithm
 JOURNAL OF THE ROYAL STATISTICAL SOCIETY, SERIES B
, 1977
"... A broadly applicable algorithm for computing maximum likelihood estimates from incomplete data is presented at various levels of generality. Theory showing the monotone behaviour of the likelihood and convergence of the algorithm is derived. Many examples are sketched, including missing value situat ..."
Abstract

Cited by 11972 (17 self)
 Add to MetaCart
A broadly applicable algorithm for computing maximum likelihood estimates from incomplete data is presented at various levels of generality. Theory showing the monotone behaviour of the likelihood and convergence of the algorithm is derived. Many examples are sketched, including missing value
Bias and Variance in Value Function Estimation
 Proc. of the 21st International Conference on Machine Learning
, 2004
"... We consider the bias and variance of value function estimation that are caused by using an empirical model instead of the true model. We analyze these bias and variance for Markov processes from a classical (frequentist) statistical point of view, and in a Bayesian setting. Using a second orde ..."
Abstract
 Add to MetaCart
We consider the bias and variance of value function estimation that are caused by using an empirical model instead of the true model. We analyze these bias and variance for Markov processes from a classical (frequentist) statistical point of view, and in a Bayesian setting. Using a second
Bias and variance approximation in value function estimates
 Management Science
, 2007
"... We consider a finite state, finite action, infinite horizon, discounted reward Markov Decision Process and study the bias and variance in the value function estimates that result from empirical estimates of the model parameters. We provide closedform approximations for the bias and variance, which ..."
Abstract

Cited by 30 (10 self)
 Add to MetaCart
We consider a finite state, finite action, infinite horizon, discounted reward Markov Decision Process and study the bias and variance in the value function estimates that result from empirical estimates of the model parameters. We provide closedform approximations for the bias and variance, which
Comparing valuefunction estimation algorithms in undiscounted problems
, 1999
"... \Ve compare sca.ling properties of severa.l valuefunction estimation algorithms. In particular, we prove that Qlearning can scale exponentially slowly with the number of states. \Ve identify the reasons of the slow convergence and show that both TD(..\) and learning with a fixed learningrate enjo ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
\Ve compare sca.ling properties of severa.l valuefunction estimation algorithms. In particular, we prove that Qlearning can scale exponentially slowly with the number of states. \Ve identify the reasons of the slow convergence and show that both TD(..\) and learning with a fixed learning
Experimental Estimates of Education Production Functions
 Princeton University, Industrial Relations Section Working Paper No. 379
, 1997
"... This paper analyzes data on 11,600 students and their teachers who were randomly assigned to different size classes from kindergarten through third grade. Statistical methods are used to adjust for nonrandom attrition and transitions between classes. The main conclusions are (1) on average, performa ..."
Abstract

Cited by 529 (19 self)
 Add to MetaCart
This paper analyzes data on 11,600 students and their teachers who were randomly assigned to different size classes from kindergarten through third grade. Statistical methods are used to adjust for nonrandom attrition and transitions between classes. The main conclusions are (1) on average, performance on standardized tests increases by four percentile points the �rst year students attend small classes; (2) the test score advantage of students in small classes expands by about one percentile point per year in subsequent years; (3) teacher aides and measured teacher characteristics have little effect; (4) class size has a larger effect for minority students and those on free lunch; (5) Hawthorne effects were unlikely. I.
Nonparametric estimation of average treatment effects under exogeneity: a review
 REVIEW OF ECONOMICS AND STATISTICS
, 2004
"... Recently there has been a surge in econometric work focusing on estimating average treatment effects under various sets of assumptions. One strand of this literature has developed methods for estimating average treatment effects for a binary treatment under assumptions variously described as exogen ..."
Abstract

Cited by 630 (25 self)
 Add to MetaCart
considered estimation and inference for average treatment effects under weaker assumptions than typical of the earlier literature by avoiding distributional and functionalform assumptions. Various methods of semiparametric estimation have been proposed, including estimating the unknown regression functions
Missing value estimation methods for DNA microarrays
, 2001
"... Motivation: Gene expression microarray experiments can generate data sets with multiple missing expression values. Unfortunately, many algorithms for gene expression analysis require a complete matrix of gene array values as input. For example, methods such as hierarchical clustering and Kmeans clu ..."
Abstract

Cited by 477 (24 self)
 Add to MetaCart
Motivation: Gene expression microarray experiments can generate data sets with multiple missing expression values. Unfortunately, many algorithms for gene expression analysis require a complete matrix of gene array values as input. For example, methods such as hierarchical clustering and K
Smooth minimization of nonsmooth functions
 Math. Programming
, 2005
"... In this paper we propose a new approach for constructing efficient schemes for nonsmooth convex optimization. It is based on a special smoothing technique, which can be applied to the functions with explicit maxstructure. Our approach can be considered as an alternative to blackbox minimization. F ..."
Abstract

Cited by 523 (1 self)
 Add to MetaCart
In this paper we propose a new approach for constructing efficient schemes for nonsmooth convex optimization. It is based on a special smoothing technique, which can be applied to the functions with explicit maxstructure. Our approach can be considered as an alternative to blackbox minimization
Results 1  10
of
179,474