Results 1  10
of
124
Assessment and Propagation of Model Uncertainty
, 1995
"... this paper I discuss a Bayesian approach to solving this problem that has long been available in principle but is only now becoming routinely feasible, by virtue of recent computational advances, and examine its implementation in examples that involve forecasting the price of oil and estimating the ..."
Abstract

Cited by 108 (0 self)
 Add to MetaCart
this paper I discuss a Bayesian approach to solving this problem that has long been available in principle but is only now becoming routinely feasible, by virtue of recent computational advances, and examine its implementation in examples that involve forecasting the price of oil and estimating the chance of catastrophic failure of the U.S. Space Shuttle.
Prediction risk and architecture selection for neural networks
, 1994
"... Abstract. We describe two important sets of tools for neural network modeling: prediction risk estimation and network architecture selection. Prediction risk is defined as the expected performance of an estimator in predicting new observations. Estimated prediction risk can be used both for estimati ..."
Abstract

Cited by 75 (2 self)
 Add to MetaCart
Abstract. We describe two important sets of tools for neural network modeling: prediction risk estimation and network architecture selection. Prediction risk is defined as the expected performance of an estimator in predicting new observations. Estimated prediction risk can be used both for estimating the quality of model predictions and for model selection. Prediction risk estimation and model selection are especially important for problems with limited data. Techniques for estimating prediction risk include data resampling algorithms such as nonlinear cross–validation (NCV) and algebraic formulae such as the predicted squared error (PSE) and generalized prediction error (GPE). We show that exhaustive search over the space of network architectures is computationally infeasible even for networks of modest size. This motivates the use of heuristic strategies that dramatically reduce the search complexity. These strategies employ directed search algorithms, such as selecting the number of nodes via sequential network construction (SNC) and pruning inputs and weights via sensitivity based pruning (SBP) and optimal brain damage (OBD) respectively.
The psychometric function: I. Fitting, sampling, and goodness of fit
, 2001
"... The psychometric function relates an observer’s performance to an independent variable, usually some physical quantity of a stimulus in a psychophysical task. This paper, together with its companion paper (Wichmann & Hill, 2001), describes an integrated approach to (1) fitting psychometric functions ..."
Abstract

Cited by 70 (10 self)
 Add to MetaCart
The psychometric function relates an observer’s performance to an independent variable, usually some physical quantity of a stimulus in a psychophysical task. This paper, together with its companion paper (Wichmann & Hill, 2001), describes an integrated approach to (1) fitting psychometric functions, (2) assessing the goodness of fit, and (3) providing confidence intervals for the function’s parameters and other estimates derived from them, for the purposes of hypothesis testing. The present paper deals with the first two topics, describing a constrained maximumlikelihood method of parameter estimation and developing several goodnessoffit tests. Using Monte Carlo simulations, we deal with two specific difficulties that arise when fitting functions to psychophysical data. First, we note that human observers are prone to stimulusindependent errors (or lapses). We show that failure to account for this can lead to serious biases in estimates of the psychometric function’s parameters and illustrate how the problem may be overcome. Second, we note that psychophysical data sets are usually rather small by the standards required by most of the commonly applied statistical tests. We demonstrate the potential errors of applying traditional c 2 methods to psychophysical data and advocate use of Monte Carlo resampling techniques that do not rely on asymptotic theory. We have made available the software to implement our methods. The performance of an observer on a psychophysical
A Nonparametric Statistical Comparison of Principal Component and Linear Discriminant Subspaces for Face Recognition
 In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
, 2001
"... The FERET evaluation compared recognition rates for different semiautomated and automated face recognition algorithms. We extend FERET by considering when differences in recognition rates are statistically distinguishable subject to changes in test imagery. Nearest Neighbor classifiers using princi ..."
Abstract

Cited by 57 (9 self)
 Add to MetaCart
The FERET evaluation compared recognition rates for different semiautomated and automated face recognition algorithms. We extend FERET by considering when differences in recognition rates are statistically distinguishable subject to changes in test imagery. Nearest Neighbor classifiers using principal component and linear discriminant subspaces are compared using different choices of distance metric. Probability distributions for algoriithm recognition rates and pairwise differences in recognition rates are determined using a permutation methodology. The principal component subspace with Mahalanobis distance is the best combination; using L2 is second best. Choice of distance measure for the linear discriminant subspace matters little, and performance is always worse than the principal components classifier using either Mahalanobis or L1 distance. We make the source code for the algorithms, scoring procedures and Monte Carlo study available in the hopes others will extend this comparison to newer algorithms.
Notung: A Program for Dating Gene Duplications and Optimizing Gene Family Trees
 Journal of Computational Biology
, 2000
"... Large scale gene duplication is a major force driving the evolution of genetic functional innovation. ..."
Abstract

Cited by 44 (3 self)
 Add to MetaCart
Large scale gene duplication is a major force driving the evolution of genetic functional innovation.
A Statistical Perspective on Knowledge Discovery in Databases
, 1996
"... The quest to find models usefully characterizing data is a process central to the scientific method, and has been carried out on many fronts. Researchers from an expanding number of fields have designed algorithms to discover rules or equations that capture key relationships between variables in a d ..."
Abstract

Cited by 41 (0 self)
 Add to MetaCart
The quest to find models usefully characterizing data is a process central to the scientific method, and has been carried out on many fronts. Researchers from an expanding number of fields have designed algorithms to discover rules or equations that capture key relationships between variables in a database. The task of this chapter is to provide a perspective on statistical techniques applicable to KDD; accordingly, we review below some major advances in statistics in the last few decades. We next highlight some distinctives of what may be called a "statistical viewpoint." Finally we overview some influential classical and modern statistical methods for practical model induction.
The psychometric function: II. Bootstrapbased confidence intervals and sampling, Perception and Psychophysics 63
, 2001
"... The psychometric function relates an observer’s performance to an independent variable, usually a physical quantity of an experimental stimulus. Even if a model is successfully fit to the data and its goodness of fit is acceptable, experimenters require an estimate of the variability of the paramete ..."
Abstract

Cited by 41 (12 self)
 Add to MetaCart
The psychometric function relates an observer’s performance to an independent variable, usually a physical quantity of an experimental stimulus. Even if a model is successfully fit to the data and its goodness of fit is acceptable, experimenters require an estimate of the variability of the parameters to assess whether differences across conditions are significant. Accurate estimates of variability are difficult to obtain, however, given the typically small size of psychophysical data sets: Traditional statistical techniques are only asymptotically correct and can be shown to be unreliable in some common situations. Here and in our companion paper (Wichmann & Hill, 2001), we suggest alternative statistical techniques based on Monte Carlo resampling methods. The present paper’s principal topic is the estimation of the variability of fitted parameters and derived quantities, such as thresholds and slopes. First, we outline the basic bootstrap procedure and argue in favor of the parametric, as opposed to the nonparametric, bootstrap. Second, we describe how the bootstrap bridging assumption, on which the validity of the procedure depends, can be tested. Third, we show how one’s choice of sampling scheme (the placement of sample points on the stimulus axis) strongly affects the reliability of bootstrap confidence intervals, and we make recommendations on how to sample the psychometric function efficiently. Fourth, we show that, under certain circumstances, the (arbitrary) choice of the distribution function can exert an unwanted influence on
A hybrid micromacroevolutionary approach to gene tree reconstruction
 J. Comput. Biol
, 2006
"... Gene family evolution is determined by microevolutionary processes (e.g., point mutations) and macroevolutionary processes (e.g., gene duplication and loss), yet macroevolutionary considerations are rarely incorporated into gene phylogeny reconstruction methods. We present a dynamic program to fin ..."
Abstract

Cited by 36 (1 self)
 Add to MetaCart
Gene family evolution is determined by microevolutionary processes (e.g., point mutations) and macroevolutionary processes (e.g., gene duplication and loss), yet macroevolutionary considerations are rarely incorporated into gene phylogeny reconstruction methods. We present a dynamic program to find the most parsimonious gene family tree with respect to a macroevolutionary optimization criterion, the weighted sum of the number of gene duplications and losses. The existence of a polynomial delay algorithm for duplication/loss phylogeny reconstruction stands in contrast to most formulations of phylogeny reconstruction, which are NPcomplete. We next extend this result to obtain a twophase method for gene tree reconstruction that takes both micro and macroevolution into account. In the first phase, a gene tree is constructed from sequence data, using any of the previously known algorithms for gene phylogeny construction. In the second phase, the tree is refined by rearranging regions of the tree that do not have strong support in the sequence data to minimize the duplication/lost cost. Components of the tree with strong support are left intact. This hybrid approach incorporates both micro and macroevolutionary considerations, yet its computational requirements are modest in practice because the two phase approach constrains the search space. Our hybrid algorithm can
Thresholds from psychometric functions: superiority of bootstrap to incremental and probit variance estimators
 Psychological Bulletin
, 1991
"... The bootstrap method provides a powerful, general procedure for estimating the variance of a parameter of a function. The parametric version of the method was used to estimate the standard deviation of a threshold from a psychometric function and the standard deviation of its slope. Bootstrap standa ..."
Abstract

Cited by 30 (5 self)
 Add to MetaCart
The bootstrap method provides a powerful, general procedure for estimating the variance of a parameter of a function. The parametric version of the method was used to estimate the standard deviation of a threshold from a psychometric function and the standard deviation of its slope. Bootstrap standard deviations were compared with those obtained by a classical incremental method and by the asymptotic method of probit analysis. Twelve representative experimental conditions were tested in Monte Carlo studies, each of 1,000 data sets. All methods performed equally well with large data sets, but with small data sets the bootstrap was superior in both percentage bias and relative efficiency. There are many occasions in which it is desirable to measure the strength of a stimulus in terms of its response in an organism. Typically, different levels of a known treatment are applied to subjects and the effects of that treatment are recorded at each level. Thus, in psychophysics, one might construct a psychometric function, which describes the relationship between the level of a stimulus and the probability of a subject making a particu
Tight Bounds on the Learnability of Evolution
, 1999
"... Evolution is often modeled as a stochastic process which modifies DNA. One of the most popular such processes are the CavenderFarris (CF) trees, which are represented as edge weighted trees. The Phylogeny Construction Problem is that of, given k samples drawn from a CF tree, output a CF tree which ..."
Abstract

Cited by 26 (1 self)
 Add to MetaCart
Evolution is often modeled as a stochastic process which modifies DNA. One of the most popular such processes are the CavenderFarris (CF) trees, which are represented as edge weighted trees. The Phylogeny Construction Problem is that of, given k samples drawn from a CF tree, output a CF tree which is close to the original. Each CF tree naturally defines a random variable, and the goldstandard for reconstructing such trees is the Maximum Likelihood Estimator of this variable. This approach is notoriously computationally expensive. In this paper, we show that a very simple algorithm, which is a variant on one of the most popular algorithms used by practitioners, converges on the true tree at a rate which differs from the optimum by a constant. We do this by analyzing upper and lower bounds for the convergence rate of learning very simple CF trees, and then show that the learnability of each CF tree is sandwiched between two such simpler trees. Our results rely on the fact that, if the ...