Results 1  10
of
51
Improved heterogeneous distance functions
 Journal of Artificial Intelligence Research
, 1997
"... Instancebased learning techniques typically handle continuous and linear input values well, but often do not handle nominal input attributes appropriately. The Value Difference Metric (VDM) was designed to find reasonable distance values between nominal attribute values, but it largely ignores cont ..."
Abstract

Cited by 199 (10 self)
 Add to MetaCart
Instancebased learning techniques typically handle continuous and linear input values well, but often do not handle nominal input attributes appropriately. The Value Difference Metric (VDM) was designed to find reasonable distance values between nominal attribute values, but it largely ignores continuous attributes, requiring discretization to map continuous values into nominal values. This paper proposes three new heterogeneous distance functions, called the Heterogeneous Value Difference Metric (HVDM), the Interpolated Value Difference Metric (IVDM), and the Windowed Value Difference Metric (WVDM). These new distance functions are designed to handle applications with nominal attributes, continuous attributes, or both. In experiments on 48 applications the new distance metrics achieve higher classification accuracy on average than three previous distance functions on those datasets that have both nominal and continuous attributes.
The unicorn, the normal curve, and other improbable creatures
 Psychological Bulletin
, 1989
"... An investigation of the distributional characteristics of 440 largesample achievement and psychometric measures found all to be significantly nonnormal at the alpha.01 significance level. Several classes of contamination were found, including tail weights from the uniform to the double exponential, ..."
Abstract

Cited by 34 (0 self)
 Add to MetaCart
An investigation of the distributional characteristics of 440 largesample achievement and psychometric measures found all to be significantly nonnormal at the alpha.01 significance level. Several classes of contamination were found, including tail weights from the uniform to the double exponential, exponentiallevel asymmetry, severe digit preferences, multimodalities, and modes external to the mean/median interval. Thus, the underlying tenets of normalityassuming statistics appear fallacious for these commonly used types of data. However, findings here also fail to support the types of distributions used in most prior robustness research suggesting the failure of such statistics under nonnormal conditions. A reevaluation of the statistical robustness literature appears appropriate in light of these findings. 1 During recent years a considerable literature devoted to robust statistics has appeared. This research reflects a growing concern among statisticians regarding the robustness, or insensitivity, of parametric statistics to violations of their underlying assumptions. Recent findings suggest that the most commonly used of these statistics exhibit varying degrees of nonrobustness to certain violations of the normality assumption. Although the importance of such findings is underscored by numerous empirical studies documenting nonnormality in a variety of fields, a startling lack of such evidence exists for achievement
Component selection and smoothing in multivariate nonparametric regression
"... We propose a new method for model selection and model fitting in multivariate nonparametric regression models, in the framework of smoothing spline ANOVA. The “COSSO ” is a method of regularization with the penalty functional being the sum of component norms, instead of the squared norm employed in ..."
Abstract

Cited by 34 (0 self)
 Add to MetaCart
We propose a new method for model selection and model fitting in multivariate nonparametric regression models, in the framework of smoothing spline ANOVA. The “COSSO ” is a method of regularization with the penalty functional being the sum of component norms, instead of the squared norm employed in the traditional smoothing spline method. The COSSO provides a unified framework for several recent proposals for model selection in linear models and smoothing spline ANOVA models. Theoretical properties, such as the existence and the rate of convergence of the COSSO estimator, are studied. In the special case of a tensor product design with periodic functions, a detailed analysis reveals that the COSSO does model selection by applying a novel soft thresholding type operation to the function components. We give an equivalent formulation of the COSSO estimator which leads naturally to an iterative algorithm. We compare the COSSO with MARS, a popular method that builds functional ANOVA models, in simulations and real examples. The COSSO method can be extended to classification problems and we compare its performance with those of a number of machine learning algorithms on real datasets. The COSSO gives very competitive performance in these studies. 1. Introduction. Consider
Informationtheoretic image formation
 IEEE Transactions on Information Theory
, 1998
"... Abstract — The emergent role of information theory in image formation is surveyed. Unlike the subject of informationtheoretic communication theory, informationtheoretic imaging is far from a mature subject. The possible role of information theory in problems of image formation is to provide a rigo ..."
Abstract

Cited by 28 (5 self)
 Add to MetaCart
Abstract — The emergent role of information theory in image formation is surveyed. Unlike the subject of informationtheoretic communication theory, informationtheoretic imaging is far from a mature subject. The possible role of information theory in problems of image formation is to provide a rigorous framework for defining the imaging problem, for defining measures of optimality used to form estimates of images, for addressing issues associated with the development of algorithms based on these optimality criteria, and for quantifying the quality of the approximations. The definition of the imaging problem consists of an appropriate model for the data and an appropriate model for the reproduction space, which is the space within which image estimates take values. Each problem statement has an associated optimality criterion that measures the overall quality of an estimate. The optimality criteria include maximizing the likelihood function and minimizing mean squared error for stochastic problems, and minimizing squared error and discrimination for deterministic problems. The development of algorithms is closely tied to the definition of the imaging problem and the associated optimality criterion. Algorithms with a strong informationtheoretic motivation are obtained by the method of expectation maximization. Related alternating minimization algorithms are discussed. In quantifying the quality of approximations, global and local measures are discussed. Global measures include the (mean) squared error and discrimination between an estimate and the truth, and probability of error for recognition or hypothesis testing problems. Local measures include Fisher information. Index Terms—Image analysis, image formation, image processing, image reconstruction, image restoration, imaging, inverse problems, maximumlikelihood estimation, pattern recognition. I.
Component Selection and Smoothing in Smoothing Spline Analysis of Variance Models
 COSSO. INSTITUTE OF STATISTICS MIMEO SERIES 2556, NCSU
, 2003
"... We propose a new method for model selection and model fitting in nonparametric regression models, in the framework of smoothing spline ANOVA. The "COSSO" is a method of regularization with the penalty functional being the sum of component norms, instead of the squared norm employed in the traditi ..."
Abstract

Cited by 27 (9 self)
 Add to MetaCart
We propose a new method for model selection and model fitting in nonparametric regression models, in the framework of smoothing spline ANOVA. The "COSSO" is a method of regularization with the penalty functional being the sum of component norms, instead of the squared norm employed in the traditional smoothing spline method. The COSSO provides a unified framework for several recent proposals for model selection in linear models and smoothing spline ANOVA models. Theoretical properties, such as the existence and the rate of convergence of the COSSO estimator, are studied. In the special case of a tensor product design with periodic functions, a detailed analysis reveals that the COSSO applies a novel soft thresholding type operation to the function components and selects the correct model structure with probability tending to one. We give
Multiscale Poisson intensity and density estimation
 IEEE TRANS. INFO. TH
, 2005
"... The nonparametric Poisson intensity and density estimation methods studied in this paper offer near minimax convergence rates for broad classes of densities and intensities with arbitrary levels of smoothness. The methods and theory presented here share many of the desirable features associated with ..."
Abstract

Cited by 25 (12 self)
 Add to MetaCart
The nonparametric Poisson intensity and density estimation methods studied in this paper offer near minimax convergence rates for broad classes of densities and intensities with arbitrary levels of smoothness. The methods and theory presented here share many of the desirable features associated with waveletbased estimators: computational speed, spatial adaptivity, and the capability of detecting discontinuities and singularities with high resolution. Unlike traditional waveletbased approaches, which impose an upper bound on the degree of smoothness to which they can adapt, the estimators studied here guarantee nonnegativity and do not require any a priori knowledge of the underlying signal’s smoothness to guarantee nearoptimal performance. At the heart of these methods lie multiscale decompositions based on freeknot, freedegree piecewisepolynomial functions and penalized likelihood estimation. The degrees as well as the locations of the polynomial pieces can be adapted to the observed data, resulting in near minimax optimal convergence rates. For piecewise analytic signals, in particular, the error of this estimator converges at nearly the parametric rate. These methods can be further refined in two dimensions, and it is demonstrated that plateletbased estimators in two dimensions exhibit similar nearoptimal error convergence rates for images consisting of smooth surfaces separated by smooth boundaries.
How to Fit a Response Time Distribution
"... Among the most valuable tools in behavioral science is statistically fitting mathematical models of cognition to data, response time distributions in particular. However, techniques for fitting distributions vary widely and little is known about the efficacy of different techniques. In this article, ..."
Abstract

Cited by 24 (0 self)
 Add to MetaCart
Among the most valuable tools in behavioral science is statistically fitting mathematical models of cognition to data, response time distributions in particular. However, techniques for fitting distributions vary widely and little is known about the efficacy of different techniques. In this article, we assessed several fitting techniques by simulating six widely cited models of response time and using the fitting procedures to recover model parameters. The techniques include the maximization of likelihood and leastsquares fits of the theoretical distributions to different empirical estimates of the simulated distributions. A running example was used to illustrate the different estimation and fitting procedures. The simulation studies revealed that empirical density estimates are biased even for very large sample sizes. Some fitting techniques yielded more accurate and less variable parameter estimates than others. Methods that involved leastsquares fits to density estimates generally yielded very poor parameter estimates. How to Fit a Response Time Distribution The importance of considering the entire response time (RT) distribution in testing formal models of cognition is now widely appreciated. Fitting a model to mean RT alone can mask important details of the data that examination of the entire distribution would reveal, such as the behavior of fast and slow responses across the conditions of an experiment (e.g., Heathcote, Popiel & Mewhort, 1991), the extent of facilitation between perceptual channels (Miller, 1982), and the effects of practice on RT quantiles (Logan, 1992). Techniques for testing hypotheses based on the RT distribution have been developed (Townsend, 1990). In addition, the RT distribution provides an important meeting ground between theory and da...
A Practical Algorithm For General Large Scale Nonlinear Optimization Problems
 SIAM Journal on Optimization
, 1994
"... . We provide an effective and efficient implementation of a sequential quadratic programming (SQP) algorithm for the general large scale nonlinear programming problem. In this algorithm the quadratic programming subproblems are solved by an interior point method that can be prematurely halted by a t ..."
Abstract

Cited by 22 (10 self)
 Add to MetaCart
. We provide an effective and efficient implementation of a sequential quadratic programming (SQP) algorithm for the general large scale nonlinear programming problem. In this algorithm the quadratic programming subproblems are solved by an interior point method that can be prematurely halted by a trust region constraint. Numerous computational enhancements to improve the numerical performance are presented. These include a dynamic procedure for adjusting the merit function parameter and procedures for adjusting the trust region radius. Numerical results and comparisons are presented. Key words: nonlinear programming, interior point, SQP, merit function, trust region, large scale 1. Introduction. In a series of recent papers, [3], [6], and [8], the authors have developed a new algorithmic approach for solving large, nonlinear, constrained optimization problems. This proposed procedure is, in essence, a sequential quadratic programming (SQP) method that uses an interior point algorithm...
ASYMPTOTIC PERFORMANCE BOUNDS FOR THE KERNEL ESTIMATE
, 1988
"... We consider an arbitrary sequence of kernel density estimates f, ~ with kernels Kn possibly depending upon n. Under a mild restriction on the sequence Kn, we obtain inequalities of the type Ef I fn fI> (1+ o ( 1)) (n, I), where f is the density being estimated and I'(n, f) is a function of n and f ..."
Abstract

Cited by 13 (10 self)
 Add to MetaCart
We consider an arbitrary sequence of kernel density estimates f, ~ with kernels Kn possibly depending upon n. Under a mild restriction on the sequence Kn, we obtain inequalities of the type Ef I fn fI> (1+ o ( 1)) (n, I), where f is the density being estimated and I'(n, f) is a function of n and f only. The function can be considered as an indicator of the difficulty of estimating f with any kernel estimate.