Results 1  10
of
181
Mean shift: A robust approach toward feature space analysis
 In PAMI
, 2002
"... A general nonparametric technique is proposed for the analysis of a complex multimodal feature space and to delineate arbitrarily shaped clusters in it. The basic computational module of the technique is an old pattern recognition procedure, the mean shift. We prove for discrete data the convergence ..."
Abstract

Cited by 1469 (34 self)
 Add to MetaCart
A general nonparametric technique is proposed for the analysis of a complex multimodal feature space and to delineate arbitrarily shaped clusters in it. The basic computational module of the technique is an old pattern recognition procedure, the mean shift. We prove for discrete data the convergence of a recursive mean shift procedure to the nearest stationary point of the underlying density function and thus its utility in detecting the modes of the density. The equivalence of the mean shift procedure to the Nadaraya–Watson estimator from kernel regression and the robust Mestimators of location is also established. Algorithms for two lowlevel vision tasks, discontinuity preserving smoothing and image segmentation are described as applications. In these algorithms the only user set parameter is the resolution of the analysis, and either gray level or color images are accepted as input. Extensive experimental results illustrate their excellent performance.
Correlationbased feature selection for discrete and numeric class machine learning
, 2000
"... Algorithms for feature selection fall into two broad categories: wrappers use the learning algorithm itself to evaluate the usefulness of features, while lters evaluate features according to heuristics based on general characteristics of the data. For application to large databases, lters have prove ..."
Abstract

Cited by 145 (1 self)
 Add to MetaCart
Algorithms for feature selection fall into two broad categories: wrappers use the learning algorithm itself to evaluate the usefulness of features, while lters evaluate features according to heuristics based on general characteristics of the data. For application to large databases, lters have proven to be more practical than wrappers because they are much faster. However, most existing lter algorithms only work with discrete classi cation problems. This paper describes a fast, correlationbased lter algorithm that can be applied to continuous and discrete problems. Experiments using the new method as a preprocessing step for naive Bayes, instancebased learning, decision trees, locally weighted regression, and model trees show it to be an e ective feature selectorit reduces the data in dimensionality by more than sixty percent in most cases without negatively a ecting accuracy. Also, decision and model trees built from the preprocessed data are often signi cantly smaller. 1 1
The Variable Bandwidth Mean Shift and DataDriven Scale Selection
 in Proc. 8th Intl. Conf. on Computer Vision
, 2001
"... We present two solutions for the scale selection problem in computer vision. The first one is completely nonparametric and is based on the the adaptive estimation of the normalized density gradient. Employing the sample point estimator, we define the Variable Bandwidth Mean Shift, prove its converge ..."
Abstract

Cited by 98 (9 self)
 Add to MetaCart
We present two solutions for the scale selection problem in computer vision. The first one is completely nonparametric and is based on the the adaptive estimation of the normalized density gradient. Employing the sample point estimator, we define the Variable Bandwidth Mean Shift, prove its convergence, and show its superiority over the fixed bandwidth procedure. The second technique has a semiparametric nature and imposes a local structure on the data to extract reliable scale information. The local scale of the underlying density is taken as the bandwidth which maximizes the magnitude of the normalized mean shift vector. Both estimators provide practical tools for autonomous image and quasi realtime video analysis and several examples are shown to illustrate their effectiveness. 1 Motivation for Variable Bandwidth The efficacy of Mean Shift analysis has been demonstrated in computer vision problems such as tracking and segmentation in [5, 6]. However, one of the limitations of the mean shift procedure as defined in these papers is that it involves the specification of a scale parameter. While results obtained appear satisfactory, when the local characteristics of the feature space differs significantly across data, it is difficult to find an optimal global bandwidth for the mean shift procedure. In this paper we address the issue of locally adapting the bandwidth. We also study an alternative approach for datadriven scale selection which imposes a local structure on the data. The proposed solutions are tested in the framework of quasi realtime video analysis. We review first the intrinsic limitations of the fixed bandwidth density estimation methods. Then, two of the most popular variable bandwidth estimators, the balloon and the sample point, are introduced and...
Estimation of genetic networks and functional structures between genes by using Bayesian networks and nonparametric regression
 Pacific Symposium on Biocomputing
, 2002
"... We propose a new method for constructing genetic network from gene expression data by using Bayesian networks. We use nonparametric regression for capturing nonlinear relationships between genes and derive a new criterion for choosing the network in general situations. In a theoretical sense, our pr ..."
Abstract

Cited by 98 (25 self)
 Add to MetaCart
We propose a new method for constructing genetic network from gene expression data by using Bayesian networks. We use nonparametric regression for capturing nonlinear relationships between genes and derive a new criterion for choosing the network in general situations. In a theoretical sense, our proposed theory and methodology include previous methods based on Bayes approach. We applied the proposed method to the S. cerevisiae cell cycle data and showed the effectiveness of our method by comparing with previous methods. 1
An algorithm for datadriven bandwidth selection
 IEEE Transactions on Pattern Analysis and Machine Intelligence
, 2003
"... Abstract—The analysis of a feature space that exhibits multiscale patterns often requires kernel estimation techniques with locally adaptive bandwidths, such as the variablebandwidth mean shift. Proper selection of the kernel bandwidth is, however, a critical step for superior space analysis and pa ..."
Abstract

Cited by 79 (7 self)
 Add to MetaCart
Abstract—The analysis of a feature space that exhibits multiscale patterns often requires kernel estimation techniques with locally adaptive bandwidths, such as the variablebandwidth mean shift. Proper selection of the kernel bandwidth is, however, a critical step for superior space analysis and partitioning. This paper presents a mean shiftbased approach for local bandwidth selection in the multimodal, multivariate case. Our method is based on a fundamental property of normal distributions regarding the bias of the normalized density gradient. We demonstrate that, within the large sample approximation, the local covariance is estimated by the matrix that maximizes the magnitude of the normalized mean shift vector. Using this property, we develop a reliable algorithm which takes into account the stability of local bandwidth estimates across scales. The validity of our theoretical results is proven in various space partitioning experiments involving the variablebandwidth mean shift. Index Terms—Variablebandwidth mean shift, bandwidth selection, multiscale analysis, JensenShannon divergence, feature space. 1
The Finite Moment Log Stable Process and Option Pricing
, 2002
"... We document a surprising pattern in market prices of S&P 500 index options. When implied volatilities are graphed against a standard measure of moneyness, the implied volatility smirk does not flatten out as maturity increases up to the observable horizon of two years. This behavior contrasts sharpl ..."
Abstract

Cited by 51 (9 self)
 Add to MetaCart
We document a surprising pattern in market prices of S&P 500 index options. When implied volatilities are graphed against a standard measure of moneyness, the implied volatility smirk does not flatten out as maturity increases up to the observable horizon of two years. This behavior contrasts sharply with the implications of many pricing models and with the asymptotic behavior implied by the central limit theorem (CLT). We develop a parsimonious model which deliberately violates the CLT assumptions and thus captures the observed behavior of the volatility smirk over the maturity horizon. Calibration exercises demonstrate its superior performance against several widely used alternatives.
Robust forecasting of mortality and fertility rates: a functional data approach, Department of Econometrics and Business Statistics working paper
, 2005
"... A new method is proposed for forecasting agespecific mortality and fertility rates observed over time. This approach allows for smooth functions of age, is robust for outlying years due to wars and epidemics, and provides a modelling framework that is easily adapted to allow for constraints and oth ..."
Abstract

Cited by 42 (14 self)
 Add to MetaCart
A new method is proposed for forecasting agespecific mortality and fertility rates observed over time. This approach allows for smooth functions of age, is robust for outlying years due to wars and epidemics, and provides a modelling framework that is easily adapted to allow for constraints and other information. Ideas from functional data analysis, nonparametric smoothing and robust statistics are combined to form a methodology that is widely applicable to any functional time series data observed discretely and possibly with error. The model is a generalization of the LeeCarter model commonly used in mortality and fertility forecasting. The methodology is applied to French mortality data and Australian fertility data, and the forecasts obtained are shown to be superior to those from the LeeCarter method and several of its variants.
Adaptive Regression by Mixing
 Journal of American Statistical Association
"... Adaptation over different procedures is of practical importance. Different procedures perform well under different conditions. In many practical situations, it is rather hard to assess which conditions are (approximately) satisfied so as to identify the best procedure for the data at hand. Thus auto ..."
Abstract

Cited by 39 (7 self)
 Add to MetaCart
Adaptation over different procedures is of practical importance. Different procedures perform well under different conditions. In many practical situations, it is rather hard to assess which conditions are (approximately) satisfied so as to identify the best procedure for the data at hand. Thus automatic adaptation over various scenarios is desirable. A practically feasible method, named Adaptive Regression by Mixing (ARM) is proposed to convexly combine general candidate regression procedures. Under mild conditions, the resulting estimator is theoretically shown to perform optimally in rates of convergence without knowing which of the original procedures work the best. Simulations are conducted in several settings, including comparing a parametric model with nonparametric alternatives, comparing a neural network with a projection pursuit in multidimensional regression, and combining bandwidths in kernel regression. The results clearly support the theoretical property of ARM. The ARM ...
CrossValidation and the Estimation of Conditional Probability Densities
 Journal of the American Statistical Association
, 2004
"... ABSTRACT. Many practical problems, especially some connected with forecasting, require nonparametric estimation of conditional densities from mixed data. For example, given an explanatory data vector X for a prospective customer, with components that could include the customer’s salary, occupation, ..."
Abstract

Cited by 37 (3 self)
 Add to MetaCart
ABSTRACT. Many practical problems, especially some connected with forecasting, require nonparametric estimation of conditional densities from mixed data. For example, given an explanatory data vector X for a prospective customer, with components that could include the customer’s salary, occupation, age, sex, marital status and address, a company might wish to estimate the density of the expenditure, Y, that could be made by that person, basing the inference on observations of (X, Y) for previous clients. Choosing appropriate smoothing parameters for this problem can be tricky, not least because plugin rules take a particularly complex form in the case of mixed data. An obvious difficulty is that there exists no general formula for the optimal smoothing parameters. More insidiously, and more seriously, it can be difficult to determine which components of X are relevant to the problem of conditional inference. For example, if the jth component of X is independent of Y then that component is irrelevant to estimating the density of Y given X, and ideally should be dropped before conducting inference. In this paper we show that crossvalidation overcomes these difficulties. It automatically determines which components are relevant and which are not, through assigning large smoothing parameters to the latter and consequently shrinking them towards the uniform distribution on the respective marginals. This effectively removes irrelevant components from contention, by suppressing their contribution to estimator variance; they already have very small bias, a consequence of their independence of Y. Crossvalidation also gives us important information about which components are relevant: the relevant components are precisely those which crossvalidation has chosen to smooth in a traditional way, by assigning them smoothing parameters of conventional size. Indeed, crossvalidation produces asymptotically optimal smoothing for relevant components, while eliminating irrelevant components by oversmoothing. In the problem of nonparametric estimation of a conditional density, crossvalidation comes into its own as a method with no obvious peers.