Results 11  20
of
541
Integrating structured biological data by kernel maximum mean discrepancy
 IN ISMB
, 2006
"... Motivation: Many problems in data integration in bioinformatics can be posed as one common question: Are two sets of observations generated by the same distribution? We propose a kernelbased statistical test for this problem, based on the fact that two distributions are different if and only if the ..."
Abstract

Cited by 52 (14 self)
 Add to MetaCart
Motivation: Many problems in data integration in bioinformatics can be posed as one common question: Are two sets of observations generated by the same distribution? We propose a kernelbased statistical test for this problem, based on the fact that two distributions are different if and only if there exists at least one function having different expectation on the two distributions. Consequently we use the maximum discrepancy between function means as the basis of a test statistic. The Maximum Mean Discrepancy (MMD) can take advantage of the kernel trick, which allows us to apply it not only to vectors, but strings, sequences, graphs, and other common structured data types arising in molecular biology. Results: We study the practical feasibility of an MMDbased test on three central data integration tasks: Testing crossplatform comparability of microarray data, cancer diagnosis, and datacontent based schema matching for two different protein function classification schemas. In all of these experiments, including highdimensional ones, MMD is very accurate in finding samples that were generated from the same distribution, and outperforms its best competitors. Conclusions: We have defined a novel statistical test of whether two samples are from the same distribution, compatible with both multivariate and structured data, that is fast, easy to implement, and works well, as confirmed by our experiments.
Optimal Pricing Mechanisms with Unknown Demand
 American Economic Review
, 2003
"... The standard profitmaximizing multiunit auction intersects the submitted demand curve with a preset reservation supply curve, which is determined using the distribution from which the buyers ’ valuations are drawn. However, when this distribution is unknown, a preset supply curve cannot maximize ..."
Abstract

Cited by 48 (0 self)
 Add to MetaCart
The standard profitmaximizing multiunit auction intersects the submitted demand curve with a preset reservation supply curve, which is determined using the distribution from which the buyers ’ valuations are drawn. However, when this distribution is unknown, a preset supply curve cannot maximize monopoly profits. The optimal pricing mechanism in this situation sets a price to each buyer on the basis of the demand distribution inferred statistically from other buyers ’ bids. The resulting profit converges to the optimal monopoly profit with known demand as the numberofbuyersgoestoinfinity, and convergence can be substantially faster than with sequential price experimentation. * Department of Economics, Stanford University, Stanford
Prediction in dynamic models with time dependent conditional heteroskedasticity, Working paper no
, 1990
"... This paper considers forecasting the conditional mean and variance from a singleequation dynamic model with autocorrelated disturbances following an ARMA process, and innovations with timedependent conditional heteroskedasticity as represented by a linear GARCH process. Expressions for the minimum ..."
Abstract

Cited by 43 (7 self)
 Add to MetaCart
This paper considers forecasting the conditional mean and variance from a singleequation dynamic model with autocorrelated disturbances following an ARMA process, and innovations with timedependent conditional heteroskedasticity as represented by a linear GARCH process. Expressions for the minimum MSE predictor and the conditional MSE are presented. We also derive the formula for all the theoretical moments of the prediction error distribution from a general dynamic model with GARCHtl, 1) innovations. These results are then used in the construction of ex ante prediction confidence intervals by means of the CornishFisher asymptotic expansion. An empirical example relating to the uncertainty of the expected depreciation of foreign exchange rates illustrates the usefulness of the results. 1.
A kernel statistical test of independence
, 2008
"... Although kernel measures of independence have been widely applied in machine learning (notably in kernel ICA), there is as yet no method to determine whether they have detected statistically significant dependence. We provide a novel test of the independence hypothesis for one particular kernel inde ..."
Abstract

Cited by 41 (26 self)
 Add to MetaCart
Although kernel measures of independence have been widely applied in machine learning (notably in kernel ICA), there is as yet no method to determine whether they have detected statistically significant dependence. We provide a novel test of the independence hypothesis for one particular kernel independence measure, the HilbertSchmidt independence criterion (HSIC). The resulting test costs O(m 2), where m is the sample size. We demonstrate that this test outperforms established contingency table and functional correlationbased tests, and that this advantage is greater for multivariate data. Finally, we show the HSIC test also applies to text (and to structured data more generally), for which no other independence test presently exists. 1
Exploring estimator biasvariance tradeoffs using the uniform CR bound
 IEEE Trans. on Sig. Proc
, 1996
"... We introduce a plane, which we call the deltasigma plane, that is indexed by the norm of the estimator bias gradient and the variance of the estimator. The norm of the bias gradient is related to the maximum variation in the estimator bias function over a neighborhood of parameter space. Using a un ..."
Abstract

Cited by 40 (15 self)
 Add to MetaCart
We introduce a plane, which we call the deltasigma plane, that is indexed by the norm of the estimator bias gradient and the variance of the estimator. The norm of the bias gradient is related to the maximum variation in the estimator bias function over a neighborhood of parameter space. Using a uniform CramerRao (CR) bound on estimator variance a deltasigma tradeoff curve is specied which denes an "unachievable region" of the deltasigma plane for a specified statistical model. In order to place an estimator on this plane for comparison to the deltasigma tradeoff curve, the estimator variance, bias gradient, and bias gradient norm must be evaluated. We present a simple and accurate method for experimentally determining the bias gradient norm based on applying a bootstrap estimator to a sample mean constructed from the gradient of the loglikelihood. We demonstrate the methods developed in this paper for linear Gaussian and nonlinear Poisson inverse problems.
A kernel method for the two sample problem
 ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 19
, 2007
"... We propose a framework for analyzing and comparing distributions, allowing us to design statistical tests to determine if two samples are drawn from different distributions. Our test statistic is the largest difference in expectations over functions in the unit ball of a reproducing kernel Hilbert ..."
Abstract

Cited by 38 (13 self)
 Add to MetaCart
We propose a framework for analyzing and comparing distributions, allowing us to design statistical tests to determine if two samples are drawn from different distributions. Our test statistic is the largest difference in expectations over functions in the unit ball of a reproducing kernel Hilbert space (RKHS). We present two tests based on large deviation bounds for the test statistic, while a third is based on the asymptotic distribution of this statistic. The test statistic can be computed in quadratic time, although efficient linear time approximations are available. Several classical metrics on distributions are recovered when the function space used to compute the difference in expectations is allowed to be more general (eg. a Banach space). We apply our twosample tests to a variety of problems, including attribute matching for databases using the Hungarian marriage method, where they perform strongly. Excellent performance is also obtained when comparing distributions over graphs, for which these are the first such tests.
On synopses for distinctvalue estimation under multiset operations
 IN SIGMOD
, 2007
"... The task of estimating the number of distinct values (DVs) in a large dataset arises in a wide variety of settings in computer science and elsewhere. We provide DV estimation techniques that are designed for use within a flexible and scalable “synopsis warehouse” architecture. In this setting, incom ..."
Abstract

Cited by 37 (7 self)
 Add to MetaCart
The task of estimating the number of distinct values (DVs) in a large dataset arises in a wide variety of settings in computer science and elsewhere. We provide DV estimation techniques that are designed for use within a flexible and scalable “synopsis warehouse” architecture. In this setting, incoming data is split into partitions and a synopsis is created for each partition; each synopsis can then be used to quickly estimate the number of DVs in its corresponding partition. By combining and extending a number of results in the literature, we obtain both appropriate synopses and novel DV estimators to use in conjunction with these synopses. Our synopses can be created in parallel, and can then be easily combined to yield synopses and DV estimates for arbitrary unions, intersections or differences of partitions. Our synopses can also handle deletions of individual partition elements. We use the theory of order statistics to show that our DV estimators are unbiased, and to establish moment formulas and sharp error bounds. Based on a novel limit theorem, we can exploit results due to Cohen in order to select synopsis sizes when initially designing the warehouse. Experiments and theory indicate that our synopses and estimators lead to lower computational costs and more accurate DV estimates than previous approaches.
Wavelet domain filtering for photon imaging systems
 IEEE Trans. on Image Processing
, 1999
"... Abstract—Many imaging systems rely on photon detection as the basis of image formation. One of the major sources of error in these systems is Poisson noise due to the quantum nature of the photon detection process. Unlike additive Gaussian white noise, the variance of Poisson noise is proportional t ..."
Abstract

Cited by 35 (7 self)
 Add to MetaCart
Abstract—Many imaging systems rely on photon detection as the basis of image formation. One of the major sources of error in these systems is Poisson noise due to the quantum nature of the photon detection process. Unlike additive Gaussian white noise, the variance of Poisson noise is proportional to the underlying signal intensity, and consequently separating signal from noise is a very difficult task. In this paper, we perform a novel gedankenexperiment to devise a new waveletdomain filtering procedure for noise removal in photon imaging systems. The filter adapts to both the signal and the noise, and balances the tradeoff between noise removal and excessive smoothing of image details. Designed using the statistical method of crossvalidation, the filter is simultaneously optimal in a smallsample predictive sum of squares sense and asymptotically optimal in the meansquareerror sense. The filtering procedure has a simple interpretation as a joint edge detection/estimation process. Moreover, we derive an efficient algorithm for performing the filtering that has the same order of complexity as the fast wavelet transform itself. The performance of the new filter is assessed with simulated data experiments and tested with actual nuclear medicine imagery. Index Terms — Photonlimited imaging, Poisson processes, wavelets.
Estimating the "Wrong" Graphical Model: Benefits in the ComputationLimited Setting
 Journal of Machine Learning Research
, 2006
"... Consider the problem of joint parameter estimation and prediction in a Markov random field: that is, the model parameters are estimated on the basis of an initial set of data, and then the fitted model is used to perform prediction (e.g., smoothing, denoising, interpolation) on a new noisy observa ..."
Abstract

Cited by 35 (2 self)
 Add to MetaCart
Consider the problem of joint parameter estimation and prediction in a Markov random field: that is, the model parameters are estimated on the basis of an initial set of data, and then the fitted model is used to perform prediction (e.g., smoothing, denoising, interpolation) on a new noisy observation.
Fast Approximate Spectral Clustering
, 2009
"... Spectral clustering refers to a flexible class of clustering procedures that can produce highquality clusterings on small data sets but which has limited applicability to largescale problems due to its computational complexity of O(n 3), with n the number of data points. We extend the range of spe ..."
Abstract

Cited by 35 (1 self)
 Add to MetaCart
Spectral clustering refers to a flexible class of clustering procedures that can produce highquality clusterings on small data sets but which has limited applicability to largescale problems due to its computational complexity of O(n 3), with n the number of data points. We extend the range of spectral clustering by developing a general framework for fast approximate spectral clustering in which a distortionminimizing local transformation is first applied to the data. This framework is based on a theoretical analysis that provides a statistical characterization of the effect of local distortion on the misclustering rate. We develop two concrete instances of our general framework, one based on local kmeans clustering (KASP) and one based on random projection trees (RASP). Extensive experiments show that these algorithms can achieve significant speedups with little degradation in clustering accuracy. Specifically, our algorithms outperform kmeans by a large margin in terms of accuracy, and run several times faster than approximate spectral clustering based on the Nyström method, with comparable accuracy and significantly smaller memory footprint. Remarkably, our algorithms make it possible for a single machine to spectral cluster data sets with a million observations within several minutes. 1