Results 11  20
of
988
A Hilbert space embedding for distributions
 In Algorithmic Learning Theory: 18th International Conference
, 2007
"... Abstract. We describe a technique for comparing distributions without the need for density estimation as an intermediate step. Our approach relies on mapping the distributions into a reproducing kernel Hilbert space. Applications of this technique can be found in twosample tests, which are used for ..."
Abstract

Cited by 110 (45 self)
 Add to MetaCart
(Show Context)
Abstract. We describe a technique for comparing distributions without the need for density estimation as an intermediate step. Our approach relies on mapping the distributions into a reproducing kernel Hilbert space. Applications of this technique can be found in twosample tests, which are used for determining whether two sets of observations arise from the same distribution, covariate shift correction, local learning, measures of independence, and density estimation. Kernel methods are widely used in supervised learning [1, 2, 3, 4], however they are much less established in the areas of testing, estimation, and analysis of probability distributions, where information theoretic approaches [5, 6] have long been dominant. Recent examples include [7] in the context of construction of graphical models, [8] in the context of feature extraction, and [9] in the context of independent component analysis. These methods have by and large a common issue: to compute quantities such as the mutual information, entropy, or KullbackLeibler divergence, we require sophisticated space partitioning and/or
MCDB: a Monte Carlo approach to managing uncertain data
, 2008
"... To deal with data uncertainty, existing probabilistic database systems augment tuples with attributelevel or tuplelevel probability values, which are loaded into the database along with the data itself. This approach can severely limit the system’s ability to gracefully handle complex or unforese ..."
Abstract

Cited by 108 (3 self)
 Add to MetaCart
To deal with data uncertainty, existing probabilistic database systems augment tuples with attributelevel or tuplelevel probability values, which are loaded into the database along with the data itself. This approach can severely limit the system’s ability to gracefully handle complex or unforeseen types of uncertainty, and does not permit the uncertainty model to be dynamically parameterized according to the current state of the database. We introduce MCDB, a system for managing uncertain data that is based on a Monte Carlo approach. MCDB represents uncertainty via “VG functions,” which are used to pseudorandomly generate realized values for uncertain attributes. VG functions can be parameterized on the results of SQL queries over “parameter tables ” that are stored in the database, facilitating whatif analyses. By storing parameters, and not probabilities, and by estimating, rather than exactly computing, the probability distribution over possible query answers, MCDB avoids many of the limitations of prior systems. For example, MCDB can easily handle arbitrary joint probability distributions over discrete or continuous attributes, arbitrarily complex SQL queries, and arbitrary functionals of the queryresult distribution such as means, variances, and quantiles. To achieve good performance, MCDB uses novel query processing techniques, executing a query plan exactly once, but over “tuple bundles ” instead of ordinary tuples. Experiments indicate that our enhanced functionality can be obtained with acceptable overheads relative to traditional systems.
A kernel statistical test of independence
, 2008
"... Although kernel measures of independence have been widely applied in machine learning (notably in kernel ICA), there is as yet no method to determine whether they have detected statistically significant dependence. We provide a novel test of the independence hypothesis for one particular kernel inde ..."
Abstract

Cited by 95 (48 self)
 Add to MetaCart
(Show Context)
Although kernel measures of independence have been widely applied in machine learning (notably in kernel ICA), there is as yet no method to determine whether they have detected statistically significant dependence. We provide a novel test of the independence hypothesis for one particular kernel independence measure, the HilbertSchmidt independence criterion (HSIC). The resulting test costs O(m 2), where m is the sample size. We demonstrate that this test outperforms established contingency table and functional correlationbased tests, and that this advantage is greater for multivariate data. Finally, we show the HSIC test also applies to text (and to structured data more generally), for which no other independence test presently exists. 1
Unsupervised Learning of Distributions on Binary Vectors Using Two Layer Networks
, 1994
"... this paper is related to both of these lines of work and has some advantages over each of them. If we find a good model of the distribution, we can tackle other interesting learning problems, such as the problem of estimating the conditional distribution on certain components of the vector ~x when p ..."
Abstract

Cited by 95 (1 self)
 Add to MetaCart
this paper is related to both of these lines of work and has some advantages over each of them. If we find a good model of the distribution, we can tackle other interesting learning problems, such as the problem of estimating the conditional distribution on certain components of the vector ~x when provided with the values for the other components (a kind of regression problem), or predicting the actual values for certain components of ~x based on the values of the other components (a kind of pattern completion task). In the example of the binary images presented above, this would amount to the task of recovering the value of a pixel whose value has been corrupted. We can often also use the distribution model to help us in a supervised learning task. This is because it is often easier to express the mapping of an instance to the correct label by using "features" that are correlation patterns among the bits of the instance. For example, it is easier to describe each of the ten digits in terms of patterns such as lines and circles, rather than in terms of the values of individual pixels, that are more likely to change between different instances of the same digit. The process of learning an unknown distribution from examples is usually called density estimation or
Integrating structured biological data by kernel maximum mean discrepancy
 IN ISMB
, 2006
"... Motivation: Many problems in data integration in bioinformatics can be posed as one common question: Are two sets of observations generated by the same distribution? We propose a kernelbased statistical test for this problem, based on the fact that two distributions are different if and only if the ..."
Abstract

Cited by 85 (20 self)
 Add to MetaCart
(Show Context)
Motivation: Many problems in data integration in bioinformatics can be posed as one common question: Are two sets of observations generated by the same distribution? We propose a kernelbased statistical test for this problem, based on the fact that two distributions are different if and only if there exists at least one function having different expectation on the two distributions. Consequently we use the maximum discrepancy between function means as the basis of a test statistic. The Maximum Mean Discrepancy (MMD) can take advantage of the kernel trick, which allows us to apply it not only to vectors, but strings, sequences, graphs, and other common structured data types arising in molecular biology. Results: We study the practical feasibility of an MMDbased test on three central data integration tasks: Testing crossplatform comparability of microarray data, cancer diagnosis, and datacontent based schema matching for two different protein function classification schemas. In all of these experiments, including highdimensional ones, MMD is very accurate in finding samples that were generated from the same distribution, and outperforms its best competitors. Conclusions: We have defined a novel statistical test of whether two samples are from the same distribution, compatible with both multivariate and structured data, that is fast, easy to implement, and works well, as confirmed by our experiments.
Prediction in dynamic models with time dependent conditional heteroskedasticity, Working paper no
, 1990
"... This paper considers forecasting the conditional mean and variance from a singleequation dynamic model with autocorrelated disturbances following an ARMA process, and innovations with timedependent conditional heteroskedasticity as represented by a linear GARCH process. Expressions for the minimum ..."
Abstract

Cited by 85 (13 self)
 Add to MetaCart
This paper considers forecasting the conditional mean and variance from a singleequation dynamic model with autocorrelated disturbances following an ARMA process, and innovations with timedependent conditional heteroskedasticity as represented by a linear GARCH process. Expressions for the minimum MSE predictor and the conditional MSE are presented. We also derive the formula for all the theoretical moments of the prediction error distribution from a general dynamic model with GARCHtl, 1) innovations. These results are then used in the construction of ex ante prediction confidence intervals by means of the CornishFisher asymptotic expansion. An empirical example relating to the uncertainty of the expected depreciation of foreign exchange rates illustrates the usefulness of the results. 1.
General Notions of Statistical Depth Function
, 2000
"... Statistical depth functions are being formulated ad hoc with increasing popularity in nonparametric inference for multivariate data. Here we introduce several general structures for depth functions, classify many existing examples as special cases, and establish results on the possession, or lack th ..."
Abstract

Cited by 80 (28 self)
 Add to MetaCart
Statistical depth functions are being formulated ad hoc with increasing popularity in nonparametric inference for multivariate data. Here we introduce several general structures for depth functions, classify many existing examples as special cases, and establish results on the possession, or lack thereof, of four key properties desirable for depth functions in general. Roughly speaking, these properties may be described as: affine invariance, maximality at center, monotonicity relative to deepest point, and vanishing at infinity. This provides a more systematic basis for selection of a depth function. In particular, from these and other considerations it is found that the halfspace depth behaves very well overall in comparison with various competitors.
Optimal Pricing Mechanisms with Unknown Demand
 American Economic Review
, 2003
"... The standard profitmaximizing multiunit auction intersects the submitted demand curve with a preset reservation supply curve, which is determined using the distribution from which the buyers ’ valuations are drawn. However, when this distribution is unknown, a preset supply curve cannot maximize ..."
Abstract

Cited by 73 (3 self)
 Add to MetaCart
The standard profitmaximizing multiunit auction intersects the submitted demand curve with a preset reservation supply curve, which is determined using the distribution from which the buyers ’ valuations are drawn. However, when this distribution is unknown, a preset supply curve cannot maximize monopoly profits. The optimal pricing mechanism in this situation sets a price to each buyer on the basis of the demand distribution inferred statistically from other buyers ’ bids. The resulting profit converges to the optimal monopoly profit with known demand as the numberofbuyersgoestoinfinity, and convergence can be substantially faster than with sequential price experimentation. * Department of Economics, Stanford University, Stanford
A kernel method for the two sample problem
 ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 19
, 2007
"... We propose a framework for analyzing and comparing distributions, allowing us to design statistical tests to determine if two samples are drawn from different distributions. Our test statistic is the largest difference in expectations over functions in the unit ball of a reproducing kernel Hilbert ..."
Abstract

Cited by 72 (19 self)
 Add to MetaCart
We propose a framework for analyzing and comparing distributions, allowing us to design statistical tests to determine if two samples are drawn from different distributions. Our test statistic is the largest difference in expectations over functions in the unit ball of a reproducing kernel Hilbert space (RKHS). We present two tests based on large deviation bounds for the test statistic, while a third is based on the asymptotic distribution of this statistic. The test statistic can be computed in quadratic time, although efficient linear time approximations are available. Several classical metrics on distributions are recovered when the function space used to compute the difference in expectations is allowed to be more general (eg. a Banach space). We apply our twosample tests to a variety of problems, including attribute matching for databases using the Hungarian marriage method, where they perform strongly. Excellent performance is also obtained when comparing distributions over graphs, for which these are the first such tests.
The Limiting Distribution of the Maximum Rank Correlation Estimator
 Econometrica
, 1993
"... Han’s maximum rank correlation (MRC) estimator is shown to be√ nconsistent and asymptotically normal. The proof rests on a general method for determining the asymptotic distribution of a maximization estimator, a simple Ustatistic decomposition, and a uniform bound for degenerate Uprocesses. A co ..."
Abstract

Cited by 71 (0 self)
 Add to MetaCart
Han’s maximum rank correlation (MRC) estimator is shown to be√ nconsistent and asymptotically normal. The proof rests on a general method for determining the asymptotic distribution of a maximization estimator, a simple Ustatistic decomposition, and a uniform bound for degenerate Uprocesses. A consistent estimator of the asymptotic covariance matrix is provided, along with a result giving the explicit form of this matrix for any model within the scope of the MRC estimator. The latter result is applied to the binary choice model, and it is found that the MRC estimator does not achieve the semiparametric efficiency bound.