Results 1 - 10
of
126
On coupling constructions and rates in the CLT for dependent summands with applications to the antivoter model and weighted
, 1997
"... This paper deals with rates of convergence in the CLT for certain types of dependency. The main idea is to combine a modification of a theorem of Stein, requiring a coupling construction, with a dynamic set-up provided by a Markov structure that suggests natural coupling variables. More specifically ..."
Abstract
-
Cited by 51 (2 self)
- Add to MetaCart
This paper deals with rates of convergence in the CLT for certain types of dependency. The main idea is to combine a modification of a theorem of Stein, requiring a coupling construction, with a dynamic set-up provided by a Markov structure that suggests natural coupling variables. More specifically, given a stationary Markov chain X�t � , and a function U = U�X�t��, we propose a way to study the proximity of U to a normal random variable when the state space is large. We apply the general method to the study of two problems. In the first, we consider the antivoter chain X�t � =�X �t� i �i∈ � � t = 0 � 1���� � where � is the vertex set of an n-vertex regular graph, and X �t� i =+1or−1. The chain evolves from time t to t + 1 by choosing a random vertex i, and a random neighbor of it j, and setting X �t+1� i =−X �t� j and X�t+1� k = X �t� k for all k = i. For a stationary antivoter chain, we study the normal approximation of Un = U �t� n = ∑ i X �t� i for large n and consider some conditions on sequences of graphs such that Un is asymptotically normal, a problem posed by Aldous and Fill. The same approach may also be applied in situations where a Markov chain does not appear in the original statement of a problem but is constructed as an auxiliary device. This is illustrated by considering weighted U-statistics. In particular we are able to unify and generalize some results on normal convergence for degenerate weighted U-statistics and provide rates. 1. Introduction and
On Kendall’s process
- Journal of Multivariate Analysis
, 1996
"... Let Z1,..., Zn be a random sample of size n2 from a d-variate continuous distribution function H, and let Vi, n stand for the proportion of observations Zj, j{i, such that ZjZi componentwise. The purpose of this paper is to examine the limiting behavior of the empirical distribution function Kn deri ..."
Abstract
-
Cited by 30 (5 self)
- Add to MetaCart
Let Z1,..., Zn be a random sample of size n2 from a d-variate continuous distribution function H, and let Vi, n stand for the proportion of observations Zj, j{i, such that ZjZi componentwise. The purpose of this paper is to examine the limiting behavior of the empirical distribution function Kn derived from the (dependent) pseudo-observations Vi, n. This random quantity is a natural non-parametric estimator of K, the distribution function of the random variable V=H(Z), whose expectation is an affine transformation of the population version of Kendall’s tau in the case d=2. Since the sample version of { is related in the same way to the mean of Kn, Genest and Rivest (1993, J. Amer. Statist. Assoc.) suggested that- n[Kn(t)&K(t)] be referred to as Kendall’s process. Weak regularity conditions on K and H are found under which this centered process is asymptotically Gaussian, and an explicit expression for its limiting covariance func-tion is given. These conditions, which are fairly easy to check, are seen to apply to large classes of multivariate distributions. 1996 Academic Press, Inc. 1.
On weighted U-statistics for stationary processes
- Ann. Probab
, 2004
"... Abstract. A weighted U-statistic based on a random sample X1,..., Xn has the form ..."
Abstract
-
Cited by 15 (3 self)
- Add to MetaCart
Abstract. A weighted U-statistic based on a random sample X1,..., Xn has the form
Fourier Methods for Estimating The Central Subspace and The Central Mean Subspace in Regression
"... In high dimensional regression, it is important to estimate the central and central mean subspaces, to which the projections of the predictors preserve sufficient information about the response and the mean response, respectively. Using the Fourier transform, we have derived the candidate matrices w ..."
Abstract
-
Cited by 13 (1 self)
- Add to MetaCart
(Show Context)
In high dimensional regression, it is important to estimate the central and central mean subspaces, to which the projections of the predictors preserve sufficient information about the response and the mean response, respectively. Using the Fourier transform, we have derived the candidate matrices whose column spaces recover the central and central mean subspaces exhaustively. Under the normality assumption of the predictors, explicit estimates of the central and central mean subspaces are derived. Bootstrap procedures are used for determining dimensionality and choosing tuning parameters. Simulation results and an application to a real data are reported. Our methods demonstrate competitive performance compared to SIR, SAVE and other existing methods. The approach proposed in the paper provides a novel view on sufficient dimension reduction and may lead to more powerful tools in the future.
Monitoring wafer map data from integrated circuit fabrication processes for spatially clustered defects
- Technometrics
, 1997
"... ..."
A sample-and-clean framework for fast and accurate query processing on dirty data
- in To Appear: ACM Special Interest Group on Management of Data (SIGMOD
, 2014
"... In emerging Big Data scenarios, obtaining timely, high-quality answers to aggregate queries is difficult due to the challenges of processing and cleaning large, dirty data sets. To increase the speed of query processing, there has been a resurgence of interest in sampling-based approximate query pro ..."
Abstract
-
Cited by 12 (6 self)
- Add to MetaCart
(Show Context)
In emerging Big Data scenarios, obtaining timely, high-quality answers to aggregate queries is difficult due to the challenges of processing and cleaning large, dirty data sets. To increase the speed of query processing, there has been a resurgence of interest in sampling-based approximate query processing (SAQP). In its usual formulation, however, SAQP does not address data cleaning at all, and in fact, ex-acerbates answer quality problems by introducing sampling error. In this paper, we explore an intriguing opportunity. That is, we explore the use of sampling to actually improve answer quality. We introduce the Sample-and-Clean frame-work, which applies data cleaning to a relatively small subset of the data and uses the results of the cleaning process to lessen the impact of dirty data on aggregate query answers. We derive confidence intervals as a function of sample size and show how our approach addresses error bias. We evalu-ate the Sample-and-Clean framework using data from three sources: the TPC-H benchmark with synthetic noise, a sub-set of the Microsoft academic citation index and a sensor data set. Our results are consistent with the theoretical confidence intervals and suggest that the Sample-and-Clean framework can produce significant improvements in accu-racy compared to query processing without data cleaning and speed compared to data cleaning without sampling. 1.
Model-free estimation of defect clustering in integrated circuit fabrication
- IEEE Transactions on Semiconductor Manufacturing
, 1997
"... ..."
Depth estimators and tests based on the likelihood principle with applications to regression.
- Journal of Multivariate Analysis
, 2005
"... Abstract We investigate depth notions for general models which are derived via the likelihood principle. We show that the so-called likelihood depth for regression in generalized linear models coincides with the regression depth of ..."
Abstract
-
Cited by 10 (5 self)
- Add to MetaCart
Abstract We investigate depth notions for general models which are derived via the likelihood principle. We show that the so-called likelihood depth for regression in generalized linear models coincides with the regression depth of