Results 1  10
of
126
On coupling constructions and rates in the CLT for dependent summands with applications to the antivoter model and weighted
, 1997
"... This paper deals with rates of convergence in the CLT for certain types of dependency. The main idea is to combine a modification of a theorem of Stein, requiring a coupling construction, with a dynamic setup provided by a Markov structure that suggests natural coupling variables. More specifically ..."
Abstract

Cited by 51 (2 self)
 Add to MetaCart
This paper deals with rates of convergence in the CLT for certain types of dependency. The main idea is to combine a modification of a theorem of Stein, requiring a coupling construction, with a dynamic setup provided by a Markov structure that suggests natural coupling variables. More specifically, given a stationary Markov chain X�t � , and a function U = U�X�t��, we propose a way to study the proximity of U to a normal random variable when the state space is large. We apply the general method to the study of two problems. In the first, we consider the antivoter chain X�t � =�X �t� i �i∈ � � t = 0 � 1���� � where � is the vertex set of an nvertex regular graph, and X �t� i =+1or−1. The chain evolves from time t to t + 1 by choosing a random vertex i, and a random neighbor of it j, and setting X �t+1� i =−X �t� j and X�t+1� k = X �t� k for all k = i. For a stationary antivoter chain, we study the normal approximation of Un = U �t� n = ∑ i X �t� i for large n and consider some conditions on sequences of graphs such that Un is asymptotically normal, a problem posed by Aldous and Fill. The same approach may also be applied in situations where a Markov chain does not appear in the original statement of a problem but is constructed as an auxiliary device. This is illustrated by considering weighted Ustatistics. In particular we are able to unify and generalize some results on normal convergence for degenerate weighted Ustatistics and provide rates. 1. Introduction and
On Kendall’s process
 Journal of Multivariate Analysis
, 1996
"... Let Z1,..., Zn be a random sample of size n2 from a dvariate continuous distribution function H, and let Vi, n stand for the proportion of observations Zj, j{i, such that ZjZi componentwise. The purpose of this paper is to examine the limiting behavior of the empirical distribution function Kn deri ..."
Abstract

Cited by 30 (5 self)
 Add to MetaCart
Let Z1,..., Zn be a random sample of size n2 from a dvariate continuous distribution function H, and let Vi, n stand for the proportion of observations Zj, j{i, such that ZjZi componentwise. The purpose of this paper is to examine the limiting behavior of the empirical distribution function Kn derived from the (dependent) pseudoobservations Vi, n. This random quantity is a natural nonparametric estimator of K, the distribution function of the random variable V=H(Z), whose expectation is an affine transformation of the population version of Kendall’s tau in the case d=2. Since the sample version of { is related in the same way to the mean of Kn, Genest and Rivest (1993, J. Amer. Statist. Assoc.) suggested that n[Kn(t)&K(t)] be referred to as Kendall’s process. Weak regularity conditions on K and H are found under which this centered process is asymptotically Gaussian, and an explicit expression for its limiting covariance function is given. These conditions, which are fairly easy to check, are seen to apply to large classes of multivariate distributions. 1996 Academic Press, Inc. 1.
On weighted Ustatistics for stationary processes
 Ann. Probab
, 2004
"... Abstract. A weighted Ustatistic based on a random sample X1,..., Xn has the form ..."
Abstract

Cited by 15 (3 self)
 Add to MetaCart
Abstract. A weighted Ustatistic based on a random sample X1,..., Xn has the form
Fourier Methods for Estimating The Central Subspace and The Central Mean Subspace in Regression
"... In high dimensional regression, it is important to estimate the central and central mean subspaces, to which the projections of the predictors preserve sufficient information about the response and the mean response, respectively. Using the Fourier transform, we have derived the candidate matrices w ..."
Abstract

Cited by 13 (1 self)
 Add to MetaCart
(Show Context)
In high dimensional regression, it is important to estimate the central and central mean subspaces, to which the projections of the predictors preserve sufficient information about the response and the mean response, respectively. Using the Fourier transform, we have derived the candidate matrices whose column spaces recover the central and central mean subspaces exhaustively. Under the normality assumption of the predictors, explicit estimates of the central and central mean subspaces are derived. Bootstrap procedures are used for determining dimensionality and choosing tuning parameters. Simulation results and an application to a real data are reported. Our methods demonstrate competitive performance compared to SIR, SAVE and other existing methods. The approach proposed in the paper provides a novel view on sufficient dimension reduction and may lead to more powerful tools in the future.
Monitoring wafer map data from integrated circuit fabrication processes for spatially clustered defects
 Technometrics
, 1997
"... ..."
A sampleandclean framework for fast and accurate query processing on dirty data
 in To Appear: ACM Special Interest Group on Management of Data (SIGMOD
, 2014
"... In emerging Big Data scenarios, obtaining timely, highquality answers to aggregate queries is difficult due to the challenges of processing and cleaning large, dirty data sets. To increase the speed of query processing, there has been a resurgence of interest in samplingbased approximate query pro ..."
Abstract

Cited by 12 (6 self)
 Add to MetaCart
(Show Context)
In emerging Big Data scenarios, obtaining timely, highquality answers to aggregate queries is difficult due to the challenges of processing and cleaning large, dirty data sets. To increase the speed of query processing, there has been a resurgence of interest in samplingbased approximate query processing (SAQP). In its usual formulation, however, SAQP does not address data cleaning at all, and in fact, exacerbates answer quality problems by introducing sampling error. In this paper, we explore an intriguing opportunity. That is, we explore the use of sampling to actually improve answer quality. We introduce the SampleandClean framework, which applies data cleaning to a relatively small subset of the data and uses the results of the cleaning process to lessen the impact of dirty data on aggregate query answers. We derive confidence intervals as a function of sample size and show how our approach addresses error bias. We evaluate the SampleandClean framework using data from three sources: the TPCH benchmark with synthetic noise, a subset of the Microsoft academic citation index and a sensor data set. Our results are consistent with the theoretical confidence intervals and suggest that the SampleandClean framework can produce significant improvements in accuracy compared to query processing without data cleaning and speed compared to data cleaning without sampling. 1.
Modelfree estimation of defect clustering in integrated circuit fabrication
 IEEE Transactions on Semiconductor Manufacturing
, 1997
"... ..."
Depth estimators and tests based on the likelihood principle with applications to regression.
 Journal of Multivariate Analysis
, 2005
"... Abstract We investigate depth notions for general models which are derived via the likelihood principle. We show that the socalled likelihood depth for regression in generalized linear models coincides with the regression depth of ..."
Abstract

Cited by 10 (5 self)
 Add to MetaCart
Abstract We investigate depth notions for general models which are derived via the likelihood principle. We show that the socalled likelihood depth for regression in generalized linear models coincides with the regression depth of