Large Sample Theory for Semiparametric Regression Models with TwoPhase, Outcome Dependent Sampling
, 2000
"... Outcomedependent, twophase sampling designs can dramatically reduce the costs of observational studies by judicious selection of the most informative subjects for purposes of detailed covariate measurement. Here we derive asymptotic information bounds and the form of the efficient score and inuenc ..."
Outcomedependent, twophase sampling designs can dramatically reduce the costs of observational studies by judicious selection of the most informative subjects for purposes of detailed covariate measurement. Here we derive asymptotic information bounds and the form of the efficient score and inuence functions for the semiparametric regression models studied by Lawless, Kalbfleisch, and Wild (1999) under twophase sampling designs. We relate the efficient score to the leastfavorable parametric submodel by use of formal calculations suggested by Newey (1994). We then proceed to show that the maximum likelihood estimators proposed by Lawless, Kalbfleisch, and Wild (1999) for both the parametric and nonparametric parts of the model are asymptotically normal and efficient, and that the efficient influence function for the parametric part agrees with the more general calculations of Robins, Hsieh, and Newey (1995).
Local Maximum Likelihood Estimation and Inference
 J. Royal Statist. Soc. B
, 1998
"... Local maximum likelihood estimation is a nonparametric counterpart of the widelyused parametric maximum likelihood technique. It extends the scope of the parametric maximum likelihood method to a much wider class of parametric spaces. Associated with this nonparametric estimation scheme is the issu ..."
Local maximum likelihood estimation is a nonparametric counterpart of the widelyused parametric maximum likelihood technique. It extends the scope of the parametric maximum likelihood method to a much wider class of parametric spaces. Associated with this nonparametric estimation scheme is the issue of bandwidth selection and bias and variance assessment. This article provides a unified approach to selecting a bandwidth and constructing con dence intervals in local maximum likelihood estimation. The approach is then applied to leastsquares nonparametric regression and to nonparametric logistic regression. Our experiences in these two settings show that the general idea outlined here is powerful and encouraging.
Nonlinear methods for multivariate statistical calibration and their use in palaeoecology: a comparison of inverse (knearest neighbours, partial least squares and weighted averaging partial least squares) and classical approaches
 Chemometrics and Intelligent Laboratory Systems
, 1995
"... and their use in palaeoecology: a comparison of inverse (knearest neighbours, partial least squares and weighted averaging partial least squares) and classical approaches. ..."
and their use in palaeoecology: a comparison of inverse (knearest neighbours, partial least squares and weighted averaging partial least squares) and classical approaches.
Randomization Inference with Natural Experiments: An Analysis of Ballot Effects
 in the 2003 California Recall Election.” Journal of the American Statistical Association 101:888–900
, 2006
"... Since the 2000 U.S. Presidential election, social scientists have rediscovered a long tradition of research that investigates the effects of ballot format on voting. Using a new dataset collected by the New York Times, we investigate the causal effect of being listed on the first ballot page in the ..."
Since the 2000 U.S. Presidential election, social scientists have rediscovered a long tradition of research that investigates the effects of ballot format on voting. Using a new dataset collected by the New York Times, we investigate the causal effect of being listed on the first ballot page in the 2003 California gubernatorial recall election. California law mandates a unique randomization procedure of ballot order that, when appropriately modeled, can be used to approximate a classical randomized experiment in a real world setting. We apply (nonparametric) randomization inference based on Fisher’s exact test, which directly incorporates the actual randomization procedure and yields accurate confidence intervals. Our results suggest that over forty percent of the minor candidates gained more votes when listed on the first page of the ballot, while there is no significant effect for top two candidates. We also investigate how randomization inference differs from conventional estimators that do not fully incorporate California’s complex treatment assignment mechanism. The results indicate appreciable differences between the two approaches.
NonGaussian conditional linear AR(1) models
 Australian and New Zealand Journal of Statistics
, 2000
"... Abstract: We give a general formulation of a nonGaussian conditional linear AR(1) model subsuming most of the nonGaussian AR(1) models that have appeared in the literature. We derive some general results giving properties for the stationary process mean, variance and correlation structure, and con ..."
Abstract: We give a general formulation of a nonGaussian conditional linear AR(1) model subsuming most of the nonGaussian AR(1) models that have appeared in the literature. We derive some general results giving properties for the stationary process mean, variance and correlation structure, and conditions for stationarity. These results highlight similarities and differences with the Gaussian AR(1) model, and unify many separate results appearing in the literature. Examples illustrate the wide range of properties that can appear under the conditional linear autoregressive assumption. These results are used in analysing three real data sets, illustrating general methods of estimation, model diagnostics and model selection. In particular, we show that the theoretical results can be used to develop diagnostics for deciding if a time series can be modelled by some linear autoregressive model, and for selecting among several candidate models.
Trialtotrial variability and its effect on timevarying dependence between two neurons
 J. Neurophysiology
, 2005
"... The joint peristimulus time histogram (JPSTH) and crosscorrelogram provide a visual representation of correlated activity for a pair of neurons, and the way this activity may increase or decrease over time. In a companion paper (Cai et al. 2004a) we showed how a Bootstrap evaluation of the peaks in ..."
The joint peristimulus time histogram (JPSTH) and crosscorrelogram provide a visual representation of correlated activity for a pair of neurons, and the way this activity may increase or decrease over time. In a companion paper (Cai et al. 2004a) we showed how a Bootstrap evaluation of the peaks in the smoothed diagonals of the JPSTH may be used to establish the likely validity of apparent timevarying correlation. As noted by Brody (1999a,b) and BenShaul et al. (2001), trialtotrial variation can confound correlation and synchrony effects. In this paper we elaborate on that observation, and present a method of estimating the timedependent trialtotrial variation in spike trains that may exceed the natural variation displayed by Poisson and nonPoisson point processes. The statistical problem is somewhat subtle because relatively few spikes per trial are available for estimating a firingrate function that fluctuates over time. The method developed here uses principal components of the trialtotrial variability in firing rate functions to obtain a small number of parameters (typically two or three) that characterize the deviation of each trial’s firing rate function from the acrosstrial average firing rate, represented by the
Improved semiparametric time series models of air pollution and mortality
 J. Am. Statist. Ass
, 2004
"... In 2002, methodological issues around time series analyses of air pollution and health attracted the attention of the scientific community, policy makers, the press, and the diverse stakeholders concerned with air pollution. As the Environmental Protection Agency (EPA) was finalizing its most recent ..."
In 2002, methodological issues around time series analyses of air pollution and health attracted the attention of the scientific community, policy makers, the press, and the diverse stakeholders concerned with air pollution. As the Environmental Protection Agency (EPA) was finalizing its most recent review of epidemiological evidence on particulate matter air pollution (PM), statisticians and epidemiologists found that the SPlus implementation of Generalized Additive Models (GAM) can overestimate effects of air pollution and understate statistical uncertainty in time series studies of air pollution and health. This discovery delayed the completion of the PM Criteria Document prepared as part of the review of the U.S. National Ambient Air Quality Standard (NAAQS), as the timeseries findings were a critical component of the evidence. In addition, it raised concerns about the adequacy of current model formulations and their software implementations. In this paper we provide improvements in semiparametric regression directly relevant to risk estimation in time series studies of air pollution. First, we introduce a closed form estimate of the asymptotically exact covariance matrix of the linear component of a GAM. To ease the implementation of these calculations, we develop the S package gam.exact, an extended version of gam.
Explaining Rare Events in International Relations
, 2000
"... Some of the most important phenomena in international conflict are coded as "rare events data," binary dependent variables with dozens to thousands of times fewer events, such as wars, coups, etc., than "nonevents". Unfortunately, rare events data are difficult to explain and pre ..."
Some of the most important phenomena in international conflict are coded as "rare events data," binary dependent variables with dozens to thousands of times fewer events, such as wars, coups, etc., than "nonevents". Unfortunately, rare events data are difficult to explain and predict, a problem that seems to have at least two sources. First, and most importantly, the data collection strategies used in international conflict are grossly inefficient. The fear of collecting data with too few events has led to data collections with huge numbers of observations but relatively few, and poorly measured, explanatory variables. As it turns out, more efficient sampling designs exist for making valid inferences, such as sampling all available events (e.g., wars) and a tiny fraction of nonevents (peace). This enables scholars to save as much as 99% of their (nonfixed) data collection costs, or to collect much more meaningful explanatory variables. Second, logistic regression, and other commonly ...
Quadruped robot obstacle negotiation via reinforcement learning
 In Proceedings of the IEEE International Conference on Robotics and Automation
, 2006
Identifying Quantitative Trait Loci in Experimental Crosses
, 1997
"... Identifying quantitative trait loci in experimental crosses by Karl William Broman Doctor of Philosophy in Statistics University of California, Berkeley Professor Terence P. Speed, Chair Identifying the genetic loci responsible for variation in traits which are quantitative in nature (such as the yi ..."
Identifying quantitative trait loci in experimental crosses by Karl William Broman Doctor of Philosophy in Statistics University of California, Berkeley Professor Terence P. Speed, Chair Identifying the genetic loci responsible for variation in traits which are quantitative in nature (such as the yield from an agricultural crop or the number of abdominal bristles on a fruit fly) is a problem of great importance to biologists. The number and effects of such loci help us to understand the biochemical basis of these traits, and of their evolution in populations over time. Moreover, knowledge of these loci may aid in designing selection experiments to improve the traits. We focus on data from a large experimental cross. The usual methods for analyzing such data use multiple tests of hypotheses. We feel the problem is best viewed as one of model selection. After a brief review of the major methods in this area, we discuss the use of model selection to identify quantitative trait loci. Forwa...