Results 1  10
of
187
Dependency networks for inference, collaborative filtering, and data visualization
 Journal of Machine Learning Research
"... We describe a graphical model for probabilistic relationshipsan alternative tothe Bayesian networkcalled a dependency network. The graph of a dependency network, unlike aBayesian network, is potentially cyclic. The probability component of a dependency network, like aBayesian network, is a set of ..."
Abstract

Cited by 156 (10 self)
 Add to MetaCart
We describe a graphical model for probabilistic relationshipsan alternative tothe Bayesian networkcalled a dependency network. The graph of a dependency network, unlike aBayesian network, is potentially cyclic. The probability component of a dependency network, like aBayesian network, is a set of conditional distributions, one for each nodegiven its parents. We identify several basic properties of this representation and describe a computationally e cient procedure for learning the graph and probability components from data. We describe the application of this representation to probabilistic inference, collaborative ltering (the task of predicting preferences), and the visualization of acausal predictive relationships.
Smoothing Spline ANOVA for Exponential Families, with Application to the Wisconsin Epidemiological Study of Diabetic Retinopathy
 ANN. STATIST
, 1995
"... Let y i ; i = 1; \Delta \Delta \Delta ; n be independent observations with the density of y i of the form h(y i ; f i ) = exp[y i f i \Gammab(f i )+c(y i )], where b and c are given functions and b is twice continuously differentiable and bounded away from 0. Let f i = f(t(i)), where t = (t 1 ; \De ..."
Abstract

Cited by 83 (44 self)
 Add to MetaCart
Let y i ; i = 1; \Delta \Delta \Delta ; n be independent observations with the density of y i of the form h(y i ; f i ) = exp[y i f i \Gammab(f i )+c(y i )], where b and c are given functions and b is twice continuously differentiable and bounded away from 0. Let f i = f(t(i)), where t = (t 1 ; \Delta \Delta \Delta ; t d ) 2 T (1)\Omega \Delta \Delta \Delta\Omega T (d) = T , the T (ff) are measureable spaces of rather general form, and f is an unknown function on T with some assumed `smoothness' properties. Given fy i ; t(i); i = 1; \Delta \Delta \Delta ; ng, it is desired to estimate f(t) for t in some region of interest contained in T . We develop the fitting of smoothing spline ANOVA models to this data of the form f(t) = C + P ff f ff (t ff ) + P ff!fi f fffi (t ff ; t fi ) + \Delta \Delta \Delta. The components of the decomposition satisfy side conditions which generalize the usual side conditions for parametric ANOVA. The estimate of f is obtained as the minimizer...
Generalized Partially Linear SingleIndex Models
 Journal of the American Statistical Association
, 1998
"... The typical generalized linear model for a regression of a response Y on predictors (X; Z) has conditional mean function based upon a linear combination of (X; Z). We generalize these models to have a nonparametric component, replacing the linear combination T 0 X + T 0 Z by 0 ( T 0 X) + T 0 Z, wher ..."
Abstract

Cited by 63 (24 self)
 Add to MetaCart
The typical generalized linear model for a regression of a response Y on predictors (X; Z) has conditional mean function based upon a linear combination of (X; Z). We generalize these models to have a nonparametric component, replacing the linear combination T 0 X + T 0 Z by 0 ( T 0 X) + T 0 Z, where 0 ( ) is an unknown function. We call these generalized partially linear singleindex models (GPLSIM). The models include the "singleindex" models, which have 0 = 0. Using local linear methods, estimates of the unknown parameters ( 0 ; 0 ) and the unknown function 0 ( ) are proposed, and their asymptotic distributions obtained. Examples illustrate the models and the proposed estimation methodology.
Local polynomial kernel regression for generalized linear models and quasilikelihood functions
 Journal of the American Statistical Association,90
, 1995
"... were introduced as a means of extending the techniques of ordinary parametric regression to several commonlyused regression models arising from nonnormal likelihoods. Typically these models have a variance that depends on the mean function. However, in many cases the likelihood is unknown, but the ..."
Abstract

Cited by 57 (7 self)
 Add to MetaCart
were introduced as a means of extending the techniques of ordinary parametric regression to several commonlyused regression models arising from nonnormal likelihoods. Typically these models have a variance that depends on the mean function. However, in many cases the likelihood is unknown, but the relationship between mean and variance can be specified. This has led to the consideration of quasilikelihood methods, where the conditionalloglikelihood is replaced by a quasilikelihood function. In this article we investigate the extension of the nonparametric regression technique of local polynomial fitting with a kernel weight to these more general contexts. In the ordinary regression case local polynomial fitting has been seen to possess several appealing features in terms of intuitive and mathematical simplicity. One noteworthy feature is the better performance near the boundaries compared to the traditional kernel regression estimators. These properties are shown to carryover to the generalized linear model and quasilikelihood model. The end result is a class of kernel type estimators for smoothing in quasilikelihood models. These estimators can be viewed as a straightforward generalization of the usual parametric estimators. In addition, their simple asymptotic distributions allow for simple interpretation
A generalized approximate cross validation for smoothing splines with nonGaussian data’, Statistica Sinica 6
, 1996
"... Abstract: In this paper, we propose a Generalized Approximate Cross Validation (GACV) function for estimating the smoothing parameter in the penalized log likelihood regression problem with nonGaussian data. This GACV is obtained by, first, obtaining an approximation to the leavingoutone function ..."
Abstract

Cited by 53 (23 self)
 Add to MetaCart
Abstract: In this paper, we propose a Generalized Approximate Cross Validation (GACV) function for estimating the smoothing parameter in the penalized log likelihood regression problem with nonGaussian data. This GACV is obtained by, first, obtaining an approximation to the leavingoutone function based on the negative log likelihood, and then, in a step reminiscent of that used to get from leavingoutone cross validation to GCV in the Gaussian case, we replace diagonal elements of certain matrices by 1/n times the trace. A numerical simulation with Bernoulli data is used to compare the smoothing parameter λ chosen by this approximation procedure with the λ chosen from the two most often used algorithms based on the generalized cross validation procedure (O’Sullivan et al. (1986), Gu (1990, 1992)). In the examples here, the GACV estimate produces a better fit of the truth in term of minimizing the KullbackLeibler distance. Figures suggest that the GACV curve may be an approximately unbiased estimate of the KullbackLeibler distance in the Bernoulli data case; however, a theoretical proof is yet to be found.
On Feature Selection: Learning with Exponentially many Irrelevant Features as Training Examples
 Proceedings of the Fifteenth International Conference on Machine Learning
, 1998
"... We consider feature selection in the "wrapper " model of feature selection. This typically involves an NPhard optimization problem that is approximated by heuristic search for a "good" feature subset. First considering the idealization where this optimization is performed exactly, we give a rigorou ..."
Abstract

Cited by 37 (4 self)
 Add to MetaCart
We consider feature selection in the "wrapper " model of feature selection. This typically involves an NPhard optimization problem that is approximated by heuristic search for a "good" feature subset. First considering the idealization where this optimization is performed exactly, we give a rigorous bound for generalization error under feature selection. The search heuristics typically used are then immediately seen as trying to achieve the error given in our bounds, and succeeding to the extent that they succeed in solving the optimization. The bound suggests that, in the presence of many "irrelevant" features, the main source of error in wrapper model feature selection is from "overfitting " holdout or crossvalidation data. This motivates a new algorithm that, again under the idealization of performing search exactly, has sample complexity (and error) that grows logarithmically in the number of "irrelevant" features  which means it can tolerate having a number of "irrelevant" f...
Modelling spatially correlated data via mixtures: a Bayesian approach
 Journal of the Royal Statistical Society, Series B
, 2002
"... This paper develops mixture models for spatially indexed data. We confine attention to the case of finite, typically irregular, patterns of points or regions with prescribed spatial relationships, and to problems where it is only the weights in the mixture that vary from one location to another. Our ..."
Abstract

Cited by 31 (2 self)
 Add to MetaCart
This paper develops mixture models for spatially indexed data. We confine attention to the case of finite, typically irregular, patterns of points or regions with prescribed spatial relationships, and to problems where it is only the weights in the mixture that vary from one location to another. Our specific focus is on Poisson distributed data, and applications in disease mapping. We work in a Bayesian framework, with the Poisson parameters drawn from gamma priors, and an unknown number of components. We propose two alternative models for spatiallydependent weights, based on transformations of autoregressive gaussian processes: in one (the Logistic normal model), the mixture component labels are exchangeable, in the other (the Grouped continuous model), they are ordered. Reversible jump Markov chain Monte Carlo algorithms for posterior inference are developed. Finally, the performance of both of these formulations is examined on synthetic data and real data on mortality from rare disease.
Do GetOutTheVote Calls Reduce Turnout? The Importance of Statistical Methods for Field Experiments
 American Political Science Review
, 2005
"... In their landmark study of a field experiment, Gerber and Green (2000) found that getoutthevote calls reduce turnout by five percentage points. In this article, I introduce statistical methods that can uncover discrepancies between experimental design and actual implementation. The application of ..."
Abstract

Cited by 28 (13 self)
 Add to MetaCart
In their landmark study of a field experiment, Gerber and Green (2000) found that getoutthevote calls reduce turnout by five percentage points. In this article, I introduce statistical methods that can uncover discrepancies between experimental design and actual implementation. The application of this methodology shows that Gerber and Green’s negative finding is caused by inadvertent deviations from their stated experimental protocol. The initial discovery led to revisions of the original data by the authors and retraction of the numerical results in their article. Analysis of their revised data, however, reveals new systematic patterns of implementation errors. Indeed, treatment assignments of the revised data appear to be even less randomized than before their corrections. To adjust for these problems, I employ a more appropriate statistical method and demonstrate that telephone canvassing increases turnout by five percentage points. This article demonstrates how statistical methods can find and correct complications of field experiments. Voter mobilization campaigns are a central part of democratic elections. In the 2000 general election, for example, the Democratic and Republican parties spent an estimated $100 million on
Large Sample Theory for Semiparametric Regression Models with TwoPhase, Outcome Dependent Sampling
, 2000
"... Outcomedependent, twophase sampling designs can dramatically reduce the costs of observational studies by judicious selection of the most informative subjects for purposes of detailed covariate measurement. Here we derive asymptotic information bounds and the form of the efficient score and inuenc ..."
Abstract

Cited by 20 (9 self)
 Add to MetaCart
Outcomedependent, twophase sampling designs can dramatically reduce the costs of observational studies by judicious selection of the most informative subjects for purposes of detailed covariate measurement. Here we derive asymptotic information bounds and the form of the efficient score and inuence functions for the semiparametric regression models studied by Lawless, Kalbfleisch, and Wild (1999) under twophase sampling designs. We relate the efficient score to the leastfavorable parametric submodel by use of formal calculations suggested by Newey (1994). We then proceed to show that the maximum likelihood estimators proposed by Lawless, Kalbfleisch, and Wild (1999) for both the parametric and nonparametric parts of the model are asymptotically normal and efficient, and that the efficient influence function for the parametric part agrees with the more general calculations of Robins, Hsieh, and Newey (1995).
Local Maximum Likelihood Estimation and Inference
 J. Royal Statist. Soc. B
, 1998
"... Local maximum likelihood estimation is a nonparametric counterpart of the widelyused parametric maximum likelihood technique. It extends the scope of the parametric maximum likelihood method to a much wider class of parametric spaces. Associated with this nonparametric estimation scheme is the issu ..."
Abstract

Cited by 16 (4 self)
 Add to MetaCart
Local maximum likelihood estimation is a nonparametric counterpart of the widelyused parametric maximum likelihood technique. It extends the scope of the parametric maximum likelihood method to a much wider class of parametric spaces. Associated with this nonparametric estimation scheme is the issue of bandwidth selection and bias and variance assessment. This article provides a unified approach to selecting a bandwidth and constructing con dence intervals in local maximum likelihood estimation. The approach is then applied to leastsquares nonparametric regression and to nonparametric logistic regression. Our experiences in these two settings show that the general idea outlined here is powerful and encouraging.