Results 1 - 10
of
51
Generalized Additive Models
, 1995
"... This article describes flexible statistical methods that may be used to identify and characterize nonlinear regression effects. These methods are called "generalized additive models". For example, a commonly used statistical model in medical research is the logistic regression model for binary data. ..."
Abstract
-
Cited by 968 (32 self)
- Add to MetaCart
This article describes flexible statistical methods that may be used to identify and characterize nonlinear regression effects. These methods are called "generalized additive models". For example, a commonly used statistical model in medical research is the logistic regression model for binary data. Here we relate the mean of the binary response ¯ = P (y = 1) to the predictors via a linear regression model and the logit link function: log
A Comparison of Prediction Accuracy, Complexity, and Training Time of Thirty-three Old and New Classification Algorithms
, 2000
"... . Twenty-two decision tree, nine statistical, and two neural network algorithms are compared on thirty-two datasets in terms of classication accuracy, training time, and (in the case of trees) number of leaves. Classication accuracy is measured by mean error rate and mean rank of error rate. Both cr ..."
Abstract
-
Cited by 134 (6 self)
- Add to MetaCart
. Twenty-two decision tree, nine statistical, and two neural network algorithms are compared on thirty-two datasets in terms of classication accuracy, training time, and (in the case of trees) number of leaves. Classication accuracy is measured by mean error rate and mean rank of error rate. Both criteria place a statistical, spline-based, algorithm called Polyclass at the top, although it is not statistically signicantly dierent from twenty other algorithms. Another statistical algorithm, logistic regression, is second with respect to the two accuracy criteria. The most accurate decision tree algorithm is Quest with linear splits, which ranks fourth and fth, respectively. Although spline-based statistical algorithms tend to have good accuracy, they also require relatively long training times. Polyclass, for example, is third last in terms of median training time. It often requires hours of training compared to seconds for other algorithms. The Quest and logistic regression algor...
Polynomial Splines and Their Tensor Products in Extended Linear Modeling
- Ann. Statist
, 1997
"... ANOVA type models are considered for a regression function or for the logarithm of a probability function, conditional probability function, density function, conditional density function, hazard function, conditional hazard function, or spectral density function. Polynomial splines are used to m ..."
Abstract
-
Cited by 121 (14 self)
- Add to MetaCart
ANOVA type models are considered for a regression function or for the logarithm of a probability function, conditional probability function, density function, conditional density function, hazard function, conditional hazard function, or spectral density function. Polynomial splines are used to model the main effects, and their tensor products are used to model any interaction components that are included. In the special context of survival analysis, the baseline hazard function is modeled and nonproportionality is allowed. In general, the theory involves the L 2 rate of convergence for the fitted model and its components. The methodology involves least squares and maximum likelihood estimation, stepwise addition of basis functions using Rao statistics, stepwise deletion using Wald statistics, and model selection using BIC, cross-validation or an independent test set. Publically available software, written in C and interfaced to S/S-PLUS, is used to apply this methodology to...
Classification trees with unbiased multiway splits
- Journal of the American Statistical Association
, 2001
"... Two univariate split methods and one linear combination split method are proposed for the construction of classification trees with multiway splits. Examples are given where the trees are more compact and hence easier to interpret than binary trees. A major strength of the univariate split methods i ..."
Abstract
-
Cited by 35 (6 self)
- Add to MetaCart
Two univariate split methods and one linear combination split method are proposed for the construction of classification trees with multiway splits. Examples are given where the trees are more compact and hence easier to interpret than binary trees. A major strength of the univariate split methods is that they have negligible bias in variable selection, both when the variables differ in the number of splits they offer and when they differ in number of missing values. This is an advantage because inferences from the tree structures can be adversely affected by selection bias. The new methods are shown to be highly competitive in terms of computational speed and classification accuracy of future observations. Key words and phrases: Decision tree, linear discriminant analysis, missing value, selection bias. 1
Statistical Themes and Lessons for Data Mining
, 1997
"... Data mining is on the interface of Computer Science and Statistics, utilizing advances in both disciplines to make progress in extracting information from large databases. It is an emerging field that has attracted much attention in a very short period of time. This article highlights some statist ..."
Abstract
-
Cited by 30 (3 self)
- Add to MetaCart
Data mining is on the interface of Computer Science and Statistics, utilizing advances in both disciplines to make progress in extracting information from large databases. It is an emerging field that has attracted much attention in a very short period of time. This article highlights some statistical themes and lessons that are directly relevant to data mining and attempts to identify opportunities where close cooperation between the statistical and computational communities might reasonably provide synergy for further progress in data analysis.
Projection Estimation In Multiple Regression With Application To Functional Anova Models
- Ann. Statist
, 1996
"... . A general theory on rates of convergence in multiple regression is developed, where the regression function is modeled as a member of an arbitrary linear function space (called a model space), which may be finite- or infinite-dimensional. A least squares estimate restricted to some approximating s ..."
Abstract
-
Cited by 23 (5 self)
- Add to MetaCart
. A general theory on rates of convergence in multiple regression is developed, where the regression function is modeled as a member of an arbitrary linear function space (called a model space), which may be finite- or infinite-dimensional. A least squares estimate restricted to some approximating space, which is in fact a projection, is employed. The error in estimation is decomposed into three parts: variance component, estimation bias, and approximation error. The contributions to the integrated squared error from the first two parts are bounded in probability by Nn=n, where Nn is the dimension of the approximating space, while the contribution from the third part is governed by the approximation power of the approximating space. When the regression function is not in the model space, the projection estimate converges to its best approximation. The theory is applied to a functional ANOVA model, where the multivariate regression function is modeled as a specified sum of a constant te...
Bayesian Treed Models
- Machine Learning
, 2000
"... When simple parametric models such as linear regression fail to adequately approximate a function across an entire set of data, an alternative may be to consider a partition of the data, and then use a separate simple model within each subset of the partition. Such an alternative is provided by ..."
Abstract
-
Cited by 17 (1 self)
- Add to MetaCart
When simple parametric models such as linear regression fail to adequately approximate a function across an entire set of data, an alternative may be to consider a partition of the data, and then use a separate simple model within each subset of the partition. Such an alternative is provided by a treed model which uses a binary tree to identify such a partition. However, treed models go further than conventional trees (eg CART, C4.5) by tting models rather than simple means or proportions across the partition. In this paper, we propose a Bayesian approach for nding and tting parametric treed models, in particular focusing on Bayesian treed regression. The potential of this approach is illustrated by a cross-validation comparison of predictive performance with neural nets, MARS, and conventional trees on simulated and real data sets. Keywords: binary trees, Markov chain Monte Carlo, model selection, stochastic search. 1 Hugh Chipman is Associate Professor of Statistics...
Local Likelihood and Local Partial Likelihood in Hazard Regression
- Ann. Statist
, 1996
"... In survival analysis, the relationship between a survival time and a covariate is conveniently modeled with the proportional hazards regression model. This model usually assumes that the covariate has a log-linear eect on the hazard function. In this paper we consider the proportional hazards regres ..."
Abstract
-
Cited by 13 (2 self)
- Add to MetaCart
In survival analysis, the relationship between a survival time and a covariate is conveniently modeled with the proportional hazards regression model. This model usually assumes that the covariate has a log-linear eect on the hazard function. In this paper we consider the proportional hazards regression model with a nonparametric risk eect. We discuss estimation of the risk function and its derivatives in two cases: when the baseline hazard function is parametrized and when it is not parametrized. In the case of a parametric baseline hazard function, inference is based on a local version of the likelihood function, while in the case of a nonparametric baseline hazard, we use a local version of the partial likelihood. This results in maximum local likelihood estimators and maximum local partial likelihood estimators respectively. We establish the asymptotic normality of the estimators. It turns out that both methods have the same asymptotic bias and variance in a common situation, eve...
Bootstrap Confidence Intervals for Smoothing Splines and their Comparison to Bayesian `Confidence Intervals'
- J. Statist. Comput. Simulation
, 1994
"... We construct bootstrap confidence intervals for smoothing spline and smoothing spline ANOVA estimates based on Gaussian data, and penalized likelihood smoothing spline estimates based on data from exponential families. Several variations of bootstrap confidence intervals are considered and compared. ..."
Abstract
-
Cited by 11 (2 self)
- Add to MetaCart
We construct bootstrap confidence intervals for smoothing spline and smoothing spline ANOVA estimates based on Gaussian data, and penalized likelihood smoothing spline estimates based on data from exponential families. Several variations of bootstrap confidence intervals are considered and compared. We find that the commonly used bootstrap percentile intervals are inferior to the T intervals and to intervals based on bootstrap estimation of mean squared errors. The best variations of the bootstrap confidence intervals behave similar to the well known Bayesian confidence intervals. These bootstrap confidence intervals have an average coverage probability across the function being estimated, as opposed to a pointwise property. Keywords: BAYESIAN CONFIDENCE INTERVALS, BOOTSTRAP CONFIDENCE INTERVALS, PENALIZED LOG LIKELIHOOD ESTIMATES, SMOOTHING SPLINES, SMOOTHING SPLINE ANOVA'S. 1 Introduction Smoothing splines and smoothing spline ANOVAs (SS ANOVAs) have been used successfully in a bro...

