Results 1  10
of
30
Tree induction vs. logistic regression: A learningcurve analysis
 CEDER WORKING PAPER #IS0102, STERN SCHOOL OF BUSINESS
, 2001
"... Tree induction and logistic regression are two standard, offtheshelf methods for building models for classi cation. We present a largescale experimental comparison of logistic regression and tree induction, assessing classification accuracy and the quality of rankings based on classmembership pr ..."
Abstract

Cited by 71 (16 self)
 Add to MetaCart
Tree induction and logistic regression are two standard, offtheshelf methods for building models for classi cation. We present a largescale experimental comparison of logistic regression and tree induction, assessing classification accuracy and the quality of rankings based on classmembership probabilities. We use a learningcurve analysis to examine the relationship of these measures to the size of the training set. The results of the study show several remarkable things. (1) Contrary to prior observations, logistic regression does not generally outperform tree induction. (2) More specifically, and not surprisingly, logistic regression is better for smaller training sets and tree induction for larger data sets. Importantly, this often holds for training sets drawn from the same domain (i.e., the learning curves cross), so conclusions about inductionalgorithm superiority on a given domain must be based on an analysis of the learning curves. (3) Contrary to conventional wisdom, tree induction is effective atproducing probabilitybased rankings, although apparently comparatively less so foragiven training{set size than at making classifications. Finally, (4) the domains on which tree induction and logistic regression are ultimately preferable canbecharacterized surprisingly well by a simple measure of signaltonoise ratio.
Bayesian model averaging
 STAT.SCI
, 1999
"... Standard statistical practice ignores model uncertainty. Data analysts typically select a model from some class of models and then proceed as if the selected model had generated the data. This approach ignores the uncertainty in model selection, leading to overcon dent inferences and decisions tha ..."
Abstract

Cited by 49 (1 self)
 Add to MetaCart
Standard statistical practice ignores model uncertainty. Data analysts typically select a model from some class of models and then proceed as if the selected model had generated the data. This approach ignores the uncertainty in model selection, leading to overcon dent inferences and decisions that are more risky than one thinks they are. Bayesian model averaging (BMA) provides a coherent mechanism for accounting for this model uncertainty. Several methods for implementing BMA haverecently emerged. We discuss these methods and present anumber of examples. In these examples, BMA provides improved outofsample predictive performance. We also provide a catalogue of
Bayesian Model Averaging in proportional hazard models: Assessing the risk of a stroke
 Applied Statistics
, 1997
"... Evaluating the risk of stroke is important in reducing the incidence of this devastating disease. Here, we apply Bayesian model averaging to variable selection in Cox proportional hazard models in the context of the Cardiovascular Health Study, a comprehensive investigation into the risk factors for ..."
Abstract

Cited by 35 (5 self)
 Add to MetaCart
Evaluating the risk of stroke is important in reducing the incidence of this devastating disease. Here, we apply Bayesian model averaging to variable selection in Cox proportional hazard models in the context of the Cardiovascular Health Study, a comprehensive investigation into the risk factors for stroke. We introduce a technique based on the leaps and bounds algorithm which e ciently locates and ts the best models in the very large model space and thereby extends all subsets regression to Cox models. For each independent variable considered, the method provides the posterior probability that it belongs in the model. This is more directly interpretable than the corresponding Pvalues, and also more valid in that it takes account of model uncertainty. Pvalues from models preferred by stepwise methods tend to overstate the evidence for the predictive value of a variable. In our data Bayesian model averaging predictively outperforms standard model selection methods for assessing
Bayesian Variable Selection for Proportional Hazards Models
, 1996
"... The authors consider the problem of Bayesian variable selection for proportional hazards regression models with right censored data. They propose a semiparametric approach in which a nonparametric prior is specified for the baseline hazard rate and a fully parametric prior is specified for the regr ..."
Abstract

Cited by 17 (1 self)
 Add to MetaCart
The authors consider the problem of Bayesian variable selection for proportional hazards regression models with right censored data. They propose a semiparametric approach in which a nonparametric prior is specified for the baseline hazard rate and a fully parametric prior is specified for the regression coe#cients. For the baseline hazard, they use a discrete gamma process prior, and for the regression coe#cients and the model space, they propose a semiautomatic parametric informative prior specification that focuses on the observables rather than the parameters. To implement the methodology, they propose a Markov chain Monte Carlo method to compute the posterior model probabilities. Examples using simulated and real data are given to demonstrate the methodology. R ESUM E Les auteurs abordent d'un point de vue bayesien le problemedelaselection de variables dans les modeles de regression des risques proportionnels en presence de censure a droite. Ils proposent une approche semip...
Likelihoodbased Data Squashing: A Modeling Approach to Instance Construction.
, 2002
"... Squashing is a lossy data compression technique that preserves statistical information. Specifically, squashing compresses a massive dataset to a much smaller one so that outputs from statistical analyses carried out on the smaller (squashed) dataset reproduce outputs from the same statistical analy ..."
Abstract

Cited by 17 (1 self)
 Add to MetaCart
Squashing is a lossy data compression technique that preserves statistical information. Specifically, squashing compresses a massive dataset to a much smaller one so that outputs from statistical analyses carried out on the smaller (squashed) dataset reproduce outputs from the same statistical analyses carried out on the original dataset. Likelihoodbased data squashing (LDS) differs from a previously published squashing algorithm insofar as it uses a statistical model to squash the data. The results show that LDS provides excellent squashing performance even when the target statistical analysis departs from the model used to squash the data.
Could a CAMELS Downgrade Model Improve OffSite Surveillance? Federal Reserve Bank of St
 Louis Economic Review
, 2002
"... The cornerstone of bank supervision is a regular schedule of thorough, onsite examinations. Under rules set forth in the Federal Deposit Insurance Corporation Improvement Act of 1991 (FDICIA), most U.S. banks must submit to a fullscope federal or state examination every 12 months; small, wellcapi ..."
Abstract

Cited by 8 (2 self)
 Add to MetaCart
The cornerstone of bank supervision is a regular schedule of thorough, onsite examinations. Under rules set forth in the Federal Deposit Insurance Corporation Improvement Act of 1991 (FDICIA), most U.S. banks must submit to a fullscope federal or state examination every 12 months; small, wellcapitalized banks must be examined every 18 months. These examinations focus on six components of bank safety and soundness: capital protection (C), asset quality (A), management competence (M), earnings strength (E), liquidity risk exposure (L), and market risk sensitivity (S). At the close of each exam, examiners award a grade of one (best) through five (worst) to each component. Supervisors then draw on these six component ratings to assign a composite CAMELS rating, which is also expressed on a scale of one through five. (See the insert for a detailed description of the composite ratings.) In general, banks with composite ratings of one or two are considered safe and sound, whereas banks with ratings of three, four, or five are considered unsatisfactory. As of March 31, 2000, nearly 94 percent of U.S. banks posted composite CAMELS ratings of one or two. Bank supervisors support onsite examinations with offsite surveillance. Offsite surveillance uses quarterly financial data and anecdotal evidence to schedule and plan onsite exams. Although onsite examination is the most effective tool for spotting safetyandsoundness problems, it is costly and R. Alton Gilbert is a vice president and banking advisor, Andrew P. Meyer is an economist, and Mark D. Vaughan is a supervisory policy officer and economist at the Federal Reserve Bank of St. Louis. The
Bayesian Analysis of Ordered Categorical Data from Industrial Experiments
 Technometrics
, 1995
"... Data from industrial experiments often involve an ordered categorical response, such as a qualitative rating. ANOVA based analyses may be inappropriate for such data, suggesting the use of Generalized Linear Models (GLMs). When the data are observed from a fractionated experiment, likelihoodbas ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
Data from industrial experiments often involve an ordered categorical response, such as a qualitative rating. ANOVA based analyses may be inappropriate for such data, suggesting the use of Generalized Linear Models (GLMs). When the data are observed from a fractionated experiment, likelihoodbased GLM estimates may be innite, especially when factors have large eects. These diculties are overcome with a Bayesian GLM, which is implemented via the Gibbs sampling algorithm. Techniques for modeling data and for subsequently using the identied model to optimize the process are outlined. An important advantage in the optimization stage is that uncertainty in the parameter estimates is accounted for in the model. For robust design experiments, the Bayesian approach easily incorporates the variability of the noise factors using the response modeling approach (Welch, Yu, Kang and Sacks 1990 and Shoemaker, Tsui and Wu 1991). This approach and its techniques are used to analyze two...
Syntactic probabilities affect pronunciation variation in spontaneous speech. Language and Cognition 1(2):147–165
, 2009
"... Speakers frequently have a choice among multiple ways of expressing one and the same thought. When choosing between syntactic constructions for expressing a given meaning, speakers are sensitive to probabilistic tendencies for syntactic, semantic or contextual properties of an utterance to favor on ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
(Show Context)
Speakers frequently have a choice among multiple ways of expressing one and the same thought. When choosing between syntactic constructions for expressing a given meaning, speakers are sensitive to probabilistic tendencies for syntactic, semantic or contextual properties of an utterance to favor one construction or another. Taken together, such tendencies may align to make one construction overwhelmingly more probable, marginally more probable, or no more probable than another. Here, we present evidence that acoustic features of spontaneous speech reflect these probabilities: when speakers choose a less probable construction, they are more likely to be disfluent, and their fluent words are likely to have a relatively longer duration. Conversely, words in more probable constructions are shorter and spoken more fluently. Our findings suggest that the di¤ering probabilities of a syntactic construction in context are not epiphenomenal, but reflect a part of a speakers ’ knowledge of their language.
Penalized Cox models and Frailty
, 1998
"... A very general mechanism for penalized regression has been added to the coxph ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
A very general mechanism for penalized regression has been added to the coxph
SAS/STAT ® 12.3 User’s Guide The LOGISTIC Procedure
"... For a Web download or ebook: Your use of this publication shall be governed by the terms established by the vendor at the time you acquire this publication. The scanning, uploading, and distribution of this book via the Internet or any other means without the permission of the publisher is illegal ..."
Abstract
 Add to MetaCart
For a Web download or ebook: Your use of this publication shall be governed by the terms established by the vendor at the time you acquire this publication. The scanning, uploading, and distribution of this book via the Internet or any other means without the permission of the publisher is illegal and punishable by law. Please purchase only authorized electronic editions and do not participate in or encourage electronic piracy of copyrighted materials. Your support of others ’ rights is appreciated. U.S. Government Restricted Rights Notice: Use, duplication, or disclosure of this software and related documentation by the U.S.