Results 1 
2 of
2
Logistic Regression for Data Mining and HighDimensional Classification
, 2004
"... The focus of this thesis is fast and robust adaptations of logistic regression (LR) for data mining and highdimensional classification problems. LR is wellunderstood and widely used in the statistics, machine learning, and data analysis communities. Its benefits include a firm statistical foundati ..."
Abstract

Cited by 10 (1 self)
 Add to MetaCart
The focus of this thesis is fast and robust adaptations of logistic regression (LR) for data mining and highdimensional classification problems. LR is wellunderstood and widely used in the statistics, machine learning, and data analysis communities. Its benefits include a firm statistical foundation and a probabilistic model useful for ``explaining'' the data. There is a perception that LR is slow, unstable, and unsuitable for large learning or classification tasks. Through fast approximate numerical methods, regularization to avoid numerical instability, and an efficient implementation we will show that LR can outperform modern algorithms like Support Vector Machines (SVM) on a variety of learning tasks. Our novel implementation, which uses a modified iteratively reweighted least squares estimation procedure, can compute model parameters for sparse binary datasets with hundreds of thousands of rows and attributes, and millions or tens of millions of nonzero elements in just a few seconds. Our implementation also handles realvalued dense datasets of similar size.
Summary of biosurveillancerelevant technologies
, 2003
"... This short report, compiled upon request from Dave Siegrist and Ted Senator, surveys the spectrum of technologies that can help with Biosurveillance. We indicate which we have chosen, so far, to use in our development of analysis methods and our reasons. 1 Timeweighted averaging This is directly ap ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
This short report, compiled upon request from Dave Siegrist and Ted Senator, surveys the spectrum of technologies that can help with Biosurveillance. We indicate which we have chosen, so far, to use in our development of analysis methods and our reasons. 1 Timeweighted averaging This is directly applicable to a scalar signal (such as “number of respiratory cases today”. This method, more commonly used in computational finance, simply compares the count during the current time period with the weighted average of the counts of recent days. Exponential weighting is typically used, where the halflife is known as the “time window ” parameter. This timewindow parameter is typically chosen by hand. We prefer the Serfling and Univariate HMM methods described below. 2 Serfling method This method (Serfling, 1963) is a cyclic regression model, and is the standard CDC algorithm for flu detection. It is, again, applicable to scalar signals. It assumes that the signal follows a sinusoid with a period of one year, and thus finds the four parameters ¢¤£¦¥¨ § and © in where the parameters are chosen to minimize the sum of squares of residuals. It is an easy matter of regression analysis to determine, on any date, whether