Results 11  20
of
62
Coordinate Descent Method for Largescale L2loss Linear SVM
"... Linear support vector machines (SVM) are useful for classifying largescale sparse data. Problems with sparse features are common in applications such as document classification and natural language processing. In this paper, we propose a novel coordinate descent algorithm for training linear SVM wit ..."
Abstract

Cited by 17 (10 self)
 Add to MetaCart
Linear support vector machines (SVM) are useful for classifying largescale sparse data. Problems with sparse features are common in applications such as document classification and natural language processing. In this paper, we propose a novel coordinate descent algorithm for training linear SVM with the L2loss function. At each step, the proposed method minimizes a onevariable subproblem while fixing other variables. The subproblem is solved by Newton steps with the line search technique. The procedure globally converges at the linear rate. Experiments show that our method is more efficient and stable than state of the art methods such as Pegasos and TRON. 1
Fighting Phishing at the User Interface
 In Lorrie Cranor and Simson Garfinkel (Eds.) Security and Usability: Designing Secure Systems that People Can Use
, 2005
"... The problem that this thesis concentrates on is phishing attacks. Phishing attacks use email messages and web sites designed to look as if they come from a known and legitimate organization, in order to deceive users into submitting their personal, financial, or computer account information online a ..."
Abstract

Cited by 17 (1 self)
 Add to MetaCart
The problem that this thesis concentrates on is phishing attacks. Phishing attacks use email messages and web sites designed to look as if they come from a known and legitimate organization, in order to deceive users into submitting their personal, financial, or computer account information online at those fake web sites. Phishing is a semantic attack. The fundamental problem of phishing is that when a user submits sensitive information online under an attack, his mental model about this submission is different from the system model that actually performs this submission. Specifically, the system sends the data to a different web site from the one where the user intends to submit the data. The fundamental solution to phishing is to bridge the semantic gap between the user’s mental model and the system model. The user interface is where human users interact with the computer system. It is where a user’s intention transforms into a system operation. It is where the semantic gap happens under phishing attacks. And therefore, it is where the phishing should be solved.
Algorithms for sparse linear classifiers in the massive data setting, 2006. Manuscript. Available fromwww.stat.rutgers.edu/˜madigan/papers
, 2005
"... Classifiers favoring sparse solutions, such as support vector machines, relevance vector machines, LASSOregression based classifiers, etc., provide competitive methods for classification problems in high dimensions. However, current algorithms for training sparse classifiers typically scale quite u ..."
Abstract

Cited by 16 (0 self)
 Add to MetaCart
Classifiers favoring sparse solutions, such as support vector machines, relevance vector machines, LASSOregression based classifiers, etc., provide competitive methods for classification problems in high dimensions. However, current algorithms for training sparse classifiers typically scale quite unfavorably with respect to the number of training examples. This paper proposes online and multipass algorithms for training sparse linear classifiers for high dimensional data. These algorithms have computational complexity and memory requirements that make learning on massive datasets feasible. The central idea that makes this possible is a straightforward quadratic approximation to the likelihood function.
Bayesian Multinomial Logistic Regression for Author Identification
 In Maxent Conference
, 2005
"... Motivated by highdimensional applications in authorship atttribution, we describe a Bayesian multinomial logistic regression model together with an associated learning algorithm. ..."
Abstract

Cited by 14 (0 self)
 Add to MetaCart
Motivated by highdimensional applications in authorship atttribution, we describe a Bayesian multinomial logistic regression model together with an associated learning algorithm.
A WEAKLY INFORMATIVE DEFAULT PRIOR DISTRIBUTION FOR LOGISTIC AND OTHER REGRESSION MODELS
"... We propose a new prior distribution for classical (nonhierarchical) logistic regression models, constructed by first scaling all nonbinary variables to have mean 0 and standard deviation 0.5, and then placing independent Studentt prior distributions on the coefficients. As a default choice, we reco ..."
Abstract

Cited by 14 (4 self)
 Add to MetaCart
We propose a new prior distribution for classical (nonhierarchical) logistic regression models, constructed by first scaling all nonbinary variables to have mean 0 and standard deviation 0.5, and then placing independent Studentt prior distributions on the coefficients. As a default choice, we recommend the Cauchy distribution with center 0 and scale 2.5, which in the simplest setting is a longertailed version of the distribution attained by assuming onehalf additional success and onehalf additional failure in a logistic regression. Crossvalidation on a corpus of datasets shows the Cauchy class of prior distributions to outperform existing implementations of Gaussian and Laplace priors. We recommend this prior distribution as a default choice for routine applied use. It has the advantage of always giving answers, even when there is complete separation in logistic regression (a common problem, even when the sample size is large and the number of predictors is small), and also automatically applying more shrinkage to higherorder interactions. This can
Author Identification on the Large Scale
 In Proc. of the Meeting of the Classification Society of North America
, 2005
"... this paper is on techniques for identifying authors in large collections of textual artifacts (emails, communiques, transcribed speech, etc.). Our approach focuses on very highdimensional, topicfree document representations and particular attribution problems, such as: (1) Which one of these K au ..."
Abstract

Cited by 13 (0 self)
 Add to MetaCart
this paper is on techniques for identifying authors in large collections of textual artifacts (emails, communiques, transcribed speech, etc.). Our approach focuses on very highdimensional, topicfree document representations and particular attribution problems, such as: (1) Which one of these K authors wrote this particular document? (2) Did any of these K authors write this particular document? Scientific investigation into measuring style and authorship of texts goes back to the late nineteenth century, with the pioneering studies of Mendenhall [36] and Mascol [34, 35] on distributions of sentence and word lengths in works of literature and the gospels of the New Testament. The underlying notion was that works by di#erent authors are strongly distinguished by quantifiable features of the text. By the midtwentieth century, this line of research had grown into what became known as "stylometrics", and a variety of textual statistics had been proposed to quantify textual style. The style of early work was characterized by a search for invariant properties of textual statistics, such as Zipf's distribution and Yule's K statistic
Boosting with Structural Sparsity
"... We derive generalizations of AdaBoost and related gradientbased coordinate descent methods that incorporate sparsitypromoting penalties for the norm of the predictor that is being learned. The end result is a family of coordinate descent algorithms that integrate forward feature induction and back ..."
Abstract

Cited by 13 (1 self)
 Add to MetaCart
We derive generalizations of AdaBoost and related gradientbased coordinate descent methods that incorporate sparsitypromoting penalties for the norm of the predictor that is being learned. The end result is a family of coordinate descent algorithms that integrate forward feature induction and backpruning through regularization and give an automatic stopping criterion for feature induction. We study penalties based on the ℓ1, ℓ2, and ℓ ∞ norms of the predictor and introduce mixednorm penalties that build upon the initial penalties. The mixednorm regularizers facilitate structural sparsity in parameter space, which is a useful property in multiclass prediction and other related tasks. We report empirical results that demonstrate the power of our approach in building accurate and structurally sparse models. 1. Introduction and
Loss functions for preference levels: Regression with discrete ordered labels
 Proceedings of the IJCAI Multidisciplinary Workshop on Advances in Preference Handling
, 2005
"... We consider different types of loss functions for discrete ordinal regression, i.e. fitting labels that may take one of several discrete, but ordered, values. These types of labels arise when preferences are specified by selecting, for each item, one of several rating “levels”, e.g. one through five ..."
Abstract

Cited by 12 (4 self)
 Add to MetaCart
We consider different types of loss functions for discrete ordinal regression, i.e. fitting labels that may take one of several discrete, but ordered, values. These types of labels arise when preferences are specified by selecting, for each item, one of several rating “levels”, e.g. one through five stars. We present two general thresholdbased constructions which can be used to generalize loss functions for binary labels, such as the logistic and hinge loss, and another generalization of the logistic loss based on a probabilistic model for discrete ordered labels. Experiments on the 1 Million MovieLens data set indicate that one of our construction is a significant improvement over previous classification and regressionbased approaches. 1
On compressionbased text classification
 In Proc. ECIR05, 300–314
, 2005
"... Abstract. Compressionbased text classification methods are easy to apply, requiring virtually no preprocessing of the data. Most such methods are characterbased, and thus have the potential to automatically capture nonword features of a document, such as punctuation, wordstems, and features spann ..."
Abstract

Cited by 11 (0 self)
 Add to MetaCart
Abstract. Compressionbased text classification methods are easy to apply, requiring virtually no preprocessing of the data. Most such methods are characterbased, and thus have the potential to automatically capture nonword features of a document, such as punctuation, wordstems, and features spanning more than one word. However, compressionbased classification methods have drawbacks (such as slow running time), and not all such methods are equally effective. We present the results of a number of experiments designed to evaluate the effectiveness and behavior of different compressionbased text classification methods on English text. Among our experiments are some specifically designed to test whether the ability to capture nonword (including superword) features causes characterbased text compression methods to achieve more accurate classification. 1
Logistic Regression for Data Mining and HighDimensional Classification
, 2004
"... The focus of this thesis is fast and robust adaptations of logistic regression (LR) for data mining and highdimensional classification problems. LR is wellunderstood and widely used in the statistics, machine learning, and data analysis communities. Its benefits include a firm statistical foundati ..."
Abstract

Cited by 10 (1 self)
 Add to MetaCart
The focus of this thesis is fast and robust adaptations of logistic regression (LR) for data mining and highdimensional classification problems. LR is wellunderstood and widely used in the statistics, machine learning, and data analysis communities. Its benefits include a firm statistical foundation and a probabilistic model useful for ``explaining'' the data. There is a perception that LR is slow, unstable, and unsuitable for large learning or classification tasks. Through fast approximate numerical methods, regularization to avoid numerical instability, and an efficient implementation we will show that LR can outperform modern algorithms like Support Vector Machines (SVM) on a variety of learning tasks. Our novel implementation, which uses a modified iteratively reweighted least squares estimation procedure, can compute model parameters for sparse binary datasets with hundreds of thousands of rows and attributes, and millions or tens of millions of nonzero elements in just a few seconds. Our implementation also handles realvalued dense datasets of similar size.