Results 11  20
of
98
Making Logistic Regression A Core Data Mining Tool: A Practical Investigation of Accuracy, Speed, and Simplicity
, 2004
"... Binary classification is a core data mining task. For large datasets or realtime applications, desirable classifiers are accurate, fast, and need no parameter tuning. We present a simple implementation of logistic regression that meets these requirements. A combination of regularization, truncated ..."
Abstract

Cited by 36 (0 self)
 Add to MetaCart
(Show Context)
Binary classification is a core data mining task. For large datasets or realtime applications, desirable classifiers are accurate, fast, and need no parameter tuning. We present a simple implementation of logistic regression that meets these requirements. A combination of regularization, truncated Newton methods, and iteratively reweighted least squares make it faster and more accurate than modern SVM implementations, and relatively insensitive to parameters. It is robust to linear dependencies and some scaling problems, making most data preprocessing unnecessary. 1
Informationtheoretic semantic multimedia indexing
 in ACM Conference on Image and Video Retrieval
, 2007
"... To solve the problem of indexing collections with diverse text documents, image documents, or documents with both text and images, one needs to develop a model that supports heterogeneous types of documents. In this paper, we show how information theory supplies us with the tools necessary to develo ..."
Abstract

Cited by 25 (10 self)
 Add to MetaCart
(Show Context)
To solve the problem of indexing collections with diverse text documents, image documents, or documents with both text and images, one needs to develop a model that supports heterogeneous types of documents. In this paper, we show how information theory supplies us with the tools necessary to develop a unique model for text, image, and text/image retrieval. In our approach, for each possible query keyword we estimate a maximum entropy model based on exclusively continuous features that were preprocessed. The unique continuous featurespace of text and visual data is constructed by using a minimum description length criterion to find the optimal featurespace representation (optimal from an information theory point of view). We evaluate our approach in three experiments: only text retrieval, only image retrieval, and text combined with image retrieval.
Boosting with Structural Sparsity
"... We derive generalizations of AdaBoost and related gradientbased coordinate descent methods that incorporate sparsitypromoting penalties for the norm of the predictor that is being learned. The end result is a family of coordinate descent algorithms that integrate forward feature induction and back ..."
Abstract

Cited by 25 (2 self)
 Add to MetaCart
We derive generalizations of AdaBoost and related gradientbased coordinate descent methods that incorporate sparsitypromoting penalties for the norm of the predictor that is being learned. The end result is a family of coordinate descent algorithms that integrate forward feature induction and backpruning through regularization and give an automatic stopping criterion for feature induction. We study penalties based on the ℓ1, ℓ2, and ℓ ∞ norms of the predictor and introduce mixednorm penalties that build upon the initial penalties. The mixednorm regularizers facilitate structural sparsity in parameter space, which is a useful property in multiclass prediction and other related tasks. We report empirical results that demonstrate the power of our approach in building accurate and structurally sparse models. 1. Introduction and
Multinomial Naive Bayes for Text Categorization Revisited
 In: Lecture Notes in Computer Science
, 2005
"... Abstract. This paper presents empirical results for several versions of the multinomial naive Bayes classifier on four text categorization problems, and a way of improving it using locally weighted learning. More specifically, it compares standard multinomial naive Bayes to the recently proposed tr ..."
Abstract

Cited by 22 (1 self)
 Add to MetaCart
(Show Context)
Abstract. This paper presents empirical results for several versions of the multinomial naive Bayes classifier on four text categorization problems, and a way of improving it using locally weighted learning. More specifically, it compares standard multinomial naive Bayes to the recently proposed transformed weightnormalized complement naive Bayes classifier (TWCNB) [1], and shows that some of the modifications included in TWCNB may not be necessary to achieve optimum performance on some datasets. However, it does show that TFIDF conversion and document length normalization are important. It also shows that support vector machines can, in fact, sometimes very significantly outperform both methods. Finally, it shows how the performance of multinomial naive Bayes can be improved using locally weighted learning. However, the overall conclusion of our paper is that support vector machines are still the method of choice if the aim is to maximize accuracy. 1
Robustness of adaptive filtering methods in a crossbenchmark evaluation
 In Proceedings of SIGIR2003
, 2005
"... This paper reports a crossbenchmark evaluation of regularized logistic regression (LR) and incremental Rocchio for adaptive filtering. Using four corpora from the Topic Detection and Tracking (TDT) forum and the Text Retrieval Conferences (TREC) we evaluated these methods with nonstationary topics ..."
Abstract

Cited by 20 (5 self)
 Add to MetaCart
(Show Context)
This paper reports a crossbenchmark evaluation of regularized logistic regression (LR) and incremental Rocchio for adaptive filtering. Using four corpora from the Topic Detection and Tracking (TDT) forum and the Text Retrieval Conferences (TREC) we evaluated these methods with nonstationary topics at various granularity levels, and measured performance with different utility settings. We found that LR performs strongly and robustly in optimizing T11SU (a TREC utility function) while Rocchio is better for optimizing Ctrk (the TDT tracking cost), a highrecall oriented objective function. Using systematic crosscorpus parameter optimization with both methods, we obtained the best results ever reported on TDT5, TREC10 and TREC11. Relevance feedback on a small portion (0.05~0.2%) of the TDT5 test documents yielded significant performance improvements, measuring up to a 54 % reduction in Ctrk and a 20.9 % increase in T11SU (with β=0.1), compared to the results of the topperforming system in TDT2004 without relevance feedback information.
Fighting Phishing at the User Interface
 in Security and Usability, O’Reilly
, 2005
"... ..."
(Show Context)
Algorithms for sparse linear classifiers in the massive data setting, 2006. Manuscript. Available fromwww.stat.rutgers.edu/˜madigan/papers
, 2005
"... Classifiers favoring sparse solutions, such as support vector machines, relevance vector machines, LASSOregression based classifiers, etc., provide competitive methods for classification problems in high dimensions. However, current algorithms for training sparse classifiers typically scale quite u ..."
Abstract

Cited by 18 (0 self)
 Add to MetaCart
Classifiers favoring sparse solutions, such as support vector machines, relevance vector machines, LASSOregression based classifiers, etc., provide competitive methods for classification problems in high dimensions. However, current algorithms for training sparse classifiers typically scale quite unfavorably with respect to the number of training examples. This paper proposes online and multipass algorithms for training sparse linear classifiers for high dimensional data. These algorithms have computational complexity and memory requirements that make learning on massive datasets feasible. The central idea that makes this possible is a straightforward quadratic approximation to the likelihood function.
Author Identification on the Large Scale
 In Proc. of the Meeting of the Classification Society of North America
, 2005
"... this paper is on techniques for identifying authors in large collections of textual artifacts (emails, communiques, transcribed speech, etc.). Our approach focuses on very highdimensional, topicfree document representations and particular attribution problems, such as: (1) Which one of these K au ..."
Abstract

Cited by 18 (0 self)
 Add to MetaCart
this paper is on techniques for identifying authors in large collections of textual artifacts (emails, communiques, transcribed speech, etc.). Our approach focuses on very highdimensional, topicfree document representations and particular attribution problems, such as: (1) Which one of these K authors wrote this particular document? (2) Did any of these K authors write this particular document? Scientific investigation into measuring style and authorship of texts goes back to the late nineteenth century, with the pioneering studies of Mendenhall [36] and Mascol [34, 35] on distributions of sentence and word lengths in works of literature and the gospels of the New Testament. The underlying notion was that works by di#erent authors are strongly distinguished by quantifiable features of the text. By the midtwentieth century, this line of research had grown into what became known as "stylometrics", and a variety of textual statistics had been proposed to quantify textual style. The style of early work was characterized by a search for invariant properties of textual statistics, such as Zipf's distribution and Yule's K statistic
Bayesian Multinomial Logistic Regression for Author Identification
 In Maxent Conference
, 2005
"... Motivated by highdimensional applications in authorship atttribution, we describe a Bayesian multinomial logistic regression model together with an associated learning algorithm. ..."
Abstract

Cited by 17 (0 self)
 Add to MetaCart
(Show Context)
Motivated by highdimensional applications in authorship atttribution, we describe a Bayesian multinomial logistic regression model together with an associated learning algorithm.
Loss functions for preference levels: Regression with discrete ordered labels
 Proceedings of the IJCAI Multidisciplinary Workshop on Advances in Preference Handling
, 2005
"... We consider different types of loss functions for discrete ordinal regression, i.e. fitting labels that may take one of several discrete, but ordered, values. These types of labels arise when preferences are specified by selecting, for each item, one of several rating “levels”, e.g. one through five ..."
Abstract

Cited by 15 (4 self)
 Add to MetaCart
We consider different types of loss functions for discrete ordinal regression, i.e. fitting labels that may take one of several discrete, but ordered, values. These types of labels arise when preferences are specified by selecting, for each item, one of several rating “levels”, e.g. one through five stars. We present two general thresholdbased constructions which can be used to generalize loss functions for binary labels, such as the logistic and hinge loss, and another generalization of the logistic loss based on a probabilistic model for discrete ordered labels. Experiments on the 1 Million MovieLens data set indicate that one of our construction is a significant improvement over previous classification and regressionbased approaches. 1