• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Making logistic regression a core data mining tool with tr-irls (2005)

by A M Paul Komarek
Venue:In ICDM
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 19
Next 10 →

Trust region Newton method for large-scale logistic regression

by Chih-jen Lin, Ruby C. Weng, S. Sathiya Keerthi - In Proceedings of the 24th International Conference on Machine Learning (ICML , 2007
"... Large-scale logistic regression arises in many applications such as document classification and natural language processing. In this paper, we apply a trust region Newton method to maximize the log-likelihood of the logistic regression model. The proposed method uses only approximate Newton steps in ..."
Abstract - Cited by 35 (5 self) - Add to MetaCart
Large-scale logistic regression arises in many applications such as document classification and natural language processing. In this paper, we apply a trust region Newton method to maximize the log-likelihood of the logistic regression model. The proposed method uses only approximate Newton steps in the beginning, but achieves fast convergence in the end. Experiments show that it is faster than the commonly used quasi Newton approach for logistic regression. We also compare it with existing linear SVM implementations. 1

Large-scale text categorization by batch mode active learning

by Steven C. H. Hoi, Rong Jin, Michael R. Lyu - In Proceedings of the International World Wide Web Conference , 2006
"... Large-scale text categorization is an important research topic for Web data mining. One of the challenges in large-scale text categorization is how to reduce the human efforts in labeling text documents for building reliable classification models. In the past, there have been many studies on applyin ..."
Abstract - Cited by 14 (5 self) - Add to MetaCart
Large-scale text categorization is an important research topic for Web data mining. One of the challenges in large-scale text categorization is how to reduce the human efforts in labeling text documents for building reliable classification models. In the past, there have been many studies on applying active learning methods to automatic text categorization, which try to select the most informative documents for manually labeling. Most of these studies focused on selecting a single unlabeled document in each iteration. As a result, the text categorization model has to be retrained after each labeled document is solicited. In this paper, we present a novel active learning algorithm that selects a batch of text documents for manually labeling in each iteration. The key of the batch mode active learning is how to reduce the redundancy among the selected examples such that each example provides unique information for model updating. To this end, we use the Fisher information matrix as the measurement of model uncertainty and choose the set of documents that can efficiently minimize the Fisher information matrix of a classification model. Extensive experiments with three different datasets have shown that our algorithm is more effective than the state-of-the-art active learning techniques for text categorization and can be a promising tool toward large-scale text categorization on World Wide Web.

Algorithms for sparse linear classifiers in the massive data setting, 2006. Manuscript. Available fromwww.stat.rutgers.edu/˜madigan/papers

by Suhrid Balakrishnan, David Madigan , 2005
"... Classifiers favoring sparse solutions, such as support vector machines, relevance vector machines, LASSO-regression based classifiers, etc., provide competitive methods for classification problems in high dimensions. However, current algorithms for training sparse classifiers typically scale quite u ..."
Abstract - Cited by 12 (0 self) - Add to MetaCart
Classifiers favoring sparse solutions, such as support vector machines, relevance vector machines, LASSO-regression based classifiers, etc., provide competitive methods for classification problems in high dimensions. However, current algorithms for training sparse classifiers typically scale quite unfavorably with respect to the number of training examples. This paper proposes online and multi-pass algorithms for training sparse linear classifiers for high dimensional data. These algorithms have computational complexity and memory requirements that make learning on massive datasets feasible. The central idea that makes this possible is a straightforward quadratic approximation to the likelihood function.

Contextual Advertising by Combining Relevance with Click Feedback

by Deepayan Chakrabarti , 2008
"... Contextual advertising supports much of the Web’s ecosystem today. User experience and revenue (shared by the site publisher ad the ad network) depend on the relevance of the displayed ads to the page content. As with other document retrieval systems, relevance is provided by scoring the match betwe ..."
Abstract - Cited by 11 (3 self) - Add to MetaCart
Contextual advertising supports much of the Web’s ecosystem today. User experience and revenue (shared by the site publisher ad the ad network) depend on the relevance of the displayed ads to the page content. As with other document retrieval systems, relevance is provided by scoring the match between individual ads (documents) and the content of the page where the ads are shown (query). In this paper we show how this match can be improved significantly by augmenting the ad-page scoring function with extra parameters from a logistic regression model on the words in the pages and ads. A key property of the proposed model is that it can be mapped to standard cosine similarity matching and is suitable for efficient and scalable implementation over inverted indexes. The model parameter values are learnt from logs containing ad impressions and clicks, with shrinkage estimators being used to combat sparsity. To scale our computations to train on an extremely large training corpus consisting of several gigabytes of data, we parallelize our fitting algorithm in a Hadoop [10] framework. Experimental evaluation is provided showing improved click prediction over a holdout set of impression and click events from a large scale real-world ad placement engine. Our best model achieves a 25 % lift in precision relative to a traditional information retrieval model which is based on cosine similarity, for recalling 10 % of the clicks in our test data.

Dual Coordinate Descent Methods for Logistic Regression and Maximum Entropy Models

by Hsiang-fu Yu, Fang-lan Huang, Chih-jen Lin
"... Most optimization methods for logistic regression or maximum entropy solve the primal problem. They range from iterative scaling, coordinate descent, quasi-Newton, and truncated Newton. Less efforts have been made to solve the dual problem. In contrast, for support vector machines (SVM), methods hav ..."
Abstract - Cited by 2 (2 self) - Add to MetaCart
Most optimization methods for logistic regression or maximum entropy solve the primal problem. They range from iterative scaling, coordinate descent, quasi-Newton, and truncated Newton. Less efforts have been made to solve the dual problem. In contrast, for support vector machines (SVM), methods have been shown to be very effective for solving the dual problem. In this paper, we apply coordinate descent methods to solve the dual form of logistic regression and maximum entropy. Interestingly, many details are different from the situation in SVM. We carefully study the theoretical convergence as well as numerical issues. The proposed method is shown to be faster than most state of the art methods for training logistic regression and maximum entropy. 1

Research methodology in studies of assessor effort for retrieval evaluation

by Ben Carterette, James Allan - In Proceedings of RIAO , 2007
"... As evaluation is an important but difficult part of information retrieval system design and experimentation, evaluation questions have been the subject of much research. An “evaluation study ” is an investigation into some aspect of evaluation. These types of studies typically experiment on ranked r ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
As evaluation is an important but difficult part of information retrieval system design and experimentation, evaluation questions have been the subject of much research. An “evaluation study ” is an investigation into some aspect of evaluation. These types of studies typically experiment on ranked results from actual retrieval systems, most often those that were submitted to TREC tracks. We argue that the standard of evidence in these types of studies should be increased to the level required of text retrieval studies, by testing on multiple data sets, multiple subsets of data, and comparison to baselines using hypothesis testing. We demonstrate that baseline performance on the standard data sets is quite high, necessitating strong evidence to support claims. 1

Learning to Hash Logistic Regression for Fast 3D Scan Point Classification

by Jens Behley, Kristian Kersting, Dirk Schulz, Volker Steinhage, Armin B. Cremers
"... Abstract — Segmenting range data into semantic categories has become a more and more active field of research in robotics. In this paper, we advocate to view this task as a problem of fast, large-scale retrieval. Intuitively, given a dataset of millions of labeled scan points and their neighborhoods ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
Abstract — Segmenting range data into semantic categories has become a more and more active field of research in robotics. In this paper, we advocate to view this task as a problem of fast, large-scale retrieval. Intuitively, given a dataset of millions of labeled scan points and their neighborhoods, we simply search for similar points in the datasets and use the labels of the retrieved ones to predict the labels of a novel point using some local prediction model such as majority vote or logistic regression. However, actually carrying this out requires highly efficient ways of (1) storing millions of scan points in memory and (2) quickly finding similar scan points to a target scan point. In this paper, we propose to address both issues by employing Weiss et al.’s recent spectral hashing. It represents each item in a database by a compact binary code that is constructed so that similar items will have similar binary code words. In turn, similar neighbors have codes within a small Hamming distance of the code for the query. Then, we learn a logistic regression model locally over all points with the same binary code word. Our experiments on real world 3D scans show that the resulting approach, called spectrally hashed logistic regression, can be ultra fast at prediction time and outperforms state-of-the art approaches such as logistic regression and nearest neighbor. I.

Challenges in the analysis of massthroughput data: A technical commentary from the statistical machine learning perspective

by Constantin F. Aliferis, Er Statnikov, Ioannis Tsamardinos - Cancer Informatics
"... Abstract: Sound data analysis is critical to the success of modern molecular medicine research that involves collection and interpretation of mass-throughput data. The novel nature and high-dimensionality in such datasets pose a series of nontrivial data analysis problems. This technical commentary ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
Abstract: Sound data analysis is critical to the success of modern molecular medicine research that involves collection and interpretation of mass-throughput data. The novel nature and high-dimensionality in such datasets pose a series of nontrivial data analysis problems. This technical commentary discusses the problems of over-fi tting, error estimation, curse of dimensionality, causal versus predictive modeling, integration of heterogeneous types of data, and lack of standard protocols for data analysis. We attempt to shed light on the nature and causes of these problems and to outline viable methodological approaches to overcome them. 1.

Logistic Regression and Stochastic Gradient Training

by Charles Elkan , 2010
"... An important extension of the idea of likelihood is conditional likelihood. The conditional likelihood of θ given data x and y is L(θ; y|x) = f(y|x; θ). Intuitively, y follows a probability distribution that is different for different x, but x itself is never unknown, so there is no need to have a ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
An important extension of the idea of likelihood is conditional likelihood. The conditional likelihood of θ given data x and y is L(θ; y|x) = f(y|x; θ). Intuitively, y follows a probability distribution that is different for different x, but x itself is never unknown, so there is no need to have a probabilistic model of it. Technically, for each x there is a different distribution f(y|x; θ) of y, but all these distributions share the same parameters θ. Given training data consisting of 〈xi, yi 〉 pairs, the principle of maximum conditional likelihood says to choose a parameter estimate ˆ θ that maximizes the product i f(yi|xi; θ). Note that we do not need to assume that the xi are independent in order to justify the conditional likelihood being a product; we just need to assume that the yi are independent conditional on the xi. For any specific value of x, ˆ θ can be used to predict values for y; we assume that we never want to predict values of x. If y is a binary outcome and x is a real-valued vector, then the conditional model

Collective Context-Aware Topic Models for Entity Disambiguation

by Prithviraj Sen , 2012
"... A crucial step in adding structure to unstructured data is to identify references to entities and disambiguate them. Such disambiguated references can help enhance readability and draw similarities across different pieces of running text in an automated fashion. Previous research has tackled this pr ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
A crucial step in adding structure to unstructured data is to identify references to entities and disambiguate them. Such disambiguated references can help enhance readability and draw similarities across different pieces of running text in an automated fashion. Previous research has tackled this problem by first forming a catalog of entities from a knowledge base, such as Wikipedia, and then using this catalog to disambiguate references in unseen text. However, most of the previously proposed models either do not use all text in the knowledge base, potentially missing out on discriminative features, or do not exploit word-entity proximity to learn high-quality catalogs. In this work, we propose topic models that keep track of the context of every word in the knowledge base; so that words appearing within the same context as an entity are more likely to be associated with that entity. Thus, our topic models utilize all text present in the knowledge base and help learn high-quality catalogs. Our models also learn groups of co-occurring entities thus enabling collective disambiguation. Unlike most previous topic models, our models are non-parametric and do not require the user to specify the exact number of groups present in the knowledge base. In experiments performed on an extract of Wikipedia containing almost 60,000 references, our models outperform SVM-based baselines by as much as 18 % in terms of disambiguation accuracy translating to an increment of almost 11,000 correctly disambiguated references.
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University