Results 1 - 10
of
667
Machine Learning in Automated Text Categorization
- ACM COMPUTING SURVEYS
, 2002
"... The automated categorization (or classification) of texts into predefined categories has witnessed a booming interest in the last ten years, due to the increased availability of documents in digital form and the ensuing need to organize them. In the research community the dominant approach to this p ..."
Abstract
-
Cited by 1734 (22 self)
- Add to MetaCart
The automated categorization (or classification) of texts into predefined categories has witnessed a booming interest in the last ten years, due to the increased availability of documents in digital form and the ensuing need to organize them. In the research community the dominant approach to this problem is based on machine learning techniques: a general inductive process automatically builds a classifier by learning, from a set of preclassified documents, the characteristics of the categories. The advantages of this approach over the knowledge engineering approach (consisting in the manual definition of a classifier by domain experts) are a very good effectiveness, considerable savings in terms of expert labor power, and straightforward portability to different domains. This survey discusses the main approaches to text categorization that fall within the machine learning paradigm. We will discuss in detail issues pertaining to three different problems, namely document representation, classifier construction, and classifier evaluation.
Large margin methods for structured and interdependent output variables
- JOURNAL OF MACHINE LEARNING RESEARCH
, 2005
"... Learning general functional dependencies between arbitrary input and output spaces is one of the key challenges in computational intelligence. While recent progress in machine learning has mainly focused on designing flexible and powerful input representations, this paper addresses the complementary ..."
Abstract
-
Cited by 624 (12 self)
- Add to MetaCart
Learning general functional dependencies between arbitrary input and output spaces is one of the key challenges in computational intelligence. While recent progress in machine learning has mainly focused on designing flexible and powerful input representations, this paper addresses the complementary issue of designing classification algorithms that can deal with more complex outputs, such as trees, sequences, or sets. More generally, we consider problems involving multiple dependent output variables, structured output spaces, and classification problems with class attributes. In order to accomplish this, we propose to appropriately generalize the well-known notion of a separation margin and derive a corresponding maximum-margin formulation. While this leads to a quadratic program with a potentially prohibitive, i.e. exponential, number of constraints, we present a cutting plane algorithm that solves the optimization problem in polynomial time for a large class of problems. The proposed method has important applications in areas such as computational biology, natural language processing, information retrieval/extraction, and optical character recognition. Experiments from various domains involving different types of output spaces emphasize the breadth and generality of our approach.
Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis
- In Proceedings of HLT-EMNLP
, 2005
"... This paper presents a new approach to phrase-level sentiment analysis that first determines whether an expression is neutral or polar and then disambiguates the polarity of the polar expressions. With this approach, the system is able to automatically identify the contextual polarity for a large sub ..."
Abstract
-
Cited by 454 (15 self)
- Add to MetaCart
(Show Context)
This paper presents a new approach to phrase-level sentiment analysis that first determines whether an expression is neutral or polar and then disambiguates the polarity of the polar expressions. With this approach, the system is able to automatically identify the contextual polarity for a large subset of sentiment expressions, achieving results that are significantly better than baseline. 1
Online passive-aggressive algorithms
- JMLR
, 2006
"... We present a unified view for online classification, regression, and uniclass problems. This view leads to a single algorithmic framework for the three problems. We prove worst case loss bounds for various algorithms for both the realizable case and the non-realizable case. The end result is new alg ..."
Abstract
-
Cited by 435 (24 self)
- Add to MetaCart
We present a unified view for online classification, regression, and uniclass problems. This view leads to a single algorithmic framework for the three problems. We prove worst case loss bounds for various algorithms for both the realizable case and the non-realizable case. The end result is new algorithms and accompanying loss bounds for hinge-loss regression and uniclass. We also get refined loss bounds for previously studied classification algorithms.
Discriminative Reranking for Natural Language Parsing
, 2005
"... This article considers approaches which rerank the output of an existing probabilistic parser. The base parser produces a set of candidate parses for each input sentence, with associated probabilities that define an initial ranking of these parses. A second model then attempts to improve upon this i ..."
Abstract
-
Cited by 333 (9 self)
- Add to MetaCart
(Show Context)
This article considers approaches which rerank the output of an existing probabilistic parser. The base parser produces a set of candidate parses for each input sentence, with associated probabilities that define an initial ranking of these parses. A second model then attempts to improve upon this initial ranking, using additional features of the tree as evidence. The strength of our approach is that it allows a tree to be represented as an arbitrary set of features, without concerns about how these features interact or overlap and without the need to define a derivation or a generative model which takes these features into account. We introduce a new method for the reranking task, based on the boosting approach to ranking problems described in Freund et al. (1998). We apply the boosting method to parsing the Wall Street Journal treebank. The method combined the log-likelihood under a baseline model (that of Collins [1999]) with evidence from an additional 500,000 features over parse trees that were not included in the original model. The new model achieved 89.75 % F-measure, a 13 % relative decrease in F-measure error over the baseline model’s score of 88.2%. The article also introduces a new algorithm for the boosting approach which takes advantage of the sparsity of the feature space in the parsing data. Experiments show significant efficiency gains for the new algorithm over the obvious implementation of the boosting approach. We argue that the method is an appealing alternative—in terms of both simplicity and efficiency—to work on feature selection methods within log-linear (maximum-entropy) models. Although the experiments in this article are on natural language parsing (NLP), the approach should be applicable to many other NLP problems which are naturally framed as ranking tasks, for example, speech recognition, machine translation, or natural language generation.
Using Maximum Entropy for Text Classification
, 1999
"... This paper proposes the use of maximum entropy techniques for text classification. Maximum entropy is a probability distribution estimation technique widely used for a variety of natural language tasks, such as language modeling, part-of-speech tagging, and text segmentation. The underlying principl ..."
Abstract
-
Cited by 326 (6 self)
- Add to MetaCart
This paper proposes the use of maximum entropy techniques for text classification. Maximum entropy is a probability distribution estimation technique widely used for a variety of natural language tasks, such as language modeling, part-of-speech tagging, and text segmentation. The underlying principle of maximum entropy is that without external knowledge, one should prefer distributions that are uniform. Constraints on the distribution, derived from labeled training data, inform the technique where to be minimally non-uniform. The maximum entropy formulation has a unique solution which can be found by the improved iterative scaling algorithm. In this paper, maximum entropy is used for text classification by estimating the conditional distribution of the class variable given the document. In experiments on several text datasets we compare accuracy to naive Bayes and show that maximum entropy is sometimes significantly better, but also sometimes worse. Much future work remains, but the re...
Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales
- In Proc. 43st ACL
, 2005
"... We address the rating-inference problem, wherein rather than simply decide whether a review is “thumbs up ” or “thumbs down”, as in previous sentiment analysis work, one must determine an author’s evaluation with respect to a multi-point scale (e.g., one to five “stars”). This task represents an int ..."
Abstract
-
Cited by 298 (2 self)
- Add to MetaCart
(Show Context)
We address the rating-inference problem, wherein rather than simply decide whether a review is “thumbs up ” or “thumbs down”, as in previous sentiment analysis work, one must determine an author’s evaluation with respect to a multi-point scale (e.g., one to five “stars”). This task represents an interesting twist on standard multi-class text categorization because there are several different degrees of similarity between class labels; for example, “three stars ” is intuitively closer to “four stars ” than to “one star”. We first evaluate human performance at the task. Then, we apply a metaalgorithm, based on a metric labeling formulation of the problem, that alters a given-ary classifier’s output in an explicit attempt to ensure that similar items receive similar labels. We show that the meta-algorithm can provide significant improvements over both multi-class and regression versions of SVMs when we employ a novel similarity measure appropriate to the problem. 1
Sharing Visual Features for Multiclass And Multiview Object Detection
, 2004
"... We consider the problem of detecting a large number of different classes of objects in cluttered scenes. Traditional approaches require applying a battery of different classifiers to the image, at multiple locations and scales. This can be slow and can require a lot of training data, since each clas ..."
Abstract
-
Cited by 279 (6 self)
- Add to MetaCart
(Show Context)
We consider the problem of detecting a large number of different classes of objects in cluttered scenes. Traditional approaches require applying a battery of different classifiers to the image, at multiple locations and scales. This can be slow and can require a lot of training data, since each classifier requires the computation of many different image features. In particular, for independently trained detectors, the (run-time) computational complexity, and the (training-time) sample complexity, scales linearly with the number of classes to be detected. It seems unlikely that such an approach will scale up to allow recognition of hundreds or thousands of objects.
Logistic Regression, AdaBoost and Bregman Distances
, 2000
"... We give a unified account of boosting and logistic regression in which each learning problem is cast in terms of optimization of Bregman distances. The striking similarity of the two problems in this framework allows us to design and analyze algorithms for both simultaneously, and to easily adapt al ..."
Abstract
-
Cited by 259 (45 self)
- Add to MetaCart
(Show Context)
We give a unified account of boosting and logistic regression in which each learning problem is cast in terms of optimization of Bregman distances. The striking similarity of the two problems in this framework allows us to design and analyze algorithms for both simultaneously, and to easily adapt algorithms designed for one problem to the other. For both problems, we give new algorithms and explain their potential advantages over existing methods. These algorithms can be divided into two types based on whether the parameters are iteratively updated sequentially (one at a time) or in parallel (all at once). We also describe a parameterized family of algorithms which interpolates smoothly between these two extremes. For all of the algorithms, we give convergence proofs using a general formalization of the auxiliary-function proof technique. As one of our sequential-update algorithms is equivalent to AdaBoost, this provides the first general proof of convergence for AdaBoost. We show that all of our algorithms generalize easily to the multiclass case, and we contrast the new algorithms with iterative scaling. We conclude with a few experimental results with synthetic data that highlight the behavior of the old and newly proposed algorithms in different settings.