Results 1  10
of
667
Machine Learning in Automated Text Categorization
 ACM COMPUTING SURVEYS
, 2002
"... The automated categorization (or classification) of texts into predefined categories has witnessed a booming interest in the last ten years, due to the increased availability of documents in digital form and the ensuing need to organize them. In the research community the dominant approach to this p ..."
Abstract

Cited by 1734 (22 self)
 Add to MetaCart
The automated categorization (or classification) of texts into predefined categories has witnessed a booming interest in the last ten years, due to the increased availability of documents in digital form and the ensuing need to organize them. In the research community the dominant approach to this problem is based on machine learning techniques: a general inductive process automatically builds a classifier by learning, from a set of preclassified documents, the characteristics of the categories. The advantages of this approach over the knowledge engineering approach (consisting in the manual definition of a classifier by domain experts) are a very good effectiveness, considerable savings in terms of expert labor power, and straightforward portability to different domains. This survey discusses the main approaches to text categorization that fall within the machine learning paradigm. We will discuss in detail issues pertaining to three different problems, namely document representation, classifier construction, and classifier evaluation.
Large margin methods for structured and interdependent output variables
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2005
"... Learning general functional dependencies between arbitrary input and output spaces is one of the key challenges in computational intelligence. While recent progress in machine learning has mainly focused on designing flexible and powerful input representations, this paper addresses the complementary ..."
Abstract

Cited by 624 (12 self)
 Add to MetaCart
Learning general functional dependencies between arbitrary input and output spaces is one of the key challenges in computational intelligence. While recent progress in machine learning has mainly focused on designing flexible and powerful input representations, this paper addresses the complementary issue of designing classification algorithms that can deal with more complex outputs, such as trees, sequences, or sets. More generally, we consider problems involving multiple dependent output variables, structured output spaces, and classification problems with class attributes. In order to accomplish this, we propose to appropriately generalize the wellknown notion of a separation margin and derive a corresponding maximummargin formulation. While this leads to a quadratic program with a potentially prohibitive, i.e. exponential, number of constraints, we present a cutting plane algorithm that solves the optimization problem in polynomial time for a large class of problems. The proposed method has important applications in areas such as computational biology, natural language processing, information retrieval/extraction, and optical character recognition. Experiments from various domains involving different types of output spaces emphasize the breadth and generality of our approach.
Recognizing Contextual Polarity in PhraseLevel Sentiment Analysis
 In Proceedings of HLTEMNLP
, 2005
"... This paper presents a new approach to phraselevel sentiment analysis that first determines whether an expression is neutral or polar and then disambiguates the polarity of the polar expressions. With this approach, the system is able to automatically identify the contextual polarity for a large sub ..."
Abstract

Cited by 454 (15 self)
 Add to MetaCart
(Show Context)
This paper presents a new approach to phraselevel sentiment analysis that first determines whether an expression is neutral or polar and then disambiguates the polarity of the polar expressions. With this approach, the system is able to automatically identify the contextual polarity for a large subset of sentiment expressions, achieving results that are significantly better than baseline. 1
Online passiveaggressive algorithms
 JMLR
, 2006
"... We present a unified view for online classification, regression, and uniclass problems. This view leads to a single algorithmic framework for the three problems. We prove worst case loss bounds for various algorithms for both the realizable case and the nonrealizable case. The end result is new alg ..."
Abstract

Cited by 435 (24 self)
 Add to MetaCart
We present a unified view for online classification, regression, and uniclass problems. This view leads to a single algorithmic framework for the three problems. We prove worst case loss bounds for various algorithms for both the realizable case and the nonrealizable case. The end result is new algorithms and accompanying loss bounds for hingeloss regression and uniclass. We also get refined loss bounds for previously studied classification algorithms.
Discriminative Reranking for Natural Language Parsing
, 2005
"... This article considers approaches which rerank the output of an existing probabilistic parser. The base parser produces a set of candidate parses for each input sentence, with associated probabilities that define an initial ranking of these parses. A second model then attempts to improve upon this i ..."
Abstract

Cited by 333 (9 self)
 Add to MetaCart
(Show Context)
This article considers approaches which rerank the output of an existing probabilistic parser. The base parser produces a set of candidate parses for each input sentence, with associated probabilities that define an initial ranking of these parses. A second model then attempts to improve upon this initial ranking, using additional features of the tree as evidence. The strength of our approach is that it allows a tree to be represented as an arbitrary set of features, without concerns about how these features interact or overlap and without the need to define a derivation or a generative model which takes these features into account. We introduce a new method for the reranking task, based on the boosting approach to ranking problems described in Freund et al. (1998). We apply the boosting method to parsing the Wall Street Journal treebank. The method combined the loglikelihood under a baseline model (that of Collins [1999]) with evidence from an additional 500,000 features over parse trees that were not included in the original model. The new model achieved 89.75 % Fmeasure, a 13 % relative decrease in Fmeasure error over the baseline model’s score of 88.2%. The article also introduces a new algorithm for the boosting approach which takes advantage of the sparsity of the feature space in the parsing data. Experiments show significant efficiency gains for the new algorithm over the obvious implementation of the boosting approach. We argue that the method is an appealing alternative—in terms of both simplicity and efficiency—to work on feature selection methods within loglinear (maximumentropy) models. Although the experiments in this article are on natural language parsing (NLP), the approach should be applicable to many other NLP problems which are naturally framed as ranking tasks, for example, speech recognition, machine translation, or natural language generation.
Using Maximum Entropy for Text Classification
, 1999
"... This paper proposes the use of maximum entropy techniques for text classification. Maximum entropy is a probability distribution estimation technique widely used for a variety of natural language tasks, such as language modeling, partofspeech tagging, and text segmentation. The underlying principl ..."
Abstract

Cited by 326 (6 self)
 Add to MetaCart
This paper proposes the use of maximum entropy techniques for text classification. Maximum entropy is a probability distribution estimation technique widely used for a variety of natural language tasks, such as language modeling, partofspeech tagging, and text segmentation. The underlying principle of maximum entropy is that without external knowledge, one should prefer distributions that are uniform. Constraints on the distribution, derived from labeled training data, inform the technique where to be minimally nonuniform. The maximum entropy formulation has a unique solution which can be found by the improved iterative scaling algorithm. In this paper, maximum entropy is used for text classification by estimating the conditional distribution of the class variable given the document. In experiments on several text datasets we compare accuracy to naive Bayes and show that maximum entropy is sometimes significantly better, but also sometimes worse. Much future work remains, but the re...
Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales
 In Proc. 43st ACL
, 2005
"... We address the ratinginference problem, wherein rather than simply decide whether a review is “thumbs up ” or “thumbs down”, as in previous sentiment analysis work, one must determine an author’s evaluation with respect to a multipoint scale (e.g., one to five “stars”). This task represents an int ..."
Abstract

Cited by 298 (2 self)
 Add to MetaCart
(Show Context)
We address the ratinginference problem, wherein rather than simply decide whether a review is “thumbs up ” or “thumbs down”, as in previous sentiment analysis work, one must determine an author’s evaluation with respect to a multipoint scale (e.g., one to five “stars”). This task represents an interesting twist on standard multiclass text categorization because there are several different degrees of similarity between class labels; for example, “three stars ” is intuitively closer to “four stars ” than to “one star”. We first evaluate human performance at the task. Then, we apply a metaalgorithm, based on a metric labeling formulation of the problem, that alters a givenary classifier’s output in an explicit attempt to ensure that similar items receive similar labels. We show that the metaalgorithm can provide significant improvements over both multiclass and regression versions of SVMs when we employ a novel similarity measure appropriate to the problem. 1
Sharing Visual Features for Multiclass And Multiview Object Detection
, 2004
"... We consider the problem of detecting a large number of different classes of objects in cluttered scenes. Traditional approaches require applying a battery of different classifiers to the image, at multiple locations and scales. This can be slow and can require a lot of training data, since each clas ..."
Abstract

Cited by 279 (6 self)
 Add to MetaCart
(Show Context)
We consider the problem of detecting a large number of different classes of objects in cluttered scenes. Traditional approaches require applying a battery of different classifiers to the image, at multiple locations and scales. This can be slow and can require a lot of training data, since each classifier requires the computation of many different image features. In particular, for independently trained detectors, the (runtime) computational complexity, and the (trainingtime) sample complexity, scales linearly with the number of classes to be detected. It seems unlikely that such an approach will scale up to allow recognition of hundreds or thousands of objects.
Logistic Regression, AdaBoost and Bregman Distances
, 2000
"... We give a unified account of boosting and logistic regression in which each learning problem is cast in terms of optimization of Bregman distances. The striking similarity of the two problems in this framework allows us to design and analyze algorithms for both simultaneously, and to easily adapt al ..."
Abstract

Cited by 259 (45 self)
 Add to MetaCart
(Show Context)
We give a unified account of boosting and logistic regression in which each learning problem is cast in terms of optimization of Bregman distances. The striking similarity of the two problems in this framework allows us to design and analyze algorithms for both simultaneously, and to easily adapt algorithms designed for one problem to the other. For both problems, we give new algorithms and explain their potential advantages over existing methods. These algorithms can be divided into two types based on whether the parameters are iteratively updated sequentially (one at a time) or in parallel (all at once). We also describe a parameterized family of algorithms which interpolates smoothly between these two extremes. For all of the algorithms, we give convergence proofs using a general formalization of the auxiliaryfunction proof technique. As one of our sequentialupdate algorithms is equivalent to AdaBoost, this provides the first general proof of convergence for AdaBoost. We show that all of our algorithms generalize easily to the multiclass case, and we contrast the new algorithms with iterative scaling. We conclude with a few experimental results with synthetic data that highlight the behavior of the old and newly proposed algorithms in different settings.