• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

A comparison of event models for Naive Bayes text classification (1998)

by Andrew McCallum, Kamal Nigam
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 1,025
Next 10 →

Thumbs up? Sentiment Classification using Machine Learning Techniques

by Bo Pang, Lillian Lee, Shivakumar Vaithyanathan - IN PROCEEDINGS OF EMNLP , 2002
"... We consider the problem of classifying documents not by topic, but by overall sentiment, e.g., determining whether a review is positive or negative. Using movie reviews as data, we find that standard machine learning techniques definitively outperform human-produced baselines. However, the three mac ..."
Abstract - Cited by 1101 (7 self) - Add to MetaCart
We consider the problem of classifying documents not by topic, but by overall sentiment, e.g., determining whether a review is positive or negative. Using movie reviews as data, we find that standard machine learning techniques definitively outperform human-produced baselines. However, the three machine learning methods we employed (Naive Bayes, maximum entropy classification, and support vector machines) do not perform as well on sentiment classification as on traditional topic-based categorization. We conclude by examining factors that make the sentiment classification problem more challenging. 1

Text Classification from Labeled and Unlabeled Documents using EM

by Kamal Nigam, Andrew Kachites Mccallum, Sebastian Thrun, Tom Mitchell - MACHINE LEARNING , 1999
"... This paper shows that the accuracy of learned text classifiers can be improved by augmenting a small number of labeled training documents with a large pool of unlabeled documents. This is important because in many text classification problems obtaining training labels is expensive, while large qua ..."
Abstract - Cited by 1033 (15 self) - Add to MetaCart
This paper shows that the accuracy of learned text classifiers can be improved by augmenting a small number of labeled training documents with a large pool of unlabeled documents. This is important because in many text classification problems obtaining training labels is expensive, while large quantities of unlabeled documents are readily available. We introduce an algorithm for learning from labeled and unlabeled documents based on the combination of Expectation-Maximization (EM) and a naive Bayes classifier. The algorithm first trains a classifier using the available labeled documents, and probabilistically labels the unlabeled documents. It then trains a new classifier using the labels for all the documents, and iterates to convergence. This basic EM procedure works well when the data conform to the generative assumptions of the model. However these assumptions are often violated in practice, and poor performance can result. We present two extensions to the algorithm that improve ...

Transductive Inference for Text Classification using Support Vector Machines

by Thorsten Joachims , 1999
"... This paper introduces Transductive Support Vector Machines (TSVMs) for text classification. While regular Support Vector Machines (SVMs) try to induce a general decision function for a learning task, Transductive Support Vector Machines take into account a particular test set and try to minimiz ..."
Abstract - Cited by 892 (4 self) - Add to MetaCart
This paper introduces Transductive Support Vector Machines (TSVMs) for text classification. While regular Support Vector Machines (SVMs) try to induce a general decision function for a learning task, Transductive Support Vector Machines take into account a particular test set and try to minimize misclassifications of just those particular examples. The paper presents an analysis of why TSVMs are well suited for text classification. These theoretical findings are supported by experiments on three test collections. The experiments show substantial improvements over inductive methods, especially for small training sets, cutting the number of labeled training examples down to a twentieth on some tasks. This work also proposes an algorithm for training TSVMs efficiently, handling 10,000 examples and more.

A Re-Examination of Text Categorization Methods

by Yiming Yang, Xin Liu , 1999
"... This paper reports a controlled study with statistical significance tests on five text categorization methods: the Support Vector Machines (SVM), a k-Nearest Neighbor (kNN) classifier, a neural network (NNet) approach, the Linear Leastsquares Fit (LLSF) mapping and a NaiveBayes (NB) classifier. We f ..."
Abstract - Cited by 853 (24 self) - Add to MetaCart
This paper reports a controlled study with statistical significance tests on five text categorization methods: the Support Vector Machines (SVM), a k-Nearest Neighbor (kNN) classifier, a neural network (NNet) approach, the Linear Leastsquares Fit (LLSF) mapping and a NaiveBayes (NB) classifier. We focus on the robustness of these methods in dealing with a skewed category distribution, and their performance as function of the training-set category frequency. Our results show that SVM, kNN and LLSF significantly outperform NNet and NB when the number of positive training instances per category are small (less than ten), and that all the methods perform comparably when the categories are sufficiently common (over 300 instances).
(Show Context)

Citation Context

...C literature. An increasing number of learning approaches have been applied, including regression models[9, 32], nearest neighbor classification[17, 29, 33, 31, 14], Bayesian probabilistic approaches =-=[25, 16, 20, 13, 12, 18, 3]-=-, decision trees[9, 16, 20, 2, 12], inductive rule learning[1, 5, 6, 21], neural networks[28, 22], on-line learning[6, 15] and Support Vector Machines [12]. While the rich literature provides valuable...

Semi-Supervised Learning Literature Survey

by Xiaojin Zhu , 2006
"... We review the literature on semi-supervised learning, which is an area in machine learning and more generally, artificial intelligence. There has been a whole spectrum of interesting ideas on how to learn from both labeled and unlabeled data, i.e. semi-supervised learning. This document is a chapter ..."
Abstract - Cited by 782 (8 self) - Add to MetaCart
We review the literature on semi-supervised learning, which is an area in machine learning and more generally, artificial intelligence. There has been a whole spectrum of interesting ideas on how to learn from both labeled and unlabeled data, i.e. semi-supervised learning. This document is a chapter excerpt from the author’s doctoral thesis (Zhu, 2005). However the author plans to update the online version frequently to incorporate the latest development in the field. Please obtain the latest version at http://www.cs.wisc.edu/~jerryzhu/pub/ssl_survey.pdf
(Show Context)

Citation Context

...ify the two components. For instance, the mixtures on the second and third line give the same p(x), but they classify x = 0.5 differently. Gaussian is identifiable. Mixture of multivariate Bernoulli (=-=McCallum & Nigam, 1998-=-a) is not identifiable. More discussions on identifiability and semi-supervised learning can be found in e.g. (Ratsaby & Venkatesh, 1995) and (Corduneanu & Jaakkola, 2001). 2.2 Model Correctness If th...

Local features and kernels for classification of texture and object categories: a comprehensive study

by J. Zhang, S. Lazebnik, C. Schmid - International Journal of Computer Vision , 2007
"... Recently, methods based on local image features have shown promise for texture and object recognition tasks. This paper presents a large-scale evaluation of an approach that represents images as distributions (signatures or histograms) of features extracted from a sparse set of keypoint locations an ..."
Abstract - Cited by 653 (34 self) - Add to MetaCart
Recently, methods based on local image features have shown promise for texture and object recognition tasks. This paper presents a large-scale evaluation of an approach that represents images as distributions (signatures or histograms) of features extracted from a sparse set of keypoint locations and learns a Support Vector Machine classifier with kernels based on two effective measures for comparing distributions, the Earth Mover’s Distance and the χ 2 distance. We first evaluate the performance of our approach with different keypoint detectors and descriptors, as well as different kernels and classifiers. We then conduct a comparative evaluation with several state-of-the-art recognition methods on four texture and five object databases. On most of these databases, our implementation exceeds the best reported results and achieves comparable performance on the rest. Finally, we investigate the influence of background correlations on recognition performance via extensive tests on the PASCAL database, for which ground-truth object localization information is available. Our experiments demonstrate that image representations based on distributions of local features are surprisingly effective for classification of texture and object images under challenging real-world conditions, including significant intra-class variations and substantial background clutter.
(Show Context)

Citation Context

... 55], and video retrieval [56]. The success of orderless models for these object recognition tasks may be explained with the help of an analogy to bag-of-words models for text document classification =-=[40, 46]-=-. Whereas for texture recognition, local features play the role of textons, or frequently repeated elements, for object recognition tasks, local features play the role of “visual words” predictive of ...

An extensive empirical study of feature selection metrics for text classification

by George Forman, Isabelle Guyon, André Elisseeff - J. of Machine Learning Research , 2003
"... Machine learning for text classification is the cornerstone of document categorization, news filtering, document routing, and personalization. In text domains, effective feature selection is essential to make the learning task efficient and more accurate. This paper presents an empirical comparison ..."
Abstract - Cited by 496 (15 self) - Add to MetaCart
Machine learning for text classification is the cornerstone of document categorization, news filtering, document routing, and personalization. In text domains, effective feature selection is essential to make the learning task efficient and more accurate. This paper presents an empirical comparison of twelve feature selection methods (e.g. Information Gain) evaluated on a benchmark of 229 text classification problem instances that were gathered from Reuters, TREC, OHSUMED, etc. The results are analyzed from multiple goal perspectives—accuracy, F-measure, precision, and recall—since each is appropriate in different situations. The results reveal that a new feature selection metric we call ‘Bi-Normal Separation ’ (BNS), outperformed the others by a substantial margin in most situations. This margin widened in tasks with high class skew, which is rampant in text classification problems and is particularly challenging for induction algorithms. A new evaluation methodology is offered that focuses on the needs of the data mining practitioner faced with a single dataset who seeks to choose one (or a pair of) metrics that are most likely to yield the best performance. From this perspective, BNS was the top single choice for all goals except precision, for which Information Gain yielded the best result most often. This analysis also revealed, for example, that Information Gain and Chi-Squared have correlated failures, and so they work poorly together. When choosing optimal pairs of metrics for each of the four performance goals, BNS is consistently a member of the pair—e.g., for greatest recall, the pair BNS + F1-measure yielded the best performance on the greatest number of tasks by a considerable margin.
(Show Context)

Citation Context

...validated. Additionally, some classifiers work primarily with positive features, e.g. the Multinomial Naïve Bayes model, which has been shown to be both better than the traditional Naïve Bayes model (=-=McCallum & Nigam, 1998-=-), and considerably inferior to other induction methods for text classification (e.g., Yang & Liu, 1999; Dumais et al., 1998). Negative features are numerous, given the large class skew, and quite val...

Learning to Extract Symbolic Knowledge from the World Wide Web

by Mark Craven, Dan DiPasquo, Dayne Freitag, Andrew McCallum, Tom Mitchell, Kamal Nigam, Sean Slattery , 1998
"... The World Wide Web is a vast source of information accessible to computers, but understandable only to humans. The goal of the research described here is to automatically create a computer understandable world wide knowledge base whose content mirrors that of the World Wide Web. Such a ..."
Abstract - Cited by 403 (29 self) - Add to MetaCart
The World Wide Web is a vast source of information accessible to computers, but understandable only to humans. The goal of the research described here is to automatically create a computer understandable world wide knowledge base whose content mirrors that of the World Wide Web. Such a
(Show Context)

Citation Context

...; the words are considered to "events" and the document is comprised of a collection of these events. We use the second approach, since it has been found to out-perform the first on several =-=data sets [41]-=-. We formulate Naive Bayes for text classification as follows. Given a set of classes C = fc 1 ; :::c jCj g and a document consisting of n words, (w 1 ; w 2 ; :::w n ), we classify the document as a m...

Toward Optimal Active Learning through Sampling Estimation of Error Reduction

by Nicholas Roy, Andrew Mccallum - In Proc. 18th International Conf. on Machine Learning , 2001
"... This paper presents an active learning method that directly optimizes expected future error. This is in contrast to many other popular techniques that instead aim to reduce version space size. These other methods are popular because for many learning models, closed form calculation of the expec ..."
Abstract - Cited by 353 (2 self) - Add to MetaCart
This paper presents an active learning method that directly optimizes expected future error. This is in contrast to many other popular techniques that instead aim to reduce version space size. These other methods are popular because for many learning models, closed form calculation of the expected future error is intractable. Our approach is made feasible by taking a sampling approach to estimating the expected reduction in error due to the labeling of a query. In experimental results on two real-world data sets we reach high accuracy very quickly, sometimes with four times fewer labeled examples than competing methods. 1.
(Show Context)

Citation Context

...ance, x 2 X , independently given the class. For text classification, the common variant of naive Bayes has unordered word counts for features, and uses a per-class multinomial to generate the words (=-=McCallum & Nigam, 1998-=-a). Let w t be the tth word in the dictionary of words V , and = ( y j ; w t jy j ) y j 2Y;w t 2V be the parameters of the model, where y j is the prior probability of class y j (otherwise writ...

Information-Theoretic Co-Clustering

by Inderjit S. Dhillon, Subramanyam Mallela, Dharmendra S. Modha - In KDD , 2003
"... Two-dimensional contingency or co-occurrence tables arise frequently in important applications such as text, web-log and market-basket data analysis. A basic problem in contingency table analysis is co-clustering: simultaneous clustering of the rows and columns. A novel theoretical formulation views ..."
Abstract - Cited by 346 (12 self) - Add to MetaCart
Two-dimensional contingency or co-occurrence tables arise frequently in important applications such as text, web-log and market-basket data analysis. A basic problem in contingency table analysis is co-clustering: simultaneous clustering of the rows and columns. A novel theoretical formulation views the contingency table as an empirical joint probability distribution of two discrete random variables and poses the co-clustering problem as an optimization problem in information theory -- the optimal co-clustering maximizes the mutual information between the clustered random variables subject to constraints on the number of row and column clusters.
(Show Context)

Citation Context

...data set consists of approximately 20; 000 newsgroup articles collected evenly from 20 dierent usenet news-groups. This data set has been used for testing several supervised text classication tasks [1=-=, 19, 14, 6]-=- and un-supervised document clustering tasks [18, 8]. Many of the news-groups share similar topics and about 4:5% of the documents are cross posted making the boundaries between some news-groups rathe...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University