• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations

Tools

Sorted by:
Try your query at:
Semantic Scholar Scholar Academic
Google Bing DBLP
Results 1 - 10 of 2,590
Next 10 →

N-grambased text categorization

by William B. Cavnar, John M. Trenkle - In Proc. of SDAIR-94, 3rd Annual Symposium on Document Analysis and Information Retrieval , 1994
"... Text categorization is a fundamental task in document processing, allowing the automated handling of enormous streams of documents in electronic form. One difficulty in handling some classes of documents is the presence of different kinds of textual errors, such as spelling and grammatical errors in ..."
Abstract - Cited by 445 (0 self) - Add to MetaCart
in email, and character recognition errors in documents that come through OCR. Text categorization must work reliably on all input, and thus must tolerate some level of these kinds of problems. We describe here an N-gram-based approach to text categorization that is tolerant of textual errors. The system

Machine Learning in Automated Text Categorization

by Fabrizio Sebastiani - ACM COMPUTING SURVEYS , 2002
"... The automated categorization (or classification) of texts into predefined categories has witnessed a booming interest in the last ten years, due to the increased availability of documents in digital form and the ensuing need to organize them. In the research community the dominant approach to this p ..."
Abstract - Cited by 1734 (22 self) - Add to MetaCart
The automated categorization (or classification) of texts into predefined categories has witnessed a booming interest in the last ten years, due to the increased availability of documents in digital form and the ensuing need to organize them. In the research community the dominant approach

An evaluation of statistical approaches to text categorization

by Yiming Yang - Journal of Information Retrieval , 1999
"... Abstract. This paper focuses on a comparative evaluation of a wide-range of text categorization methods, including previously published results on the Reuters corpus and new results of additional experiments. A controlled study using three classifiers, kNN, LLSF and WORD, was conducted to examine th ..."
Abstract - Cited by 663 (22 self) - Add to MetaCart
Abstract. This paper focuses on a comparative evaluation of a wide-range of text categorization methods, including previously published results on the Reuters corpus and new results of additional experiments. A controlled study using three classifiers, kNN, LLSF and WORD, was conducted to examine

Inductive learning algorithms and representations for text categorization,”

by Susan Dumais , John Platt , Mehran Sahami , David Heckerman - in Proceedings of the International Conference on Information and Knowledge Management, , 1998
"... ABSTRACT Text categorization -the assignment of natural language texts to one or more predefined categories based on their content -is an important component in many information organization and management tasks. We compare the effectiveness of five different automatic learning algorithms for text ..."
Abstract - Cited by 652 (8 self) - Add to MetaCart
ABSTRACT Text categorization -the assignment of natural language texts to one or more predefined categories based on their content -is an important component in many information organization and management tasks. We compare the effectiveness of five different automatic learning algorithms for text

A Re-Examination of Text Categorization Methods

by Yiming Yang, Xin Liu , 1999
"... This paper reports a controlled study with statistical significance tests on five text categorization methods: the Support Vector Machines (SVM), a k-Nearest Neighbor (kNN) classifier, a neural network (NNet) approach, the Linear Leastsquares Fit (LLSF) mapping and a NaiveBayes (NB) classifier. We f ..."
Abstract - Cited by 853 (24 self) - Add to MetaCart
This paper reports a controlled study with statistical significance tests on five text categorization methods: the Support Vector Machines (SVM), a k-Nearest Neighbor (kNN) classifier, a neural network (NNet) approach, the Linear Leastsquares Fit (LLSF) mapping and a NaiveBayes (NB) classifier. We

A Comparative Study on Feature Selection in Text Categorization

by Yiming Yang, Jan O. Pedersen , 1997
"... This paper is a comparative study of feature selection methods in statistical learning of text categorization. The focus is on aggressive dimensionality reduction. Five methods were evaluated, including term selection based on document frequency (DF), information gain (IG), mutual information (MI), ..."
Abstract - Cited by 1320 (15 self) - Add to MetaCart
This paper is a comparative study of feature selection methods in statistical learning of text categorization. The focus is on aggressive dimensionality reduction. Five methods were evaluated, including term selection based on document frequency (DF), information gain (IG), mutual information (MI

BoosTexter: A Boosting-based System for Text Categorization

by Robert E. Schapire , Yoram Singer
"... This work focuses on algorithms which learn from examples to perform multiclass text and speech categorization tasks. Our approach is based on a new and improved family of boosting algorithms. We describe in detail an implementation, called BoosTexter, of the new boosting algorithms for text catego ..."
Abstract - Cited by 667 (20 self) - Add to MetaCart
This work focuses on algorithms which learn from examples to perform multiclass text and speech categorization tasks. Our approach is based on a new and improved family of boosting algorithms. We describe in detail an implementation, called BoosTexter, of the new boosting algorithms for text

RCV1: A new benchmark collection for text categorization research

by David D. Lewis, Yiming Yang, Tony G. Rose, Fan Li - JOURNAL OF MACHINE LEARNING RESEARCH , 2004
"... Reuters Corpus Volume I (RCV1) is an archive of over 800,000 manually categorized newswire stories recently made available by Reuters, Ltd. for research purposes. Use of this data for research on text categorization requires a detailed understanding of the real world constraints under which the data ..."
Abstract - Cited by 663 (11 self) - Add to MetaCart
Reuters Corpus Volume I (RCV1) is an archive of over 800,000 manually categorized newswire stories recently made available by Reuters, Ltd. for research purposes. Use of this data for research on text categorization requires a detailed understanding of the real world constraints under which

Text Categorization with Support Vector Machines: Learning with Many Relevant Features

by Thorsten Joachims , 1998
"... This paper explores the use of Support Vector Machines (SVMs) for learning text classifiers from examples. It analyzes the particular properties of learning with text data and identifies, why SVMs are appropriate for this task. Empirical results support the theoretical findings. SVMs achieve substan ..."
Abstract - Cited by 2303 (9 self) - Add to MetaCart
This paper explores the use of Support Vector Machines (SVMs) for learning text classifiers from examples. It analyzes the particular properties of learning with text data and identifies, why SVMs are appropriate for this task. Empirical results support the theoretical findings. SVMs achieve

A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization

by Thorsten Joachims , 1997
"... The Rocchio relevance feedback algorithm is one of the most popular and widely applied learning methods from information retrieval. Here, a probabilistic analysis of this algorithm is presented in a text categorization framework. The analysis gives theoretical insight into the heuristics used in the ..."
Abstract - Cited by 456 (1 self) - Add to MetaCart
The Rocchio relevance feedback algorithm is one of the most popular and widely applied learning methods from information retrieval. Here, a probabilistic analysis of this algorithm is presented in a text categorization framework. The analysis gives theoretical insight into the heuristics used
Next 10 →
Results 1 - 10 of 2,590
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University