Results 1 
9 of
9
Text categorization using compression models
 In Proceedings of DCC00, IEEE Data Compression Conference, Snowbird, US
, 2000
"... Text categorization, or the assignment of natural language texts to predefined categories based on their content, is of growing importance as the volume of information available on the internet continues to overwhelm us. The use of predefined categories implies a “supervised learning ” approach to c ..."
Abstract

Cited by 27 (1 self)
 Add to MetaCart
Text categorization, or the assignment of natural language texts to predefined categories based on their content, is of growing importance as the volume of information available on the internet continues to overwhelm us. The use of predefined categories implies a “supervised learning ” approach to categorization, where alreadyclassified articles—which
Feature Reduction for Document Clustering and Classification
, 2000
"... Often users receive search results which contain a wide range of documents, only some of which are relevant to their information needs. To address this problem, ever more systems not only locate information for users, but also organise that information on their behalf. We look at two main automatic ..."
Abstract

Cited by 13 (2 self)
 Add to MetaCart
Often users receive search results which contain a wide range of documents, only some of which are relevant to their information needs. To address this problem, ever more systems not only locate information for users, but also organise that information on their behalf. We look at two main automatic approaches to information organisation: interactive clustering of search results and precategorising documents to provide hierarchical browsing structures. To be feasible in real world applications, both of these approaches require accurate yet efficient algorithms. Yet, both suffer from the curse of dimensionality — documents are typically represented by hundreds or thousands of words (features) which must be analysed and processed during clustering or classification. In this paper, we discuss feature reduction techniques and their application to document clustering and classification, showing that feature reduction improves efficiency as well as accuracy. We validate these algorithms using human relevance assignments and categorisation. 1
Naive Bayes for regression
 Machine Learning
, 2000
"... Abstract. Despite its simplicity, the naive Bayes learning scheme performs well on most classification tasks, and is often significantly more accurate than more sophisticated methods. Although the probability estimates that it produces can be inaccurate, it often assigns maximum probability to the c ..."
Abstract

Cited by 13 (0 self)
 Add to MetaCart
Abstract. Despite its simplicity, the naive Bayes learning scheme performs well on most classification tasks, and is often significantly more accurate than more sophisticated methods. Although the probability estimates that it produces can be inaccurate, it often assigns maximum probability to the correct class. This suggests that its good performance might be restricted to situations where the output is categorical. It is therefore interesting to see how it performs in domains where the predicted value is numeric, because in this case, predictions are more sensitive to inaccurate probability estimates. This paper shows how to apply the naive Bayes methodology to numeric prediction (i.e., regression) tasks by modeling the probability distribution of the target value with kernel density estimators, and compares it to linear regression, locally weighted linear regression, and a method that produces “model trees”—decision trees with linear regression functions at the leaves. Although we exhibit an artificial dataset for which naive Bayes is the method of choice, on realworld datasets it is almost uniformly worse than locally weighted linear regression and model trees. The comparison with linear regression depends on the error measure: for one measure naive Bayes performs similarly, while for another it is worse. We also show that standard naive Bayes applied to regression problems by discretizing the target value performs similarly badly. We then present empirical evidence that isolates naive Bayes ’ independence assumption as the culprit for its poor performance in the regression setting. These results indicate that the simplistic statistical assumption that naive Bayes makes is indeed more restrictive for regression than for classification.
Text Classification Beyond the BagofWords Representation
"... Most known text classifiers represent documents as bags of words and process documents as a whole. In practice, e.g., when handling long documents with sections, it would be useful to have algorithms that can seamlessly switch from labeling documents to labeling regions, or individual tokens, and be ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
Most known text classifiers represent documents as bags of words and process documents as a whole. In practice, e.g., when handling long documents with sections, it would be useful to have algorithms that can seamlessly switch from labeling documents to labeling regions, or individual tokens, and be able to account for the sequence and context of words at least in a limited way. We discuss how both of these issues can be addressed by treating text classification as an instance of a generalized token labeling problem which we define. We show that Hidden Markov models (HMMs) cover this range of labeling problems and point out that in our setting a HMM is in fact a direct generalization of the naive Bayes classifier and allows limited context to be treated. We derive algorithms for the associated learning problems that cover the full range from completely labeled data, to labels that impose only weak constraints on the possible state sequences. In experiments, we demonstrate the clear advantages of HMMs for classification.
Using fuzzy clustering to improve naive Bayes classifiers and probabilistic networks
 Proceedings of Ninth IEEE International Conference on Fuzzy Systems (FUZZ IEEE 2000
, 2000
"... Abstract — Although probabilistic networks and fuzzy clustering may seem to be disparate areas of research, they can both be seen as generalizations of naive Bayes classifiers. If all descriptive attributes are numeric, naive Bayes classifiers often assume an axisparallel multidimensional normal di ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
Abstract — Although probabilistic networks and fuzzy clustering may seem to be disparate areas of research, they can both be seen as generalizations of naive Bayes classifiers. If all descriptive attributes are numeric, naive Bayes classifiers often assume an axisparallel multidimensional normal distribution for each class. Probabilistic networks remove the requirement that the distributions must be axisparallel by taking covariances into account where this is necessary. Fuzzy clustering tries to find general or axisparallel distributions to cluster the data. Although it neglects the class information, it can be used to improve the result of the abovementioned methods by removing the restriction to only one distribution per class. I.
AverageCase Analysis of Classification Algorithms for Boolean Functions and Decision Trees
, 2000
"... We conduct an averagecase analysis of the generalization error rate of classification algorithms with finite model classes. Unlike worstcase approaches, we do not rely on bounds that hold for all possible learning problems. Instead, we study the behavior of a learning algorithm for a given problem ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
We conduct an averagecase analysis of the generalization error rate of classification algorithms with finite model classes. Unlike worstcase approaches, we do not rely on bounds that hold for all possible learning problems. Instead, we study the behavior of a learning algorithm for a given problem, taking properties of the problem and the learner into account. The solution depends only on known quantities (e.g., the sample size), and the histogram of error rates in the model class which we determine for the case that the sought target is a randomly drawn Boolean function. We then discuss how the error histogram can be estimated from a given sample and thus show how the analysis can be applied approximately in the more realistic scenario that the target is unknown. Experiments show that our analysis can predict the behavior of decision tree algorithms fairly accurately even if the error histogram is estimated from a sample.
Learning Conceptual Descriptions of Categories
, 1999
"... In this work we propose a model to learn conceptual descriptions of categories from precategorized texts. The model is general and parametric, and it captures most of the statistical approaches to classication as well as allowing the denition of more symbolic learning schemes. The algorithm schem ..."
Abstract
 Add to MetaCart
In this work we propose a model to learn conceptual descriptions of categories from precategorized texts. The model is general and parametric, and it captures most of the statistical approaches to classication as well as allowing the denition of more symbolic learning schemes. The algorithm scheme has been instantiated into three dierent algorithms, which have been implemented and tested on a collection of documents obtained from the Web. As a possible application of the descriptions obtained, classication was done on a test set. Results are somewhat surprising, and stand in contrast with most experiments done in literature, possibly giving hints about a dierent research direction. 1 Introduction 1.1 Motivation The use of statistical analysis often allows the treatment of phenomena whose complexity goes beyond our modeling capabilities. As an example, the theory of chaotic systems shows how some complex phenomenon, such as the meteorological condition, is not describable...
Predicting the Generalization Performance of Cross Validatory Model Selection Criteria
, 2000
"... We conduct an averagecase analysis of the generalization error rate of holdout testing and nfold cross validation "wrappers" for model selection. Unlike previous approaches, we do not rely on worstcase bounds that hold for all possible learning problems. Instead, we study the behavior of a learni ..."
Abstract
 Add to MetaCart
We conduct an averagecase analysis of the generalization error rate of holdout testing and nfold cross validation "wrappers" for model selection. Unlike previous approaches, we do not rely on worstcase bounds that hold for all possible learning problems. Instead, we study the behavior of a learning algorithm with a crossvalidation wrapper for a given problem, taking properties of the problem (that can be estimated using the sample) into account. We have to pay for this (and the efficiency of our solution) by having to make some approximations. Experiments show that our analysis can nevertheless predict the behavior of cross validation wrappers fairly accurately.
Text Classification beyond . . .
 PROCEEDINGS OF THE ICMLWORKSHOP ON TEXT LEARNING
, 2002
"... Most known text classifiers represent documents as bags of words and process documents as a whole. In practice, e.g., when handling long documents with sections, it would be useful to have algorithms that can seamlessly switch from labeling documents to labeling regions, or individual tokens, ..."
Abstract
 Add to MetaCart
Most known text classifiers represent documents as bags of words and process documents as a whole. In practice, e.g., when handling long documents with sections, it would be useful to have algorithms that can seamlessly switch from labeling documents to labeling regions, or individual tokens, and be able to account for the sequence and context of words at least in a limited way. We discuss how both of these issues can be addressed by treating text classification as an instance of a generalized token labeling problem which we define. We show that Hidden Markov models (HMMs) cover this range of labeling problems and point out that in our setting a HMM is in fact a direct generalization of the naive Bayes classifier and allows limited context to be treated. We derive algorithms for the associated learning problems that cover the full range from completely labeled data, to labels that impose only weak constraints on the possible state sequences. In experiments, we demonstrate the clear advantages of HMMs for classification.