## Text Categorisation: A Survey (1999)

Citations: | 65 - 0 self |

### BibTeX

@TECHREPORT{Aas99textcategorisation:,

author = {Kjersti Aas and Line Eikvil},

title = {Text Categorisation: A Survey},

institution = {},

year = {1999}

}

### Years of Citing Articles

### OpenURL

### Abstract

### Citations

5438 |
C4.5: Programs for Machine Learning
- Quinlan
- 1993
(Show Context)
Citation Context ... in the new training set, and the tree that performs this task with the lowest overall error rate is declared the winner. 3.4.3 Other algorithms Two other well-known decision tree algorithms are C4.5 =-=[22]-=- and CHAID [16]. The di erences between these algorithms and CART will be brie y described in this section. C4.5 is the most recent version of the decision-tree algorithm that Quinlan has been evolvin... |

4457 | Classification and Regression Trees - Breiman, Friedman, et al. - 1984 |

4178 | Pattern Classification and Scene Analysis - Duda, Hart - 1973 |

3423 |
Introduction to Modern Information Retrieval
- Salton, McGill
- 1983
(Show Context)
Citation Context ..., walked, and walking. The Porter stemmer [21] is a well-known algorithm for this task. 2.2 Indexing The perhaps most commonly used document representation is the so called vector space model (SMART) =-=[24]-=-. In the vector space model, documents are represented by vectors of words. Usually, one has a collection of documents which is represented by a word-bydocument matrix A, where each entry represents t... |

3025 | Indexing by latent semantic analysis
- Deerwester, Dumais, et al.
- 1990
(Show Context)
Citation Context ...parameterisation is the process of constructing new features as combinations or transformations of the original features. In this section we describe one such approach� Latent Semantic Indexing (LSI) =-=[2, 7]-=-. LSI is based on the assumption that there is some underlying or latent structure in the pattern of word usage across documents, and that statistical techniques can be used to estimate this structure... |

2765 | Bagging predictors
- Breiman
- 1996
(Show Context)
Citation Context ...tween the two types is the way the di erent versions of the training set are created. In what follows a closer description of the two types of algorithms is given. 3.6.1 The Bagging Algorithm Bagging =-=[4]-=- takes as input a classi cation algorithm f( ) and a training set T and returns a set of classi ers f ( )=ff1( )� :::::� fR( )g. Here fr( ) is a classi er that is learned from a bootstrap sample Tr of... |

1976 | An algorithm for suffix stripping - PORTER - 1980 |

1879 | Text categorization with support vector machines: learning with many relevant features.” European conference on machine learning
- Joachims
- 1998
(Show Context)
Citation Context ... how large this probability must be (speci ed by a threshold) for the document to be assigned to the category. By adjusting this threshold, one can achieve di erent levels of recall and precision. In =-=[13]-=- the results for di erent thresholds are combined using interpolation. 4.2 Multi-class and multi-label classi cation To measure the performance of a classi er that produces a ranked list of categories... |

1774 | Experiments with a new boosting algorithm
- Freund, Schapire
- 1996
(Show Context)
Citation Context ...d)� :::� fR(d). The result of the voting classi er is the class that obtains the most votes from the single classi ers when applied to d: f (d) = argmax y X r:fr(d)=y 1 (3.23) 3.6.2 Boosting Boosting =-=[11]-=- encompasses a family of methods. Like bagging, these methods choose a training set of size N for classi er fr by randomly selecting with replacement examples from the original training set. Unlike ba... |

1693 | Term weighting approaches in automatic text retrieval
- Salton, Buckley
- 1988
(Show Context)
Citation Context ... collection for which theword occurs at least once. aik = fik log N ni (2.4) tfc-weighting The tf idf-weighting does not take into account that documents may be of di erent lengths. The tfc-weighting =-=[25]-=- is similar to the tf idf-weighting except for the fact that length normalisation is used as part of the word weighting formula. aik = fik log N n i r PMj=1 h fjk log N n j i 2 (2.5) ltc-weighting A s... |

1320 | Additive logistic regression: a statistical view of boosting - Friedman, Hastie, et al. |

918 |
Relevance feedback information retrieval
- Rocchio
- 1971
(Show Context)
Citation Context ...ion, including regression models [30], nearest neighbour classi ers [30], decision trees [19], Bayesian classi ers [19], Support Vector Machines [15], rule learning algorithms [6], relevance feedback =-=[23]-=-, voted classi cation [27], and neural networks [28]. In this report we giveasurvey of the state-of-the-art in text categorisation. To be able to measure progress in this eld, it is important to use a... |

574 | Using linear algebra for intelligent information retrieval
- Berry, Dumais, et al.
- 1996
(Show Context)
Citation Context ...parameterisation is the process of constructing new features as combinations or transformations of the original features. In this section we describe one such approach� Latent Semantic Indexing (LSI) =-=[2, 7]-=-. LSI is based on the assumption that there is some underlying or latent structure in the pattern of word usage across documents, and that statistical techniques can be used to estimate this structure... |

557 | Inductive learning algorithms and representations for text categorization
- Dumais, Platt, et al.
- 1998
(Show Context)
Citation Context ...In this section we describe previous work on text categorisation where the Reuters-21578 collection has been used to evaluate the methods. The papers that are described are the works of Dumais et al. =-=[10]-=-, Joachims [13], Shapire et al. [26], Weiss et al. [27] and Yang [29]. Tables 5.1 and 5.2 summarise the papers. All authors have used the ModApte split. The rst table contain the number of training an... |

554 | An Evaluation of Statistical Approaches to Text Categorization
- Yang
- 1999
(Show Context)
Citation Context ... an approach called interpolated Norwegian Computing Center, P.B. 114 Blindern, N-0314 Oslo, Norway Tel.:(+47)22852500Fax: (+47) 22 69 76 60Text Categorisation: ASurvey 23 11-point average precision =-=[29]-=- may be used. In this approach the recall for one speci c document is de ned to be: recall = Number of categories found that are correct Total number of true categories (4.7) For each of 11 values 0.0... |

507 | A sequential algorithm for training text classiers
- Lewis, Gale
- 1994
(Show Context)
Citation Context ...eak-even point of the system. The break-even point has been commonly used in text categorisation evaluations. F-measure Another evaluation criterion that combines recall and precision is the Fmeasure =-=[18]-=-: F = ( 2 +1) precision recall (4.6) 2 precision + recall where is a parameter allowing di erent weighting of recall and precision. Interpolation For some methods, the category assignments are made by... |

379 | A probabilistic analysis of the rocchio algorithm with tfidf for text categorization
- Joachims
- 1997
(Show Context)
Citation Context ...class cj is computed as the average vector over all training document vectors that belong to class cj. This means that learning is very fast for this method. 3.2 Naive Bayes The naive Bayes classi er =-=[14]-=- is constructed by using the training data to estimate the probability of each class given the document feature values of a new instance. We use Bayes theorem to estimate the probabilities: P (cjjd) =... |

292 | A comparison of two learning algorithms for text categorization
- Lewis, Ringuette
- 1994
(Show Context)
Citation Context ...ion. A number of statistical classi cation and machine learning techniques has been applied to text categorisation, including regression models [30], nearest neighbour classi ers [30], decision trees =-=[19]-=-, Bayesian classi ers [19], Support Vector Machines [15], rule learning algorithms [6], relevance feedback [23], voted classi cation [27], and neural networks [28]. In this report we giveasurvey of th... |

264 | Context-sensitive learning methods for text categorization
- Cohen, Singer
- 1996
(Show Context)
Citation Context ...lied to text categorisation, including regression models [30], nearest neighbour classi ers [30], decision trees [19], Bayesian classi ers [19], Support Vector Machines [15], rule learning algorithms =-=[6]-=-, relevance feedback [23], voted classi cation [27], and neural networks [28]. In this report we giveasurvey of the state-of-the-art in text categorisation. To be able to measure progress in this eld,... |

245 |
Improving the retrieval of information from external sources
- Dumais
- 1991
(Show Context)
Citation Context ...ik = log(fik +1:0) log N n i r PMj=1 h log(fjk +1:0) log N n j i 2 (2.6) Entropy weighting Entropy-weighting is based on information theoretic ideas and is the most sophisticated weighting scheme. In =-=[9]-=- it turned out to be the most e ective scheme in comparison with 6 others. Averaged over ve test collections, it was for instance 40 % more e ective than word frequency weighting. In the entropy-weigh... |

204 |
An exploratory technique for investigating large quantities of categorical data
- KASS
- 1980
(Show Context)
Citation Context ...ining set, and the tree that performs this task with the lowest overall error rate is declared the winner. 3.4.3 Other algorithms Two other well-known decision tree algorithms are C4.5 [22] and CHAID =-=[16]-=-. The di erences between these algorithms and CART will be brie y described in this section. C4.5 is the most recent version of the decision-tree algorithm that Quinlan has been evolving and re ning f... |

194 |
Classi cation and Regression Trees
- Breiman, Friedman, et al.
- 1984
(Show Context)
Citation Context ...ecision tree to determine whether the document isrelevant to the user or not. The decision tree is constructed from the training samples, and one the most popular approaches for this task is the CART =-=[3]-=- algorithm that will be described here. 3.4.1 Creating the tree (CART) CART builds a binary decision tree by splitting the set of training vectors at each node according to a function of one single ve... |

175 | Automatic Query Expansion Using SMART: TREC 3
- Buckley, Salton, et al.
- 1994
(Show Context)
Citation Context ...weighting except for the fact that length normalisation is used as part of the word weighting formula. aik = fik log N n i r PMj=1 h fjk log N n j i 2 (2.5) ltc-weighting A slightly di erent approach =-=[5]-=- uses the logarithm of the word frequency instead of the raw word frequency, thus reducing the e ects of large di erences in frequencies. aik = log(fik +1:0) log N n i r PMj=1 h log(fjk +1:0) log N n ... |

163 | A Neural Network Approach to Topic Spotting
- Wiener
- 1995
(Show Context)
Citation Context ...bour classi ers [30], decision trees [19], Bayesian classi ers [19], Support Vector Machines [15], rule learning algorithms [6], relevance feedback [23], voted classi cation [27], and neural networks =-=[28]-=-. In this report we giveasurvey of the state-of-the-art in text categorisation. To be able to measure progress in this eld, it is important to use a standardised collection of documents for analysis a... |

70 | Svdpackc (version 1.0) user’s guide
- Berry, Do, et al.
- 1993
(Show Context)
Citation Context ... word-document matrix, giving the best \reduced-dimension"approximation to this matrix. To perform the singular value decomposition, we used the Single-Vector Lanczos Method from the SVDPACKC library =-=[1]-=-. The number of calculated singular values (dimensions) was 200. 6.3 Methods The kNN method is a very simple approach that has previously shown very good performance on text categorisation tasks (See ... |

61 |
Feature selection in statistical learning of text categorization
- YANG, PEDERSEN
- 1997
(Show Context)
Citation Context ...n developing technologies for automatic text categorisation. A number of statistical classi cation and machine learning techniques has been applied to text categorisation, including regression models =-=[30]-=-, nearest neighbour classi ers [30], decision trees [19], Bayesian classi ers [19], Support Vector Machines [15], rule learning algorithms [6], relevance feedback [23], voted classi cation [27], and n... |

28 | Information management tools for updating an SVD-encoded indexing scheme
- O'Brien
- 1994
(Show Context)
Citation Context ...already are in the training set. In this case, the new word usage data may potentially be lost or misrepresented. A third method, SVD-updating that deals with this problem has recently been developed =-=[20]-=-. However, SVD-updating requires slightly more time and memory than the folding-in approach, meaning that neither approach appears to be uniformly superior over the other. Norwegian Computing Center, ... |

6 |
E.Hart.Pattern Classi cation and Scene Analysis
- Duda, P
- 1973
(Show Context)
Citation Context ... N-0314 Oslo, Norway Tel.:(+47)22852500Fax: (+47) 22 69 76 60Text Categorisation: ASurvey 14 3.3 K-nearest neighbour To classify an unknown document vector d, the k-nearest neighbour (kNN) algorithm =-=[8]-=- ranks the document's neighbours among the training document vectors, and use the class labels of the k most similar neighbours to predict the class of the input document. The classes of these neighbo... |

5 |
Automatic Text Categorization Using Support Vector
- Kwok
- 1998
(Show Context)
Citation Context ...is built. 3.5 Support Vector Machines Support Vector Machines (SVMs) have shown to yield good generalisation performance on a wide variety of classi cation problems, most recently text categorisation =-=[15, 17]-=-. The SVM integrates dimension reduction and classi cation. It is only applicable for binary classi cation tasks, meaning that, using this method text categorisation has to be treated as a series of d... |

1 |
BoosTexter: A System for Multi-Label Text Categorization
- Shapire, Singer
- 1998
(Show Context)
Citation Context ... In addition to that, it doesn't handle cases where a document maybelong to more than one class. In this following section we present an extension of the original algorithm, the AdaBoost.MH algorithm =-=[26]-=-, that can e ciently handle multi-label problems. AdaBoost.MH Let the weight of sample di and label ck in iteration r be pikr. Initially, all weights are equal, i.e. pik1 = 1=N for all samples di and ... |

1 |
Maximizing Text-Mining Performance, Will be appearing
- Weiss, Apte, et al.
(Show Context)
Citation Context ...models [30], nearest neighbour classi ers [30], decision trees [19], Bayesian classi ers [19], Support Vector Machines [15], rule learning algorithms [6], relevance feedback [23], voted classi cation =-=[27]-=-, and neural networks [28]. In this report we giveasurvey of the state-of-the-art in text categorisation. To be able to measure progress in this eld, it is important to use a standardised collection o... |

1 |
An Algorithm for Su x
- Porter
- 1980
(Show Context)
Citation Context ...d stemming we mean the process of su x removal to generate word stems. This is done to group words that have the same conceptual meaning, such as walk, walker, walked, and walking. The Porter stemmer =-=[21]-=- is a well-known algorithm for this task. 2.2 Indexing The perhaps most commonly used document representation is the so called vector space model (SMART) [24]. In the vector space model, documents are... |