Results 11 - 20
of
259
Towards Language Independent Automated Learning of Text Categorization Models
- IN PROCEEDINGS OF THE 17TH ANNUAL ACM/SIGIR CONFERENCE
, 1994
"... We describe the results of extensivemachine learning experiments on large collections of Reuters' English and German newswires. The goal of these experiments was to automatically discover classification patterns that can be used for assignment of topics to the individual newswires. Our results wi ..."
Abstract
-
Cited by 79 (2 self)
- Add to MetaCart
We describe the results of extensivemachine learning experiments on large collections of Reuters' English and German newswires. The goal of these experiments was to automatically discover classification patterns that can be used for assignment of topics to the individual newswires. Our results with the English newswire collection show a very large gain in performance as compared to published benchmarks, while our initial results with the German newswires appear very promising. We present our methodology, which seems to be insensitive to the language of the document collections, and discuss issues related to the differences in results that wehave obtained for the two collections.
The Power of Decision Tables
- Proceedings of the European Conference on Machine Learning
, 1995
"... . We evaluate the power of decision tables as a hypothesis space for supervised learning algorithms. Decision tables are one of the simplest hypothesis spaces possible, and usually they are easy to understand. Experimental results show that on artificial and real-world domains containing only discre ..."
Abstract
-
Cited by 79 (5 self)
- Add to MetaCart
. We evaluate the power of decision tables as a hypothesis space for supervised learning algorithms. Decision tables are one of the simplest hypothesis spaces possible, and usually they are easy to understand. Experimental results show that on artificial and real-world domains containing only discrete features, IDTM, an algorithm inducing decision tables, can sometimes outperform state-of-the-art algorithms such as C4.5. Surprisingly, performance is quite good on some datasets with continuous features, indicating that many datasets used in machine learning either do not require these features, or that these features have few values. We also describe an incremental method for performing crossvalidation that is applicable to incremental learning algorithms including IDTM. Using incremental cross-validation, it is possible to cross-validate a given dataset and IDTM in time that is linear in the number of instances, the number of features, and the number of label values. The time for incre...
A knowledge-intensive genetic algorithm for supervised learning
, 1993
"... Abstract. Supervised learning in attribute-based spaces is one of the most popular machine learning problems studied and, consequently, has attracted considerable attention of the genetic algorithm community. The fullmemory approach developed here uses the same nigh-level descriptive language that i ..."
Abstract
-
Cited by 75 (1 self)
- Add to MetaCart
Abstract. Supervised learning in attribute-based spaces is one of the most popular machine learning problems studied and, consequently, has attracted considerable attention of the genetic algorithm community. The fullmemory approach developed here uses the same nigh-level descriptive language that is used in rule-based systems. This allows for an easy utilization of inference rules of the well-known inductive learning methodology, which replace the traditional domain-independent operators and make the search task-specific. Moreover, a closer relationship between the underlying task and the processing mechanisms provides a setting for an application of more powerful task-specific heuristics. Initial results obtained with a prototype implementation for the simplest case of single concepts indicate that genetic algorithms can be effectively used to process nigh-level concepts and incorporate task-specific knowledge. The method of abstracting the genetic algorithm to the problem level, described here for the supervised inductive learning, can be also extended to other domains and tasks, since it provides a framework for combining recently popular genetic algorithm methods with traditional problem-solving methodologies. Moreover, in this particular case, it provides a very powerful tool enabling study of the widely accepted but not so well understood inductive learning methodology.
Issues in Stacked Generalization
- Journal of Artificial Intelligence Research
, 1999
"... Stacked generalization is a general method of using a high-level model to combine lower-level models to achieve greater predictive accuracy. In this paper we address two crucial issues which have been considered to be a `black art' in classification tasks ever since the introduction of stacked gener ..."
Abstract
-
Cited by 71 (1 self)
- Add to MetaCart
Stacked generalization is a general method of using a high-level model to combine lower-level models to achieve greater predictive accuracy. In this paper we address two crucial issues which have been considered to be a `black art' in classification tasks ever since the introduction of stacked generalization in 1992 by Wolpert: the type of generalizer that is suitable to derive the higher-level model, and the kind of attributes that should be used as its input. We find that best results are obtained when the higher-level model combines the confidence (and not just the predictions) of the lower-level ones.
Using taxonomy, discriminants, and signatures for navigating in text databases
- In Proceedings of the 23rd VLDB Conference
, 1997
"... We explore how to organize a text database hierarchically to aid better searching and browsing. We propose to exploit the natural hierarchy of topics, or taxonomy, that many corpora,suchas internet directories, digital libraries, and patent databases enjoy. In our system, the user navigates through ..."
Abstract
-
Cited by 67 (4 self)
- Add to MetaCart
We explore how to organize a text database hierarchically to aid better searching and browsing. We propose to exploit the natural hierarchy of topics, or taxonomy, that many corpora,suchas internet directories, digital libraries, and patent databases enjoy. In our system, the user navigates through the query response not as a at unstructured list, but embedded in the familiar taxonomy, and annotated with document signatures computed dynamically with respect to where the user is located at any time. Weshowhowto update such databases with new documents with high speed and accuracy. Weuse techniques from statistical pattern recognition to e ciently separate the feature words or discriminants from the noise words at each node of the taxonomy. Using these, we build a multi-level classi er. At each node, this classi er can ignore the large number of noise words in a document. Thus the classi er has a small model size and is very fast. However, owing to the use of context-sensitive features, the classi er is very accurate. We report on experiences with the Reuters newswire benchmark, the US Patent database, and web document samples from Yahoo!. 1
Relational Instance-Based Learning
- Proceedings of the Thirteenth International Conference on Machine Learning
, 1996
"... A relational instance-based learning algorithm, called Ribl, is motivated and developed in this paper. We argue that instancebased methods o#er solutions to the often unsatisfactory behavior of current inductive logic programming #ILP# approaches in domains with continuous attribute values a ..."
Abstract
-
Cited by 65 (1 self)
- Add to MetaCart
A relational instance-based learning algorithm, called Ribl, is motivated and developed in this paper. We argue that instancebased methods o#er solutions to the often unsatisfactory behavior of current inductive logic programming #ILP# approaches in domains with continuous attribute values and in domains with noisy attributes and#or examples. Three research issues that emerge when a propositional instance-based learner is adapted to a #rst-order representation are identi#ed: #1# construction of cases from the knowledge base, #2# computation of similaritybetween arbitrarily complex cases, and #3# estimation of the relevance of predicates and attributes. Solutions to these issues are developed. Empirical results indicate that Ribl is able to achieve high classi#cation accuracy in a variety of domains. to appear in: Proc. 13th International Conference on Machine Learning, L. Saitta #ed.#, Morgan Kaufmann, 1996 1 Introduction The #eld of Inductive Logic Programming ...
Regression Modeling in Back-Propagation and Projection Pursuit Learning
, 1994
"... We studied and compared two types of connectionist learning methods for model-free regression problems in this paper. One is the popular back-propagation learning (BPL) well known in the artificial neural networks literature; the other is the projection pursuit learning (PPL) emerged in recent years ..."
Abstract
-
Cited by 61 (1 self)
- Add to MetaCart
We studied and compared two types of connectionist learning methods for model-free regression problems in this paper. One is the popular back-propagation learning (BPL) well known in the artificial neural networks literature; the other is the projection pursuit learning (PPL) emerged in recent years in the statistical estimation literature. Both the BPL and the PPL are based on projections of the data in directions determined from interconnection weights. However, unlike the use of fixed nonlinear activations (usually sigmoidal) for the hidden neurons in BPL, the PPL systematically approximates the unknown nonlinear activations. Moreover, the BPL estimates all the weights simultaneously at each iteration, while the PPL estimates the weights cyclically (neuron-by-neuron and layer-by-layer) at each iteration. Although the BPL and the PPL have comparable training speed when based on a Gauss-Newton optimization algorithm, the PPL proves more parsimonious in that the PPL requires a fewer hi...
Artificial Intelligence and Intrusion Detection: Current and Future Directions
- In Proceedings of the 17th National Computer Security Conference
, 1994
"... Intrusion Detection systems (IDSs) have previously been built by hand. These systems have difficulty successfully classifying intruders, and require a significant amount of computational overhead making it difficult to create robust real-time IDS systems. Artificial Intelligence techniques can reduc ..."
Abstract
-
Cited by 59 (0 self)
- Add to MetaCart
Intrusion Detection systems (IDSs) have previously been built by hand. These systems have difficulty successfully classifying intruders, and require a significant amount of computational overhead making it difficult to create robust real-time IDS systems. Artificial Intelligence techniques can reduce the human effort required to build these systems and can improve their performance. Learning and induction are used to improve the performance of search problems, while clustering has been used for data analysis and reduction. AI has recently been used in Intrusion Detection (ID) for anomaly detection, data reduction and induction, or discovery, of rules explaining audit data. We survey uses of artificial intelligence methods in ID, and present an example using feature selection to improve the classification of network connections. The network connection classification problem is related to ID since intruders can create "private" communications services undetectable by normal means. We als...
Linear and Order Statistics Combiners for Pattern Classification
- Combining Artificial Neural Nets
, 1999
"... Several researchers have experimentally shown that substantial improvements can be obtained in difficult pattern recognition problems by combining or integrating the outputs of multiple classifiers. This chapter provides an analytical framework to quantify the improvements in classification resul ..."
Abstract
-
Cited by 56 (6 self)
- Add to MetaCart
Several researchers have experimentally shown that substantial improvements can be obtained in difficult pattern recognition problems by combining or integrating the outputs of multiple classifiers. This chapter provides an analytical framework to quantify the improvements in classification results due to combining. The results apply to both linear combiners and order statistics combiners. We first show that to a first order approximation, the error rate obtained over and above the Bayes error rate, is directly proportional to the variance of the actual decision boundaries around the Bayes optimum boundary. Combining classifiers in output space reduces this variance, and hence reduces the "added" error. If N unbiased classifiers are combined by simple averaging, the added error rate can be reduced by a factor of N if the individual errors in approximating the decision boundaries are uncorrelated. Expressions are then derived for linear combiners which are biased or correlated, and the effect of output correlations on ensemble performance is quantified. For order statistics based non-linear combiners, we derive expressions that indicate how much the median, the maximum and in general the ith order statistic can improve classifier performance. The analysis presented here facilitates the understanding of the relationships among error rates, classifier boundary distributions, and combining in output space. Experimental results on several public domain data sets are provided to illustrate the benefits of combining and to support the analytical results.

