Results 1 - 10
of
15
Hypothesis Selection and Testing by the MDL Principle
- The Computer Journal
, 1998
"... ses where the variance is known or taken as a parameter. 1. INTRODUCTION Although the term `hypothesis' in statistics is synonymous with that of a probability `model' as an explanation of data, hypothesis testing is not quite the same problem as model selection. This is because usually a particul ..."
Abstract
-
Cited by 47 (2 self)
- Add to MetaCart
ses where the variance is known or taken as a parameter. 1. INTRODUCTION Although the term `hypothesis' in statistics is synonymous with that of a probability `model' as an explanation of data, hypothesis testing is not quite the same problem as model selection. This is because usually a particular hypothesis, called the `null hypothesis', has already been selected as a favorite model and it will be abandoned in favor of another model only when it clearly fails to explain the currently available data. In model selection, by contrast, all the models considered are regarded on the same footing and the objective is simply to pick the one that best explains the data. For the Bayesians certain models may be favored in terms of a prior probability, but in the minimum description length (MDL) approach to be outlined below, prior knowledge of any kind is to be used in selecting the tentative models, which in the end, unlike in the Bayesians' case, can and will be fitted to data
A Data Mining Framework for Constructing Features and Models for Intrusion Detection Systems
, 1999
"... Intrusion detection is an essential component of critical infrastructure protection mechanisms. The traditional pure "knowledge engineering" process of building Intrusion Detection Systems (IDSs) is very slow, expensive, and error-prone. Current IDSs thus have limited extensibility in the face of ch ..."
Abstract
-
Cited by 38 (7 self)
- Add to MetaCart
Intrusion detection is an essential component of critical infrastructure protection mechanisms. The traditional pure "knowledge engineering" process of building Intrusion Detection Systems (IDSs) is very slow, expensive, and error-prone. Current IDSs thus have limited extensibility in the face of changed or upgraded network configurations, and poor adaptability in the face of new attack methods. This thesis describes a novel framework, MADAM ID, for Mining Audit Data for Automated Models for Intrusion Detection. Classification rules are inductively learned from audit records and used as intrusion detection models. A critical requirement for the rules to be effective detection models is that an appropriate set of features need to be first constructed and included in the audit records. A key contribution of the thesis is thus in automatic "feature construction". Using MADAM ID, raw ...
Is random model better? on its accuracy and efficiency
- In Proceedings of Third IEEE International Conference on Data Mining (ICDM-2003
, 2003
"... Inductive learning searches an optimal hypothesis that minimizes a given loss function. It is usually assumed that the simplest hypothesis that fits the data is the best approximate to an optimal hypothesis. Since finding the simplest hypothesis is NP-hard for most representations, we generally empl ..."
Abstract
-
Cited by 19 (9 self)
- Add to MetaCart
Inductive learning searches an optimal hypothesis that minimizes a given loss function. It is usually assumed that the simplest hypothesis that fits the data is the best approximate to an optimal hypothesis. Since finding the simplest hypothesis is NP-hard for most representations, we generally employ various heuristics to search its closest match. Computing these heuristics incurs significant cost, making learning inefficient and unscalable for large dataset. In the same time, it is still questionable if the simplest hypothesis is indeed the closest approximate to the optimal model. Recent success of combining multiple models, such as bagging, boosting and meta-learning, has greatly improved the accuracy of the simplest hypothesis, providing a strong argument against the optimality of the simplest hypothesis. However, computing these combined hypotheses incurs significantly higher cost. In this paper, we first advert that as long as the error of a hypothesis on each example is within a range dictated by a given loss function, it can still be optimal. Contrary to common beliefs, we propose a completely random decision tree algorithm that achieves much higher accuracy than the single best hypothesis and is comparable to boosted or bagged multiple best hypotheses. The advantage of multiple random tree is its training efficiency as well as minimal memory requirement. 1.
Understanding the crucial differences between classification and discovery of association rules – a position paper
- ACM SIGKDD Explorations
, 2000
"... The goal of this position paper is to contribute to a clear understanding of the profound differences between the association-rule discovery and the classification tasks. We argue that the classification task can be considered an ill-defined, nondeterministic task, which is unavoidable given the fac ..."
Abstract
-
Cited by 16 (5 self)
- Add to MetaCart
The goal of this position paper is to contribute to a clear understanding of the profound differences between the association-rule discovery and the classification tasks. We argue that the classification task can be considered an ill-defined, nondeterministic task, which is unavoidable given the fact that it involves prediction; while the standard association task can be considered a well-defined, deterministic, relatively simple task, which does not involve prediction in the same sense as the classification task does.
Process-Oriented Estimation of Generalization Error
- In Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence
, 1999
"... Methods to avoid overfitting fall into two broad categories: data-oriented (using separate data for validation) and representation-oriented (penalizing complexity in the model). Both have limitations that are hard to overcome. We argue that fully adequate model evaluation is only possible if t ..."
Abstract
-
Cited by 13 (2 self)
- Add to MetaCart
Methods to avoid overfitting fall into two broad categories: data-oriented (using separate data for validation) and representation-oriented (penalizing complexity in the model). Both have limitations that are hard to overcome. We argue that fully adequate model evaluation is only possible if the search process by which models are obtained is also taken into account. To this end, we recently proposed a method for process-oriented evaluation (POE), and successfully applied it to rule induction [ Domingos, 1998b ] . However, for the sake of simplicity this treatment made a number of rather artificial assumptions. In this paper the assumptions are removed, and a simple formula for error estimation is obtained. Empirical trials show the new, better-founded form of POE to be as accurate as the previous one, while further reducing theory sizes. 1 Introduction Overfitting avoidance is a central problem in machine learning. If a learner is su#ciently powerful, whatever repre...
Genetic Programming for Knowledge Discovery in Chest Pain Diagnosis
- IEEE Engineering in Medicine and Biology Magazine 19(4
, 2000
"... This work aims at discovering classification rules for diagnosing certain pathologies. These rules are capable of discriminating among 12 different pathologies, whose main symptom is chest pain. In order to discover these rules we have used genetic programming as well as some concepts of data mining ..."
Abstract
-
Cited by 13 (4 self)
- Add to MetaCart
This work aims at discovering classification rules for diagnosing certain pathologies. These rules are capable of discriminating among 12 different pathologies, whose main symptom is chest pain. In order to discover these rules we have used genetic programming as well as some concepts of data mining, with emphasis on the discovery of comprehensible knowledge. The fitness function used combines a measure of rule comprehensibility with two usual indicators in medical domain: sensitivity and specificity. Results regarding the predictive accuracy of the discovered rule set as a whole and the predictive accuracy of individual rules are presented and compared to other approaches.
Automatic Bias Learning: An Inquiry into the Inductive Basis of Induction
, 1999
"... This thesis combines an epistemological concern about induction with a computational exploration of inductive mechanisms. It aims to investigate how inductive performance could be improved by using induction to select appropriate generalisation procedures. The thesis revolves around a meta-learning ..."
Abstract
-
Cited by 9 (5 self)
- Add to MetaCart
This thesis combines an epistemological concern about induction with a computational exploration of inductive mechanisms. It aims to investigate how inductive performance could be improved by using induction to select appropriate generalisation procedures. The thesis revolves around a meta-learning system, called designed to investigate how inductive performances could be improved by using induction to select appropriate generalisation procedures. The performance of is discussed against the background of epistemological issues concerning induction, such as the role of theoretical vocabularies and the value of simplicity.
A Compact and Accurate Model for Classification
- IEEE Transactions on Knowledge and Data Engineering
, 2004
"... We describe and evaluate an information-theoretic algorithm for datadriven induction of classification models based on a minimal subset of available features. The relationship between input (predictive) features and the target (classification) attribute is modeled by a tree-like structure termed an ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
We describe and evaluate an information-theoretic algorithm for datadriven induction of classification models based on a minimal subset of available features. The relationship between input (predictive) features and the target (classification) attribute is modeled by a tree-like structure termed an information network (IN). Unlike other decision-tree models, the information network uses the same input attribute across the nodes of a given layer (level). The input attributes are selected incrementally by the algorithm to maximize a global decrease in the conditional entropy of the target attribute. We are using the prepruning approach: when no attribute causes a statistically significant decrease in the entropy, the network construction is stopped. The algorithm is shown empirically to produce much more compact models than other methods of decision-tree learning, while preserving nearly the same level of classification accuracy.
Pareto-Optimal Patterns in Logical Analysis of Data
- Discrete Applied Mathematics
, 2001
"... Patterns are the key building blocks in the logical analysis of data (LAD). It has been observed in empirical studies and practical applications that some patterns are more "suitable" than others for use in LAD. In this paper, we model various such suitability criteria as partial preorders define ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Patterns are the key building blocks in the logical analysis of data (LAD). It has been observed in empirical studies and practical applications that some patterns are more "suitable" than others for use in LAD. In this paper, we model various such suitability criteria as partial preorders defined on the set of patterns.
Instance-Based Regression by Partitioning Feature Projections
- APPLIED INTELLIGENCE
, 2004
"... A new instance-based learning method is presented for regression problems with high-dimensional data. As an instance-based approach, the conventional method, KNN, is very popular for classification. Although KNN performs well on classification tasks, it does not perform as well on regression problem ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
A new instance-based learning method is presented for regression problems with high-dimensional data. As an instance-based approach, the conventional method, KNN, is very popular for classification. Although KNN performs well on classification tasks, it does not perform as well on regression problems. We have developed a new instance-based method, called Regression by Partitioning Feature Projections (RPFP) which is designed to meet the requirement for a lazy method that achieves high levels of accuracy on regression problems. RPFP gives better performance than well-known eager approaches found in machine learning and statistics such as MARS, rule-based regression, and regression tree induction systems. The most important property of RPFP is that it is a projectionbased approach that can handle interactions. We show that it outperforms existing eager or lazy approaches on many domains when there are many missing values in the training data.

