Results 1  10
of
50
Toward optimal feature selection
 In 13th International Conference on Machine Learning
, 1995
"... In this paper, we examine a method for feature subset selection based on Information Theory. Initially, a framework for de ning the theoretically optimal, but computationally intractable, method for feature subset selection is presented. We show that our goal should be to eliminate a feature if it g ..."
Abstract

Cited by 387 (9 self)
 Add to MetaCart
In this paper, we examine a method for feature subset selection based on Information Theory. Initially, a framework for de ning the theoretically optimal, but computationally intractable, method for feature subset selection is presented. We show that our goal should be to eliminate a feature if it gives us little or no additional information beyond that subsumed by the remaining features. In particular, this will be the case for both irrelevant and redundant features. We then give an e cient algorithm for feature selection which computes an approximation to the optimal feature selection criterion. The conditions under which the approximate algorithm is successful are examined. Empirical results are given on a number of data sets, showing that the algorithm e ectively handles datasets with a very large number of features.
Hypothesisdriven Constructive Induction in AQ17: A Method and Experiments
, 1992
"... This paper presents a method for constructive induction in which new problemrelevant attributes are generated by analyzing consecutively created inductive hypotheses. The method starts by creating a set of rules from given examples using the AQ algorithm. These rules are then evaluated according to ..."
Abstract

Cited by 111 (34 self)
 Add to MetaCart
This paper presents a method for constructive induction in which new problemrelevant attributes are generated by analyzing consecutively created inductive hypotheses. The method starts by creating a set of rules from given examples using the AQ algorithm. These rules are then evaluated according to a rule quality criterion. Subsets of the bestperforming rules for each decision class are selected to form new attributes. These new attributes are used to reformulate the training examples used in the previous step, and the whole inductive process repeats. This iterative process ends when the performance accuracy of the rules exceeds a predefined threshold In several experiments on learning different welldefined transformations, the method consistently outperformed (in terms of predictive accuracy) the AQ15 rule learning method, GREEDY3 and GROVE decision list learning methods. and REDWOOD and FRINGE decision tree learning methods.
Wrappers For Performance Enhancement And Oblivious Decision Graphs
, 1995
"... In this doctoral dissertation, we study three basic problems in machine learning and two new hypothesis spaces with corresponding learning algorithms. The problems we investigate are: accuracy estimation, feature subset selection, and parameter tuning. The latter two problems are related and are stu ..."
Abstract

Cited by 111 (7 self)
 Add to MetaCart
In this doctoral dissertation, we study three basic problems in machine learning and two new hypothesis spaces with corresponding learning algorithms. The problems we investigate are: accuracy estimation, feature subset selection, and parameter tuning. The latter two problems are related and are studied under the wrapper approach. The hypothesis spaces we investigate are: decision tables with a default majority rule (DTMs) and oblivious readonce decision graphs (OODGs).
Sample compression, learnability, and the VapnikChervonenkis dimension
 MACHINE LEARNING
, 1995
"... Within the framework of paclearning, we explore the learnability of concepts from samples using the paradigm of sample compression schemes. A sample compression scheme of size k for a concept class C ` 2 X consists of a compression function and a reconstruction function. The compression function r ..."
Abstract

Cited by 66 (4 self)
 Add to MetaCart
Within the framework of paclearning, we explore the learnability of concepts from samples using the paradigm of sample compression schemes. A sample compression scheme of size k for a concept class C ` 2 X consists of a compression function and a reconstruction function. The compression function receives a finite sample set consistent with some concept in C and chooses a subset of k examples as the compression set. The reconstruction function forms a hypothesis on X from a compression set of k examples. For any sample set of a concept in C the compression set produced by the compression function must lead to a hypothesis consistent with the whole original sample set when it is fed to the reconstruction function. We demonstrate that the existence of a sample compression scheme of fixedsize for a class C is sufficient to ensure that the class C is paclearnable. Previous work has shown that a class is paclearnable if and only if the VapnikChervonenkis (VC) dimension of the class i...
Relating data compression and learnability
, 1986
"... We explore the learnability of twovalued functions from samples using the paradigm of Data Compression. A first algorithm (compression) choses a small subset of the sample which is called the kernel. A second algorithm predicts future values of the function from the kernel, i.e. the algorithm acts ..."
Abstract

Cited by 56 (1 self)
 Add to MetaCart
(Show Context)
We explore the learnability of twovalued functions from samples using the paradigm of Data Compression. A first algorithm (compression) choses a small subset of the sample which is called the kernel. A second algorithm predicts future values of the function from the kernel, i.e. the algorithm acts as an hypothesis for the function to be learned. The second algorithm must be able to reconstruct the correct function values when given a point of the original sample. We demonstrate that the existence of a suitable data compression scheme is sufficient to ensure learnability. We express the probability that the hypothesis predicts the function correctly on a random sample point as a function of the sample and kernel sizes. No assumptions are made on the probability distributions according to which the sample points are generated. This approach provides an alternative to that of [BEHW86], which uses the VapnikChervonenkis dimension to classify learnable geometric concepts. Our bounds are derived directly from the kernel size of the algorithms rather than from the VapnikChervonenkis dimension of the hypothesis class. The proofs are simpler and the introduced compression scheme provides a rigorous model for studying data compression in connection with machine learning. 1
A Formal Definition of Intelligence Based on an Intensional Variant of Algorithmic Complexity
 In Proceedings of the International Symposium of Engineering of Intelligent Systems (EIS'98
, 1998
"... Machine Due to the current technology of the computers we can use, we have chosen an extremely abridged emulation of the machine that will effectively run the programs, instead of more proper languages, like lcalculus (or LISP). We have adapted the "toy RISC" machine of [Hernndez & H ..."
Abstract

Cited by 29 (18 self)
 Add to MetaCart
Machine Due to the current technology of the computers we can use, we have chosen an extremely abridged emulation of the machine that will effectively run the programs, instead of more proper languages, like lcalculus (or LISP). We have adapted the "toy RISC" machine of [Hernndez & Hernndez 1993] with two remarkable features inherited from its objectoriented coding in C++: it is easily tunable for our needs, and it is efficient. We have made it even more reduced, removing any operand in the instruction set, even for the loop operations. We have only three registers which are AX (the accumulator), BX and CX. The operations Q b we have used for our experiment are in Table 1: LOOPTOP Decrements CX. If it is not equal to the first element jump to the program top.
Dominance Detection in Meetings Using Easily Obtainable Features
 In Bourlard, H., & Renals, S. (Eds.), Revised Selected Papers of the 2nd Joint Workshop on Multimodal Interaction and Related Machine Learning Algorithms
, 2005
"... We show that, using a Support Vector Machine classifier, it is possible to determine with a 75% success rate who dominated a particular meeting on the basis of a few basic features. We discuss the corpus we have used, the way we had people judge dominance and the features that were used. ..."
Abstract

Cited by 24 (1 self)
 Add to MetaCart
(Show Context)
We show that, using a Support Vector Machine classifier, it is possible to determine with a 75% success rate who dominated a particular meeting on the basis of a few basic features. We discuss the corpus we have used, the way we had people judge dominance and the features that were used.
A refinement operator based learning algorithm for the ALC description logic
, 2007
"... Abstract With the advent of the Semantic Web, description logics have become one of the most prominent paradigms for knowledge representation and reasoning. Progress in research and applications, however, faces a bottleneck due to the lack of available knowledge bases, and it is paramount that suita ..."
Abstract

Cited by 21 (8 self)
 Add to MetaCart
(Show Context)
Abstract With the advent of the Semantic Web, description logics have become one of the most prominent paradigms for knowledge representation and reasoning. Progress in research and applications, however, faces a bottleneck due to the lack of available knowledge bases, and it is paramount that suitable automated methods for their acquisition will be developed. In this paper, we provide the first learning algorithm based on refinement operators for the most fundamental description logic ALC. We develop the algorithm from thorough theoretical foundations and report on a prototype implementation. 1
Complexity Theoretic Hardness Results for Query Learning
 COMPUTATIONAL COMPLEXITY
, 1998
"... We investigate the complexity of learning for the wellstudied model in which the learning algorithm may ask membership and equivalence queries. While complexity theoretic techniques have previously been used to prove hardness results in various learning models, these techniques typically are no ..."
Abstract

Cited by 20 (5 self)
 Add to MetaCart
We investigate the complexity of learning for the wellstudied model in which the learning algorithm may ask membership and equivalence queries. While complexity theoretic techniques have previously been used to prove hardness results in various learning models, these techniques typically are not strong enough to use when a learning algorithm may make membership queries. We develop a general technique for proving hardness results for learning with membership and equivalence queries (and for more general query models). We apply the technique to show that, assuming NP != coNP, no polynomialtime membership and (proper) equivalence query algorithms exist for exactly learning readthrice DNF formulas, unions of k 3 halfspaces over the Boolean domain, or some other related classes. Our hardness results are representation dependent, and do not preclude the existence of representation independent algorithms. The general