Results 1  10
of
55
Toward optimal feature selection
 In 13th International Conference on Machine Learning
, 1995
"... In this paper, we examine a method for feature subset selection based on Information Theory. Initially, a framework for de ning the theoretically optimal, but computationally intractable, method for feature subset selection is presented. We show that our goal should be to eliminate a feature if it g ..."
Abstract

Cited by 472 (9 self)
 Add to MetaCart
In this paper, we examine a method for feature subset selection based on Information Theory. Initially, a framework for de ning the theoretically optimal, but computationally intractable, method for feature subset selection is presented. We show that our goal should be to eliminate a feature if it gives us little or no additional information beyond that subsumed by the remaining features. In particular, this will be the case for both irrelevant and redundant features. We then give an e cient algorithm for feature selection which computes an approximation to the optimal feature selection criterion. The conditions under which the approximate algorithm is successful are examined. Empirical results are given on a number of data sets, showing that the algorithm e ectively handles datasets with a very large number of features.
Optimal Prefetching via Data Compression
, 1995
"... Caching and prefetching are important mechanisms for speeding up access time to data on secondary storage. Recent work in competitive online algorithms has uncovered several promising new algorithms for caching. In this paper we apply a form of the competitive philosophy for the first time to the pr ..."
Abstract

Cited by 262 (7 self)
 Add to MetaCart
Caching and prefetching are important mechanisms for speeding up access time to data on secondary storage. Recent work in competitive online algorithms has uncovered several promising new algorithms for caching. In this paper we apply a form of the competitive philosophy for the first time to the problem of prefetching to develop an optimal universal prefetcher in terms of fault ratio, with particular applications to largescale databases and hypertext systems. Our prediction algorithms for prefetching are novel in that they are based on data compression techniques that are both theoretically optimal and good in practice. Intuitively, in order to compress data effectively, you have to be able to predict future data well, and thus good data compressors should be able to predict well for purposes of prefetching. We show for powerful models such as Markov sources and nth order Markov sources that the page fault rates incurred by our prefetching algorithms are optimal in the limit for almost all sequences of page requests.
Wrappers For Performance Enhancement And Oblivious Decision Graphs
, 1995
"... In this doctoral dissertation, we study three basic problems in machine learning and two new hypothesis spaces with corresponding learning algorithms. The problems we investigate are: accuracy estimation, feature subset selection, and parameter tuning. The latter two problems are related and are stu ..."
Abstract

Cited by 122 (7 self)
 Add to MetaCart
In this doctoral dissertation, we study three basic problems in machine learning and two new hypothesis spaces with corresponding learning algorithms. The problems we investigate are: accuracy estimation, feature subset selection, and parameter tuning. The latter two problems are related and are studied under the wrapper approach. The hypothesis spaces we investigate are: decision tables with a default majority rule (DTMs) and oblivious readonce decision graphs (OODGs).
Hypothesisdriven Constructive Induction in AQ17: A Method and Experiments
, 1992
"... This paper presents a method for constructive induction in which new problemrelevant attributes are generated by analyzing consecutively created inductive hypotheses. The method starts by creating a set of rules from given examples using the AQ algorithm. These rules are then evaluated according to ..."
Abstract

Cited by 117 (34 self)
 Add to MetaCart
This paper presents a method for constructive induction in which new problemrelevant attributes are generated by analyzing consecutively created inductive hypotheses. The method starts by creating a set of rules from given examples using the AQ algorithm. These rules are then evaluated according to a rule quality criterion. Subsets of the bestperforming rules for each decision class are selected to form new attributes. These new attributes are used to reformulate the training examples used in the previous step, and the whole inductive process repeats. This iterative process ends when the performance accuracy of the rules exceeds a predefined threshold In several experiments on learning different welldefined transformations, the method consistently outperformed (in terms of predictive accuracy) the AQ15 rule learning method, GREEDY3 and GROVE decision list learning methods. and REDWOOD and FRINGE decision tree learning methods.
Sample compression, learnability, and the VapnikChervonenkis dimension
 MACHINE LEARNING
, 1995
"... Within the framework of paclearning, we explore the learnability of concepts from samples using the paradigm of sample compression schemes. A sample compression scheme of size k for a concept class C ` 2 X consists of a compression function and a reconstruction function. The compression function r ..."
Abstract

Cited by 83 (5 self)
 Add to MetaCart
Within the framework of paclearning, we explore the learnability of concepts from samples using the paradigm of sample compression schemes. A sample compression scheme of size k for a concept class C ` 2 X consists of a compression function and a reconstruction function. The compression function receives a finite sample set consistent with some concept in C and chooses a subset of k examples as the compression set. The reconstruction function forms a hypothesis on X from a compression set of k examples. For any sample set of a concept in C the compression set produced by the compression function must lead to a hypothesis consistent with the whole original sample set when it is fed to the reconstruction function. We demonstrate that the existence of a sample compression scheme of fixedsize for a class C is sufficient to ensure that the class C is paclearnable. Previous work has shown that a class is paclearnable if and only if the VapnikChervonenkis (VC) dimension of the class i...
Relating data compression and learnability
, 1986
"... We explore the learnability of twovalued functions from samples using the paradigm of Data Compression. A first algorithm (compression) choses a small subset of the sample which is called the kernel. A second algorithm predicts future values of the function from the kernel, i.e. the algorithm acts ..."
Abstract

Cited by 65 (1 self)
 Add to MetaCart
(Show Context)
We explore the learnability of twovalued functions from samples using the paradigm of Data Compression. A first algorithm (compression) choses a small subset of the sample which is called the kernel. A second algorithm predicts future values of the function from the kernel, i.e. the algorithm acts as an hypothesis for the function to be learned. The second algorithm must be able to reconstruct the correct function values when given a point of the original sample. We demonstrate that the existence of a suitable data compression scheme is sufficient to ensure learnability. We express the probability that the hypothesis predicts the function correctly on a random sample point as a function of the sample and kernel sizes. No assumptions are made on the probability distributions according to which the sample points are generated. This approach provides an alternative to that of [BEHW86], which uses the VapnikChervonenkis dimension to classify learnable geometric concepts. Our bounds are derived directly from the kernel size of the algorithms rather than from the VapnikChervonenkis dimension of the hypothesis class. The proofs are simpler and the introduced compression scheme provides a rigorous model for studying data compression in connection with machine learning. 1
A Formal Definition of Intelligence Based on an Intensional Variant of Algorithmic Complexity
 In Proceedings of the International Symposium of Engineering of Intelligent Systems (EIS'98
, 1998
"... Machine Due to the current technology of the computers we can use, we have chosen an extremely abridged emulation of the machine that will effectively run the programs, instead of more proper languages, like lcalculus (or LISP). We have adapted the "toy RISC" machine of [Hernndez & H ..."
Abstract

Cited by 38 (19 self)
 Add to MetaCart
Machine Due to the current technology of the computers we can use, we have chosen an extremely abridged emulation of the machine that will effectively run the programs, instead of more proper languages, like lcalculus (or LISP). We have adapted the "toy RISC" machine of [Hernndez & Hernndez 1993] with two remarkable features inherited from its objectoriented coding in C++: it is easily tunable for our needs, and it is efficient. We have made it even more reduced, removing any operand in the instruction set, even for the loop operations. We have only three registers which are AX (the accumulator), BX and CX. The operations Q b we have used for our experiment are in Table 1: LOOPTOP Decrements CX. If it is not equal to the first element jump to the program top.
A refinement operator based learning algorithm for the ALC description logic
, 2007
"... Abstract With the advent of the Semantic Web, description logics have become one of the most prominent paradigms for knowledge representation and reasoning. Progress in research and applications, however, faces a bottleneck due to the lack of available knowledge bases, and it is paramount that suita ..."
Abstract

Cited by 32 (14 self)
 Add to MetaCart
(Show Context)
Abstract With the advent of the Semantic Web, description logics have become one of the most prominent paradigms for knowledge representation and reasoning. Progress in research and applications, however, faces a bottleneck due to the lack of available knowledge bases, and it is paramount that suitable automated methods for their acquisition will be developed. In this paper, we provide the first learning algorithm based on refinement operators for the most fundamental description logic ALC. We develop the algorithm from thorough theoretical foundations and report on a prototype implementation. 1
Dominance Detection in Meetings Using Easily Obtainable Features
 In Bourlard, H., & Renals, S. (Eds.), Revised Selected Papers of the 2nd Joint Workshop on Multimodal Interaction and Related Machine Learning Algorithms
, 2005
"... We show that, using a Support Vector Machine classifier, it is possible to determine with a 75% success rate who dominated a particular meeting on the basis of a few basic features. We discuss the corpus we have used, the way we had people judge dominance and the features that were used. ..."
Abstract

Cited by 27 (1 self)
 Add to MetaCart
(Show Context)
We show that, using a Support Vector Machine classifier, it is possible to determine with a 75% success rate who dominated a particular meeting on the basis of a few basic features. We discuss the corpus we have used, the way we had people judge dominance and the features that were used.