Results 1 
9 of
9
Hierarchical mixtures of experts and the EM algorithm
 Neural Computation
, 1994
"... We present a treestructured architecture for supervised learning. The statistical model underlying the architecture is a hierarchical mixture model in which both the mixture coefficients and the mixture components are generalized linear models (GLIM’s). Learning is treated as a maximum likelihood ..."
Abstract

Cited by 723 (19 self)
 Add to MetaCart
We present a treestructured architecture for supervised learning. The statistical model underlying the architecture is a hierarchical mixture model in which both the mixture coefficients and the mixture components are generalized linear models (GLIM’s). Learning is treated as a maximum likelihood problem; in particular, we present an ExpectationMaximization (EM) algorithm for adjusting the parameters of the architecture. We also develop an online learning algorithm in which the parameters are updated incrementally. Comparative simulation results are presented in the robot dynamics domain. 1
An analysis of Bayesian classifiers
 IN PROCEEDINGS OF THE TENTH NATIONAL CONFERENCE ON ARTI CIAL INTELLIGENCE
, 1992
"... In this paper we present anaveragecase analysis of the Bayesian classifier, a simple induction algorithm that fares remarkably well on many learning tasks. Our analysis assumes a monotone conjunctive target concept, and independent, noisefree Boolean attributes. We calculate the probability that t ..."
Abstract

Cited by 333 (17 self)
 Add to MetaCart
In this paper we present anaveragecase analysis of the Bayesian classifier, a simple induction algorithm that fares remarkably well on many learning tasks. Our analysis assumes a monotone conjunctive target concept, and independent, noisefree Boolean attributes. We calculate the probability that the algorithm will induce an arbitrary pair of concept descriptions and then use this to compute the probability of correct classification over the instance space. The analysis takes into account the number of training instances, the number of attributes, the distribution of these attributes, and the level of class noise. We also explore the behavioral implications of the analysis by presenting
Approximation Algorithms for Projective Clustering
 Proceedings of the ACM SIGMOD International Conference on Management of data, Philadelphia
, 2000
"... We consider the following two instances of the projective clustering problem: Given a set S of n points in R d and an integer k ? 0; cover S by k hyperstrips (resp. hypercylinders) so that the maximum width of a hyperstrip (resp., the maximum diameter of a hypercylinder) is minimized. Let w ..."
Abstract

Cited by 246 (21 self)
 Add to MetaCart
We consider the following two instances of the projective clustering problem: Given a set S of n points in R d and an integer k ? 0; cover S by k hyperstrips (resp. hypercylinders) so that the maximum width of a hyperstrip (resp., the maximum diameter of a hypercylinder) is minimized. Let w be the smallest value so that S can be covered by k hyperstrips (resp. hypercylinders), each of width (resp. diameter) at most w : In the plane, the two problems are equivalent. It is NPHard to compute k planar strips of width even at most Cw ; for any constant C ? 0 [50]. This paper contains four main results related to projective clustering: (i) For d = 2, we present a randomized algorithm that computes O(k log k) strips of width at most 6w that cover S. Its expected running time is O(nk 2 log 4 n) if k 2 log k n; it also works for larger values of k, but then the expected running time is O(n 2=3 k 8=3 log 4 n). We also propose another algorithm that computes a c...
Wrappers For Performance Enhancement And Oblivious Decision Graphs
, 1995
"... In this doctoral dissertation, we study three basic problems in machine learning and two new hypothesis spaces with corresponding learning algorithms. The problems we investigate are: accuracy estimation, feature subset selection, and parameter tuning. The latter two problems are related and are stu ..."
Abstract

Cited by 107 (8 self)
 Add to MetaCart
In this doctoral dissertation, we study three basic problems in machine learning and two new hypothesis spaces with corresponding learning algorithms. The problems we investigate are: accuracy estimation, feature subset selection, and parameter tuning. The latter two problems are related and are studied under the wrapper approach. The hypothesis spaces we investigate are: decision tables with a default majority rule (DTMs) and oblivious readonce decision graphs (OODGs).
An Application of Pattern Matching in Intrusion Detection
, 1994
"... This report examines and classifies the characteristics of signatures used in misuse intrusion detection. Efficient algorithms to match patterns in some of these classes are described. A generalized model for matching intrusion signatures based on Colored Petri Nets is presented, and some of its pro ..."
Abstract

Cited by 68 (4 self)
 Add to MetaCart
This report examines and classifies the characteristics of signatures used in misuse intrusion detection. Efficient algorithms to match patterns in some of these classes are described. A generalized model for matching intrusion signatures based on Colored Petri Nets is presented, and some of its properties are derived.
Learning from incomplete data
, 1994
"... Realworld learning tasks often involve highdimensional data sets with complex patterns of missing features. In this paper we review the problem of learning from incomplete data from two statistical perspectivesthe likelihoodbased and the Bayesian. The goal is twofold: to place current neura ..."
Abstract

Cited by 58 (0 self)
 Add to MetaCart
Realworld learning tasks often involve highdimensional data sets with complex patterns of missing features. In this paper we review the problem of learning from incomplete data from two statistical perspectivesthe likelihoodbased and the Bayesian. The goal is twofold: to place current neural network approaches to missing data within a statistical framework, and to describe a set of algorithms, derived from the likelihoodbased framework, that handle clustering, classification, and function approximation from incomplete data in a principled and efficient manner. These algorithms are based on mixture modeling and maketwo distinct appeals to the ExpectationMaximization (EM) principle (Dempster et al., 1977)both for the estimation of mixture components and for coping with the missing data.
Learning, Bayesian Probability, Graphical Models, and Abduction
 Abduction and Induction: Essays on their Relation and Integration, Chapter 10
, 1998
"... In this chapter I review Bayesian statistics as used for induction and relate it to logicbased abduction. Much reasoning under uncertainty, including induction, is based on Bayes' rule. Bayes' rule is interesting precisely because it provides a mechanism for abduction. I review work of Buntine that ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
In this chapter I review Bayesian statistics as used for induction and relate it to logicbased abduction. Much reasoning under uncertainty, including induction, is based on Bayes' rule. Bayes' rule is interesting precisely because it provides a mechanism for abduction. I review work of Buntine that argues that much of the work on Bayesian learning can be best viewed in terms of graphical models such as Bayesian networks, and review previous work of Poole that relates Bayesian networks to logicbased abduction. This lets us see how much of the work on induction can be viewed in terms of logicbased abduction. I then explore what this means for extending logicbased abduction to richer representations, such as learning decision trees with probabilities at the leaves. Much of this paper is tutorial in nature; both the probabilistic and logicbased notions of abduction and induction are introduced and motivated. 1 Introduction This paper explores the relationship between learning (induct...
The minimum description length principle applied to feature learning and analogical mapping
 MCC Tech. Rep
, 1990
"... This paper describes an algorithm for orthogonal clustering. That is, it nds multiple partitions of a domain. The Minimum Description Length (MDL) Principle is used to de ne a parameterfree evaluation function over all possible sets of partitions. In contrast, conventional clustering algorithms can ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
This paper describes an algorithm for orthogonal clustering. That is, it nds multiple partitions of a domain. The Minimum Description Length (MDL) Principle is used to de ne a parameterfree evaluation function over all possible sets of partitions. In contrast, conventional clustering algorithms can only nd a single partition of a set of data. While they can be applied iteratively to create hierarchies, these are limited to tree structures. Orthogonal clustering, on the other hand, cannot form hierarchies deeper than one layer. Ideally one would want an algorithm which doesboth. However there are important problems for which orthogonal clustering is desirable. In particular, orthogonal clusters correspond to feature vectors, which are widely used throughout cognitive science. Hopefully, orthogonal clusters will also be useful for nding analogies. A side e ect which deserves more exploration is the induction of domain axioms in which the features
Structured Concept Discovery: Theory and Methods
, 1994
"... The field of knowledge discovery is concerned with the theory and processes involved in finding and representing patterns and regularities previously unknown. A new generation of knowledge discovery tools now deals with structured concepts: these capture associations between relations among the comp ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
The field of knowledge discovery is concerned with the theory and processes involved in finding and representing patterns and regularities previously unknown. A new generation of knowledge discovery tools now deals with structured concepts: these capture associations between relations among the components of structured objects. This paper outlines a logic used to express structured concepts, and surveys a number of systems performing structured concept discovery. The paper concludes with a discussion of important future research directions for the field. Contents 1 Introduction and motivations 3 2 Structured concepts: theoretical foundations 3 2.1 Logical concepts : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 4 2.1.1 A simple example : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 5 2.2 Structured concepts : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 6 2.3 Subsumption of concepts : : : : : : : : : : : : : : : : : : : : : : : : : : ...