Results 1 -
6 of
6
Hierarchical mixtures of experts and the EM algorithm
- Neural Computation
, 1994
"... We present a tree-structured architecture for supervised learning. The statistical model underlying the architecture is a hi-erarchical mixture model in which both the mixture coefficients and the mixture components are generalized linear models (GLIM’s). Learning is treated as a max-imum likelihood ..."
Abstract
-
Cited by 635 (20 self)
- Add to MetaCart
We present a tree-structured architecture for supervised learning. The statistical model underlying the architecture is a hi-erarchical mixture model in which both the mixture coefficients and the mixture components are generalized linear models (GLIM’s). Learning is treated as a max-imum likelihood problem; in particular, we present an Expectation-Maximization (EM) algorithm for adjusting the parame-ters of the architecture. We also develop an on-line learning algorithm in which the pa-rameters are updated incrementally. Com-parative simulation results are presented in the robot dynamics domain. 1
Substructure Discovery Using Minimum Description Length and Background Knowledge
- Journal of Artificial Intelligence Research
, 1994
"... The ability to identify interesting and repetitive substructures is an essential component to discovering knowledge in structural data. We describe a new version of our Subdue substructure discovery system based on the minimum description length principle. The Subdue system discovers substructures ..."
Abstract
-
Cited by 127 (34 self)
- Add to MetaCart
The ability to identify interesting and repetitive substructures is an essential component to discovering knowledge in structural data. We describe a new version of our Subdue substructure discovery system based on the minimum description length principle. The Subdue system discovers substructures that compress the original data and represent structural concepts in the data. By replacing previously-discovered substructures in the data, multiple passes of Subdue produce a hierarchical description of the structural regularities in the data. Subdue uses a computationally-bounded inexact graph match that identifies similar, but not identical, instances of a substructure and finds an approximate measure of closeness of two substructures when under computational constraints. In addition to the minimum description length principle, other background knowledge can be used by Subdue to guide the search towards more appropriate substructures. Experiments in a variety of domains demonstrate Subdu...
Iterative Optimization and Simplification of Hierarchical Clusterings
- Journal of Artificial Intelligence Research
, 1995
"... Clustering is often used for discovering structure in data. Clustering systems differ in the objective function used to evaluate clustering quality and the control strategy used to search the space of clusterings. Ideally, the search strategy should consistently construct clusterings of high qual ..."
Abstract
-
Cited by 96 (1 self)
- Add to MetaCart
Clustering is often used for discovering structure in data. Clustering systems differ in the objective function used to evaluate clustering quality and the control strategy used to search the space of clusterings. Ideally, the search strategy should consistently construct clusterings of high quality, but be computationally inexpensive as well. In general, we cannot have it both ways, but we can partition the search so that a system inexpensively constructs a `tentative' clustering for initial examination, followed by iterative optimization, which continues to search in background for improved clusterings. Given this motivation, we evaluate an inexpensive strategy for creating initial clusterings, coupled with several control strategies for iterative optimization, each of which repeatedly modifies an initial clustering in search of a better one. One of these methods appears novel as an iterative optimization strategy in clustering contexts. Once a clustering has been construct...
Utility-Based Categorization
, 1993
"... The ability to categorize and use concepts e#ectively is a basic requirementofany intelligent actor. The utility-based approach to categorization is founded on the thesis that categorization is fundamentally in service of action, i.e., the choice of concepts made by an actor is critical to its choi ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
The ability to categorize and use concepts e#ectively is a basic requirementofany intelligent actor. The utility-based approach to categorization is founded on the thesis that categorization is fundamentally in service of action, i.e., the choice of concepts made by an actor is critical to its choice of appropriate actions. This is in contrast to classical and similarity-based approaches which seek logical completeness in concept description with respect to sensory data rather than action-oriented e#ectiveness. Utility-based categorization is normative and not descriptive. It prescribes howanintelligent agent ought to conceptualize to act e#ectively. It provides ideals for categorization, speci#es criteria for the design of e#ective computational agents, and provides a model of ideal competence. A decision-theoretic framework for utilitybased categorization whichinvolves reasoning about alternative categorization models of varying levels of abstraction is proposed. Categorization mode...
A Combined Latent Class and . . .
- ACCEPTED FOR PUBLICATION TO IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
, 2001
"... We present a general framework for data analysis and visualisation by means of topographic organization and clustering. Imposing distributional assumptions on the assumed underlying latent factors makes the proposed model suitable for both visualisation and clustering. The system's noise will be mod ..."
Abstract
- Add to MetaCart
We present a general framework for data analysis and visualisation by means of topographic organization and clustering. Imposing distributional assumptions on the assumed underlying latent factors makes the proposed model suitable for both visualisation and clustering. The system's noise will be modeled in parametric form, as a member of the exponential family of distributions and this allows us to deal with different (continuous or discrete) types of observables in a uni ed framework. In this paper we focus on discrete case formulations which, contrary to self organizing methods for continuous data, imply variants of Bregman divergencies as measures of dissimilaritybetween data and reference points, and also define the matching nonlinear relation between latent and observable variables. Therefore, the trait variant of the model can be seen as a data-driven noisy nonlinear Independent Component Analysis, which is capable of revealing meaningful structure in the multivariate observable data and visualise it in two dimensions. The class variant (which performs the clustering) of our model performs data-driven parametric mixture modeling. The combined (trait and class) model along with the associated estimation procedures allows us to interpret the visualisation result, in the sense of a topographic ordering. One important application of this work is the discovery of underlying semantic structure in text based documents. Experimental results on various subsets of the Twenty-Newsgroups text corpus and binary coded digits data are given by way of demonstration.

