Results 1 
6 of
6
Hierarchical mixtures of experts and the EM algorithm
 Neural Computation
, 1994
"... We present a treestructured architecture for supervised learning. The statistical model underlying the architecture is a hierarchical mixture model in which both the mixture coefficients and the mixture components are generalized linear models (GLIM’s). Learning is treated as a maximum likelihood ..."
Abstract

Cited by 833 (20 self)
 Add to MetaCart
(Show Context)
We present a treestructured architecture for supervised learning. The statistical model underlying the architecture is a hierarchical mixture model in which both the mixture coefficients and the mixture components are generalized linear models (GLIM’s). Learning is treated as a maximum likelihood problem; in particular, we present an ExpectationMaximization (EM) algorithm for adjusting the parameters of the architecture. We also develop an online learning algorithm in which the parameters are updated incrementally. Comparative simulation results are presented in the robot dynamics domain. 1
Substructure Discovery Using Minimum Description Length and Background Knowledge
 Journal of Artificial Intelligence Research
, 1994
"... The ability to identify interesting and repetitive substructures is an essential component to discovering knowledge in structural data. We describe a new version of our Subdue substructure discovery system based on the minimum description length principle. The Subdue system discovers substructures ..."
Abstract

Cited by 185 (43 self)
 Add to MetaCart
(Show Context)
The ability to identify interesting and repetitive substructures is an essential component to discovering knowledge in structural data. We describe a new version of our Subdue substructure discovery system based on the minimum description length principle. The Subdue system discovers substructures that compress the original data and represent structural concepts in the data. By replacing previouslydiscovered substructures in the data, multiple passes of Subdue produce a hierarchical description of the structural regularities in the data. Subdue uses a computationallybounded inexact graph match that identifies similar, but not identical, instances of a substructure and finds an approximate measure of closeness of two substructures when under computational constraints. In addition to the minimum description length principle, other background knowledge can be used by Subdue to guide the search towards more appropriate substructures. Experiments in a variety of domains demonstrate Subdu...
Iterative Optimization and Simplification of Hierarchical Clusterings
 Journal of Artificial Intelligence Research
, 1995
"... Clustering is often used for discovering structure in data. Clustering systems differ in the objective function used to evaluate clustering quality and the control strategy used to search the space of clusterings. Ideally, the search strategy should consistently construct clusterings of high qual ..."
Abstract

Cited by 118 (2 self)
 Add to MetaCart
Clustering is often used for discovering structure in data. Clustering systems differ in the objective function used to evaluate clustering quality and the control strategy used to search the space of clusterings. Ideally, the search strategy should consistently construct clusterings of high quality, but be computationally inexpensive as well. In general, we cannot have it both ways, but we can partition the search so that a system inexpensively constructs a `tentative' clustering for initial examination, followed by iterative optimization, which continues to search in background for improved clusterings. Given this motivation, we evaluate an inexpensive strategy for creating initial clusterings, coupled with several control strategies for iterative optimization, each of which repeatedly modifies an initial clustering in search of a better one. One of these methods appears novel as an iterative optimization strategy in clustering contexts. Once a clustering has been construct...
UtilityBased Categorization
, 1993
"... The ability to categorize and use concepts e#ectively is a basic requirementofany intelligent actor. The utilitybased approach to categorization is founded on the thesis that categorization is fundamentally in service of action, i.e., the choice of concepts made by an actor is critical to its choi ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
The ability to categorize and use concepts e#ectively is a basic requirementofany intelligent actor. The utilitybased approach to categorization is founded on the thesis that categorization is fundamentally in service of action, i.e., the choice of concepts made by an actor is critical to its choice of appropriate actions. This is in contrast to classical and similaritybased approaches which seek logical completeness in concept description with respect to sensory data rather than actionoriented e#ectiveness. Utilitybased categorization is normative and not descriptive. It prescribes howanintelligent agent ought to conceptualize to act e#ectively. It provides ideals for categorization, speci#es criteria for the design of e#ective computational agents, and provides a model of ideal competence. A decisiontheoretic framework for utilitybased categorization whichinvolves reasoning about alternative categorization models of varying levels of abstraction is proposed. Categorization mode...
A Combined Latent Class and . . .
 ACCEPTED FOR PUBLICATION TO IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
, 2001
"... We present a general framework for data analysis and visualisation by means of topographic organization and clustering. Imposing distributional assumptions on the assumed underlying latent factors makes the proposed model suitable for both visualisation and clustering. The system's noise will b ..."
Abstract
 Add to MetaCart
We present a general framework for data analysis and visualisation by means of topographic organization and clustering. Imposing distributional assumptions on the assumed underlying latent factors makes the proposed model suitable for both visualisation and clustering. The system's noise will be modeled in parametric form, as a member of the exponential family of distributions and this allows us to deal with different (continuous or discrete) types of observables in a uni ed framework. In this paper we focus on discrete case formulations which, contrary to self organizing methods for continuous data, imply variants of Bregman divergencies as measures of dissimilaritybetween data and reference points, and also define the matching nonlinear relation between latent and observable variables. Therefore, the trait variant of the model can be seen as a datadriven noisy nonlinear Independent Component Analysis, which is capable of revealing meaningful structure in the multivariate observable data and visualise it in two dimensions. The class variant (which performs the clustering) of our model performs datadriven parametric mixture modeling. The combined (trait and class) model along with the associated estimation procedures allows us to interpret the visualisation result, in the sense of a topographic ordering. One important application of this work is the discovery of underlying semantic structure in text based documents. Experimental results on various subsets of the TwentyNewsgroups text corpus and binary coded digits data are given by way of demonstration.