Results 1  10
of
10
Operations for Learning with Graphical Models
 Journal of Artificial Intelligence Research
, 1994
"... This paper is a multidisciplinary review of empirical, statistical learning from a graphical model perspective. Wellknown examples of graphical models include Bayesian networks, directed graphs representing a Markov chain, and undirected networks representing a Markov field. These graphical models ..."
Abstract

Cited by 249 (12 self)
 Add to MetaCart
This paper is a multidisciplinary review of empirical, statistical learning from a graphical model perspective. Wellknown examples of graphical models include Bayesian networks, directed graphs representing a Markov chain, and undirected networks representing a Markov field. These graphical models are extended to model data analysis and empirical learning using the notation of plates. Graphical operations for simplifying and manipulating a problem are provided including decomposition, differentiation, and the manipulation of probability models from the exponential family. Two standard algorithm schemas for learning are reviewed in a graphical framework: Gibbs sampling and the expectation maximization algorithm. Using these operations and schemas, some popular algorithms can be synthesized from their graphical specification. This includes versions of linear regression, techniques for feedforward networks, and learning Gaussian and discrete Bayesian networks from data. The paper conclu...
Hierarchical Learning with Procedural Abstraction Mechanisms
, 1997
"... Evolutionary computation (EC) consists of the design and analysis of probabilistic algorithms inspired by the principles of natural selection and variation. Genetic Programming (GP) is one subfield of EC that emphasizes desirable features such as the use of procedural representations, the capability ..."
Abstract

Cited by 33 (2 self)
 Add to MetaCart
Evolutionary computation (EC) consists of the design and analysis of probabilistic algorithms inspired by the principles of natural selection and variation. Genetic Programming (GP) is one subfield of EC that emphasizes desirable features such as the use of procedural representations, the capability to discover and exploit intrinsic characteristics of the application domain, and the flexibility to adapt the shape and complexity of learned models. Approaches that learn monolithic representations are considerably less likely to be effective for complex problems, and standard GP is no exception. The main goal of this dissertation is to extend GP capabilities with automatic mechanisms to cope with problems of increasing complexity. Humans succeed here by skillfully using hierarchical decomposition and abstraction mechanisms. The translation of such mechanisms into a general computer implementation is a tremendous challenge, which requires a firm understanding of the interplay between repr...
Graphical Models for Discovering Knowledge
, 1995
"... There are many different ways of representing knowledge, and for each of these ways there are many different discovery algorithms. How can we compare different representations? How can we mix, match and merge representations and algorithms on new problems with their own unique requirements? This cha ..."
Abstract

Cited by 28 (2 self)
 Add to MetaCart
There are many different ways of representing knowledge, and for each of these ways there are many different discovery algorithms. How can we compare different representations? How can we mix, match and merge representations and algorithms on new problems with their own unique requirements? This chapter introduces probabilistic modeling as a philosophy for addressing these questions and presents graphical models for representing probabilistic models. Probabilistic graphical models are a unified qualitative and quantitative framework for representing and reasoning with probabilities and independencies. 4.1 Introduction Perhaps one common element of the discovery systems described in this and previous books on knowledge discovery is that they are all different. Since the class of discovery problems is a challenging one, we cannot write a single program to address all of knowledge discovery. The KEFIR discovery system applied to health care by Matheus, PiatetskyShapiro, and McNeill (199...
A Fast and Robust General Purpose Clustering Algorithm
 In Pacific Rim International Conference on Artificial Intelligence
, 2000
"... General purpose and highly applicable clustering methods are usually required during the early stages of knowledge discovery exercises. kMeans has been adopted as the prototype of iterative modelbased clustering because of its speed, simplicity and capability to work within the format of very larg ..."
Abstract

Cited by 16 (2 self)
 Add to MetaCart
General purpose and highly applicable clustering methods are usually required during the early stages of knowledge discovery exercises. kMeans has been adopted as the prototype of iterative modelbased clustering because of its speed, simplicity and capability to work within the format of very large databases. However, kMeans has several disadvantages derived from its statistical simplicity. We propose an algorithm that remains very efficient, generally applicable, multidimensional but is more robust to noise and outliers. We achieve this by using the discrete median rather than the mean as the estimator of the center of a cluster. Comparison with kMeans, Expectation Maximization and Gibbs sampling demonstrates the advantages of our algorithm.
Time Series Learning with Probabilistic Network Composites
 University of Illinois
, 1998
"... The purpose of this research is to extend the theory of uncertain reasoning over time through integrated, multistrategy learning. Its focus is on decomposable, concept learning problems for classification of spatiotemporal sequences. Systematic methods of task decomposition using attributedriven m ..."
Abstract

Cited by 9 (9 self)
 Add to MetaCart
The purpose of this research is to extend the theory of uncertain reasoning over time through integrated, multistrategy learning. Its focus is on decomposable, concept learning problems for classification of spatiotemporal sequences. Systematic methods of task decomposition using attributedriven methods, especially attribute partitioning, are investigated. This leads to a novel and important type of unsupervised learning in which the feature construction (or extraction) step is modified to account for multiple sources of data and to systematically search for embedded temporal patterns. This modified technique is combined with traditional cluster definition methods to provide an effective mechanism for decomposition of time series learning problems. The decomposition process interacts with model selection from a collection of probabilistic models such as temporal artificial neural networks and temporal Bayesian networks. Models are chosen using a new quantitative (metricbased) approach that estimates expected performance of a learning architecture, algorithm, and mixture model on a newly defined subproblem. By mapping subproblems to customized configurations of probabilistic networks for time series learning, a hierarchical, supervised learning system with enhanced generalization quality can be automatically built. The system can improve data fusion
Robust Linear Discriminant Trees
 In AI&Statistics95 [7
"... We present a new method for the induction of classification trees with linear discriminants as the partitioning function at each internal node. This paper presents two main contributions: first, a novel objective function called soft entropy which is used to identify optimal coefficients for the lin ..."
Abstract

Cited by 8 (2 self)
 Add to MetaCart
We present a new method for the induction of classification trees with linear discriminants as the partitioning function at each internal node. This paper presents two main contributions: first, a novel objective function called soft entropy which is used to identify optimal coefficients for the linear discriminants, and second, a novel method for removing outliers called iterative refiltering which boosts performance on many datasets. These two ideas are presented in the context of a single learning algorithm called DTSEPIR, which is compared with the CART and OC1 algorithms. 36.1 Introduction Recursive partitioning classifiers, or decision trees, are an important nonparametric function representation in statistics and machine learning (Friedman 1977, Breiman, Friedman, Olshen & Stone 1984, Quinlan 1986, Quinlan 1993). Their wide and successful use in fielded applications and their simple intuitive appeal make decision tree learning algorithms an important area of study. In this p...
Fast Randomized Algorithms for Robust Estimation of Location
"... . A fundamental procedure appearing within such clustering methods as kMeans, Expectation Maximization, FuzzyCMeans and Minimum Message Length is that of computing estimators of location. Most estimators of location exhibiting useful robustness properties require at least quadratic time to co ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
. A fundamental procedure appearing within such clustering methods as kMeans, Expectation Maximization, FuzzyCMeans and Minimum Message Length is that of computing estimators of location. Most estimators of location exhibiting useful robustness properties require at least quadratic time to compute, far too slow for large data mining applications. In this paper, we propose O(Dn p n)time randomized algorithms for computing robust estimators of location, where n is the size of the data set, and D is the dimension. Keywords: clustering, spatial data mining, robust statistics, location. 1 Introduction When analyzing large sets of spatial information (both 2dimensional and higherdimensional) , classical multivariate statistical procedures such as variable standardization, multivariate studentizing, outlier detection, discriminant analysis, principal components, factor analysis, structural models and canonical correlations all require that the center and scatter of a cloud of...
Learning with probabilistic representations
 Machine Learning
, 1997
"... Machine learning cannot occur without some means to represent the learned knowledge. Researchers have long recognized the influence of representational choices, and the major paradigms in machine learning are organized not around induction algorithms or performance elements as much as around represe ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
Machine learning cannot occur without some means to represent the learned knowledge. Researchers have long recognized the influence of representational choices, and the major paradigms in machine learning are organized not around induction algorithms or performance elements as much as around representational classes. Major examples include logical
Software for Data Analysis With Graphical Models
 In Fifth International Artificial Intelligence and Statistics Workshop, Ft Lauderdale, FL
, 1995
"... Probabilistic graphical models are being used widely in artificial intelligence and statistics, for instance, in diagnosis and expert systems, as a framework for representing and reasoning with probabilities and independencies. They come with corresponding algorithms for performing statistical infer ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Probabilistic graphical models are being used widely in artificial intelligence and statistics, for instance, in diagnosis and expert systems, as a framework for representing and reasoning with probabilities and independencies. They come with corresponding algorithms for performing statistical inference. This offers a unifying framework for prototyping and/or generating data analysis algorithms from graphical specifications. This paper illustrates the framework with an example and then presents some basic techniques for the task: problem decomposition and the calculation of exact Bayes factors. Other tools already developed, such as automatic differentiation, Gibbs sampling, and use of the EM algorithm, make this a broad basis for the generation of data analysis software. 1 Introduction This paper argues that the data analysis tasks of learning and knowledge discovery can be handled using graphical models. This metalevel use of graphical models was first suggested by Spiegelhalter an...