Results 1 
9 of
9
ModelBased Clustering, Discriminant Analysis, and Density Estimation
 JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
, 2000
"... Cluster analysis is the automated search for groups of related observations in a data set. Most clustering done in practice is based largely on heuristic but intuitively reasonable procedures and most clustering methods available in commercial software are also of this type. However, there is little ..."
Abstract

Cited by 319 (26 self)
 Add to MetaCart
Cluster analysis is the automated search for groups of related observations in a data set. Most clustering done in practice is based largely on heuristic but intuitively reasonable procedures and most clustering methods available in commercial software are also of this type. However, there is little systematic guidance associated with these methods for solving important practical questions that arise in cluster analysis, such as \How many clusters are there?", "Which clustering method should be used?" and \How should outliers be handled?". We outline a general methodology for modelbased clustering that provides a principled statistical approach to these issues. We also show that this can be useful for other problems in multivariate analysis, such as discriminant analysis and multivariate density estimation. We give examples from medical diagnosis, mineeld detection, cluster recovery from noisy data, and spatial density estimation. Finally, we mention limitations of the methodology, a...
Learning with mixtures of trees
 Journal of Machine Learning Research
, 2000
"... This paper describes the mixturesoftrees model, a probabilistic model for discrete multidimensional domains. Mixturesoftrees generalize the probabilistic trees of Chow and Liu [6] in a different and complementary direction to that of Bayesian networks. We present efficient algorithms for learnin ..."
Abstract

Cited by 117 (2 self)
 Add to MetaCart
(Show Context)
This paper describes the mixturesoftrees model, a probabilistic model for discrete multidimensional domains. Mixturesoftrees generalize the probabilistic trees of Chow and Liu [6] in a different and complementary direction to that of Bayesian networks. We present efficient algorithms for learning mixturesoftrees models in maximum likelihood and Bayesian frameworks. We also discuss additional efficiencies that can be obtained when data are “sparse, ” and we present data structures and algorithms that exploit such sparseness. Experimental results demonstrate the performance of the model for both density estimation and classification. We also discuss the sense in which treebased classifiers perform an implicit form of feature selection, and demonstrate a resulting insensitivity to irrelevant attributes.
Learning Evaluation Functions to Improve Optimization by Local Search
 Journal of Machine Learning Research
, 2000
"... This paper describes algorithms that learn to improve search performance on largescale optimization tasks. The main algorithm, Stage, works by learning an evaluation function that predicts the outcome of a local search algorithm, such as hillclimbing or Walksat, from features of states visited durin ..."
Abstract

Cited by 57 (0 self)
 Add to MetaCart
This paper describes algorithms that learn to improve search performance on largescale optimization tasks. The main algorithm, Stage, works by learning an evaluation function that predicts the outcome of a local search algorithm, such as hillclimbing or Walksat, from features of states visited during search. The learned evaluation function is then used to bias future search trajectories toward better optima on the same problem. Another algorithm, XStage, transfers previously learned evaluation functions to new, similar optimization problems. Empirical results are provided on seven largescale optimization domains: binpacking, channel routing, Bayesian network structurefinding, radiotherapy treatment planning, cartogram design, Boolean satisfiability, and Boggle board setup.
Bayesian Network Anomaly Pattern Detection for Disease Outbreaks
 In Proceedings of the Twentieth International Conference on Machine Learning
, 2003
"... Early disease outbreak detection systems typically monitor health care data for irregularities by comparing the distribution of recent data against a baseline distribution. Determining the baseline is dicult due to the presence of dierent trends in health care data, such as trends caused by th ..."
Abstract

Cited by 39 (6 self)
 Add to MetaCart
Early disease outbreak detection systems typically monitor health care data for irregularities by comparing the distribution of recent data against a baseline distribution. Determining the baseline is dicult due to the presence of dierent trends in health care data, such as trends caused by the day of week and by seasonal variations in temperature and weather. Creating the baseline distribution without taking these trends into account can lead to unacceptably high false positive counts and slow detection times.
An accelerated Chow and Liu algorithm: fitting tree distributions to highdimensional sparse data
, 1999
"... Chow and Liu [2] introduced an algorithm for fitting a multivariate distribution with a tree (i.e. a density model that assumes that there are only pairwise dependencies between variables) and that the graph of these dependencies is a spanning tree. The original algorithm is quadratic in the dimes ..."
Abstract

Cited by 17 (0 self)
 Add to MetaCart
Chow and Liu [2] introduced an algorithm for fitting a multivariate distribution with a tree (i.e. a density model that assumes that there are only pairwise dependencies between variables) and that the graph of these dependencies is a spanning tree. The original algorithm is quadratic in the dimesion of the domain, and linear in the number of data points that define the target distribution P . This paper shows that for sparse, discrete data, fitting a tree distribution can be done in time and memory that is jointly subquadratic in the number of variables and the size of the data set. The new algorithm, called the acCL algorithm, takes advantage of the sparsity of the data to accelerate the computation of pairwise marginals and the sorting of the resulting mutual informations, achieving speed ups of up to 23 orders of magnitude in the experiments. Copyright c # Massachusetts Institute of Technology, 1998 This report describes research done at the Dept. of Electrical Enginee...
Realvalued alldimensions search: Lowoverhead rapid searching over subsets of attributes
 Proceedings of the 18th Conference on Uncertainty in Artificial Intelligence
, 2002
"... This paper is about searching the combinatorial space of contingency tables during the inner loop of a nonlinear statistical optimization. Examples of this operation in various data analytic communities include searching for nonlinear combinations of attributes that contribute signicantly to a regre ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
This paper is about searching the combinatorial space of contingency tables during the inner loop of a nonlinear statistical optimization. Examples of this operation in various data analytic communities include searching for nonlinear combinations of attributes that contribute signicantly to a regression (Statistics), searching for items to include in a decision list (machine learning) and association rule hunting (Data Mining). This paper investigates a new, efficient approach to this class of problems, called RADSEARCH (Realvalued AllDimensionstree Search). RADSEARCH finds the global optimum, and this gives us the opportunity to empirically evaluate the question: apart from algorithmic elegance what does this attention to optimality buy us? We compare RADSEARCH with other recent successful search algorithms such as CN2, PRIM, APriori, OPUS and DenseMiner. Finally, we introduce RADREG, a new regression algorithm for learning realvalued outputs based on RADSEARCHing for highorder interactions.
MOTC with Examples: An Interactive Aid for Multidimensional Hypothesis Generation
, 2000
"... ..."
(Show Context)
E4  Machine Learning
"... Machine learning's focus on illdefined problems and highly flexible methods makes it ideally suited for KDD applications. Among the ideas machine learning contributes to KDD are the importance of empirical validation, the impossibility of learning without a priori assumptions, and the utili ..."
Abstract
 Add to MetaCart
Machine learning's focus on illdefined problems and highly flexible methods makes it ideally suited for KDD applications. Among the ideas machine learning contributes to KDD are the importance of empirical validation, the impossibility of learning without a priori assumptions, and the utility of limitedsearch or limitedrepresentation methods. Machine learning provides methods for incorporating knowledge into the learning process, changing and combining representations, combatting the curse of dimensionality, and learning comprehensible models. KDD challenges for machine learning include scaling up its algorithms to large databases, using cost information in learning, automating data preprocessing, and enabling rapid development of applications. KDD opens
Learning Evaluation Functions to Improve Local Search
"... This paper describes Stage, a learning algorithm that automatically improves search performance on largescale optimization problems. Stage learns an evaluation function that predicts the outcome of a local search algorithm, such as hillclimbing or Walksat, from features of states visited during sea ..."
Abstract
 Add to MetaCart
This paper describes Stage, a learning algorithm that automatically improves search performance on largescale optimization problems. Stage learns an evaluation function that predicts the outcome of a local search algorithm, such as hillclimbing or Walksat, from features of states visited during search. The learned evaluation function is used to bias future search trajectories toward better optima on the same problem. This paper presents the Stage algorithm; an extension, XStage, that transfers learned evaluation functions to new, similar optimization problems; and empirical results on seven largescale optimization domains: binpacking, channel routing, Bayes network structurefinding, radiotherapy treatment planning, cartogram design, Boolean satisfiability, and Boggle board setup.