Results 1 
5 of
5
Mixnets: Factored Mixtures of Gaussians in Bayesian Networks with Mixed Continuous And Discrete Variables
, 2000
"... Recently developed techniques have made it possible to quickly learn accurate probability density functions from data in lowdimensional continuous spaces. In particular, mixtures of Gaussians can be fitted to data very quickly using an accelerated EM algorithm that employs multiresolution kdtrees ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
Recently developed techniques have made it possible to quickly learn accurate probability density functions from data in lowdimensional continuous spaces. In particular, mixtures of Gaussians can be fitted to data very quickly using an accelerated EM algorithm that employs multiresolution kdtrees (Moore, 1999). In this paper, we propose a kind of Bayesian network in which lowdimensional mixtures of Gaussians over different subsets of the domain’s variables are combined into a coherent joint probability model over the entire domain. The network is also capable of modeling complex dependencies between discrete variables and continuous variables without requiring discretization of the continuous variables. We present efficient heuristic algorithms for automatically learning these networks from data, and perform comparative experiments illustrating how well these networks model real scientific data and synthetic data. We also briefly discuss some possible improvements to the networks, as well as possible applications.
Netcube: A scalable tool for fast data mining and compression
 In Proceedings of the International Conference on Very Large Databases (VLDB
, 2001
"... We propose an novel method of computing and storing DataCubes. Our idea is to use Bayesian Networks, which can generate approximate counts for any query combination of attribute values and “don’t cares. ” A Bayesian network represents the underlying joint probability distribution of the data that we ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
We propose an novel method of computing and storing DataCubes. Our idea is to use Bayesian Networks, which can generate approximate counts for any query combination of attribute values and “don’t cares. ” A Bayesian network represents the underlying joint probability distribution of the data that were used to generate it. By means of such a network the proposed method, NetCube, exploits correlations among attributes. Our proposed preprocessing algorithm scales linearly on the size of the database, and is thus scalable; it is also parallelizable with a straightforward parallel implementation. Moreover, we give an algorithm to estimate counts of arbitrary queries that is fast (constant on the database size). Experimental results show that NetCubes have fast generation and use (a few This material is based upon work supported by the National Science
Evolving Dynamic Bayesian Networks with Multiobjective Genetic Algorithms
, 2005
"... A dynamic Bayesian network (DBN) is a probabilistic network that models interdependent entities that change over time. Given example sequences of multivariate data, we use a genetic algorithm to synthesize a network structure that models the causal relationships that explain the sequence. We use a m ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
A dynamic Bayesian network (DBN) is a probabilistic network that models interdependent entities that change over time. Given example sequences of multivariate data, we use a genetic algorithm to synthesize a network structure that models the causal relationships that explain the sequence. We use a multiobjective evaluation strategy with a genetic algorithm. The multiobjective criteria are a network’s probabilistic score and structural complexity score. Our use of Pareto ranking is ideal for this application, because it naturally balances the effect of the likelihood and structural simplicity terms used in the BIC network evaluation heuristic. We use a simple structural scoring formula, which tries to keep the number of links in the network approximately equivalent to the number of variables. We also use a simple representation that favours sparsely connected networks similar in structure to those modeling biological phenomenon. Our experiments show promising results when evolving networks ranging from 10 to 30 variables, using a maximal connectivity of between 3 and 4 parents per node. The results from the multiobjective GA were superior to those obtained with a single objective GA.
Fast Factored Density Estimation and Compression with Bayesian Networks
, 2002
"... my family especially my father, Donald. iv Abstract Many important data analysis tasks can be addressed by formulating them as probability estimation problems. For example, a popular general approach to automatic classification problems is to learn a probabilistic model of each class from data in ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
my family especially my father, Donald. iv Abstract Many important data analysis tasks can be addressed by formulating them as probability estimation problems. For example, a popular general approach to automatic classification problems is to learn a probabilistic model of each class from data in which the classes are known, and then use Bayes's rule with these models to predict the correct classes of other data for which they are not known. Anomaly detection and scientific discovery tasks can often be addressed by learning probability models over possible events and then looking for events to which these models assign low probabilities. Many data compression algorithms such as Huffman coding and arithmetic coding rely on probabilistic models of the data stream in order achieve high compression rates.
Using Constrained Models for GuaranteedError Semantic Compression
"... While a variety of lossy compression schemes have been developed for certain forms of digital data (e.g., images, audio, video), the area of lossy compression techniques for arbitrary data tables has been left relatively unexplored. Nevertheless, such techniques are clearly motivated by the everincr ..."
Abstract
 Add to MetaCart
While a variety of lossy compression schemes have been developed for certain forms of digital data (e.g., images, audio, video), the area of lossy compression techniques for arbitrary data tables has been left relatively unexplored. Nevertheless, such techniques are clearly motivated by the everincreasing data collection rates of modern enterprises and the need for effective, guaranteedquality approximate answers to queries over massive relational data sets.