Results 1 
9 of
9
Disk Aware Discord Discovery: Finding Unusual Time Series in Terabyte Sized
"... The problem of finding unusual time series has recently attracted much attention, and several promising methods are now in the literature. However, virtually all proposed methods assume that the data reside in main memory. For many realworld problems this is not be the case. For example, in astrono ..."
Abstract

Cited by 14 (4 self)
 Add to MetaCart
(Show Context)
The problem of finding unusual time series has recently attracted much attention, and several promising methods are now in the literature. However, virtually all proposed methods assume that the data reside in main memory. For many realworld problems this is not be the case. For example, in astronomy, multiterabyte time series datasets are the norm. Most current algorithms faced with data which cannot fit in main memory resort to multiple scans of the disk/tape and are thus intractable. In this work we show how one particular definition of unusual time series, the time series discord, can be discovered with a disk aware algorithm. The proposed algorithm is exact and requires only two linear scans of the disk with a tiny buffer of main memory. Furthermore, it is very simple to implement. We use the algorithm to provide further evidence of the effectiveness of the discord definition in areas as diverse as astronomy, web query mining, video surveillance, etc., and show the efficiency of our method on datasets which are many orders of magnitude larger than anything else attempted in the literature. 1.
Interpolating conditional density trees
 A. Darwiche, N. Friedman (Eds.), Uncertainty in Artificial Intelligence
, 2002
"... Joint distributions over many variables are frequently modeled by decomposing them into products of simpler, lowerdimensional conditional distributions, such as in sparsely connected Bayesian networks. However, automatically learning such models can be very computationally expensive when there are ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
(Show Context)
Joint distributions over many variables are frequently modeled by decomposing them into products of simpler, lowerdimensional conditional distributions, such as in sparsely connected Bayesian networks. However, automatically learning such models can be very computationally expensive when there are many datapoints and many continuous variables with complex nonlinear relationships, particularly when no good ways of decomposing the joint distribution are known a priori. In such situations, previous research has generally focused on the use of discretization techniques in which each continuous variable has a single discretization that is used throughout the entire network. In this paper, we present and compare a wide variety of treebased algorithms for learning and evaluating conditional density estimates over continuous variables. These trees can be thought of as discretizations that vary according to the particular interactions being modeled; however, the density within a given leaf of the tree need not be assumed constant, and we show that such nonuniform leaf densities lead to more accurate density estimation. We have developed Bayesian network structurelearning algorithms that employ these treebased conditional density representations, and we show that they can be used to practically learn complex joint probability models over dozens of continuous variables from thousands of datapoints. We focus on nding models that are simultaneously accurate, fast to learn, and fast to evaluate once they are learned.
Fast Factored Density Estimation and Compression with Bayesian Networks
"... Gaussian mixture models, compression, interpolating density trees, conditional density estimation To my family  especially my father, Donald. iv Many important data analysis tasks can be addressed by formulating them as probability estimation problems. For example, a popular general approach to au ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
(Show Context)
Gaussian mixture models, compression, interpolating density trees, conditional density estimation To my family  especially my father, Donald. iv Many important data analysis tasks can be addressed by formulating them as probability estimation problems. For example, a popular general approach to automatic classication problems is to learn a probabilistic model of each class from data in which the classes are known, and then use Bayes's rule with these models to predict the correct classes of other data for which they are not known. Anomaly detection and scientic discovery tasks can often be addressed by learning probability models over possible events and then looking for events to which these models assign low probabilities. Many data compression algorithms such as Human coding and arithmetic coding rely on probabilistic models of the data
Mixnets: Factored Mixtures of Gaussians in Bayesian Networks with Mixed Continuous And Discrete Variables
"... Recently developed techniques have made it possible to quickly learn accurate probability density functions from data in lowdimensional continuous spaces. In particular, mixtures of Gaussians can be fitted to data very quickly using an accelerated EM algorithm that employs multiresolution�dtrees ( ..."
Abstract
 Add to MetaCart
(Show Context)
Recently developed techniques have made it possible to quickly learn accurate probability density functions from data in lowdimensional continuous spaces. In particular, mixtures of Gaussians can be fitted to data very quickly using an accelerated EM algorithm that employs multiresolution�dtrees (Moore, 1999). In this paper, we propose a kind of Bayesian network in which lowdimensional mixtures of Gaussians over different subsets of the domain’s variables are combined into a coherent joint probability model over the entire domain. The network is also capable of modeling complex dependencies between discrete variables and continuous variables without requiring discretization of the continuous variables. We present efficient heuristic algorithms for automatically learning these networks from data, and perform comparative experiments illustrating how well these networks model real scientific data and synthetic data. We also briefly discuss some possible improvements to the networks, as well as possible applications. 1
Learning from Time Series in the Presence of Noise: Unsupervised and SemiSupervised Approaches
, 2008
"... ..."
(Show Context)
Focus on Mammalian Embryogenomics
"... Advancing the understanding of the embryo transcriptome coregulation using meta, functional, and gene network analysis tools ..."
Abstract
 Add to MetaCart
Advancing the understanding of the embryo transcriptome coregulation using meta, functional, and gene network analysis tools
BMC Systems Biology BioMed Central Research article Inference of gene pathways using mixture Bayesian networks
"... ..."
(Show Context)
Copyright by
, 2008
"... UMI Number: 3305676 INFORMATION TO USERS The quality of this reproduction is dependent upon the quality of the copy submitted. Broken or indistinct print, colored or poor quality illustrations and photographs, print bleedthrough, substandard margins, and improper alignment can adversely affect repr ..."
Abstract
 Add to MetaCart
(Show Context)
UMI Number: 3305676 INFORMATION TO USERS The quality of this reproduction is dependent upon the quality of the copy submitted. Broken or indistinct print, colored or poor quality illustrations and photographs, print bleedthrough, substandard margins, and improper alignment can adversely affect reproduction. In the unlikely event that the author did not send a complete manuscript and there are missing pages, these will be noted. Also, if unauthorized copyright material had to be removed, a note will indicate the deletion. UMI