Results 1 
3 of
3
Bayesian Biosurveillance of Disease Outbreaks
 In Proceedings of the 20th Annual Conference on Uncertainty in Artificial Intelligence (UAI04
, 2004
"... Early, reliable detection of disease outbreaks is a critical problem today. This paper reports an investigation of the use of causal Bayesian networks to model spatiotemporal patterns of a noncontagious disease (respiratory anthrax infection) in a population of people. The number of parameters in ..."
Abstract

Cited by 21 (11 self)
 Add to MetaCart
Early, reliable detection of disease outbreaks is a critical problem today. This paper reports an investigation of the use of causal Bayesian networks to model spatiotemporal patterns of a noncontagious disease (respiratory anthrax infection) in a population of people. The number of parameters in such a network can become enormous, if not carefully managed. Also, inference needs to be performed in real time as population data stream in. We describe techniques we have applied to address both the modeling and inference challenges. A key contribution of this paper is the explication of assumptions and techniques that are sufficient to allow the scaling
Detecting anomalous records in categorical datasets
 Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
, 2007
"... We consider the problem of detecting anomalies in high arity categorical datasets. In most applications, anomalies are defined as data points that are ’abnormal’. Quite often we have access to data which consists mostly of normal records, along with a small percentage of unlabelled anomalous records ..."
Abstract

Cited by 17 (2 self)
 Add to MetaCart
We consider the problem of detecting anomalies in high arity categorical datasets. In most applications, anomalies are defined as data points that are ’abnormal’. Quite often we have access to data which consists mostly of normal records, along with a small percentage of unlabelled anomalous records. We are interested in the problem of unsupervised anomaly detection, where we use the unlabelled data for training, and detect records that do not follow the definition of normality. A standard approach is to create a model of normal data, and compare test records against it. A probabilistic approach builds a likelihood model from the training data. Records are tested for anomalousness based on the complete record likelihood given the probability model. For categorical attributes, bayes nets give a standard representation of the likelihood. While this approach is good at finding outliers in the dataset, it often tends to detect records with attribute values that are rare. Sometimes, just detecting rare values of an attribute is not desired and such outliers are not considered as anomalies in that context. We present an alternative definition of anomalies, and propose an approach of comparing against marginal distributions of attribute subsets. We show that this is a more meaningful way of detecting anomalies, and has a better performance over semisynthetic as well as real world datasets.
94 COOPER ET AL. UAI 2004 Bayesian Biosurveillance of Disease Outbreaks
"... Early, reliable detection of disease outbreaks is a critical problem today. This paper reports an investigation of the use of causal Bayesian networks to model spatiotemporal patterns of a noncontagious disease (respiratory anthrax infection) in a population of people. The number of parameters in ..."
Abstract
 Add to MetaCart
Early, reliable detection of disease outbreaks is a critical problem today. This paper reports an investigation of the use of causal Bayesian networks to model spatiotemporal patterns of a noncontagious disease (respiratory anthrax infection) in a population of people. The number of parameters in such a network can become enormous, if not carefully managed. Also, inference needs to be performed in real time as population data stream in. We describe techniques we have applied to address both the modeling and inference challenges. A key contribution of this paper is the explication of assumptions and techniques that are sufficient to allow the scaling of Bayesian network modeling and inference to millions of nodes for realtime surveillance applications. The results reported here provide a proofofconcept that Bayesian networks can serve as the foundation of a system that effectively performs Bayesian biosurveillance of disease outbreaks. 1